Scrapebox > Platform Identifier > GSA no engines found
Hello!
I am currently scraping using scrapebox and some list of footprints I found online until I can find my own.
Once the list is scraped I go in platform identifier and I process the file, per engine, directly to GSA SER's identified's folder, then I remove the duplicates.
I'm still getting a whole lot of 'no engine matches' in the logs and my lpm dropped significantly.
I also notice that when I go within GSA SER to 'Clean Up' in the tools, there are still a lot of URLs being removed, somehow. It's confusing because I just did that process within platform identifier. So what's different? They don't have the same footprints? What do I need to add in GSA SER?
What are some tips here? How can I improve my workflow?
I am currently scraping using scrapebox and some list of footprints I found online until I can find my own.
Once the list is scraped I go in platform identifier and I process the file, per engine, directly to GSA SER's identified's folder, then I remove the duplicates.
I'm still getting a whole lot of 'no engine matches' in the logs and my lpm dropped significantly.
I also notice that when I go within GSA SER to 'Clean Up' in the tools, there are still a lot of URLs being removed, somehow. It's confusing because I just did that process within platform identifier. So what's different? They don't have the same footprints? What do I need to add in GSA SER?
What are some tips here? How can I improve my workflow?
Tagged:
Comments
- Create a new project in SER.
- Select the engines you want (+ apply filters)
- Right button click in the field "where to submit" -> "export footprints of all checked engine"
- Enter your niche's KWs or use generic A - Z, 0 - 9 in Scrapebox.
- Merge with the footprints exported.
- Use good proxies when scraping Google.
- Opt for the detailed harvester than the custom harvester in Scrapebox.
This way, you will scrape primarily target URLs that will be recognized by SER's engine. You can then expand this list in Scrapebox using the "link extractor" addon.This way, you also ensure to include your custom footprints.
It seems to take longer than the custom harvester but when looking at the results, you'll see that it is way more successful.
My goal in using 3rd party footprints is to scrape websites that aren't being scraped by other GSA users. I want a different link profile. @backlinkaddict
It also doesn't explain why GSA platform identifier is identifying urls as certain engines while GSA SER can't recognize them. Logically it's supposed to be cleaning my scraped lists properly. That's why I was wondering if Platform Identifier somehow has more ''footprints'' (or whatever it uses to spot the engine) than GSA SER. I can't figure this out through Footprint Studio either.
Can anyone advise on that point?
I will however try what you both suggested and see if my results improve.
Thank you for your input!
Can't comment on PI as I am not using it.