Scrapebox > Platform Identifier > GSA no engines found

January 2024

Hello!

I am currently scraping using scrapebox and some list of footprints I found online until I can find my own.

Once the list is scraped I go in platform identifier and I process the file, per engine, directly to GSA SER's identified's folder, then I remove the duplicates.

I'm still getting a whole lot of 'no engine matches' in the logs and my lpm dropped significantly.

I also notice that when I go within GSA SER to 'Clean Up' in the tools, there are still a lot of URLs being removed, somehow. It's confusing because I just did that process within platform identifier. So what's different? They don't have the same footprints? What do I need to add in GSA SER?

What are some tips here? How can I improve my workflow?

January 2024

No need to search for footprints online on 3rd party sites. Use SER instead.

Create a new project in SER.
Select the engines you want (+ apply filters)
Right button click in the field "where to submit" -> "export footprints of all checked engine"
This way, you also ensure to include your custom footprints.
Enter your niche's KWs or use generic A - Z, 0 - 9 in Scrapebox.
Merge with the footprints exported.
Use good proxies when scraping Google.
Opt for the detailed harvester than the custom harvester in Scrapebox.
It seems to take longer than the custom harvester but when looking at the results, you'll see that it is way more successful.

This way, you will scrape primarily target URLs that will be recognized by SER's engine. You can then expand this list in Scrapebox using the "link extractor" addon.

January 2024

organiccastle said:

No need to search for footprints online on 3rd party sites. Use SER instead.
Create a new project in SER.
Select the engines you want (+ apply filters)
Right button click in the field "where to submit" -> "export footprints of all checked engine"
This way, you also ensure to include your custom footprints.
Enter your niche's KWs or use generic A - Z, 0 - 9 in Scrapebox.
Merge with the footprints exported.
Use good proxies when scraping Google.
Opt for the detailed harvester than the custom harvester in Scrapebox.
It seems to take longer than the custom harvester but when looking at the results, you'll see that it is way more successful.
This way, you will scrape primarily target URLs that will be recognized by SER's engine. You can then expand this list in Scrapebox using the "link extractor" addon.

Hello,

My goal in using 3rd party footprints is to scrape websites that aren't being scraped by other GSA users. I want a different link profile. @backlinkaddict

It also doesn't explain why GSA platform identifier is identifying urls as certain engines while GSA SER can't recognize them. Logically it's supposed to be cleaning my scraped lists properly. That's why I was wondering if Platform Identifier somehow has more ''footprints'' (or whatever it uses to spot the engine) than GSA SER. I can't figure this out through Footprint Studio either.

Can anyone advise on that point?

I will however try what you both suggested and see if my results improve.

Thank you for your input!

January 2024

rosath said:

My goal in using 3rd party footprints is to scrape websites that aren't being scraped by other GSA users. I want a different link profile.

That's perfectly fine but you need to ensure that SER knows how to identify and action on these targets. Thus the suggestion to export footprints from SER.

Can't comment on PI as I am not using it.

October 2024

Hi, what do you mean by "You can then expand this list in Scrapebox using the "link extractor" addon." ?

Does it mean you are looking for external links on pages SB already found, hoping it will be same platforms? thanks

October 2024

remirom said:

Hi, what do you mean by "You can then expand this list in Scrapebox using the "link extractor" addon." ?
Does it mean you are looking for external links on pages SB already found, hoping it will be same platforms? thanks

Both internal and external links. Some people like to post again on the same site.

Scrapebox > Platform Identifier > GSA no engines found

Comments