Skip to content

Scrapebox > Platform Identifier > GSA no engines found

Hello!

I am currently scraping using scrapebox and some list of footprints I found online until I can find my own.

Once the list is scraped I go in platform identifier and I process the file, per engine, directly to GSA SER's identified's folder, then I remove the duplicates.

I'm still getting a whole lot of 'no engine matches' in the logs and my lpm dropped significantly.

I also notice that when I go within GSA SER to 'Clean Up' in the tools, there are still a lot of URLs being removed, somehow. It's confusing because I just did that process within platform identifier. So what's different? They don't have the same footprints? What do I need to add in GSA SER?

What are some tips here? How can I improve my workflow?
Tagged:

Comments

  • Maybe instead of using Scrapebox try the built in scraper in GSA SER and compare and see what you come up with?

    I made a quick guide on this somewhere in thread about footprints I am in middle off trying to fix my stuff so I am no help other then to guide you to that.

    Basically, search online for URLs in site list tools I believe and familiar "scrapebox-like" window will pop up.
  • No need to search for footprints online on 3rd party sites. Use SER instead.
    • Create a new project in SER.
    • Select the engines you want (+ apply filters)
    • Right button click in the field "where to submit" -> "export footprints of all checked engine"
      This way, you also ensure to include your custom footprints.
    • Enter your niche's KWs or use generic A - Z, 0 - 9 in Scrapebox.
    • Merge with the footprints exported.
    • Use good proxies when scraping Google.
    • Opt for the detailed harvester than the custom harvester in Scrapebox.
      It seems to take longer than the custom harvester but when looking at the results, you'll see that it is way more successful.
    This way, you will scrape primarily target URLs that will be recognized by SER's engine. You can then expand this list in Scrapebox using the "link extractor" addon.
  • No need to search for footprints online on 3rd party sites. Use SER instead.
    • Create a new project in SER.
    • Select the engines you want (+ apply filters)
    • Right button click in the field "where to submit" -> "export footprints of all checked engine"
      This way, you also ensure to include your custom footprints.
    • Enter your niche's KWs or use generic A - Z, 0 - 9 in Scrapebox.
    • Merge with the footprints exported.
    • Use good proxies when scraping Google.
    • Opt for the detailed harvester than the custom harvester in Scrapebox.
      It seems to take longer than the custom harvester but when looking at the results, you'll see that it is way more successful.
    This way, you will scrape primarily target URLs that will be recognized by SER's engine. You can then expand this list in Scrapebox using the "link extractor" addon.
    Hello,

    My goal in using 3rd party footprints is to scrape websites that aren't being scraped by other GSA users. I want a different link profile. @backlinkaddict

    It also doesn't explain why GSA platform identifier is identifying urls as certain engines while GSA SER can't recognize them. Logically it's supposed to be cleaning my scraped lists properly. That's why I was wondering if Platform Identifier somehow has more ''footprints'' (or whatever it uses to spot the engine) than GSA SER. I can't figure this out through Footprint Studio either.

    Can anyone advise on that point?

    I will however try what you both suggested and see if my results improve.

    Thank you for your input!
  • Agreed, you won't see good/any results without "working" proxies. Missing many as proxy errors.

    To clearify. . .

    The "search online for URLS" feature I mentioned does this all within SER finding targets it could possibly post to keeping in correct format.

    All footprints from all engines can be found/added to GUI here very easy and can also be filtered by engines and OBL and or exported.

    Also, exported as suggested above.

    You can also append your own "predefined keywords" list to the selected "engines" / "footprints". (in SER)

    You can use this here and auto add into SER or export to Scrapebox if you wish, there's many ways.

    Platform Identifier probably does better for this if need to identify from SER standpoint higher scale?

    I would keep in mind grabbing footprints online and using the default ones you may keep getting same results. (same as other users using default settings/footprints)

    Maybe try sourcing list elsewhere for list of URLs from certain engine (not found with current SER footprints) then adding list to the footprints studio.

    Now scan and build different custom footprints. You can use this list newly sourced list or append to some other also . Maybe you will find some new and test some good combinations that provide more of the engines you are after in results.

    My last test I used more "simple" footprints 1 or 2 from a high percentage and 1 lesser used for example  (most likely found more on newer updated versions of the specific CMS/engine)

    This worked for me recently to get more rolling in, and I was thinking about looking into another engine I want to improve so I will be testing that hopefully soon.

    Also, while footprints work to parse and find potential targets the "script" is what does the work so sometimes these must be adjusted for success.

    You can double click on and engine in the pane and see the script, how its identifying and [Register] [Steps].

    You may find that it may not save your changes this way, so in this case you would have to pull out config .ini file edit and then paste over and add to right place with the right "permissions".
  • rosath said:

    My goal in using 3rd party footprints is to scrape websites that aren't being scraped by other GSA users. I want a different link profile.
    That's perfectly fine but you need to ensure that SER knows how to identify and action on these targets. Thus the suggestion to export footprints from SER.

    Can't comment on PI as I am not using it.

  • I think I understood and answered the "getting different then others default users" but if there is something else lmk. I did however expand upon topic a bit on recent post as we posted at same time lol
Sign In or Register to comment.