Skip to content

How deep does GSA spyder a site looking for a postable url?

I'm trying to figure out the most efficient way to add newly scraped potential link targets to GSA.

If I add a random single inner url which itself directly does not have the ability to add a link...

Or...I add a whole big list of many sites which I trimmed to the root and then took out all duplicate domains in scrapebox...and just upload the root homepage of the site to GSA....

Does GSA crawl different inner pages from either the inner page or the root...looking for the right page to get a link from or do I need to extract all the links from each site and upload them all?

I plan on running all my scrapes thru the platform identifier first so maybe that is sorting that out for me first?

I guess most sites will have the registration/login links on every page for the most important contextual style link opportunity based sites? Do I need to adjust things for potential comment/guestbook type pages/links?

Thanks 

Comments

  • SvenSven www.GSA-Online.de
    edited October 2021
    SER downloads the URL and tries to detect what engine it is based on. It usually doesn't matter what deep link it is.
    If nothing matches and the option is enabled to find alternative links, it goes to the root URL and tries to locate a deep link. That makes only sense if you use blog comments or alike as engines.
    No further scraping is done from this point.
    Thanked by 1googlealchemist
  • googlealchemistgooglealchemist Anywhere I want
    "option is enabled to find alternative links"
    Are you referring to the project options "try to locate new url on "no engine match" (useful for some engines)"

    I couldnt find any global option for this, do I need to tick that in each project?

    Thanks
  • SvenSven www.GSA-Online.de
    Yes, thats the option I meant.
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    SER downloads the URL and tries to detect what engine it is based on. It usually doesn't matter what deep link it is.
    If nothing matches and the option is enabled to find alternative links, it goes to the root URL and tries to locate a deep link. That makes only sense if you use blog comments or alike as engines.
    No further scraping is done from this point.
    Does it not matter for the main contextual blog/wiki/etc type sites as the registration/login pages would be on all pages and not just the homepage? Or in the source code of all pages or however it is working?
  • SvenSven www.GSA-Online.de
    ones the engine is detected, SER will usually know what to open to register/login/submit.
    Thanked by 1googlealchemist
Sign In or Register to comment.