Skip to content

Try to locate new url on "no engine match" - is this feature has depth limit?

Ok lets say i have imported 100 root url into my project

no search engine etc checked

Also ticket try to locate new url feature

Now lets say 50 of these root url are engine matched so no new url extracted from them

for the lest 50 (depth 0) from my understanding GSA extracted crawled pages urls

and lets say extracted 500 url as depth 1

now does it continue to extract for depth 2 and goes on ?

or it does stop at depth 1 ?

@sven

ty very much for answer

Comments

  • SvenSven www.GSA-Online.de
    It usually means that it will try to locate a new blog entry or download the root url to see if there is something useful.
  • @Sven so it does mean that it will not continue to crawl entire website right ?

    because if it continue to crawl extracted links from root url it would continue until crawl entire website
  • SvenSven www.GSA-Online.de
    No thats not what will happen.
  • Sorry can you just confirm, if you trim a list of URLs to root it won't crawl the whole website to look for targeted inner URLs to post on? 

    I don't trim to root with SB/GS but always wondered if I should delete duplicate domains from a raw list.
  • SvenSven www.GSA-Online.de
    "it will not crawl the whole page!"
  • @Sven i dont understand why you are giving very little info about this

    it must have a simple explanation

    like :
    1: crawls root url
    2: get source
    3: extract found urls in the source of root url
    4: rawl them do not get new urls from those crawled leaf urls
    end


  • SvenSven www.GSA-Online.de

    Sorry but I don't understand what the problem here is.

    This option is turned on means:

    1. On a 404 error it will go to the root URL (if a deep link saw used as starting URL) and try again from there to identify the engine

    2. On a "no engine match" it will try to find a deeper link with something of a date in the URL or a ?p=<number> to locate a blog entry to post to it (in case Blog Comments are used)

    3. It will also try to locate iframes to post to it

  • In that case if blog is not such structured urls for posts gsa would fail right ?

    however good approach

    i think if an option to set depth of inner crawl would be great


  • SvenSven www.GSA-Online.de

    >In that case if blog is not such structured urls for posts gsa would fail right ?

    yes 

Sign In or Register to comment.