Try to locate new url on "no engine match" - is this feature has depth limit?

yavuz · March 2015

Ok lets say i have imported 100 root url into my project

no search engine etc checked

Also ticket try to locate new url feature

Now lets say 50 of these root url are engine matched so no new url extracted from them

for the lest 50 (depth 0) from my understanding GSA extracted crawled pages urls

and lets say extracted 500 url as depth 1

now does it continue to extract for depth 2 and goes on ?

or it does stop at depth 1 ?

@sven

ty very much for answer

Sven · March 2015

It usually means that it will try to locate a new blog entry or download the root url to see if there is something useful.

yavuz · March 2015

@Sven so it does mean that it will not continue to crawl entire website right ?

because if it continue to crawl extracted links from root url it would continue until crawl entire website

Sven · March 2015

No thats not what will happen.

JudderMan · March 2015

Sorry can you just confirm, if you trim a list of URLs to root it won't crawl the whole website to look for targeted inner URLs to post on?

I don't trim to root with SB/GS but always wondered if I should delete duplicate domains from a raw list.

Sven · March 2015

"it will not crawl the whole page!"

yavuz · March 2015

@Sven i dont understand why you are giving very little info about this

it must have a simple explanation

like :
1: crawls root url
2: get source
3: extract found urls in the source of root url
4: rawl them do not get new urls from those crawled leaf urls
end

Sven · March 2015

Sorry but I don't understand what the problem here is.

This option is turned on means:

1. On a 404 error it will go to the root URL (if a deep link saw used as starting URL) and try again from there to identify the engine

2. On a "no engine match" it will try to find a deeper link with something of a date in the URL or a ?p=<number> to locate a blog entry to post to it (in case Blog Comments are used)

3. It will also try to locate iframes to post to it

yavuz · March 2015

In that case if blog is not such structured urls for posts gsa would fail right ?

however good approach

i think if an option to set depth of inner crawl would be great

Sven · March 2015

>In that case if blog is not such structured urls for posts gsa would fail right ?

yes