How to keep the depth very shallow.
The spider is trying to parse all kinds of files --- includes/css etc.
-- I've tried to set level to 1.. but that doesn't seem to work
still parsing hello/anotherlevel/more/here etc.
Do I need to restart the program? or is there a specific setting to only parse top level html files.
thanks!
Comments
1. limit parsing to one level deep only. This would skip all unrelated sites as the contact page is usually linked to every page and it should be reachable by one click / level only.
2. watch the URL queue for a while and maybe add the unwanted URLs to the filter like *badjavapart* .