Loaded URLs * from site lists
OK this has been asked before and I have pieces but I'd like to have a definitive understanding of this (and this may help the next people googling for it).
I have lists from a project I consider to be finished, and I keep having : Loaded URLs 0/200 from site lists.
From what I understood, it keeps reading the identified lists (I checked the box so it's OK to me), and the 0 means it has found new URLs.
Question #1 : OK what about the 200? From a thread I found you (Sven) said that it takes random URLs? Correct? So you're not reading the whole file but just take 200 random lines???
Question #2 : Now if it takes random lines, why is it 200 in my case? I saw some people have different numbers. And I probably have sometimes but I didn't pay attention.
Question #3 : any way to change that 200 or have the whole file read? (assuming everything above is correct)
Thanks!
Comments
1) Yes
2) Depends on your defined Max. Threads and the current free threads. It would make no sense to load like 1000 URLs when you just defined 1 Thread. IT would be a wast of memory.
3) You can import your site list directly to the project using right click->import target urls-> from site list
I see your problem but also see if I would go through the whole file and see where it has new targets, it's almost killing the program in terms of performance and memory usage. Thats not something you want.
Maybe it would be a good idea to add targets from PI directly to selected projects as SER would monitor changes on that and load the file.
no secret... http://pastebin.com/9t6Y0BgZ
A host hash is generated by the domain only like http://www.blah.com -> hash('blah.com') or http://sub.blah.com/ -> hash('sub.blah.com');
---
An URL is done with hash( lowercase( URL ) );