I really hope someone are able to enlighten me about this mystery?
Here is the problem:
I ran a good scrape with multiple footprints for .edu and .gov and 100K keywords merged together.
Result was more than 250.000 URLS - found in about 1-2 hours with 70 threads using private proxies.
After trimming those for redundancy and pointless extensions like .pdf, png etc... I ran them through GSA Ser.
And here is the funny part...More than half od those URLS---> "Allready parsed" and "No Engine"
Result is really bad - nothing to say it frankly.
I also tried to extract the footprints internally in GSA SER and then combine those with 100K keywords...
Same result: NOTHING!
How is that possible?
Why are there no positives i.e verified links?
Why does it say already parsed when I'm 100% sure both footprints and keywords are unique?
Damn, i'm really frustrated about those endless scrapings with zero results - regardless of settings, footprints, methods etc...
How on earth do you guys get success?
Anyone here up for running my scraped list through to see why I get nothing out of it?