Importing Target URLs Slows the LPM Really Bad
Hallo,
Basically I am scraping for URLs using Hrefer with the GSA footprints. I sort through them; remove duplicates, PR check by domain etc, but I am still left with millions of URLs resulting in TXT files that are hundreds of megabytes in size. This takes too long process when I do it via Advanced > Tools > Import URLs (identify platforms and sort in) > From File. When I did it last it took 4 days with a 300mb file using 40 dedicated private proxies from BuyProxies. (Tested and Speed checked, less than 1 second response) and that was also doing that sort-in exclusively. GSA was not running projects while the sort in was in process.
So it's taking a very long time sort through my lists so I thought I'd import them on a project level, right clicking on project and Import Target URLs > From File. When I did this, I noticed this killed my LPM. It went from around 90LPM to about 12LPM. I think this is because it's processing those URLs in the background.
I feel a bit lost with it. How can I use these scraped links I've got without making GSA horribly slow?
Comments
you are not experiencing anything new. Scraped lists will yield an extremely low success rate especially if you aren't using some more advanced tactics.
if you just run the list it is going to sort anyway
advanced tactics would entail footprint lists and using some more in depth processes to evaluate what others are doing and capitalize
sorry that I am not more revealing on the advanced tactics portion
Okay, yeah I'm already doing that advanced stuff. I thought you were referring to something else.
https://forum.gsa-online.de/discussion/838/independent-tool-for-importing-and-sorting-urls
mmtj mentioned.
"Now we identify the scrape in GSA (no proxies, re-download 1x) - we have a good dedi. with a superb line (poweruphosting) and some blazing fast VPS on a private cloud server and we can identify the 3mill. in one night easily."
I tested my poweruphosting VPS, i can only identify 0.2mil in one night. I feel great pain there, but I really don't know any better way to work similar problem.
But at least this is an idea for you. And hope some one can get another way to solve the problem.