Skip to content

Importing Target URLs Slows the LPM Really Bad

edited February 2014 in Need Help
Hallo,

Basically I am scraping for URLs using Hrefer with the GSA footprints. I sort through them; remove duplicates, PR check by domain etc, but I am still left with millions of URLs resulting in TXT files that are hundreds of megabytes in size. This takes too long process when I do it via Advanced > Tools > Import URLs (identify platforms and sort in) > From File. When I did it last it took 4 days with a 300mb file using 40 dedicated private proxies from BuyProxies. (Tested and Speed checked, less than 1 second response) and that was also doing that sort-in exclusively. GSA was not running projects while the sort in was in process.

So it's taking a very long time sort through my lists so I thought I'd import them on a project level, right clicking on project and Import Target URLs > From File. When I did this, I noticed this killed my LPM. It went from around 90LPM to about 12LPM. I think this is because it's processing those URLs in the background.

I feel a bit lost with it. How can I use these scraped links I've got without making GSA horribly slow?

Comments

  • skip the sort process-run the lists
    you are not experiencing anything new.  Scraped lists will yield an extremely low success rate especially if you aren't using some more advanced tactics.  
  • Thanks for your answer krushinem.

    What do you mean skip the sort process-run the lists? Also, what more advanced tactics can I employ? 


  • the sorting process is redundant
    if you just run the list it is going to sort anyway
    advanced tactics would entail footprint lists and using some more in depth processes to evaluate what others are doing and capitalize
    sorry that I am not more revealing on the advanced tactics portion
  • Sorry. I'm not sure what you mean by run the list? Do you mean by clicking on project and Import Target URLs > From File? 

    Okay, yeah I'm already doing that advanced stuff. I thought you were referring to something else.
  • BrandonBrandon Reputation Management Pro
    You should be getting a low rate, you're grabbing random URLs all over the internet. A small percentage of harvested sites will ever work. Once you build up that Verified folder, you can run that exclusively if you want a higher LPM. With a clean verified list I can get 300 LPM sustained. Importing scraped URLs will give me 10-50 LPM.
  • So there is no way around it? Either Import using the sort in process, or import on the project level? Both will cause GSA to run slow?


  • The following thread may help you
    https://forum.gsa-online.de/discussion/838/independent-tool-for-importing-and-sorting-urls
    mmtj mentioned.
    "Now we identify the scrape in GSA (no proxies, re-download 1x) - we have a good dedi. with a superb line (poweruphosting) and some blazing fast VPS on a private cloud server and we can identify the 3mill. in one night easily."
    I tested my poweruphosting VPS, i can only identify 0.2mil in one night. I feel great pain there, but I really don't know any better way to work similar problem.
    But at least this is an idea for you. And hope some one can get another way to solve the problem.
Sign In or Register to comment.