Skip to content

How to increase the speed of identify links ?

I had scraped about 10 millions links after removing duplicates and set up 24/7 to identify those links into GSA SER. But the speed is very slow in VPS 300mbps download speed, I think it is only completed 10% since 2 days. How can i increase the speed, please help, really tired, i can't do other works due to the internet speed. 

Comments

  • You can increase your threads, lower your html timeout (not recommended unless it's through the roof), you can dedupe your scraped urls by domains if you didn't already do this (unless you have a hard on for image and blog comments).

    An alternative is simply not caring. I split all the urls I scrape up among 10 dummy projects on a server dedicated to this, by doing this SER is identifying, trying to post, and verifying at the same time.
  • @redfoxseo, how do you "identify those links into GSA SER"? Are you using the "Import URLs (identify platform and sort in)" option in Advanced Tools? If you are, then you can increase your speed by selecting only those platforms you want to identify...
  • @Olve1954, Yes i am doing that way. However i have just noticed, slowing speed is the issues from proxies. When i use semi private proxies, speed was increased. So might be the proxues issues. 

    Second, i have scraped through GScraper, so basically don't know much about which platform is for the links, sometime gscraper also scrape more than one platform by giving footprints. 
  • @redfoxseo, I disable proxies when I "Import URLs (identify platform and sort in)". I also enable/tick only the platforms I use, this will greatly increase the identification speed,

    image


  • @Olve1954, if i disable the proxies, will they not send the warning to my host about spam complaints ?
  • No, SER only visits or downloads that page and identify which platform it belongs to. It doesn't submit any links, so there shouldn't be any spam complaints. Furthermore, if you remove duplicate domains, SER visits every site, only once. So, it's not spamming...
  • @Olve1954, Thanks i will try..
  • You have to disable proxies for this task or it will take years. Identify platform is CPU sensitive so no matter if your vps is 300mbps or 10gbps speed will be low. You would need dedi to make it faster.
Sign In or Register to comment.