how to import 12 million urls without freezing?
before, I tried to import a list of 3 million urls and SER froze. I am not sure if it really froze, but I started importing and logged off my vps, came back 24 hours and it was still importing, I dont know if it froze or what happened, I had to force it to close using control alt delete.
Now I just scraped a list of 12 million urls and I am scared of importing it. anything I can do to avoid problems?
Now I just scraped a list of 12 million urls and I am scared of importing it. anything I can do to avoid problems?
Comments
SSD is a lot faster, SATA drives can consume a lot of Proc resources during read/write and it may takes ages to import such list especially on VPS
Chunking your imports up seems to help. Try doing 4-5 million at a time.
how you I chunk it up? scrapebox cant handle such task.
http://www.softpedia.com/get/System/File-Management/Text-File-Splitter.shtml
When you chunk it, try importing smaller chunks at a time - not all at once.
I'm betting your issue the 1gb of ram.
This would rule it out in this instance. I have found several programs for dupremoval and some on this forum. But all seem limited by the on system ram.
I am currently 7-ziping the scrapes and downloading processing offline/desktop. This is a real pain. And I am sure there are better ways.
I am testing to see if g-scraper s speed degrades drastically with the dedupe function on. I suspect on large or long scrapes it does but we will see. If anyone has any suggestions I'd like to hear.