Gsa Urls Identification
I am trying to use gsa ser to identify urls from a list text file http://prntscr.com/nzvr3p
The text file has a list of about 10 Million Urls. Gsa ser is not procession the whole file and identification stops after around 300k-500k urls processing and the dialogue box pops up showing that the identification of the text file has been completed. The popup shows around 200k identified and 300k unknown urls. I have tried to process the file multiple times and ever tried making the text file smaller. But ser always doesnot identifies the whole list.
@Sven Is there any way Ser can process the whole file instead of just few urls ?
The text file has a list of about 10 Million Urls. Gsa ser is not procession the whole file and identification stops after around 300k-500k urls processing and the dialogue box pops up showing that the identification of the text file has been completed. The popup shows around 200k identified and 300k unknown urls. I have tried to process the file multiple times and ever tried making the text file smaller. But ser always doesnot identifies the whole list.
@Sven Is there any way Ser can process the whole file instead of just few urls ?
Tagged:
Comments
The file is with no duplicates and I have also sent the files to you in pm. Please check
I have spent huge resources in scraping and such urls with spaces are very min in the scraped file http://" k z w b " = " òf¿~èR"
Please let me know if any possible solution to process the scraped file.
Earlier I was using scrapebox duperemove free tool to remove dupes from files and it was lightning fast with even files upto 5 GB. With the current update, I have deduped a 1 GB file and the difference in speed in prominent now.