Target Url Import from text file splitting not working properly

AsimNawaz · June 2019

Hi @Sven
I am trying to import a 400 mb target urls text file and split them in 300 projects. But what actually happening is that few urls are added to first of the selected projected and then few urls are added to randomly other project and then more than 90% of the target urls are added to last selected project. Mean while also lots of projects out of 300 ones dont even get a target url . I have tried this with different text files and even tried with different projects but every time this is happening that splitting is not being done properly no matter I randomize them or not. I have added a Video recording of the process which actually is happening. GSA ser version is 13.72 and is happening the same with updated and older versions. Please check this video and let me know what is the issue and how can I fix it. Thanks.
Here is the video https://streamable.com/9s1w4

Sven · June 2019

Thanks for the Video...trying to debug that now...

Sven · June 2019

fixed in next update

AsimNawaz · June 2019

I guess this is the same issue. We never recheck whether the import was properly done or not and then end up worrying where the links have gone.
https://forum.gsa-online.de/discussion/26290/i-have-imported-around-90-million-unique-urls-as-new-targets-and-they-get-deleted-with-these-setting/p1

Thank you so much @Sven

AsimNawaz · June 2019

@Sven
I have updates gsa ser but still the same problem

Importing a 300 mbs text file to 250 projects and still the same issue. Ser splits properly to early 20-30 and then remaining gets zero and then the last one gets all of the target urls. Tried diff text files too

Sven · June 2019

can you end the project backup + import file?

AsimNawaz · June 2019

@Sven please check your inbox for the files

Sven · June 2019

the problem is within that file. It has a big block of bogus data in it, on that whole block it can not find any URL and so further import fails and the rest of the file is imported to the last project. I will try to fix that based on that file you sent me.

AsimNawaz · June 2019

Thank you so much @Sven Really appreciate your efforts. Loved it

AsimNawaz · June 2019

Actually I am scraping and also using the verified links from gsa ser itself and merging all the files. And then importing them to gsa ser for tier 2 links. For file viewing I am using Notepad++ . Can you please just let me know how to analyze the text file for bogus or any invalid lines or characters that can cause ser to bug?

I am really thankful again for such a support. You have the worlds best fastest and the most efficient support. Really amazed

Sven · June 2019

if you load that into notepad++ you will see some very long lines. Thats obviously not correct. I don'T know from where these lines are.

Target Url Import from text file splitting not working properly

Comments