Skip to content

Bug with saving verified URLs

AliTabAliTab GSAserlists.com
Hello @Sven

I suspect there is a problem with the way verified URLs are being saved. This might also be occurring with submitted URLs, but I haven't tested that yet. Not all URLs are being recorded in the file.





In a recent experiment, approximately 15 minutes elapsed, but only one URL was written to the file.



Comments

  • SvenSven www.GSA-Online.de
    New URLs are not saved instantly but only if enough of the same type have been collected or time between two save intervals is big enough.
    Thanked by 2Deeeeeeee AliTab
  • I noticed similar behaviour. . .

    That to there is a time interval until it is saved to file. You can see if you click early on project showing submitted in GUI to show submitted urls and  it says nothing has been saved to disk yet or something similiar. This seems normal.

    On the otherhand . . .

    I did have a bunch of pligg bookmarks that im sure I had a decent number of in the file. Recently, I used the tools in sitelists to dedupe urls not domains and then check/clean up none working.

    I also have a project trying to scrape bookmarks that should be appending to that file. After a while, I went manually into the sitelist to see what had been added for Pligg type bookmarks and there was now only 34?

    Im not sure what exactly happened but I did have a ton more that that just a short time before rechecking.
  • AliTabAliTab GSAserlists.com
    Sven said:
    New URLs are not saved instantly but only if enough of the same type have been collected or time between two save intervals is big enough.
    Thank you for your response, Sven. While I understand the logic, it still seems that not all URLs are being saved. I've observed that some URLs are properly recorded in their respective files, yet others, even though they were verified earlier, appear to be completely missed by SER. It seems as though some URLs are being lost somewhere in the process.
  • SvenSven www.GSA-Online.de
    URLs are also not saved if SER clearly knows that they where e.g. pulled from the identified site list...so it doesn't have to put them there again. If pulled from submitted, it will not add it to identified + submitted. If pulled from verified, it will not add it anywhere as it assumes it had added them in the other lists before.
  • AliTabAliTab GSAserlists.com
    Sven said:
    URLs are also not saved if SER clearly knows that they where e.g. pulled from the identified site list...so it doesn't have to put them there again. If pulled from submitted, it will not add it to identified + submitted. If pulled from verified, it will not add it anywhere as it assumes it had added them in the other lists before.
    Thank you for your response, Sven. I would like to clarify that I am not using any of the site lists in my projects.
  • SvenSven www.GSA-Online.de
    Oh there is another exception...if the site type uses fixed URLs, it is also not adding it to site lists, because the URLs are taken from the script itself anyway.
  • AliTabAliTab GSAserlists.com
    Sven said:
    Oh there is another exception...if the site type uses fixed URLs, it is also not adding it to site lists, because the URLs are taken from the script itself anyway.
    Also, that's not the case. I was checking SER default engines
  • SvenSven www.GSA-Online.de
    can you send me a sample so I can debug?
  • AliTabAliTab GSAserlists.com
    Sven said:
    can you send me a sample so I can debug?
    Of course, could you please specify the exact data you need? I'll send them to you in a private message.
  • SvenSven www.GSA-Online.de
    well, I could either need a project with such urls (the less the better) included that I run to see why it would not add them
    or
    just the URL itself that you think should get added.
Sign In or Register to comment.