Skip to content

Anyway to tell which urls gsa imported and identified as unknown before it crashed?


When in options, advanced, site lists, tools..site lists..import urls/identify and add in...

I had a couple million urls i started to import to identify and sort into my master identified list while i was on vacation last week...

I came back and the software had crashed at some point...is there any default area im missing that gsa saves the processed but unknown urls?

I took the list i was importing and used scrapebox to compare and take out everything from my identified folder but i cant find any way to figure out which of the rest of the urls it processed and determined as unknown before it crashed.

Id rather not re run close to two million links

If not, could there be an option to set to automatically save the unknowns to a file like i think the stand alone platform identifier does?

Comments

  • SvenSven www.GSA-Online.de
    First, it should not crash at all! So lets find out why it crashed!? ;)
    Anyway, you can try to sort the site list files in the folder by date and copy out the last line of the newest file. That must be the one that was last checked.
    Search for this URL in your (hopefully) previously deduped file and you are at the position to continue.
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    First, it should not crash at all! So lets find out why it crashed!? ;)
    Anyway, you can try to sort the site list files in the folder by date and copy out the last line of the newest file. That must be the one that was last checked.
    Search for this URL in your (hopefully) previously deduped file and you are at the position to continue.
    i see what your saying thanks. that would be the last url that was identified as usable in gsa, but most of the raw scrape is going to be 'unknown' so it may have processed another few tens or hundreds of thousands of urls after that last identified url correct? is there any way to find out where it left off there?
  • googlealchemistgooglealchemist Anywhere I want
    i see the option to manually 'save unknown' but is there, or could there be, a way for it to be autosaving that to a file?
  • SvenSven www.GSA-Online.de
    its saved in temp folder and only copied ones you use this function.
Sign In or Register to comment.