Skip to content

Import urls questions

googlealchemistgooglealchemist Anywhere I want
When i go into the main settings, tools, import urls (identify platform and sort in) (I cant tell any difference between this and the option below it (holding site lists)?

When it finishes and pops up that statistics box at the top it says added X urls to site list. Is the site list the global identified list? 

If I just have a raw scraped list I need to identify...is there a difference between adding it to gsa this way...versus importing the same list to a specific project that I have setup to identify/submit/verify with the 'use global identified lists' box checked?

Comments

  • SvenSven www.GSA-Online.de
    Import URLs (identify platform and sort in) << this will take e.g. your file with one url per line and check what engine/platform it belongs to and put it into the appropriate site list file.
    Import URLs (holding site lists) << this will download each url from the source you give it and extract all the URLs linked on it (also text only) and sort it in. A sample are e.g. pastebin.com URLs where a lot of URLs are usually listed one per line.
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    Import URLs (identify platform and sort in) << this will take e.g. your file with one url per line and check what engine/platform it belongs to and put it into the appropriate site list file.
    Import URLs (holding site lists) << this will download each url from the source you give it and extract all the URLs linked on it (also text only) and sort it in. A sample are e.g. pastebin.com URLs where a lot of URLs are usually listed one per line.
    So in general it looks like the first option is what i want to upload bulk scraped urls to identify and then post to. Is there a difference or benefit to importing my list in the main settings area there vs the specific project that I have setup to identify and test new scrapes?

    But the second option...like you said for stuff like pastebin which ive come across having many links in them. This would only see fully naked url links or just a non hyper linked url...so would this be good to pull potential links from big spammy blog comment posts?
  • Yes first option sends the identified urls to identified engine lists. It would be the same if you were using GSA PI and told it to add identified URLS to the identified sitelist folder, per engine. That is the choice you want to make when using SER to identify CMS from raw scrape.
    Thanked by 1googlealchemist
  • googlealchemistgooglealchemist Anywhere I want
    Yes first option sends the identified urls to identified engine lists. It would be the same if you were using GSA PI and told it to add identified URLS to the identified sitelist folder, per engine. That is the choice you want to make when using SER to identify CMS from raw scrape.
    thanks that makes sense. is there a benefit or any difference between uploading that list using the first option in the main gsa settings area vs importing them to a specific test project directly?
  • edited July 2022
    I would have SER identify and sort in your raw scrapes because it will save time when SER is trying to turn the identified list into a verified list.

    GSA PI is a much better option for identify and sort in operation (and it’s much faster than SER at this). It allows SER to keep posting which is all SER should be doing for best results.
Sign In or Register to comment.