Skip to content

Import List Questions

edited February 2013 in Need Help

1. I've scraped a very comprehensive list with scrapebox using numerous SER footprints+keywords. Do I need to "trim to root" for these lists before I import them or just leave the full URL and just "remove duplicate urls"? Or does it even matter?

2. I have 3 locations running SER. From time to time I like to take the "submitted url lists" from each location transfer them to my other locations running SER. When importing site lists from other SER installs, should I use Advanced>Tools>Import Site Lists when doing this OR open the submitted sites list folder, merge all lists into a text file, and then import them via right clicking a Project>Import Target URLs>From File.

I'm assuming they'll both do the same thing, but using Import Target URLs will run through all URLs immediately as where Import Site Lists will submit to the URLs over time, for all projects (NOTE: only if Global Lists is checked)

3. What's the difference between the 2 features under Advanced>Tools: Add URLs from Projects VS Import Site Lists?

Comments

  • LeeGLeeG Eating your first bourne

    If it was me, I would split the lists if they are big, then add them to t3 or t4 projects

    Filter them in that way and they only get added once, then the working links added to the site lists

  • SvenSven www.GSA-Online.de

    1) doesn't matter 

    2) Use Import Site Lists

    3) Import/Export are from other SER instances

  • @Sven - I need a bit more clarification please.

    1. Ok, so SER will find the registration pages on it's own regardless if the url is at it's root or a random url scraped? That doesn't seem to make much sense.

    2. Is my assumption correct in the 2nd paragraph of #2?

    3. This explanation didn't make sense.

  • SvenSven www.GSA-Online.de

    1. Just have a look in e.g. any forum engine. It first does "find link=Register" or something alike. And the Register/Login link is usually visible on sub pages as well.

    2) yes correct

    3) That Import/Export of site lists was made for things you want. So use it to import site lists that you build on a different machine. 

  • 1. Ok then, I'll trim to root as an extended URL (www.website.com/string.php?t=something&=something) I would think would be harder on SER to find register links for, as it would have to trim to root before it started the registration process. That's even if it does that at all.

    2. Ok thanks! Followup questions: a) If I import my URL lists, using "Import URL Lists" feature, am I importing them into the "Successful" AKA - Submitted folder? b) Once I have imported my URL lists into the submitted folder, will SER ensure that ALL PROJECTS run through the URLs imported?

    3. Cool....thanks!

  • OzzOzz
    edited February 2013
    you don't need to trim it to root. its completely irrelevant because the "register" button is on each page. just leave the url as it is and everything is fine.
  • UPDATE: I ended up importing my HUGE URL list into each project by choosing "Import Target URLs" and now I want to delete all of them.

    I am selecting Show URLs>Show left Target URLs>Selecting All>Deleting then clicking OK. When I open the Target URL list again to ensure the URLs are gone, they are still there! I believe this is a bug, as none of the URLs for any of the projects are being deleted. I have tried this numerous times with no success.

  • right click project -> modify -> delete url cache
  • SvenSven www.GSA-Online.de
    fixed in upcoming version
  • @ Ozz- Thanks....that worked, but I didn't want to delete all URLs in cache, just the ones imported. I see Sven will be creating a fix.......

    @Sven- thanks for the fix. I will be awaiting this.....

    My last 2 followup questions were ignored above. Followup questions: a) If I import my URL lists, using "Import URL Lists" feature, am I importing them into the "Successful" AKA - Submitted folder? b) Once I have imported my URL lists into the submitted folder, will SER ensure that ALL PROJECTS run through the URLs imported?
  • SvenSven www.GSA-Online.de

    a) as it is parsing the URLs...it is adding them to "identified list"

    b) yes and no. The projects can use the new URLs any time you add a new URL it it. But as they get the new URLs randomly (random position in file). It might take it's time till they get a new one and not some old.

  • a) Well when were importing lists (Advanced>Import site lists) it asks us which folder we want to save that list in. So I asked you if the submitted sites folder was ok. Your answer doesn't seem to match up with my question. Please revise.

    b) Understood and thanks!
  • SvenSven www.GSA-Online.de
    a) So it's "import site lists" not "Import URL list" as you asked previously. Yes you have to specify what site list it was that you previously exported.
  • As al;ways....thank you Sven!
  • Thanks for clarifying the Trim to root Sven.

    Trim to root seems to make the most sense when importing URLs. Then you can keep track of ALL your scraped URLs in one excel file. Have one tab as a master list of all URLs and a second tab for newly scraped URLs.

    Each time you do a scrape:

    1. Trim to root
    2. Copy and paste into Excel, tab 2
    3. Do a Vlookup against master list tab and sort out the duplicates. Move the unique ones into SER and also paste them to the master tab.

    Eliminates SER having to process duplicates.
  • @scp - OR you could simply import your list and click on Advanced>Tools>Remove duplicate domains.....lol.
  • apologize in advanced for these newbie questions...

    a) advanced > tools > "import URLs (identify platforms and sort in)"... do they get sorted to site_list-identified?

    b) i accidentally left the "save identified sites to" [unchecked] when i was doing my mass import... does it matter?

    c) in order to use the URLs from "import URLs (identify platforms and sort in)", we have to enable this in all projects like this?

    in each project, we [check] use URLs from global site lists if enabled
    [check] identified

    is this correct?


    thank you for the help!
  • SvenSven www.GSA-Online.de

    a) yes as no verification/submission was done

    b) no, should be imported anyway

    c) of course, else they are not used

Sign In or Register to comment.