Skip to content

Question: Scraping Targets for GSA

Hey Guys,

Great software, thanks!

I want to start feeding GSA rather than letting it find targets itself and I have 2 simple questions for you:

1. Getting rid of duplicates:
a. Do I have to trim to last folder > then remove duplicate urls OR
b. Remove duplicate Domains right after harvesting? (I'll be using Scrapebox for scraping, btw, I am not scraping for blog/image commenting type links where every url matters)

Question 2: List of Identified Targets:
a. Does GSA automatically save the targets to proper .txt file inside site_list-identified folder for each platform?
b. OR do I have to manually add them?

Tips are welcome, but if you are busy please give a short answer like 1 - A, 2 - B.
I have made it super easy for you :)

Thanks,

Comments

  • OzzOzz
    edited June 2013
    1b
    2a -> if you import the urls directly to your project or use the identifier tool doesn't matter. each identified url will be saved to global lists once you've selected this option in OPTIONS -> ADVANCED

    as you are new to this forum two important things.
    I) read the sticky threads ("compiled list of tips..." and "inofficial FAQ")
    II) use google to search this board as the on-board search function doesn't work that well. search term in google looks like this:
    site:forum.gsa-online.de SEARCH TERM
  • @Ozz

    Thanks, I appreciate it!

    P.S. I am not new to the forum.
  • your welcome :)

    your account is less than 2 hours old, so i assumed you are new here ;)
Sign In or Register to comment.