Question: Scraping Targets for GSA

Gunel · June 2013

Hey Guys,

Great software, thanks!

I want to start feeding GSA rather than letting it find targets itself and I have 2 simple questions for you:

1. Getting rid of duplicates:

a. Do I have to trim to last folder > then remove duplicate urls OR
b. Remove duplicate Domains right after harvesting? (I'll be using Scrapebox for scraping, btw, I am not scraping for blog/image commenting type links where every url matters)

Question 2: List of Identified Targets:

a. Does GSA automatically save the targets to proper .txt file inside site_list-identified folder for each platform?
b. OR do I have to manually add them?

Tips are welcome, but if you are busy please give a short answer like 1 - A, 2 - B.

I have made it super easy for you

Thanks,

Ozz · June 2013

1b

2a -> if you import the urls directly to your project or use the identifier tool doesn't matter. each identified url will be saved to global lists once you've selected this option in OPTIONS -> ADVANCED

as you are new to this forum two important things.

I) read the sticky threads ("compiled list of tips..." and "inofficial FAQ")

II) use google to search this board as the on-board search function doesn't work that well. search term in google looks like this:

site:forum.gsa-online.de SEARCH TERM

Gunel · June 2013

@Ozz

Thanks, I appreciate it!

P.S. I am not new to the forum.

Ozz · June 2013

your welcome

your account is less than 2 hours old, so i assumed you are new here

Question: Scraping Targets for GSA

Comments