Scraper Questions

safenet · March 2014

I have tried to research in the forum but I am not able to locate the answer needed:

When scraping new sites:

1. If we have the bandwidth and processing power, can the scraper be ran at the same time as GSA submissions?

2. When scraping, make assumption we scraped for Pligg sites. When it finds a potential site, GSA then tries to submit to it - correct?

3. If it was successful, where does GSA store the site data? Not the verified link - but the Pligg site itself?

4. Does GSA automatically remove duplicate sites it finds and if not, what happens with duplicates?

Thanks

Sven · March 2014

Please everyone. If you open a thread here asking for help, please don't send another email to me. This double work for me answering emails and forum.

@safenet copy/paste from my email sent to you:

1.  If we have the bandwidth and processing power, can the scraper
be ran at the same time as GSA submissions?

Yes, though threats are shared between submissions/projects and the
scrapper (options->advanced->tools->...).

2.  When scraping, make assumption we scraped for Pligg sites.
When it finds a potential site, GSA then tries to submit to it -
correct?

Yes

3.  If it was successful, where does GSA store the site data?  Not
the verified link - but the Pligg site itself?

On the site lists that you configure in options->advanced

4. Does GSA automatically remove duplicate sites it finds and if
not, what happens with duplicates?

It does not touch any site lists for anything other than adding new
sites. It does not check for duplicates as this would reduce speed.
Though you can do that manually if wanted.

Scraper Questions

Comments