Skip to content

Set threads and search delay per project vs global?

googlealchemistgooglealchemist Anywhere I want
edited June 2022 in GSA Content Generator
Edit/Update: Just read this thread https://forum.gsa-online.de/discussion/29872/feature-request-gsa-cg-threads-for-checking/p1 is that what this accomplishes?

I'd like to make my content scraping as effecient as possible...

So I'd like to use a lot of threads for all of the non search (at least/especially google specifically) sites.

But for Google I would like to just have it set similar to what ive found to be effective using single thread and delay using the number of proxies i have in scrapebox to harvest w/o getting banned.

Are any of the 'other sites' prone to blocking access using just one ip for search/scrape and therefor make enabling proxies for any/all of them a good idea?

If getting project specific settings isn't practical for whatgever reason...is my best option to get what i want to just select all non search sites that i want...run that content harvest at many threads with no delay or proxies. 

Then unselect all of those non search sites...select just the search engines and change it to scrape just one thread and with a delay using proxies?

Comments

  • SvenSven www.GSA-Online.de
    This is already optimized to the max. A search is performed per proxy+delay and the scraping of the search results without any delay.
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    This is already optimized to the max. A search is performed per proxy+delay and the scraping of the search results without any delay.
    cool thanks.

    since google is alot faster to ban ip/proxies than other search engines. would the most practical way to work it be to run google separetly for each project/keyword set at more strict threads and delays...then unselect google and select the other search engines and run those at higher threads and/or less delays?

    is the 'threads for search/scraping' check box only for search engines specifically or is that also referencing searching/scraping from the other article/generic type sources?

    is the threads for testing for the non search engine sources or am i misunderstanding? i see i can check the box to only use proxies for search engines and leave the other sites unchecked. but i dont want to use a single thread with delays for all those other sites if they arent going to ban me for running at higher threads with no delays

    i have 150 private proxies and have found i need to run a single thread with a 4 second delay in scrapebox when harvesting footprints in google to avoid being banned long term
  • SvenSven www.GSA-Online.de
    sine all timings are per proxy/ip, you don't have to worry here. the program is taking care of everything.
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    sine all timings are per proxy/ip, you don't have to worry here. the program is taking care of everything.
    so i can set it to a lot of threads...ten or twenty or more...and the software will throttle requests based on how many proxies i have and how sensitive each site is to requests/banning?
  • SvenSven www.GSA-Online.de
    yes thats right.
Sign In or Register to comment.