Skip to content

How to speed up scraping pages

ahmethannahmethann Turkmenistan
edited March 2020 in GSA Website Contact
I purchased your software, my first impression is it is easy to manage and understand. Here my first trial results, I have made scraping during 15 hours, with 15 keywords which return at least 10.000.000 results on google. For example keyword is: "mobilya". (So I hope other search engines should have at least 100K results)

I dont use google since I dont have proxies, moroever I can not use YAHOO because it opens a "cookie consent" popup and software can not bypass this (is there any way to paybass this, software can not make search on yahoo because of this usual cookie warninng) I use other 23 international search engines.

So after 15 hours work, inluding the fails it found 4600 pages (it couldnt find contact forms at half of them)

Here my settings screenshot:
https://prnt.sc/rpcs9j

Am I doing something wrong. Because after 15 hours of work finding 4600 pages(including pages that does not contain contact form) seems very low to me. Any idea is appreciated.
Tagged:

Comments

  • SvenSven www.GSA-Online.de
    1. google will only get you a max of 1000 results per query.
    2. to increase scraping speed you want to use some proxies (private because public proxies are banned often).
  • ahmethannahmethann Turkmenistan
    Hey Sven I said that I dont use google. I use other search engines that dont have tight boundries. Also I dont get blocked by any search engines. So problem is not related with google. Could you please look at above screen shot.
    Thanks
  • SvenSven www.GSA-Online.de
    Well google or bing whatever they will block you sooner or later I guess....but if you say they don't ... fine then.

    But using proxies will increase speed since the software can make as many simultaneous searches as you have proxies.
  • ahmethannahmethann Turkmenistan
    edited March 2020
    I look at the logs and see that it makes 1 bing search every 3 minutes. Which is very slow. And this why I am not blocked!

    Also I use single IP and selected 10 threads for search BUT I see that it makes searches sequentially. It makes a search on baidu, it waits 3 minutes (although time to wait between search queries is 60 seconds - why he makes like this??-) and makes a search on bing and waits 3 minutes...

    Why not it makes a search in parallel on bing and baidu and yandex in parallel and then wait 60 seconds??

    PLEASE LOOK: https://prnt.sc/rpcs9j

    I really dont understand why it behaves like this?? Can you explain? Am I missing a point?
  • SvenSven www.GSA-Online.de
    you add the same screenshot again and again...thats not helpful.
    --
    You must have read the log wrong, because the search is performed on all search engines if "domain - tld" is unique. Then the program waits like 60sec +/- 10sec.
  • ahmethannahmethann Turkmenistan
    No, log is correct I share the log with you. I have 23 search engines but literally it only makes one search at a given time moreover it waits 120 seconds or more between searches, which is not related with the value that I set. Here the log files:

    https://filebin.net/fc47lhaxw6yu4imr

    If you need I can give VPS details or TeamViewer. I really want problem to be solved. Currently software is very very very underperforming. 
  • SvenSven www.GSA-Online.de
    And yet it's correct!
    You use BING only it seems. But bing is counted as one search engine even though you use bing in different languages/countries. else please provide the project backup.
  • ahmethannahmethann Turkmenistan
    edited March 2020
    Dear Sven,
    I know what I use, I dont use bing only! I use 23 different search engines. But since software makes single search at a time and more over since it does not show above a curent line of log you only see the part that it makes search on BING!

    Again I am not using single search engine. BUT It uses single search engine even I set 10 threads. This is why I am writing here for 2 days

    ok, I am sending project backup
  • SvenSven www.GSA-Online.de
    Got the backup and checked it...
    You probably looked at the log when it was only parsing the last entries from MSN (bing). At start it also parsed all the other search engines you have checked...though many of them returned no result or not enough to parse on.
Sign In or Register to comment.