Skip to content

Scraper simply not working, NO MATTER WHAT settings I use. Please help :(

edited March 2013 in Need Help
Ok, i seem to be having a problem with the scraper that is driving me NUTS. Ive had 3 projects running on a very fast VPS for over a week now without a SINGLE submission! 

I have 150 shared proxys which have been tested and are not banned by Google.
I've set it to 50, then 75, then 100 threads...still nothing
I've checked and unchecked all of the site filters (PR, OBL, Always use keywords to find sites) 
Tried ticking and unticking the "Put keywords in quotes..: box
Tried using only English search engines, and then tried using ALL search engines
I have over 1000 generic health related keywords in the projects
I've checked every single platform available and filled in all required fields

The only time GSA has actually submitted anything was when I found a verified url list and manually added them into my target sites, which submitted and verified just fine. I've thought about giving up on the scraper altogether and just scraping the platforms myself, but i plan on using this with over 100 different projects in the next 2 months and this would defeat the purpose of even using GSA. I'm slowly approaching the cliff with this program and thinking about just going back to MS. Troubleshooting this is just eating too much of my time. Can somebody throw me a lifeline and make me a GSA lover???
Tagged:

Comments

  • pictures of settings and logs... thats the only way we can help
  • SvenSven www.GSA-Online.de
    Sounds to me that these proxies are still a problem. Maybe you have enabled the PR filter and now the proxies are banned from google PR checking...resulting in rejecting any URL because of unknown PR.
  • edited March 2013
    Sven you are partially right. I disabled the proxy's and ran without and it worked. Now here's the strange part, I reenabled the same proxy's and everything magically worked! But only for about an hour or so. I thought that maybe my settings killed the proxy's, but then I rechecked them in both GSA and Scrapebox and they are alive. I also ran a PR check in Scrapebox on about 1,000 URLS and they worked like a charm for PR checks.

    In addition to this I went into one of the projects and disabled the PR filters that I had enabled (again), and it still doesn't work. I also ran without proxy's again, and everything runs fine. But this time when I reenable proxy's it didn't work properly like the last time. Sven I'd be more than happy to give you access to the VPN if you'd like to test/troubleshoot.

    @rodol: I'm not sure what info you'd get from these screenshots, as it's pretty much everything I just stated. The log is full of nothing but "[PAGE END] results on..."  Here they are anyway though. Thanks.

    imageimageimageimage
  • Uncheck "always use keywords to find target sites", trim the 156 search engines down to like 10 or 20.
  • @indylinks: That seems to have worked, but here's the problem:

    I'm not sure if that was what was actually causing the issue in the first place. I rechecked the "use keywords to find target sites" box and then chose all of the English sites again and it's still working with my proxy's just fine. In other words, I can't manually recreate the problem in order to determine it's cause. It just seems to behave very sporadic and works when it wants to after playing around with the settings. So far I haven't been able to run this thing for more than a few hours without the issue popping back up again, so I will let this run overnight and report back to you guys in the morning (im on Pacific time). 

    Thanks for everyone's help btw. Hopefully it was just some phantom fluke that will never return, but we'll see. And just from digging around, and by sven's acknowledgement, this seems to be a pretty common problem with using proxy's in GSA. Like I said if it happens again i'm willing to give you access to the VPN (only sven or his appointee) if you'd like to debug or something. 
  • mmm Why your Proxy's Type is WEB.... mines are TRANS... maybe your proxies aren't good for scraping and that's the issue... i beleive WEB type proxies are banned very quickly by google.

    try to do a huge scrape with scrapbox to see if your proxies get banned quickly, use 2 or 3, for the test.
Sign In or Register to comment.