getting started with python-gsa integration

gribeiro · August 2014

I've been looking for sample python codes that send capcthas to CB, but couldn't find much other than this one.

I've done some basic web-craling and web-scraping with python (using modules such as mechanize, lmxl, and urllib2) but I guess my core problem here is that I'm not even sure what's the proper sequence of the steps I have to take.

I suppose I'd have to:

1. send requests with python
2. locate the captcha image in the response
3. locate the form where the solved captcha must be submitted
4. send the captcha file to CB to be solved
5. grab CB's response
6. submit it in the form located in step#2
7. gain access to page, and do whatever I want to do with it (in my case, save its source)

So, my questions are:
1. is this sequence of steps more or less accurate? if not, what's the problem with it?
2. any reccommendations of sample python code to achieve what I'm trying to achieve (i.e., gain access to 1000s of pages blocked by captchas).
3. any other reccomendations on using python to achieve my goal -- do`s and dont`s?

By the way, I'm on a Windows 64-bit machine, and I'm used to Python 2.7.

Thanks!

Sven · August 2014

1. sounds correct

2. the one from https://forum.gsa-online.de/discussion/8605/send-captchas-from-python-to-gsa-captcha-breaker/p1 seems good...at least it worked for him

3. it's no magic really...the way yo have nailed it in point 1. is looking good to me

getting started with python-gsa integration

Comments