GSA Proxy Scraper - Parse extracted links from search-engine-tests

DzilRel · March 2016

Greetings

I am trying to uncheck Parse extracted links from search-engine-tests in the Provider section because I have the impression that I have no idea what it really does, and I wanted to see how it performs without it, but each time I restart the Proxy Scraper, the option is activated.

Is this a bug or a feature?

) Because it has a checkbox, which is usually used to check or uncheck things

)

Cheers

Sven · March 2016

indeed a bug, will get fixed in next update.

Let me however show you what this is doing:

When a Test of a proxy is performed against some search engine, it is doing a real search and not just opening the homepage. This real search is done with a known proxy IP/host in the hope that a new site is listed among the results where more proxies are listed.
The results get collected to a file and later, when this option is used, the program might be able to find new proxies. In the source column you will then read "Proxy-Search Links - <url>".

DzilRel · March 2016

@Sven ( SuperSven ) - Does that mean that as the list of proxies grows, the program will use more time in colateral searches instead of focusing on checking the proxy providers list and the proxies? Because it seems to me that it takes hours to finish a simple tour, and that's very bad because I need the proxies to be checked regularly.

If it is as I say, can you please implement 2 kind of scheduling? Like separating the processes?

1. Let me schedule the normal proxy checks on the proxies found on the lists I added

2. Make Parse extracted links from search-engine-tests and Use Search Engine to locate proxy lists a separate process.

Because if I understand well, the process of searching new sources take too much time, ignoring the first job, which is quite important for me to be repeated regularly. Else I can't imagine why it takes hours to finish a tour, and barely a few proxies are added.

For some reason, probably because I was limited by the trial features, the trial seemed to move faster than the paid one

I have a paid license since yesterday or the day before.

Sven · March 2016

@DzilRel ( SuperDzilRel ) the time this scanning and testing does should reduce actually as I use the same function myself to locate new sources and add them on each update.
The program will only extract there if it's not from a known source.

However the time can differ a lot according to the sites it finds.

DzilRel · March 2016

@Sven Thank you master. Then it means that I have to tweak something, because it takes hours to finish a tour, and the results are not really impressing, 5 Proxy-Search Links results in 1 day. Cheers

DzilRel · March 2016

@Sven - I am terrorizing you

Could you please tell me what am I doing wrong related to the speed of a complete tour? The computer is strong, dedicated to the scrapper, windows server, strong internet connection. Here are my settings:

Settings

http://prntscr.com/amd21b

Provider ( around 12000 sources I added, I will trim them in 1 week after checking the results )

http://prntscr.com/amd222

Automatic Search

http://prntscr.com/amd27u - Anon, Bing, Google, Yahoo=Costica tag

Filter

http://prntscr.com/amd2hn

Thank you sir!

DzilRel · March 2016

Oh I got it, I think you mean that in time the time will reduce. Thank. Ignore the above post.

Sven · March 2016

Hmm well you only want CONNECT proxies? Are you sure about that config? Also it might all be better if you use no filters at all but just apply filters on export. That way the program can reduce it's testing/work.

DzilRel · March 2016

@Sven Hm, does GSA work with other type of proxies too? I read somewhere that Connect proxies are the ones that GSA uses.

What do you recommend me to pick for GSA+Gscraper? All 4 types of proxies? The Connect ones worked very fast and nice. I think on this forum I read about the Connect type. Am I wrong?

Noted about the filters, I will remove them, since they are already enforced on export.

Sven · March 2016

GSA SER is using all types of proxies. Most other programs do not support CONNECT proxies like Scrapebox, JDownloader or others.

DzilRel · March 2016

I am stupid then. Gscraper supports them really nice. Thanks! You are awesome! Keep up the good work and soon enough I will have to order a T-Shirt with Sven is awesome!

DzilRel · March 2016

@Sven - I know I am a pain in the back. But. Gsa knows how to use both socks and web/connect in the same list? Or do I have to pick either socks either proxies.

Thank you and sorry. You need a support department just for me.

Sven · March 2016

GSA SER can handle any proxy because it detects what type it is on first usage/test. You can have a mix in the list.

DzilRel · March 2016

Thank you.

DzilRel · April 2016

@Sven - I reinstated the initial filters, because I think it's better to have a filter excluding a transparent proxy then to test it for google, bing and yahoo too. Instead of having 4 tests per proxy, you have just 1, non elite? You are out.

Something is not working right anyway, I ended up today having less proxies then I used to have all the time with the trial, when I just used Connect proxies, when I always had around 500 elite proxies.

I am keeping tabs over the script and see what I can improve.

Sven · April 2016

tests against google are only done for anonymous/elite proxies. so you should have at least one anonymous test script enabled. It is really better to leave the default setup as it is and only export things as you needed with filters.

DzilRel · April 2016

The default anonymous test is enabled. I do not target the google proxies too much, I will most probably disable the google test soon because I found a workaround in getting new links which works better.

The issue is that I saw many non elite proxies that were tagged as Bing and Yahoo compatible, so even if Google tests are not conducted on NON Elite proxies, the other tests go on. So it means that 3 tests were conducted on each of them, no matter if they are anonymous or not.

But as I said prior to this, I think the time issue lies into the search new sources/proxies rather then the scrap of proxies on the existing websites in the Provider and verifying them.

DzilRel · April 2016

@Sven - So. You know my settings more or less. Proxies are exported each 11 minutes. On loading them on GSA, I retest them, because something seems fishy. Check it out:

http://prntscr.com/amz2hg

Only 44 work from a fresh check. I am too tired and see only proxies so I will look into this more tomorrow. I tried all kinds of settings, it seems that if I have more sources it is very bad, because the program keeps checking proxies for hours, which basically gives me bad lists since a proxy that worked 10 hours ago, most probably will not work. I even cancelled the search for new proxies, allowing the software only to check the existing sources only.

Good night

Sven · April 2016

if you test against bing you will soon see that there are proxies who only accept bing and nothing else.
if you test with SER against anything else but bing you get them as not working.

the reason is some kind of bing-cache system where you can use those servers as proxies as well. i tried to understand this back than but didn't came far. i just accepted it and carried on.

DzilRel · April 2016

Ok. I did not know that. Funny.

Question. How does Proxy Scraper works?

To be honest, after going pro I am a bit disappointed, because instead of going faster, the last few days I am mostly tinkering with the proxy scan setup, mostly pausing gsa and gscraper because the speed dropped too much due to too many proxies that expired.

So, each time a new round starts, I want the Scraper to test the present working proxies ( it does that).

But from the present non working proxies, I would like it to test only the Elite ones that used to work. Else if I do not load any prefilter in the Filter section, it will check every time too many proxies. I do not care about the Transparent and Anonymous ones, and it would be just a waste of time to retest them.

Also, I would like to test the already present Elite, working and not working proxies from the list, each 10 20 30 minutes, not related to how much it takes to finish a round of scraping. Because for me, a round takes many many hours.

I could do this if I would purchase to licenses of the Proxy Scraper, and probably I will use a second trial one if it ain't possible in another way.

Can I run 2 Proxy Scrapers on 2 profiles on the same PC without issues? With 150/50 threads for each one? Or they would conflict.

Thank you

Sven · April 2016

you can only run once per pc and license. However if you don't care about tranparent proxies, you should add them in globlal filter. I would however not through away the anonymous once. They are useful as well.

to speed this up you can lower the timeout a bit...5seconds are not used much and if, then the proxy is probably unstable anyway. You can of course higher threads and also disable the provides with low success rate (sort by that column and disable the once at the end).

DzilRel · April 2016

All mighty and knowing Sven, does this mean that I can use anonymous proxies to scrap? Or submit with them?

And how low should I set the timeout filter? Now it's 5 seconds for all types, connect or not.

Also, I have 150 threads, because I read a post somewhere here where you said that more are not necessary better. The PC is an i5 with 8 gb RAM, a good network card and router, and no other scripts running on it. I like to separate processes on different computers.

Thank you

Sven · April 2016

anonymous means they are not leaking your IP so yes, you can use that.

DzilRel · April 2016

Ok thank you for your time. I appreciate that you find the time to help an old blind fool who barely knows how to install a program. Cheers

GSA Proxy Scraper - Parse extracted links from search-engine-tests

Comments