Skip to content

Software Idea

edited July 2018 in Feature Requests
I have a software idea for you @Sven.

Basically something like a mini-Scrapebox, except it only has one function -- to scrape search engine results and automatically import them into specified projects.

I (and I take it many others) have servers that just scrape GSA SER 24/7 with a lot of different engines, but for submissions, I use 100 dedicated proxies (which often will get banned from the SER search engine scrapes because it's not enough). In something like Scrapebox, I'll use 2000+ proxies (from GSA PS/SB harvester etc), or a few Google proxies if I'm scraping Google.

Anyways, here's my idea.

You setup a "project" in this theoretical tool (GSA Search Scraper?). It can pull engines from the SER directory so it knows which footprints to use. You then provide a custom list of keywords for the project, and also perhaps it could have a built-in massive list if you don't want to provide your own... and also an option like SER "don't use keywords when searching".

Then with SER open on the same PC, you tell it which project you want to automatically import the URLs into (could also be an option to save URLs to file, but the real power would be in automatically importing into SER). The tool would be multi-threaded and able to handle many projects going at the same time. It would just feed SER projects new URLs to processes 24/7.

With a dedicated platform like this, perhaps it could open up the possibly of integrating new and different search engines that aren't already in SER. Even if not, it would allow people to use 2000 proxies for the scraping tool, while still using their dedicated proxies for SER submissions.

The program could also directly integrate with GSA Proxy Scraper, and allow for the use of those proxies and/or the Internal server to run when scraping the search engines. The way I imagine it, is with the exact same proxy manager as SER, "uncheck proxies that don't work" "stop project if not available working proxies" etc. It could also support adding Recaptcha v2 services for scraping Google. Services like SolveRecaptcha.com, or xEVIL for people who have xRumer would be useful for something like this.

- - - - - - - - - -

The mini-tool that's in SER is decent, but it's really bare-bones, you can't switch to a different set of proxies, it can't auto import, etc. On the tool that I purpose, the GUI could look like SER with the engines listed in categories for easy selection. There could be options to "constantly stream new URLs into SER project" or, "import URLs after X found" or "after X minutes" and so on.

I could go on, but I think you get the jest of what I'm talking about. Something like this could easily go for ... $49? $99? It would essentially allow you setup completely automated SER scraping servers, and would make a really cool addition to your software collection.

Let me know what you think!

Comments

  • DeeeeeeeeDeeeeeeee the Americas
    "The mini-tool that's in SER is decent, but it's really bare-bones"

    I really like your idea for an advanced GSA scraper that can do more than the SER or SB scrapers.  With multiple projects running in SER, this would definitely be of extreme use and provide new options for using SER.

    Guess we'll see what Sven and GSA say about this?? :)
Sign In or Register to comment.