I decided to take some time to try and gauge the effectiveness of the non Google, Yahoo and Bing search engines, as these engines can be good sources of urls, particularly if your proxies are prone to bans. So, to do this I created a simple engine file with one footprint - "powered by wordpress". Surely this has to be one of the most widely spread footprints on the web, and any worthy search engine should be able to give us some results for it.
Next, I unchecked by mask all search engines with google, yahoo or msn in the name. This left me with around 575 search engines. I then started a scrape at 50 threads, 60 seconds between searches, and using a plain residential ISP IP.
After the scrape was finished, I took note of the search engines that gave no results. These were:
When I have time, I'm going to try and go through each engine individually, to see if it needs fixing or just needs deleting. But so far, I see these updates need making:
Euroseek * - this now seems to be just a directory, with no search. I'd remove it.
Expopage - does not appear to be a search engine.
FindLink - appears to be dead.
Gigablast - seems to give an ascii-based captcha that I don't think is solvable via automated tools. Remove?
And some quick fixes to some engines:
Will add to this thread when I have time.