[Request] Search Engine Duplication Results List
Does anyone have a resource or a list of SEs that utilize their own results and DO NOT pull from other SE results like Google, Bing, etc?
I'm going to need to deselect some of the SEs I have selected recently from SER, as I have a feeling I'm pulling the same results over and over again from different SEs.
I'm going to need to deselect some of the SEs I have selected recently from SER, as I have a feeling I'm pulling the same results over and over again from different SEs.
Comments
I also realize that SER is going to filter out any duplicate results anyways......so I'm in a bit of a quandry here.
Do I utilize more select SEs for more results, but sacrifice system resources on the overlapping results (even though filtered)?
OR
Do I select a few SEs that I know will deliver unique results, but possibly come up with a fraction of the results for my links to be placed?
I had been using only a select few SEs for my projects in the past, but then I read a few threads from you and Sven about expanding our results with more SEs that no one else really utilizes. Which ones do you use with the best results with min overlap?
This will bugger those up that follow my tricks on getting good daily submissions
If your using google as a search engine, plus shared proxies. Its worth running three or four googles and go for weird countries on some choices. Plenty of choice to use in gsa. You could even go for a different selection of googles on each project and tier to spread the search load
The average person running any kind of search scraper with proxies, hits the UK, US type engines.
So your proxies might be banned from those engines giving zero results on a term that might return thousands
Your running a google fall back
If you dont use a big bunch blog engines, again you can find your ips are banned
If you only use google, without the blog search versions, you need to alter your engines files to suit
Each google search can return 100 results per page
I spent almost a full day going over this, so let me share what I found, and what I eliminated:
Bing=Yahoo => eliminate either one, I kept Bing
Startpage=Google =>eliminate Startpage
Lycos=Bing => eliminate Lycos
Ecosia=Bing => eliminate Ecosia
Keep DuckDuckGo - very unique results.
I dumped Sky because I couldn't access it on multiple ocassions.
Ask is powered by Google, but they layer an algorithm on top of Google, so the results are a liitle bit unique, so keep it.
Eliminate Google on English speaking islands like Samoa, Antigua, Bahamas, Barbados, etc. Too similar to Google.com.
Most compilers that say "Powered by Google, Bing Yandex" are all owned by a company called "infospace" (they own 100 search engines which all do the same), so keep Excite, but get rid of other compilers like metacrawler, dogpile, etc.
Keep Baidu, Yandex.
Use international search engines which are choices about halfway down the list.
All in all, I end up with about 112 search engines.
Without feeding lists (which skews the results positively), and using all platforms across about 30 projects with only about 4 of those projects bottom tiers with no submission limits, and just using the SE's to find targets, on 100 threads and 30 semi-private proxies, I am able to get about 30,000 submissions per day, with roughly 15% verification.
For the record, I compared each search engine manually, side by side, with two browser tabs. I probably went about 4 pages deep with each search engine to see if those results were the same.
Ozz, you are absolutely correct - I did check that when I did it, and it only goes one page. The only reason I kept it is because they had completely different results than Bing and Google.
20 x random google. Bin the rest
The only captcha software that can keep up with my link building is GSA Captcha Breaker. And its still only in beta. Which gives some idea of the speed it can work at and not lock up
And I have had to drop my threads by thirty to slow SER down
I found using the likes of Bing, Yahoo etc etc, the results returned were of small amounts
Bang out ten results then waste time waiting for another search spot. It became too monotonous and tedious. Load several hundred search results at a time, bang through them and hope the decaptcha system can keep up. Which it does now thanks to Sven
Which goes back to Ozz's point, and I agree with you. I haven't done anything since then, but if we are talking about efficiency, you want the search engines that can deliver the most results in the quickest amount of time. Which is why I should let go of DuckDuckGo and some others as well.
What kills me is that these search engines are supposed to have completely different algorithms, yet they tend to deliver the same results (English speaking SE's) - just maybe in a different order.
I think the only way to game it over the long term so you have massive diversity with different websites is to scrape each platform with something like SB for a zillion search terms, weed out the dups, and feed this beast.
@Ozz - that is a great point! Maybe we should consider translating our keywords, once they have been run through English based SEs and then re-run them according to the new SEs we want results from (German, Polish, Chinese, Russian, etc). If this was automated as a separate option, on down the line, EVEN BETTER!
@Ron - thank you very much for your testing feedback. It has been very useful and will be well utilized. I believe in squeezing out as much efficiency as you possibly can with SER.
So here is the conclusive list we have so far:
Bing
Ask
Google
Yandex
Excite
Baidu
Yandex
International SEs
Google and Bing with translated keywords (Google DE, Google RUS, etc)