Advantages of Scraping Your Own URLs over SER Finding Them?
I've always been curious to know if there is any real advantage to scraping your own urls (using set footprints), importing them and letting SER do it's thing over allowing SER to scrape them for you during a typical project run, thus saving yourself the time in scraping. I can't seem to pick out a real advantage here. Example..........
I use SB to scrape Google and Yahoo urls with the footprint "member.php" "Powered by MyBB" "golf bags" and come up with a list that is imported into SER and run it. I just don't see how my results would vary much compared to just importing my keyword "golf bags" into a project, setting the appropriate settings to scrape Google and Yahoo SEs, and ensure that Forum > MyBB is checked.
Any reasoning to use one over the other is welcome........
I use SB to scrape Google and Yahoo urls with the footprint "member.php" "Powered by MyBB" "golf bags" and come up with a list that is imported into SER and run it. I just don't see how my results would vary much compared to just importing my keyword "golf bags" into a project, setting the appropriate settings to scrape Google and Yahoo SEs, and ensure that Forum > MyBB is checked.
Any reasoning to use one over the other is welcome........
Tagged:
Comments
As an example, I have a small list of private proxies (20 only) and sometimes while using SB i forget to use my shared list while checking for PR, after a few links all my private proxies are blacklisted, if i go back and do it with public or shared, sure it will take longer, but i will have all of them in the end.
@ pisco - interesting theory, but I have never found any evidence to prove that any of my private proxies are getting blacklisted or temp banned by Google. I use SB all day, everyday, with 20 private proxies. The thing scrapes 100,000+ results for me in less than 10 minutes. I check PR, OBL, and then post comments all in the same sitting without any issues whatsoever. There is actually a huge difference in proxy usage for programs that utilize IE browsers as compared to no browser at all (SB and SER fall into this category). I'd definately use private and public proxies for programs that actually open a browser like IE and utilize the submissions this way.
I have never personally experienced problems with my private proxies being banned/blacklisted within SER. I have tested with private proxies, no proxies, and public proxies. The result = private and no proxies usually receive around the same results, however public proxies have resulted in dismal results. Even the creator of SB and SER (guy in the tutorial videos for SER) state that private proxies or no proxies will always result in better performance. I have also watched (DVD and streaming video) and spoken to a TON of hardcore multitasking SEO experts that use 80+ private proxies with 5-10 VPS systems and they never once mention their private proxies getting banned/blacklisted.
If you have any hard evidence to back up your claims, I would love to see it, as this could help improve everyone's performance on all automation software platforms.
I never said public or semi perform better than private, just that private aren't meant to be abused when you can get temp banned from it (unless you have a huge amount of private that you can replace from time to time), and this is why you configure a delay between each search on SER, otherwise you would just go all out.
Here is a screenshot of a test showing it happening, proxies were going all out, and after a while they all come as dead, and the PR is not picked up.
I'm not trying to contradict any expert/guru, heck i'm still making pocket money from SEO, but fact is this is happening to me and that is why i use a shared list to scrape and pr check.
For 20 private proxies use 2-3 Connections.
This is still faster than having tons of Public proxies and using 100 Connections. Why? Public proxies are unreliable and a lot of time will be spent skipping/removing dead proxies, Some public proxies are even transparent and show your REAL IP in someway.
I'll just have to try that and compare if using 2 connections with 20 private proxies beats 100 connections and a shared proxy list (not talking public here).
So regarding SER, what is the magical % of private proxie vs threads, surely it can't be the same 10%, otherwise i would only run 2 threads .
IF we are talking about Page Rank Checking or Searching (with the Google Search engine checked). Then you have to lower Connections or Google will ban your proxies.
For anything relying on Google, connections should be 10% of your proxies.
That screenshot doesn't show or prove anything. That just means that the site has no PR. In some cases it could mean that the PR was just not able to be detected at that moment in time, but that's just theory.
Even when scraping Google, I use around 40-60 connections with 20 privates. No problem. In my experience and many others I speak with who do 10X the amount of scraping/PR checking in mass amounts, the only problem you can face are complaints from the actual proxy provider for using the proxies too much, too frequently (if their shared that is).
@Tank and GlobalGoogler - Never ever heard of that 10% rule (I do a lot of reading). Waaaay too conservative for my taste as well. I have never heard of anyone being that conservative with their paid private proxies. As long as your not being ridiculous with them and letting them go at 100+ threads with say 10 proxies, then I think we'll all be doing just fine.
ANYWAYS.....the proxy discussion got my thread a little "sidetracked". Anyone else have feedback as to what pros or cons scraping and importing your list will give OVER simply letting SER do it's thang!!
@Pisco- My bad buddy! I only skimmed the top of your screenshot. Didn't take notice of the bottom part where it states "Bad Proxies". I have never got that message when doing my checks before, but I'm happy to see SB at least reports that.
Back to topic.....SB Imports VS SER
You haven't outlined a distinction between why you scrape with Hrefer and just not allowing SER to do it for you. Your basically commenting on how Hrefer is better at scraping than SB and you import a ton of urls. I hope you realize that doesn't cover the basis of the original question.
Also I'm not talking about scraping targets within a project, but externally.
I guess it comes down to what program your more comfortable with. I've been using SB for years, so I see no need to go elsewhere, as the advantages are none or very miniscule.
2nd point- I wasn't really talking about comparing SB with GSAs scraper, but more so, comparing imported scraped urls into SER over typing in your own keywords and just letting SER run it's course. THANK YOU though for your feedback. It is greatly appreciated.
Yeah, this thread got sidetracked, but in a good way . Some great information on SB.
@grafx77, getting back your original question, I think it has to do with efficiency. GSA can run on full throttle with all threads on a list that was fed to it, whereas the GSA scrape is confined by the timing parameters for the scrape and the timing of the search engines' responses.
Personally, if your needs are getting to the point where you need 50,000 or 100,000 or 1,000,000 links a day, then you will need that other heavy artillery. Like ranking fast on hard keywords, etc.
But for what most of us do, I think GSA works pretty darn good.
Obviously the timing parameters can be adjusted and the same parameters are true for SB and other scrapers, so this point is invalid.
I can see using scrapers more in conjunction with SER for doing MASSIVE campaigns, to help out with the load, would be very beneficial. I think this may be the best answer we've got so far on the topic. Thank you.