Skip to content
  • goonergooner SERLists.com
    @coneh34d - I'd be interested to know how you get on with the foreign keywords, i bought the lists but haven't used them yet.

    Automation is not really a problem for me, i have 1 server building a verified list and feeding 3 others automatically. So that's ok - I'm just having performance issues now, out of memory on dedi, lag on vps etc

    As i solve one problem another appears!
  • @gooner Windows 2008 or Windows 2012? The languages are big bonus and you should definitly implement building your verified up with them.
  • goonergooner SERLists.com
    @coneh34d - I'll give the languages a go, cheers.

    2008 on the dedi.
    2012 on the VPS' - I should get that changed i know.


  • @gooner I had a look at the series for scrapebox, and it appears I already have used the techniques mentioned.

    How many proxies are you using and what kind to scrape in scrapebox?
  • @gooner What are your techniques with SB? I mean how many threads for each search engine? Like 25 for Google, 25 for bing and so on? And what is the harvester timeout you set based on it?

    I had some 35 proxies working and I set mistakenly set 75 threads for just Google and 45 sec timeout and it burnt all of them lo, none works now.
  • goonergooner SERLists.com
    Hi @pratik,

    I have 100 proxies with 25 threads for each SE. 90 second timeout.

    I run 1 scrape every day and recently i see that i can only get 300k url's from Google, a few more from Bing but Yahoo always gives me 2 million or more. So it works out ok.

    If i leave SB alone for 2 days i can get a million or 2 from each SE. Hope it helps.
  • edited January 2014
    Also @gooner do you sort/identify in SER or just import target URLs right away after removing dups? Do you still get good performance if not doing so as there might be many useless URLs.

    Also do you see your proxies blacklisted in the blacklisted column while scraping even at those minimal settings? If yes, do they usually work fine maybe the next day or so or stays blacklisted? I find bing blacklists it more than Google in SB.

    Edit: Saw your reply above now. Thanks for answering the previous question!

    Cheers.
  • goonergooner SERLists.com
    @pratik - No problem.

    I see some are blacklisted in that column but i don't think it's accurate. Google shows low blacklisted but scrapes the least url's and yahoo has many blacklisted but scrapes the most. (Bing has many blacklisted too) Doesn't make sense to me.

    All seem to be ok after 24 hours, but i am not using them for scrpaing in SER of course.

    At the moment i have a VPS just for processing scraped lists, direct import and then my other servers use the verified lists from the first VPS. I get about 10k new unique domains per day, it's plenty for me.

    But i am still new to scraping, i can improve it for sure.
  • @gooner So you use sort/identify feature of SER in the VPS which you say you allotted for processing the list?

    And oh, one more thing. What value do you use for results per page? I use 25 to make sure no useless sites are included and it has maximum sites on which SER can post too. Going too high usually (I think) results in lots of useless URLs and non-relevant platforms.

    As how I see, you seem to have some custom software or setup which processes them and sends the list on the servers where SER uses to blast them? Cool.
  • goonergooner SERLists.com
    @pratik - I import directly into projects. Sort and identify takes too long.

    Results per page is default, whatever that is. I didn't even know that setting existing until you mentioned it :)

    I use dropbox to share the verified lists. Very quick and easy. There are a few things to note if you use that method, i think @satans_apprentice made a post with all the details recently.
  • @gooner Cool. I too directly import it. It indeed does take too long.

    Dropbox, yes I use it just for uploading backups. Will look into it definitely in future.

    Thanks once again!
  • goonergooner SERLists.com
    @pratik - No probs mate.
  • Tim89Tim89 www.expressindexer.solutions
    @gooner & @Pratik - how does it take too long to "import and sort in"? stop your projects, ramp up your threads to 1,000 +/- then import them...
  • goonergooner SERLists.com
    I import over a million url's per day, import and sort in takes way too long.
    Easier just to let SER see if it can post to them and no need to stop projects.
  • How often do you 'clean' your lists @gooner? I have now stopped import/identify and import directly now, thanks it works so much faster. 
  • goonergooner SERLists.com
    @judderman - I do that about once a month, but i've noticed it works much better if you check the option "disable proxies" when you run it. First time i run it with proxies it deleted half my list, gutted!


  • @gooner What exactly clean lists does? Removes non-working site lists I assume?
  • goonergooner SERLists.com
    @pratik - Yes exactly that.
  • goonergooner SERLists.com
    Here's a question for you guys... If you delete duplicate domains, that should also delete all duplicate url's right? Because if you only have 1x each domain in the list it's not possible to have any dup url's.

    So, how come if i delete dup url's after dup domains it still finds sites to delete?
  • That's weird indeed @gooner. I tried dup domain once but reverted back as I got scared (lol) and wanted to see more people giving tries to see how it works.

    @Tim89 Thanks. Never tried upping threads to 1K but will definitely see.
  • goonergooner SERLists.com
    @pratik - Yea i know what you mean, i'm not confident in some of the cleanup features either.
  • edited January 2014
    Here is a helpful video that covers cleaning up your site list:





  • Thanks @gooner, I never click the disable proxies bit....damn..
  • Tim89Tim89 www.expressindexer.solutions
    @gooner Well if you're importing 1 million fresh targets per day directly to a project (this makes no sense), every single day, theres no way on this planet your 1 project is processing these targets... you must have a massive backlog.


    you can easily import and sort your scraped lists within a couple hours tops.. then these targets will get hit by all of your projects as they will be available in your identified list, meaning you will be attacking all of these identified targets, with less resources being used.


    you can probably import and sort a million urls within a couple of hours and possibly increase your threads to 5000, all ser is doing is checking the url to see if it finds a matching footprint, then stores it into the corresponding identified sitelist....


    importing these raw scraped lists directly into a project uses resources as it attempts to create and post to them, which is also dependant on proxies and connection... using my method eliminates this.

  • goonergooner SERLists.com
    It's not just 1 project, it's a whole VPS full of projects dedicated to processing scraped lists.

    I hear what you're saying but i've tested both and import and sort as at least 3 times longer to process the same number of url's.

  • Tim89Tim89 www.expressindexer.solutions
    A "whole VPS" isn't much power in regards to processing millions of URLs per day (even if you have lots of projects, at the end of the day, SER can only do so much with the hardware and threads you set things at), the last time I checked, a VPS isn't quite so much "dedicated" either, they are meerly replicated virtual private platforms shared amongst many individuals, in essence, a new windows user with dedicated portions of ram/hdd space etc etc.

    I'm coming from a perspective of having 4 dedicated machines (yes, actual machines with 16/32gb of ram each) running close to 1000 threads 24/7.

    I'm not saying what you're doing is the wrong way of doing it, I'm saying it's difficult to change something you do that you believe is worth while, I'm giving everyone a much much much more solid solution that they can work with which is much less resourceful and personally, I suggest you try it this way too as it could only benefit you.

    I started out doing what you are doing when I purchased GSA SER, scraping and importing into a project that is set not to scrape search engines etc, over grew that and found a more logical method which was staring at me in the face, hence the option "Import URLS - Indentify platform and sort in".

    I'm not sure if users know, but all you need to do is "Stop" your projects, by hitting the big red stop button, then go to "Options" then set your threads to what ever your MACHINE can handle, not your connection speed, but your machine, if you have a beefy machine, you can go all the way up to 10,000 threads, it doesn't matter, then "Advanced -> Tools -> Import URLs - Identify Platform and sort in" and see how fast GSA sorts out your scraped list, disgarding all unpostable sources, it takes me minutes to process tens of thousands, the last list I imported had around 40,000 sources, which isn't that many, but I processed these in around 10 minutes, if that.

    each to their own I guess
  • AlexRAlexR Cape Town
    @Tim89 - "m not sure if users know, but all you need to do is "Stop" your projects, by hitting the big red stop button, then go to "Options" then set your threads to what ever your MACHINE can handle, not your connection speed, but your machine, if you have a beefy machine, you can go all the way up to 10,000 threads, " 
     
    Thanks! Great tip!
  • @tim89 I always find that I need to *lower* threads to import as I max out the CPU usage and SER stops responding - eg I'll post at 800 threads but sometimes need to drop to 400-500 in order to keep the machine running smoothly (I am fairly sure that SER not responding results in many urls timing out rather than being correctly identified)
    I'm using a dedicated server with a Xeon E3 - perhaps not the top of the range, but still a solid multithreading cpu. I'm curious as to how you manage to import at such a rate? What does your box run on? (I know you bulit it yourself..)
  • Tim89Tim89 www.expressindexer.solutions
    edited January 2014
    @namdas I'm not so sure there is any loading of these urls to the web to indentify its platform, this is why I increase my threads to such high limits, this isn't effecting my proxies or connection.

    Yes, by doing this will increase CPU load for sure, increase your HTML timeout if you want to run SER at higher threads when posting.

    Unfortunately, your machine is only capable of what it can.. hardware wise, if your CPU maxes out at 500 threads, then that is that, saying that, it is also a piece of software and very RAM dependant, so it's not entirely CPU related.

    How much ram do you have in that machine?

    this is roughly the spec for all my machines;

    i7 3770k 4.0ghz
    32gb ram

    I overclock all my computers just a little, some clock speeds are sitting at 4ghz, some at 4.5ghz.


  • It's an E3-1245 v2 @ 3.4 ghz with 32gb ram running 2008 r2 - but I have never seen SER use more than 2gb, sadly.
    I have a feeling in part the cpu load depends on the size of the pages too - eg 1500 threads of a list of contextual articles will be fine, while 1500 threads of a trackback page with 2000 OBL will probably clog things up a fair bit.

  • Tim89Tim89 www.expressindexer.solutions
    @namdas yes you're probably right there, I primarily deal with contextual links..
  • @Tim89, I'm picking on you again as you know your dedis...After speaking to you a few weeks ago and looking again at building my own server, it seems on Amazon you can just buy ready-made gaming PCs with the spec that I want. Would it be OK to run that as a dedi, selling the GPU to make the overall cost of the PC cheaper? If so, that seems a financially viable way for me to go rather than paying monthly for a provider, as I'm always looking for the best deals :)
  • Tim89Tim89 www.expressindexer.solutions
    @judderman yes, that's perfectly fine.. when I buy parts I tend to purchase gaming motherboards anyway simply because there better performers and I have the option to sell them on as gaming machines on ebay if I want to get rid.

    I bought a £500 nvidia graphics card on this machine, simply because I may want to play some games, I used to be a big gamer in my teens but haven't touched a game in 3 years lol, but the option is there!
  • Awesome, cheers @Tim89 - I assume the box/cooling will be more than good enough rather than me guessing at what is right. If a company has built a specific machine for gaming then it will be perfectly capable for me. 

    LOL at gaming. Last time I picked up a game was a PS1 controller!!!...I think I've worked non-stop for 15 years :)
  • Tim89Tim89 www.expressindexer.solutions
    edited January 2014
    @judderman The cooling should be good enough, just make sure it has at least 2-3 fans, all computers come with two main fans, an exhaust which is at the back of the machine and a vacuum which is normally located infront of the HDD bays, so don't include those fans, they are standard.

    try and go for at least 2 fans (not including the standard fans)

    Yes, I used to be a big player in cod4 and MoH:AA, I was actually playing in the same online league as Mathew Woodward, what makes me laugh is that if you check out his online blog, he claims he was the first ever person in the game to do a 360 air shot with a sniper rifle, which is absolutely rubbish lol, I guess it's the blind leading the blind though.
  • Excellent, thanks mate. Looks like £500 (after selling bits) for the one I want. Compare that to $90pm for the dedicated server I have (which is a different one to the one I ordered). Just need to see what Virgin Media can do for fast Internet.
  • Tim89Tim89 www.expressindexer.solutions
    @judderman coolio, please share your virgin media product you end up going for, The last time I had virgin was two years ago, had to change when I moved.
  • goonergooner SERLists.com
    Virgin Media is pretty decent. I got 100MB soon to be 120MB. They double your speed for free and offer discounted rate for about 6 months i think on a lot of deals. Speeds depends where you live though of course.
  • Just hide behind a VPN if you need to sequentially scan through a big list of proxies. Nothing like a Netscan detected to ruin your day and put your Internet access at risk.
  • @Judderman I set my up home server about 9 months and at the time I went with the AMD FX8350 as it was quite a bit cheaper than the nearest Intel equivelant.  With 8Gb ram, 128Gb SSD the processor, case and fans etc it cost about £400.

    @Tim89 I'm the same, but my problem is I keep buying them.  I keep telling myself that I'll get time to play them soon enough, but there are still 7 or 8 in my draw still shrink wrapped...I've even got the last two COD installed in Steam, but I bet I haven't fired up either of them more than once...
  • Cheers @davbel AMD FX-8350 4.0GHz (4.2GHz Turbo), 8 cores, 16GB RAM this is the dedi I should have had but they had none in stock. Instead I got 32GB but slower CPU, which is pointless for SER as it doesn't need RAM (I don't think). I will buy one soon I think, as I hate monthly payments for anything.
  • Tim89Tim89 www.expressindexer.solutions
    @davbel ye im literally the same, I have all my games in my draw ready for some day.. 
  • @judderman have a look win home server 2011 too.  You can get it for less than £35
  • Sven says 2008 Win is better for SER but £35 is a bargain, cheers. I find SER works best on Win 7 64bt. What Internet speed do you have @davbel and with what provider?
  • I've not had any issues with WHS2011 and SER, but @sven would know better :-)

    I'm in a non-Virgin area, so 38Mb fibre with Sky which averages at about 30-32.  SER consumes about 10mb so it still leaves a good 20 for web / streaming / downloads etc

    I'm looking to upgrade to 76Mb as I also run a VPS with SolidSEO, so with dbl the bandwidth I'll be able to quiet literally bring that "in-house" and build and run another server on the home network :))
  • Cool man, I have Virgin now but only 30MB for personal stuff. My dedi provider has 1GBps uplink, which is astonishingly fast. 

    I think I'd get a business line as I'm sure there are no throttling of speeds at peak times (well...that's what they say).
  • goonergooner SERLists.com
    Just got an email from Virgin telling me they are upgrading me to 152MB for free in the next couple of months. Man i might need to get a home server too!
  • ^^^ pah
  • goonergooner SERLists.com
    lol @davbel - I pay them enough money to watch football, never mind internet, so it's the least they can do :)
  • why this is not about churn and burn anymore?, Great tips and advice guys i know everything talked here is some kind related to the topic..... BUT i want to know a little bit more about the results on churn and burn techniques  :-*
  • I don't know if you have been following the journey on BHW from the link that was provided on the first page of this thread, but the latest update is that his rankings apparently fell except for 1 keyword.

    I can't access BHW from my computer at work so I don't remember the specifics, but I believe he's going to continue throwing backlinks to see for any improvements. If nothing happens, he'll move on to another blogspot blog.

    His method of anchor text was strictly the use of keywords he wanted to rank for. Since there were 50 he wanted to rank for, each keyword was getting roughy 2% of the backlinks. He didn't use generic words as anchor text, nor did he use naked url anchor text backlinks.

    You should definitely keep an eye on that thread as he plans to run more tests to try to rank faster, as he said it took him a month to reach as high as he did.

  • I think the moral of the story is not to go mad behind very very tough and manually moderated keywords. That guy wasted resources for roughly 1 month and just made $168. Not saying it's bad but if he would have chosen a relatively medium competition niche, maybe around 40K searches and low to medium competition, then he might have made way more than this.

    Using the resources at proper place is what is important, at least for me.
  • Yeah I agree @Pratik he could have gone into a relatively smaller niche and cleaned up. There are plenty of $40-60 niches that isn't moderated so harshly. I still wonder if it was the fact that he used Blogspot but then again we all blast YT videos and that's a Google product too.


  • I think part of the problem is BHW.  Cutts has said publicly that the spam team spend a ton of time there looking for targets.
  • Especially if people use a gmail to sign up to BHW and then it could be open for them to see what that same email address is used for ie. hosting, tools...etc etc

    [/tinfoil hat mode]
  • Seems Like he's still going good.  I see him about to hit first page for the Main Keyword.   Maybe he just fell for all the other keywords?  even if he ranks for that one main keyword he will still make a very good amount.
  • I think he called off that his main KW is not ranking? Not sure if the site you're talking about @Hunar is the same one as of that guy.

    @davbel Definitely. I don't think Google is fool enough or doesn't have stats to see what # of backlinks are being fired at their products. For example I am sure there might be some way where they can sort, filter and see the # of backlinks coming daily to their blogspot domains. And the guy who makes almost 70-100K backlinks daily to blogspot can be easily spot by Google team.

    Now, it doesn't mean that this is what has happened or is some manual action, but there's a chance.
  • I'm 100% certain it's his site.  Cause he gave away what affiliate company he's using.  And if you look at the site i'm talking about you'll see the link lol.   Maybe masking his buy/affiliate link would be better but <shrug>
  • @Hunar , I'm not sure what keyword position you're seeing, but he did mention that all of them fell except for one that receives a search of 40.5K. When he reported that all but one fell, he said one still remained at position 13.

    Do you guys really think this was a penalization (a manual action taken by Google)?

  • Yep the exact one i'm taking about is at Position 13. :)
Sign In or Register to comment.