Skip to content

Verified Site List - Not Verified at All?

hey guys,
quick question for you. the last few weeks i've been scraping like a mad man to create a solid verified site list my projects can use. the other day i thought why not give it a test run and create a dummy project, just to see how high i can get the VPM up. the results have actually been quite good, but i'm still not satisfied, at all.

i've noticed something very strange - while only using links from my verified site lists, i got a lot (at least a lot for a verified list) of errors i usually only get when going through a freshly scraped url list (no form at all, no registration page, xxx expected, unknown platform, ..). this really got me riddled.

then i actually thought of one reason that may be responsible for this: maybe some of these sites are temporarily unavailable, have changed their layout, engine, etc., etc. and therefore aren't what they were before.
would you consider this to be a very possible reason, too, or is there something wrong with my list/settings?

Comments

  • I cannot speak for others but my scraped lists also have same behavior. I guess its the nature of this things as you mentioned - unavailability, misidentification, layout, etc.

    So at the very least you're not alone in this 
    ;)
  • ronron SERLists.com
    edited June 2014
    @Tixxpff - Just to add, those verified links definitely decay because of all those changes you guys mentioned.

    So if you are using your verified list to make links, it is smart, every so often (like minimally once a month) to:

    - Dedupe at the URL and domain level
    - Then empty the verified folder (move the contents to a different folder)
    - Now you have an empty verified folder
    - Import the deduped links (formerly in your verified folder) into lower level tiers with all engines turned on
    - Let SER process and find the gold nuggets in this mess until it completes
    - Dedupe this 'refreshed' verified folder one last time at the URL and domain level
    - Then you can import this refreshed verified file into projects or simply run projects off verified, or both

    I know I didn't answer your question because I felt you guys already answered it correctly. All I am adding is some Best Practices into the equation so you can make more links, and more efficiently.
  • @ron This is exactly what I was going to ask next. I actually thought of this exact same method myself. I was wondering if it'd be a good idea to go through your whole verified list every now and then to keep it as clean and as high quality as possible.

    There's just one thing I didn't quite get. Do you keep your original verified folder or do you delete it, once you've gone through it and have extracted the gold nuggets, like you call them. Because to me it sounded like you keep it, but I don't see the reason why.
     Let's just say that 50% of your original folder have passed and moved on to your new 'gold nugget verified' folder, then your original verified folder would only consist of exactly these links (and therefore be skipped, unless you allow multiple posts on the same site) and the other 50% would be links of quality close to garbage, because many of them have become permanently unavailable and a small percentage may only be temporarily unavailable. But all in all, pulling links from your nugget folder AND your former verified folder seems kind of redundant to me.
    But then again, maybe I totally misunderstood you there.
  • ronron SERLists.com
    edited June 2014

    Nah, you throw it away. Your verified file is only as good as it is current. You processed the entire old file, extracted the gold, and now you throw away the old one - it has no value.

    That is why you want to run that old verified file through several spam tiers. You want several swings at the plate for each link, and if 3 (or more) projects can't make a link work from that old verified file, then who are you kidding, right? Time to take out the trash.

    You want to import it as it will try to post to each URL in order (as opposed to ticking a folder in projects:sitelist which chooses targets randomly - and often repetitively). You simply want nothing in the target URL cache of these spam tiers before you import the old verified file. You want no sites lists checked. You want no search engines. You want all engines checked. You want to only process that list and nothing else. And when it is done, bam, the icon shows up, and you are finished.

    And yes, turn on "post to same domain more than once" or however it is worded. In fact, from these projects, you should also be deleting all target URL history. And to take it even one step further, use all new emails, as in 10 new ones for each project. You want the maximum efficiency to make every link possible with this type of processing. This is obviously a junk tier that makes no direct links to any of your moneysites.

  • nawshalenawshale Sales & Tech Support at www.SERVerifiedLists.com
    I am running same verified list under different filters i always end up with 80% verified ratio. To be honest i don't worry about wasting targets or verified ratios since i am scraping daily plus i buy most verified lists i always have enough targets and unique.

    Its always good to subscribe for verified lists.

    Plus i really like SER inbuilt scraper too

  • @ron I just noticed you said that I probably want to clean my list at least once a month. Does this include letting it run through the SPAM tiers to create a new and fresh verified list? I do all the other stuff (deduping, etc.) on a regular basis, but running my whole list through 2-3 projects is going to take a lot of time and ressources. My VPS may not be the worst, but it's no dedi.
  • ronron SERLists.com
    @Tixxpff - I do find it amazing how quickly links go bad in the verified file. Please understand that I have made more mistakes than anyone on this forum. But because of that, I did figure out what works. Yes, you want to run it through an organized set of test projects.

    Do what I say:

    • Create 1 spam project from hell. No filters, all engines checked, going to Yahoo or Wikipedia, unlimited links.
    • Once you have it all set up clone it 9 times. You want 10 spam projects from hell.
    • Name them Test 1, Test 2, Test 3, etc.
    • Stop all your 'real' projects because this is going to put your efficiency on steroids anyway.
    • Dedupe your verified list, and move the contents to a different folder. Basically cut and paste.
    • Then backup that verified into a second location just in case - you will never touch this second backup.
    • That means your new verified folder is bone dry - 100% empty.
    • Start dividing this sitelist into 10 chunks.
    • Import Chunk 1 into Failed (make sure it is empty first) using Advanced Options. 
    • Then Import from Failed that sitelist into Test 1
    • Do this process 9 more times so that each test project has its proper chunk.
    • Make sure after each import into Failed that you empty it after importing its contents into each project
    • Your goal is to make this processing as fast as you can - that's why we divided the big butt verified sitelist.
    • Hit Start, and go watch TV, and then go to bed.
    • When you wake up, even if it has not finished processing, you can still turn on important projects (although I would prefer to wait until it was done).
    • When it is all done, turn off the test projects, and light up your real projects.
    You have no idea what a clean sitelist makes. There are no excuses. You either want better efficiency or you don't. The damn projects will be fine without links for a day or two. The difference is enormous when you have a good list. Freaking enormous.

    Yes, I have a dedi, but so what? A clean sitelist exponentially improves linkbuilding, whether a VPS or a dedi. I still made 200 LPM on a VPS. By the way, here are my stats with our latest list...

    after only 12 hours:

    image

    So a 250,000 verified links kind of day - because the list is clean. 

    It is worth your time to pause your operations and do what I said. You will be extremely happy.
  • @ron holy crap, I didn't expect such a detailed answer. thanks a lot, mate. Alright, I'll give it a shot. I was going to do it in a few days, anyway, because I haven't done it before and wanted to see the before/after results. But since you gave me a complete step by step tutorial, I'll actually do it tonight.

    Just 2 really quick questions

    "That is why you want to run that old verified file through several spam tiers. You want several swings at the plate for each link, and if 3 (or more) projects can't make a link work from that old verified file, then who are you kidding, right? Time to take out the trash."

    That's what you wrote earlier. You didn't explicitly mention (in your step by step tutorial) to take several swings at one URL. Should I tick 'Continously post to same URL even if failed before' for these spam tiers, or rather create 3 projects for the same chunk of links? You got me a little confused on that particular detail.


    • "Then Import from Failed that sitelist into Test 1"

    Is there a particular reason why you suggest importing the file chunks through one of the global site list folders (the failed one in this case) instead of simply splitting the file into 10 chunks and then manually import them right clicking each of the 10 projects? 


    Again ron, words can't even begin to describe how helpful you are. I appreciate this so much, I really do.

  • ronron SERLists.com
    edited June 2014
    @Tixxpff - You made it sound like your verified file was massive. That's why I suggested splitting it into 10 parts. If it wasn't so big, my preference is to run the entire verified file through all 10 projects - 10 swings at the plate, lol. So all I was trying to do was give you a rationale alternative for processing a very large file (my file is huge). Ideally, take multiple swings at the plate.

    You are correct. I may have over complicated the explanation because I was trying to split them up and all of that. It all depends on your approach. I prefer to import as a sitelist. It has been so long since I have done otherwise, I can't remember, but I believe it is more efficient than just pasting in a bunch of targets. At least with the sitelist approach it is already sorted by engine properly.


  • Just wanted to point something out I've seen while scripting engines. Sometimes SER will make a connection (??) to a website yet it doesn't download any HTML. I'm not talking about a simple "download failed", but a website that SER believes has been loaded successfully, and therefore returns "no engine matches" for as it sees nothing on the page. To pull a number out my ass, I'd say it happens 2-5% of the time for me. For this very reason I'd always run my verified list at least twice through dummy projects to clean it.
  • Holy shit.. I don't know if that was supposed to happen, but after running my list of 350k verified through 1 dummy project, I'm down to 30k links.. what the fuck? How is that even possible? I duplicated this dummy project and now I'm running 5 projects and each is going through my whole verified site list.

    I'm not using any filters whatsoever. All I did was untick a few engines I never use. I deduped my 350k list on a regular basis. I don't even know what to say right now. I thought I might lose 10-30%, hell maybe even 50% - but 90% (!!!).

    I'll wait for the 5 new projects to finish and then evaluate the results again.
  • BrandonBrandon Reputation Management Pro
    @Tixxpff this is the first time you've done it. What this means is your list was pretty bad because it's been stagnant. That's why I always advise against buying lists, even if they sell a new list each month, it's already old.

    I do exactly what @ron said on a weekly basis. I have two dedicated servers scraping 24/7, between 25,000 and 100,000 URLs per minute each. I run all of those scans in multiple servers to find good stuff. I just finished my new master list and it has 12,000 unique domains (no blog comments, indexers, or exploits).

    12,000 unique domains doesn't sound like much, but I get number similar to @ron because it's a 100% clean list.
  • edited June 2014
    @Brandon Ok this actually does make a lot of sense to me, but then can you explain to me how people use spam link structures, firing away 200-300k links a day if there are only ~50-100k verified links in your list?

    I mean if even people like you who scrape like their lifes depend on it can't sustain a big list with more than 200-300k links, than how am I supposed to pull that off?
    My plan was to scrape enough so I have a basic verified list with 200-300k verifieds which were going to be used for my ranking projects. But since I'm now left with ~30k URLs (of which ~50% or even more aren't even dofollow) I don't even know how I'd create a tiered structure, because I simply don't have enough links.
  • BrandonBrandon Reputation Management Pro
    @Tixxpff multiple projects, allow posting on same domain, blog comments, trackbacks, etc.

    I care about one link per domain. I don't want to post 25 links on the same domain, although that would definitely make my numbers look better. I don't believe that certain types of links provide results so I exclude them even though that would inflate the numbers. My point is that results are more important that numbers.
  • ronron SERLists.com
    edited June 2014
    @Tixxpff - What @Brandon said is correct about one link per domain. On lower levels, I honestly don't care about posting multiple times on lower level tiers (I would never do that on a T1 property).

    I have unused capacity on the servers, so I just let it rip. What he is referring to is that Google has said on a number of occasions that if you have multiple links coming from a website, you only get credit for one link. So the point is, why even bother?

    Don't take this part too seriously... In the Theory Of Link Juice, it is a zero sum game allegedly. If a website has incoming links worth, say, 15,000 units of linkjuice, it is divided by the number of outgoing links. So if there are 15 outgoing links, each one carries with it 1,000 units of linkjuice.The Amended Ron Theory Of Link Juice (rofl) states that is if I have 5 of those 15 links, I should get 5,000 units of linkjuice. Or at least more than those other guys who only have one link. I mean, I was a finance major, something doesn't add up, right? So just in case with Google "they say one thing and do another", I pepper all the targets with extra links (on lower levels). 

    I know that I am talking some smack here, but I am barely awake, and I am totally justifying my reasoning for creating insane amounts of links. I also like when the speedometer goes up up up in cars too.

    As he said, the result is the only thing that matters. 
  • edited June 2014
    @ron So if I've understood you correctly then I should rather focus on getting as many different domains on my T1 level, as possible. On T2+ and your secondary links to each tier you rely on posting multiple times to the same domain to increase the amount of links, instead of posting do unique domains only (which results in more speed and more link juice from each domain considering your link juice formula).

    From what you guys have told me I get the strong feeling that I should rethink my scraping strategy. Right now I'm using no filters on my scraping footprints and my goal is to scrape as much as possible.
    But now I'm thinking I should filter out all the unneccessary NoFollow platforms and additionally focus on contextual dofollow links. I'll still get a lot of NoFollow links for diversity, but right now (if not applying any filters at all) I get 25% df and 75% nf.
  • ronron SERLists.com
    edited June 2014
    @Tixxpff - Yes, you want ideally one link per domain to any moneysite with your T1.

    My linkjuice 'formula' was tounge-in-cheek. So please, do not take that all too seriously. I was just having fun.

    But I do make a ton of links on lower tiers - for better or worse. Ideally, I would love to have them all be unique domains as that would be so helpful. But it is so impractical in my own opinion. Plus things have worked out ok for me just blasting the snot out of lower tiers. Again though, if I had infinite scraping ability and the patience to go along with it, I would do single posting to all domains.

    A lot of people try to play games with targeting dofollow. Not saying there isn't some validity to it, but in the extreme, it is very unnatural and makes you a target. So do that wisely.
  • @ron - Yes, of course I know you weren't dead serious when talking about your 'formula', but I liked the name and I think the essence of what you were trying to explain was pretty clear.

    Right now I'm trying to figure out how to properly build links and what to focus on and how to do it exactly. Because my current strategy obviously doesn't work that well, since 90% of my scraped links die after a couple of weeks. Repetitive posting to the same domain/URL definitely makes things easier and more effective. And I need to be efficient, because there's nothing I hate more than being inefficient and wasting my time.

    Now, when you talk about 'lower tiers' are you referring to your secondary links that point at your CDF tiers, or does this include your lower CDF tier(s)? Or does every single link in your CDF pyramid come from a unique domain? Just wondering, because that'd be hard to pull off, I guess.

    Regarding the dofollow matter - I wasn't going to shoot 99% dofollow links at my money site, but I'd rather have as many as possible. Especially since even if you uncheck typical nofollow engines, you still end up with ~50% nofollow links.
  • ronron SERLists.com
    edited June 2014

    @Tixxpff - Yeah, unchecking the nofollow creates less links. But if you have things spinning at a high LPM, then I say go for it. You are right, no matter what you uncheck, you are still going to get a fairly balanced blend. So I like it.

    With my secondary links, I was referring to T2. I was babbling that if I could possibly do it, I would love to have unique domains all the way through the pyramid, but it is incredibly impractical. I think some people try to do that in practice - but I am sure they make a lot less links (as well as constantly running out of targets, if done on any kind of scale).

    Overall, I like how you worded it. Insanity is doing the same thing and expecting different results. Change it up dude!

Sign In or Register to comment.