Some really basic questions about SER

fakenickahl · May 2014

I should clarify that the 1% verified is the amount of verified from a freshly scraped list. I pretty much pulled the number out my ass, but it sounds about right according to what I've seen. I have no idea what percentage of submitted urls becomes verified.

I do agree that you should optimize your footprints. I've only removed those footprints from engines that performs poorly with my setup so far. However I think you'll see a lot better results if you acquire footprints which aren't in SER by default. A little out of the box thinking should also get you far here.

Also, I'm wondering if you guys really have seen any difference between splitting url lists before importing them? I've imported lists of up to 15 million into one project and didn't notice much difference between that and 100k urls. Currently I'm just keeping it to 2 million at a time per project though. Guess it's time for me to properly test this out once again, but I'd love to know what experience you guys have had.

Artsi · May 2014

@fakenickahl, yeah!

That's what I've been confused about!

If I scrape with footprints not recognized by SER, how can it post to them? Or rather, how can I make it post to them?

By the way guys... Got to love this forum! Super answers, thanks everyone!

fakenickahl · May 2014

You can edit the engines yourself if SER is unable to post to the urls you find with new footprints.

Artsi · May 2014

So, how does that work in a nutshell? I go in an edit the text files in the global site list folders or what?

gooner · May 2014

@artsi - Just checked our data and we are getting 5 - 30% verified from raw scrapes depending on the engine.
Auto approve stuff like XpressEngine can be as high as 30%.

As an overall average, Footprint Factory boasts of 8% - So that would be a good target to aim for.

Artsi · May 2014

Ok, nice! That's good to know!

I'll just see where I'll arrive at once I'm done with those lists...

fakenickahl · May 2014

Go to the location of your engines (C:\Program Files (x86)\GSA Search Engine Ranker\Engines) and edit "page must have1" accordingly to your needs in the .ini files.

Artsi · May 2014

Wow, okay...

I just ran some urls in my SER, and the numbers are very disappointing

In SER number 1:
- 193 000 urls
- 9300 submitted
- 500 verified
(0,3% from total, 5,4% from submitted)

In SER number 2
- 209 000 urls
- 15 750 submitted
- 790 verified
(0,4% from total, 5% from submitted)

These numbers suck big time. I don't understand what I'm doing wrong...

I think though, that I used ALL footprints form SER, and then removed all that had less than 1M indexed sites to them (did this in Gscraper). I have a keyword list of 428k keywords, and I used that.

I now have some urls coming up that are purely verified contextual footprints, and keywords focused on my niche. Let's see if those are any better.

Feeling kind of disappointed right now. Hopefully this'll get better.

Artsi · May 2014

Okay...

I just scraped a list of 800k urls. I used the Footprint Factory to come up with footprints, and I used a couple of hundred really broad keywords.

I've ran that list on 4 projects right now, and as the list is approaching is end, I have about 80 verified per project.

What am I doing wrong here? How could it be even possible to have just 80 verified out of a list of 800 000 urls?

Where should I begin to diagnose this problem? Does @gooner or @fakenickahl or @davbel have any suggestions?

EDIT: Ah, seems like my log is almost exclusively filled with "no engine matches". Could this be it?

gooner · May 2014

Hey @artsi - First ensure your proxies and everything else is working properly.
Assuming that's all ok... Next question to ask yourself is about the engines they were. Some engines have very low success rate... Wordpress for example.

If it's none of those reasons then you have to assume it's a bad footprint and find another.

Artsi · May 2014

Hmm.... The proxies seem to be fine.

I do get quite a bit of "download failed" messages, though.

@gooner, I had this exact same thing happening me with earlier lists (with no footprints other than those from SER). Disappointingly low verified numbers.

So, looking at the log here... I see a lot of wikipedia articles there on wikipedia.org, and naturally "no engine matches". Think I should perhaps try scraping without keywords?

EDIT: That particular list was for Buzznet, since I had the most verified urls in that engine.

gooner · May 2014

@artsi

Scraping without keywords will only return limited results i think.

If possible, you need to try and add another qualifier to the footprint that will eliminate the wiki's but keep the good URLs

So, you want something that the good URLs have that wiki's don't.
That might be very easy to find, on the other hand it might not.
I'm not familiar with that footprint so it's hard to say.

Artsi · May 2014

Okay, since this is a "basic questions" thread, I'll just add another one in here...

What exactly is an url in this case?

I mean... I just had a look at the Buzznet website. It seems like nowadays it takes a Facebook account to register there. Does this mean, that I can't create Buzznet properties anymore?

And one thing I've been confused about is this: let's say I have a bunch of verified links, which are some some sites I've created, like mysite.somepopupulardomain.com.

If I now take that verified link and put that through some projects, what is SER going to do with it? Log in and put another post? Or create a new profile altogether?

I understand that if I have a url to a blog post, SER would go ahead and leave a comment there, right?

And thanks @gooner, I'll have another look at the footprints!

How many times is it good idea to push your verified list to projects? Isn't that going to leave a footprint?

gooner · May 2014

@artsi - No probs mate, if you post from verified list it picks URLs from there randomly and not all will be successful, so i wouldn't worry about footprints there.

Even if there was a clear footprint i've never seen any evidence that this will result in your site being penalised.

If you use new projects, SER will try to create a new account there. Because it doesn't have the account data from the project that originally created the link.

I don't know about Buzznet sorry.

Artsi · May 2014

So @gooner, let's say I'm creating 10 Web2.0's and I use SER to blast their DA up.

I now have probably something like 1M identified and around 40k verified urls.

I'm now blasting them one at a time, so that I have around 30 projects running towards a property. You think it would be ok, to drive the verified list through all of those projects?

So, 30 per property, and say, 10 properties = 300 projects to run the verified list through. Waddaya think, buddy?

gooner · May 2014

@artsi -Sure why not? I use identified and verified always.
Identified for new lists (to grow my verified list bigger and increase link diversity)
Verified because SER runs quicker from verified links.

Artsi · May 2014

@gooner, wow I think I just had an epiphany of some sort!

Thanks a lot!

Artsi · May 2014

@gooner, here's another one for you...

When I import target urls from site lists, do I need to have that option checked on the project options? (underneath the search engines, use global site lists if enabled).

gooner · May 2014

@artsi - If you import directly from projects then no, if you want the project to post from URLs in those folders then yes

Artsi · May 2014

@gooner, do you use Footprint Factory in coming up with footprints? I think I've created some pretty bad footprints and thus, gotten really bad results.

How many footprints you think it would be good idea to create for scraping? Any ballpark figures?

gooner · May 2014

Hi @artsi - I have Footprint Factory but haven't played with that much yet.

There's no ideal number of footprints really. It depends on the platform. The more good footprints you can find the better of course.

But you need to test the effectiveness of each footprint, so when testing you should be scraping one footprint at a time. If you throw a bunch into the scrape then how do you know which were good or bad?

Artsi · May 2014

That's a good point!

But isn't that going to take forever? I mean... If you pick one CMS, throw a bunch of urls into FPF, and as a result end up with, say, 100 snippets that you in turn turn into footprints... You could be looking at thousands of footprints.

You suggest testing all of those individually? Like scraping a little, and then trying to post to those or how exactly?

Artsi · May 2014

Oh hey @gooner! I got an idea...

In Gscraper, you can (if I've understood this right) delete footprints that have less than X amount of indexed pages, right? Well... I'm going to have to try that one... Do you use Gscraper? You've got any good suggestions on what would be a good amount of indexed pages to still try out the footprint?

gooner · May 2014

@artisi - Yea good idea. I forgot about that.

I think i eliminated all that have less than 100,000 as a starting point.

I use Gscraper but at the time i used Scrapebox to check that info, it does the same thing.

Artsi · May 2014

Okay, I've now experimented with all kinds of stuff....

No wonder I was getting such bad results, because my footprints were just all over the place. Even if the words "privacy" and "Contact us" are together on a site, it doesn't automatically turn them into footprints for some PHP Fox or something

Okay... My man @gooner, here's a good question for you. As established, I'm running 2 SER's. I think I've been unknowingly stuck at Recapthas, since I haven't used any other captcha breaker than the GSA CB.

Bad mistake. I've missed practically all the GOOD targets all these months. Oh well, live and learn.

Well anyway...

Now that I have two SER's - the other one having the 50 thread ReverseCaptchaOCR on it - I just got thinking if it would make more sense for me to use this SER 1 for posting ONLY to contextual engines, and the SER 2 to post to everything else.

You think this would have any distinct advantages?

Some really basic questions about SER

Comments