Some really basic questions about SER

Artsi · April 2014

So, I've been using SER for some time, but I'm still lacking the knowledge of some very basic things. I thought to ask them here for some clarification.

1) What is a thread, really?

Let's say you have 1 project for the sake of simplicity... If you're running 500 threads, does it mean that SER will either be trying to submit to 500 links from sitelists or scanning search engines for 500 keywords or what?

2) What's the difference between thread count / timeout in global options versus proxy options?

I don't understand what those two mean. I think the global option timeout means how long SER will wait (in one thread?) for a website to send the first bit of data, right? Well, what does the proxy timeout mean then? How about proxy thread count?

3) What causes "download failed" logs?

I'm seeing these quite a bit. Does it mean that either
1) the website is down
2) it takes too long to load
3) it hit the proxy / global options timeout limit?

4) What's a proper amount of threads to run?

Now, I know this is dependent on VPS and million other settings. I'm just curious as on how this relates to the amount of proxies. Like if you're not using SER for scraping at all, can you radically raise the thread count?

5) Using imported lists

Sorry, I already posted a question about this, but let's have another run... I'm mainly curious about the engine selections when importing lists...

I mean, you're not scraping for targets when importing lists. Wouldn't it then make sense to check ALL engines and try to post to as many of them as possible? I've understood that if you don't have articles checked - as an example here - and you bring a list of 100k article urls, SER won't post to any of them, right?

Conversely, let's say you've been unable to post to a certain article engine when scraping for targets. Well, wouldn't it make sense to still tick that on when you're importing lists, in case it would succeed this time around?

I know these are really n00b questions, but I just don't understand these nevertheless

Would be awesome to hear input from guys like @Sven, @ron and other guys!

davbel · April 2014

1) A thread is one item of activity, so it could be searching for a site, posting a link, verifying a link. Basically anything that SER has to do.
2) You're right about HTML timeout & proxy time out is how long before SER decided that the proxy is dead. Threads is how many to use when you test your proxies
3) Yep, all of them. You get this whenever the site can't be accessed for whatever the reason
4) Depends on your setup, but should never be more than 10 x proxies e.g. 50 proxies = 500 threads. You can run this higher if you don't use SER to scrape and lower if you are scraping and don't want Google to ban your proxies
5.1) Yes you can tick everything and if it's unticked it won't post to them
5.2) You could do that, but you might be wasting SER's resources that could be used on other platforms to def get a link.

Artsi · April 2014

@davbel

I think I just experienced some sort of Satori! Thanks a lot!

Just to clarify... So the proxy thread... Let's say I have 30 proxies. By having a thread count of 10 would then mean that SER can test out 10 at a time, leaving 20 for "real" use? So if that was 30, SER could then theoretically halt everything else for the time it tests proxies?

davbel · April 2014

@artsi yes, that's the case, but you're always better stopping everything and then running a proxy text

Artsi · April 2014

Oh, an add-on here...

Continuing on the theme of importing lists...

Wouldn't keeping all the same engine selections lead ultimately into having a verified list full of duplicates? I mean, having at least one project trying post to EVERYTHING - wouldn't that be wiser as to the development of a verified list?

And here's one more...

6) Using verified list

I'm also kind of curious when I should drive the verified list into a project? I mean... If I have a list of my own and it's not shared with anyone, how often do you think it's okay to use it? I'm mainly thinking about leaving a footprint if each and every single project has like 95% the same links...

Artsi · April 2014

LOL!

In case someone's curious as well on the what engines to check...

I just made a test on one of my SER's.

The other half I ran ALL the engines checked. On the other one, I chose only those SER has successfully posted and verified.

Results?

ALL engines: around 5 LpM
Verified ones: 115 LpM.

Guys, know your stats.

davbel · April 2014

You can remove duplicates within SER if you want to keep your list clean

I wouldn't worry about footprints too much with SER - providing you've got a big enough list.

Even if you hit two projects with the exact same list, some sites would give you a link to one project and not to the other, different links will stick and different ones will die.

Recently I've been using a variant of what the SERList chaps suggest you should do, but until I started doing that I was using identified and verified on all projects.

Artsi · April 2014

On all projects? Wow, okay.

I think it would be acceptable to at least use portion of verified links and then the rest from imported lists or something..

davbel · April 2014

@artsi test test and then test again. You'll find a way that works for you/ that you're happy with

Artsi · April 2014

Yep @davbel, that's one of the things I've certainly learned about SER.

Lately I've been focusing exclusively on using SER on my SEO efforts, so the learning curve is quite steep.

Thanks again for great answers!

Artsi · April 2014

Oh hey!

One more question!

When I'm importing URL's... Let's say I have a list of 100k, and 10 projects.

Now, if I paint all those projects and import that way, is SER going to import 100k into every selected project, or spread the 100k evenly accross all projects?

Artsi · April 2014

Oh yeah and here's one more...

Is there a smart way to sort scraped urls into engines that SER recognizes and can post to?

Wouldn't that be a more effective way to import lists? Right? To have them be in a folder, as an example, and then put SER to read from that folder?

With 30 projects that would cut down the time it takes for ever project to figure out what engine a url represents and whether it can even try to post there...

I think @ron mentioned somewhere having a project post to some trash url and then write into that folder? How do I make it so that nothing else is written into that folder?

davbel · April 2014

If you select all the projects then SER will ask if you want to split across the selected projects.

Something like the folder option for auto processing lists is in the works according to @sven, but it will still need to check the sites against it's footprints to decide what platform the site belongs to.

Unless @ron knows a better way, you'd have to create a new folder, set it as verified and then run those projects exclusively until the import list had processed. You do that a couple of times to make sure SER has found everything it can.

Artsi · April 2014

Ok, great!

What do you think @davbel, would it be better to put all the successful ones into that folder? Or verified?

ron · April 2014

@Artsi - Your question was on how to process random targets in the most efficient manner. The most efficient manner is to simply set up a new project at a junk level (meaning not directly to a moneysite), import the URLs and let that project run. Whatever verifieds it produces will go into your verified list. End of story.

Artsi · April 2014

Okay, I'm going to have to do it that way. Thanks @ron!

Artsi · May 2014

Hey @ron, I just got really curious about running the lists your way

I have a couple more questions.

1) I think you said somewhere you have a SER of it's own doing just sorting the lists, is this correct?

2) Why do you run these "trash projects" through them? Wouldn't it make more sense to put actual projects in there and grab the links for those as you go?

3) How would you run a scenario like this:
You have scraped and cleaned 5 lists:
- 36k
- 36k
- 56k
- 56k

Would you make own project for all of these, or put them all in the same project?

4) What kind of settings do you use?

I mean... Do you put the thread count a lot higher, as an example? Any other major changes on settings?

This is so awesome. I need to forget about just dumping the lists into projects as they are, as every project then needs to go through the same trouble of finding what works and what doesn't etc.

Thank you for your answers!

gooner · May 2014

@artsi - Are you talking about how we build the blue and red lists?
Or are you talking about how @ron uses lists on his SER installations?
The processes used for each is different.

Artsi · May 2014

@gooner - well yeah, I think the process is very similar to the way you build the lists. I got a blue list from you guys, and SER just ate it up, so that was very good! I figured that since I have a Gscraper and own VPS for it, I might as well try and build my own lists

So... Here's where my understanding is right now:
- scrape and clean lists
- create a couple of projects in SER, and import the lists
- have SER write on a new verified folder (I just create one on my desktop)
- after SER is done, I figured I'd make that folder be "indentified" as an example, and then import this "identified" site list into projects

Am I on the right track at all here?

gooner · May 2014

@artsi - Glad it ran great for you

You are on the right track for sure. We do the same thing, just on a larger scale.

Artsi · May 2014

Yep, your list was VERY good! If I ever purchase a list again, it's going to be from you guys for sure @gooner!

Okay, I'll just put 4 projects into both of my SER's right now, and drive couple of hundred thousands links through them. Let's see how it goes

gooner · May 2014

Good luck @artsi

Artsi · May 2014

Thanks and likewise to you @gooner!

Oh hey by the way... What kind of verified percentages do you guys typically get when running scraped islts through SER? 10%?

I have around half a million urls going through it right now. Super excited to see how it ends up!

And here's another question... When do you know SER has done everything it can for those lists? Do you wait for the "no more targets to post to" message, or do you watch the remaining urls or what?

For whatever reason, it seems like SER leaves couple of hundred urls just hanging in there, and there they sit then, not giving out the "no more targets to post to" message.

fakenickahl · May 2014

When I'm running scraped lists through SER I'm typically getting about 1% of the list as verified and I'm only running unique domains.

I watch for the "no more targets to post to" message and then I check the remaining urls. This is because SER will sometimes give you the mentioned message in error as the cache is still full of plenty urls.

I have also seen SER leave some urls in the cache, but I usually just import a new list on top of these and SER goes nuts once again. Other times I'm just clearing the cache and forgetting about these left over urls, as it's barely a drop in ocean.

JudderMan · May 2014

Artsi half a million might be OK, but next time it might be best to split the files into 50 or 100k chunks and let SER process them. It's meant to be quicker and doesn't clog SER up.

I'd just leave SER until it has no more targets to post to, or keep on adding lists as you build them, and let it burn.

Artsi · May 2014

Ok, great!

@JudderMan, yeah I think I have around 400k urls right now - they're running on 2 SER's and divided between 8 projects. So, what's that - around 50k per project or so?

Hey do you guys put the identified and successful into their own folders as well? Could re-importing the identified and successful be any good in this case?

JudderMan · May 2014

Ah that'll do fine then dude.

Not sure about the other questions, gooner would be best to answer those.

Artsi · May 2014

Yep, super excited to see how it goes!

I must say that the 1% does sound a bit low for me... I was expecting something more to the tune of 10%... How about you @JudderMan? What kind of verified-% are you seeing?

gooner · May 2014

@artsi - We get 10 - 20% verified from scrapes, only using unique domains like @fakenichahl mentioned.

But we have worked really hard on the footprints to get to that %. It's a long long process.
So we are always tweaking the footprints to narrow down on scraping only the sites that SER can post to.
It's not possible to scrape only those URLs of course, but we try and improve the % all the time.

Personally, when i see a project has less than around 10,000 links - I throw another list in there, so it never gets too low.

I split the list into 100k chunks as @judderman suggested.

EDIT: Those figures are the submitted - verified %
1% from scrape to verified is probably about right. Not totally sure on that figure off-hand.

Artsi · May 2014

Ok, cool!

Yeah, that footprint thing is next on my "things to learn" list. I'll just get going with the footprints SER is currently being able to post to, and once I start getting not-that-stellar results, I'll look into those footprints a bit more.

davbel · May 2014

I'd expect a min of about 10% submitted using one of own scraped list, however it can be much higher if you are using the right footprints on the right platform.

As far as submitted to verified, again this depends on the platform, but 5-10% would be bad and 50-70% would be good.

As @gooner says, getting the footprint right is pretty much the key think to scraping. This is where you will have the biggest improvements in the number of verified.

Some really basic questions about SER

Comments