High CPU Load Not Resolved (For Me)

jjumpm2 · April 2014

Oh and purely running lists.

Artsi · April 2014

Okay, actually it seems like I lied. No problem with this in version 8.0. I was too quick to interpret the high CPU usage as a bug.

In case you want to downgrade, here's link to 8.0:
http://www38.zippyshare.com/v/55616772/file.html

Sven · April 2014

I don't know but if you where able to run that many threads in the past, you should be able to do that as well now.

Artsi · April 2014

Hmm... Seems like my CPU is still at max at both VPS's. And this is whether I'm running 1 or 12 projects.

Both are running at 300 threads.

I'll just mess around a bit more and see if I can get anything change on my own.

seo4all · April 2014

@artsi -so for you the cpu still stays in full load ?

I will try with a list as well now to see if the issue is fixed or not

Artsi · April 2014

@seo4all, yep, at least for me it still stays at the max.

Artsi · April 2014

Here's a screenshot from my VPS's:

seo4all · April 2014

man that`s not good. this is becoming very annoying.

Artsi · April 2014

Well, I'm sure @Sven is doing his wizardry on it, and soon everything will be back in order again

seo4all · April 2014

i`m sure he will.

Artsi -may i ask you how do you build your lists ?

Artsi · April 2014

You sure can.

I run Gscraper on the Geek plan on SolidSEO.

I exported the footprints from SER, and threw in around 400k keywords, and let it roll. I then remove all the duplicates, check the http status, indexing and PR.

1M scraped urls turns into 50-100k importable urls for me right now.

I then just import them directly into projects.

Artsi · April 2014

One thing I don't understand...

I have this 8.0 version .exe available. So I started the SER from that, and problem persists.

Is this perhaps somewhere in the files and not in the software itself? I don't I'm not a developer, just guessing at solutions

seo4all · April 2014

Not sure about where`s the bug. At some point I was tempted to learn programming but i gaved up as i realized "it`s not for me".

Regarding the way you build the list i`m doing 99% the same process

. The only difference is that i only check Pr (on a domain level not url). I don`t bother with indexing or other seetings such as OBL (which i set in Ser anyway).

A quick advice to you : Just try the free version of "footprint factory" and watch the first video on the website. Then follow the instructions. I did that yesterday and i used Gscraper with the footprints from a platform. You will end up with millions of potential targets .

I let gscraper run for a limited period of time (a few hours) and now i`m filtering a 1.2 million urls pertaining only to a CMS platform (please note those 1.2 million are all unique domains) no duplicates at all.

And that`s only for a small run using the FF free version (pro gives you wild abilities in terms of footprints)

I plan to repeat the process for for every CMS (especially for the contextual and do-follow ones). This should put SEr on steroids....

Artsi · April 2014

Heh, well that sounds awesome!

Hmm... Right now I got a lot of these "The remote name could not be resolved" things as I checked the http status. But when I click on the url, it's still alive an healthy. I wonder if this is a proxy issue?

Will check out the footprint factory!

Artsi · April 2014

@seo4all, I just watched the FpF video on their website.

I don't quite understand what it does? Footprints? I mean... Are you sure SER can post to those links if it doesn't understand what engines they are a part of?

seo4all · April 2014

Artsi just like normal scraping you can`t know for sure if ser will post to them or not. But it will definitely post to a certain number of them. I wouldn`t be too worried about that.

With gscraper (by scraping using ser footprints) is the same thing. you find tons of potential targets but when you run you realize you could only post to a part of them. even if at least in theory ser can post to all of them .

That`s the reality with scraping. However FF provides a way to uncover millions targets which aren`t used by other gsa users.

Just try following the free video on their site. Download FF free generate some CMS footprints and scrape them in Gscraper. It won`t cost you a dime and at the end you might end up with many more targets which gscraper wouldn`t find on it`s own if you were using the default footprnts.

Hope that helps you.

P.s One thing i would like to reinforce here -Just as the FF video said select a CMS platform which gaved you good results as a starting point.

Hope that helps you

Artsi · April 2014

Ok, cool! Need to check out a bit better!

Artsi · April 2014

@seo4all, hate to spam this thread with questions about FpF, but I just got thinking about it...

So, as an example... Here's one of the footprints from Articles -> Wordpress articles:
"Powered by WordPress + Article Directory plugin"

If I go into google with that, one of the results I find is this:
http://www.addnewarticles.com/health/cosmetic-dentistry-is-not-just-about-beauty.html

and I believe SER could post to that.

So, the FpF then... Did I understand it right, that I paste in some urls into it, and it then brings me footprints like the one I pasted over there, so that I can find more sites SER will understand as being part of the Article engine?

I want to get this, I'm thinking this is probably crucial for my understanding

seo4all · April 2014

@Artsi you`re right .

With FF you`ll have to import 25 unique domains (in the free version pro allows you to upload unlimited domains)

Note that you must only upload unique domains.

After that check on the "Process text Snippets" on the left side and click on "Get Footprints"

On the "Footprint List Builder Tab" make sure to check "Put snippets in quotation marks" (this is required later on on Gscraper)

Once the program will finish click on "generate footprints" and export them into a txt file.

You have a few footprints which aren`t by default in SER. Take those footprints import them in Gscraper, import your keywords and you`re good to go.

An avalanche of potential targets. To filter them after the scraping is done just do the filters you would normally do-export the list-import in ser and let me know how it goes

adystanley · April 2014

8.27 is worse that previous version for me. Much more memory eating and 90+ cpu. How's that?

Artsi · April 2014

Aaah, yes! I'm watching the video again now...

So, I bring urls from my verified folder, say Joomla blogs (I currently have only 6 urls).

Here are the footprints from SER for Joomla blogs:
"Fields marked with an asterisk are required" joomla
"Please login to write comment" "add new post"
"powered by joomla" "add new post"
"Smart Blog" "Add new post"

So, the FPF would go out and expand that footprint list manyfold, so that I could then upload that into Gscraper, and go hunting for wya more Joomla blogs than I would find with those footprints from SER alone?

Is this correct?

And how do the keywords come into play here? Say I want to find sites about dogs, and I have 10k keywords about dogs. Will the FPF / Gscraper then randomly insert those keywords with the newly-found footprints to find even more and even more specific sites?

Thanks for the insights, @seo4all!

seo4all · April 2014

@Artsi -YES.

The keywords will only come into play when you`re using Gscraper.

If you want to find related urls you`ll put your keywords in quotes in Gscraper. I personally don`t use GSA to link directly to money sites so i don`t usually scrape niche related url.

I go with general terms because for me what matters are numbers not relevancy. It depends on what you`re trying to rank. But if you want relevant sites putting the keywords in quotes in Gscraper is the way to go.

Artsi · April 2014

Ok, awesome!

One more thing... Do you know what the "delete if index < ____ " means in Gscraper? I thought it meant if there are less than a million websites for that particular footprint, but now I'm not so sure anymore..

Artsi · April 2014

Oh yeah!

And let's say I come up with 50k Joomla blog urls - as an example.

I then import that into SER. @Sven, could you help me real quick here... How does SER know that this particular url is a part of article engine - Joomla in particular? Does SER just go after the url, and if it can post to it, it'll then say "all right, this turned out to be a Joomal url, so let's put that into identified / verified folder"?

Or how does it work?

seo4all · April 2014

@Artsi "delete if index=.." in Gscraper is a function to delete the urls that are below the value you enter there.

Most likely you`ll put there a "1" Then you run an index check.

If on the index the value shows "0" after the check it means the url is not indexed in Google So in this case "delete if index<1" would delete all the urls which are not indexed

As far as importing to ser i wouldn`t be worried about it. You`ll import the txt file and before posting SER will automatically identify the platform

killerm · April 2014

solved for me ...way better version .27

Artsi · April 2014

@seo4all, no no, I mean the one on the Scrape -tab. Right underneath the footprints?

@killerm, you're running ok at 8.27? How many threads / projects - and are you ONLY driving imported, external (not sites lists) lists?

killerm · April 2014

@artsi..updated screenshot

seo4all · April 2014

@Artsi - i think you`re also doing in van the index check (assuming you do the pr)

If the url has at least a Pr of 1 it means that 99% of the time it will be indexed in Google. (there are exceptions to this but very very few at least based on my experience)

Instead of running an index check i would recommend you to do a PR check (on a domain level not url), delete the urls which have a PR less than 1 and you`re good to go.

In the time it takes for a list to check the index you could scrape more targets.

Please note that i`m not by any means a "scraping master". What i`ve told you here is however what works best for me. I used to do the index check as well a while back and i didn`t noticed any productivity improvements or any better results. What i was noticing is that back then filtering a list was eating more time than it should

Artsi · April 2014

Ok, cool! I'm no scraping master either, I've been into scraping for maybe a month or so.

One thing I'm wondering is this: isn't the FpF using proxies at all?

Another thing is this... One of the footprints I was given is this:
leave a comment

Isn't that going to be in gazillion other websites as well? I'm just thinking how useful it is to be scraping with such general footprints?

High CPU Load Not Resolved (For Me)

Comments