DRASTIC Efficiency Improvements - Part 1 - Targetted List Management
AlexR
Cape Town
I have been thinking about this tool and there are a few areas that need to be looked at and optimised. It's an incredible tool. I think as Google advances there will be a greater focus to quality over quantity of links. That said the efficiency improvements I mention should assist those with different viewpoints. I have done this in a two part series.
ISSUE 1:
We all use a tool to take a handful of keywords and then find related keywords and then extrapolate to get to about 1000 keywords that we can plug into GSA. The more keywords we use the further we get away from the core that we want to rank for. BUT at the core there might be 100 UNIQUE keywords, and the other 900 are variations thereof. This results in a MASSIVE overlap in SE results. More on that later (part 2).
Let's assume we have 50 SE's enabled. (just a random number closer to the 37 SE's recommended by Ozz).
1000 keywords.
SE's return 100 results on average. (some more, some less).
So we're getting 1000 x 100 x 50 = 5 000 000 URL's that get parsed.
5 000 000 checks for badwords, OBL, PR, etc, etc....that's a fair amount of resources.
So we have 5 million URL's to be parsed, checked, platform sorted, etc, etc per project. This will take time! (and so many many of these URL's are identical, since the keywords are similar and SE's tend to generate similar results)
Times this by at least 10 to 100 projects = anywhere between 50 million and 500 million URL's.
Now my first issue is this...GSA parses these and finds a number of "Identified" platforms where it can submit.
Right, so it's a funnel. 5 000 000 get inputted at level 1. Call this level 1 "Parsed" (obviously over a period of time)
Level 2 is the identified. No idea of numbers, but nothing we can do about this until more platforms, engines, etc are added.
Level 3 is the submitted. We now have a neat list of the 5 000 000 URL's parsed, where we can submit to.
Level 4 is the verified. We now have a list of the places where we got the link.
We also have a massive FAILED list, that really could be due to anything...
There is fallout along this process. CS is down, a platform engine is improved, email blacklisted, proxy banned, etc.
When we press "Clear Target URL's" we are saying let's just start again. Let's go an parse another 5 mil URL's and see how we go again. Let's go and do 5 million new checks, filters, etc.
SURELY, surely it would be neat to have the choice to focus on sharpening each level.
On a PER project basis I feel these features must be added:
Step 1 - Resubmit the "Submitted But Not Verified" list. (i.e. verification failed per project list)
We've already done so VERY much work to get to this point. 5 million URL's parsed, OBL checks done, PR checks done, bad words checks done, etc, etc. Currently it's like "Oh well, link not found, let's strip it, bin all this data we have gathered and dump it in the big trash/failed list."
Surely we should be able to take this list of "Submitted BUT NOT VERIFIED" and redo this and get these links! Try a different Capcha service, a new email that's not blacklisted, edit the comment spintax.
They are the PRIME links we're after (They PASSED all our criteria, all OBL, PR filters passed). We're happy for CS to retry a capcha 6 times, surely we should try this batch of links at least 3 times to get it verified.
Step 2 - Resubmit the "Identified But Not Submitted" list.
If you're not getting submitted, there could be filters you placed that are too high, PR checks that didn't work. Too many bad words...Basically, you were just too strict. So we've identified the platform, but didn't submit to it. So we've taken ages to parse these 5 million URL's and then sort them into platforms and we now have a nice identified list. Why go and clear the target URL's and start again! Rather focus on the step 1 above, then when you have maxed that out, focus here with a few lower parameters. I.e. rerun this list and try and get some more links onto the submitted list by lowering your filters. Surely surely...clearing the Target URL cache and starting it all again is such a waste. So much parsing and sorting down the drain!
Step 3 - Resubmit the "SE Results Parsed" list.
So you've parsed 5 millions URL's and it's taken some time! Some of these (normally a small percentage), you've identified the platform. Let's say 25% are identified (and that's a very very nice percentage it feels), that's 4 million URLs unidentified. What if Sven releases a few platform fixes/new platforms. You've got the list (around 4 million now) that match/related to your keywords, and Sven's new platforms get released. I'd love to be able to get it to have a go at this list with the new platforms and find some great new targets URL's to submit to. No need to go and reparse it all again...just need to check if it matches a platform...
FURTHER FEATURE: Above list management.
1) Given that often domains have many pages that appear in the SE's...
It would make sense to be able to remove "Duplicate domains" from any of the above lists - you just don't want the same comment to every single blog page on a domain. (yes, there is a filter, but remove it before so you don't have to do all the checks and can keep the list neat.)
2) On the lists where you already have the PR and OBL and other filter data (since it's been parsed, and checked and then you submitted), the option to mask/sort by PR or OBL. This way, you can keep some very neat lists that can be used for key projects.
3) Maybe an option when working on the lists, is to run a few masks, and then to have the option "Export to Project" and you can then select a project and it will load the selected URL's into the project list to try again.
I feel this would make this the very best tool and give it the option to really handle all the different links you need to get ranked. It would also make it EXTREMELY efficient.
Tagged:
Comments
1) What tool are you using for extrapolating to 1000 keywords?
2) In item 3, "parse" means to examine what we know of each site. That means recalling OBL, PR, and content of the pages. This sounds like a lot of data to store. ....unless we are already storing it, which I do not know. Is this as big a burden as I think it is?
These ideas are phenomenal to say the least.
The power of GSA is mind boggling.... with these features added to it.... It's.... I don't have the words
- Duplicate your projects > Clear URL cache & History > Import the "Submitted Sites" list or "Identified Site" list and run. Do this for all projects as needed. Once the list(s) are completed, just delete the duplicate project.
- Filtering PR/OBL/etc- take your list, for instance, "Submitted Sites" and run them through SB. Run the PR Checker, outbound links checker, etc in order to sort the above list into a nicely filtered new list which you can sort according to your criteria. Re-import your list into SER.
Obviously this isn't as convienant as having SER do all this for you during it's initial parsing, but the above mentioned steps are fairly easy and quick to complete................until or if your features are added.