DRASTIC Efficiency Improvements - Part 1 - Targetted List Management

AlexRAlexR Cape Town
edited December 2012 in Feature Requests
I have been thinking about this tool and there are a few areas that need to be looked at and optimised. It's an incredible tool. I think as Google advances there will be a greater focus to quality over quantity of links. That said the efficiency improvements I mention should assist those with different viewpoints. I have done this in a two part series. 

ISSUE 1:
We all use a tool to take a handful of keywords and then find related keywords and then extrapolate to get to about 1000 keywords that we can plug into GSA. The more keywords we use the further we get away from the core that we want to rank for. BUT at the core there might be 100 UNIQUE keywords, and the other 900 are variations thereof. This results in a MASSIVE overlap in SE results. More on that later (part 2). 

Let's assume we have 50 SE's enabled. (just a random number closer to the 37 SE's recommended by Ozz). 
1000 keywords.
SE's return 100 results on average. (some more, some less).
So we're getting 1000 x 100 x 50 = 5 000 000 URL's that get parsed. 
5 000 000 checks for badwords, OBL, PR, etc, etc....that's a fair amount of resources. 
So we have 5 million URL's to be parsed, checked, platform sorted, etc, etc per project. This will take time! (and so many many of these URL's are identical, since the keywords are similar and SE's tend to generate similar results)

Times this by at least 10 to 100 projects = anywhere between 50 million and 500 million URL's. 

Now my first issue is this...GSA parses these and finds a number of "Identified" platforms where it can submit. 

Right, so it's a funnel. 5 000 000 get inputted at level 1. Call this level 1 "Parsed" (obviously over a period of time)
Level 2 is the identified. No idea of numbers, but nothing we can do about this until more platforms, engines, etc are added. 
Level 3 is the submitted. We now have a neat list of the 5 000 000 URL's parsed, where we can submit to.
Level 4 is the verified. We now have a list of the places where we got the link.
We also have a massive FAILED list, that really could be due to anything...

There is fallout along this process. CS is down, a platform engine is improved, email blacklisted, proxy banned, etc. 

When we press "Clear Target URL's" we are saying let's just start again. Let's go an parse another 5 mil URL's and see how we go again. Let's go and do 5 million new checks, filters, etc.

SURELY, surely it would be neat to have the choice to focus on sharpening each level.
On a PER project basis I feel these features must be added:

Step 1 - Resubmit the "Submitted But Not Verified" list. (i.e. verification failed per project list)  
We've already done so VERY much work to get to this point. 5 million URL's parsed, OBL checks done, PR checks done, bad words checks done, etc, etc. Currently it's like "Oh well, link not found, let's strip it, bin all this data we have gathered and dump it in the big trash/failed list." 

Surely we should be able to take this list of "Submitted BUT NOT VERIFIED" and redo this and get these links! Try a different Capcha service, a new email that's not blacklisted, edit the comment spintax. 

They are the PRIME links we're after (They PASSED all our criteria, all OBL, PR filters passed). We're happy for CS to retry a capcha 6 times, surely we should try this batch of links at least 3 times to get it verified. 

Step 2 - Resubmit the "Identified But Not Submitted" list. 
If you're not getting submitted, there could be filters you placed that are too high, PR checks that didn't work. Too many bad words...Basically, you were just too strict. So we've identified the platform, but didn't submit to it. So we've taken ages to parse these 5 million URL's and then sort them into platforms and we now have a nice identified list. Why go and clear the target URL's and start again! Rather focus on the step 1 above, then when you have maxed that out, focus here with a few lower parameters. I.e. rerun this list and try and get some more links onto the submitted list by lowering your filters. Surely surely...clearing the Target URL cache and starting it all again is such a waste. So much parsing and sorting down the drain!

Step 3 - Resubmit the "SE Results Parsed" list. 
So you've parsed 5 millions URL's and it's taken some time! Some of these (normally a small percentage), you've identified the platform. Let's say 25% are identified (and that's a very very nice percentage it feels), that's 4 million URLs unidentified. What if Sven releases a few platform fixes/new platforms. You've got the list (around 4 million now) that match/related to your keywords, and Sven's new platforms get released. I'd love to be able to get it to have a go at this list with the new platforms and find some great new targets URL's to submit to. No need to go and reparse it all again...just need to check if it matches a platform...

FURTHER FEATURE: Above list management.

1) Given that often domains have many pages that appear in the SE's...
It would make sense to be able to remove "Duplicate domains" from any of the above lists - you just don't want the same comment to every single blog page on a domain. (yes, there is a filter, but remove it before so you don't have to do all the checks and can keep the list neat.)
2) On the lists where you already have the PR and OBL and other filter data (since it's been parsed, and checked and then you submitted), the option to mask/sort by PR or OBL. This way, you can keep some very neat lists that can be used for key projects. 
3) Maybe an option when working on the lists, is to run a few masks, and then to have the option "Export to Project" and you can then select a project and it will load the selected URL's into the project list to try again.

I feel this would make this the very best tool and give it the option to really handle all the different links you need to get ranked. It would also make it EXTREMELY efficient.  



 
Tagged:

Comments

  • Totally agree. Actually these features are exactly what I expect.
  • AlexRAlexR Cape Town
    This post took me ages to write! I'd love to get some feedback from others on it. I know that my timing was bad with CB going into beta but would love your thoughts on it. 
  • Thanks for taking time to write this down Global. I completly agree with all points/aspects of your suggestions.
  • AlexRAlexR Cape Town
    @bytefaker - was it clear enough to understand? Sometimes it's tricky writing what you can visualise in your head. :-0
  • Accepted Answer
    Yea, I totally understand where your coming from here. We need to think smarter and really look to how we can refine and reuse rather than dump and restart that makes no sense at all. Therefore, I'm +1ing all those well put together suggestions!
  • AlexRAlexR Cape Town
    @takeachance - thank you for your support. I really do feel strongly about this. The data is ALL there, we just need to store it and use it. I'm not a programmer, so not sure how hard it would be to implement. 
  • AlexRAlexR Cape Town
    edited December 2012
    @sven - would this be possible? 
  • I agree with @GlobalGoogler %100. Doing this will save resources and time. We can also start compiling our own list of great links to use on tier 2, or tier 3 or in different projects.
  • GG, I like all these ideas and from what I read here you have done some hard thinking on it all. It all makes great sense to me.
  • AlexRAlexR Cape Town
    I'd like to get a little more discussion on this! Does anyone else see the benefit? Be honest if you were too lazy to read this long post, or ask a question if you didn't understand the why of what was suggested! :-)


  • 1) What tool are you using for extrapolating to 1000 keywords?

    2) In item 3, "parse" means to examine what we know of each site.  That means recalling OBL, PR, and content of the pages. This sounds like a lot of data to store.  ....unless we are already storing it, which I do not know.  Is this as big a burden as I think it is?

  • AlexRAlexR Cape Town
    edited December 2012
    @psikogeek - "This sounds like a lot of data to store" - surely it's not too much to add the PR, OBL of pages? It's just appending 2 digit parameters. Can't be too much burden!

    So it would be 

    I.e. url, has PR of 1, and OBL of 20. 

    We're getting the data anyway, I just want it added to be able to be used later. 
  • AlexRAlexR Cape Town
    I know it's been the Christmas break, but would love some active discussion on this. How are you guys getting around these issues? Am I missing something obvious?
  • edited January 2013
    TOTALLY agree @GlobalGoogler. This has been something on my mind for quite awhile too.

    GSA really could be the best of the best with just a little added priority control and list management. It's just a little TOO automated at the moment. Don't get me wrong, everything about GSA's automation is phenomenal. But imagine the power when you can manually prioritize and queue up lists based on all the properties and filters already found in the program. Like by engine, type, PR, OBL, do follow, etc.

    Imagine being able to re-queue failed submissions (more specifically failed captcha submissions). Failed captcha submissions should have their own label and be able to be selected, right clicked, and re-queued for retry. Especially if it's known to have been successfully submitted in another project (perhaps a tick mark in the GUI to signify this).

    Holy tits. Imagine if you could select links from any project and organize them in a list that could be queued up inside any other projects.

    Right click -> send to list xxxxxx

    High-value lists could be neatly managed, and queued up from a right click dropdown menu. Each pending submission could be found in a VISIBLE work queue.

    Send list to project xxxxxx and set priority to xxxxxx

    The VISIBLE work queue could be easily managed and separated into tabs:
    [ Working ] [ Failed ] [ Submitted ] [ Verified ]

    Items could be selected by filter (engine, type, PR, OBL, do follow, etc.)
    Right click options allow users to set the priority of an item or multiple selected items.
    Items from the failed tab could be sent back to the working tab for re-submission.

    Captcha solving services could be specifically applied to specific lists for those links with pain in the ass solves. Perhaps an option for manual solving. We don't mind filling out a few captchas ourselves if we know the links are worth the effort!
  • lol noc i like how you tell things like a dreamer... imagine this imagine that... thats good stuff, i hope sven put this on to do list
  • AlexRAlexR Cape Town
    @NocT - you're getting the picture! :-) Let's keep the ideas coming so when Sven reads through this, it will be full of good ideas that he can pick and choose!
  • Mind = Blown

    These ideas are phenomenal to say the least.

    The power of GSA is mind boggling.... with these features added to it.... It's.... I don't have the words
  • edited January 2013
    To be quite honest, I wouldn't mind paying extra so Sven can hire another programmer to help. These are pretty big features and it's enough just having to deal with engine updates all week.

    Maybe it could be released as a premium addon - it makes good business sense and I know I'm not alone when I say I would put up the $$ for it.
  • AlexRAlexR Cape Town
    I'd also be happy to pay for these features. They would make a drastic difference. 

    The other option is we get a few of us together and fund someone to do the engine updates/platform improvements...leaving Sven to focus on other things. Just an idea. 
  • Yeah, I'd pay for these features too be added as well. Not sure how much ('m cheap, lol)... but it's definitely worth adding to the software. And if he's already swamped with new additions BEFORE these suggestions, it only makes sense to throw a little more money on the table. For one payment software, you really can't beat this.
  • I agree 110% with the need for this feature. Hopefully we'll see a day when this can be implemented into the program. Until then, here are some of my workaround suggestions.

    - Duplicate your projects > Clear URL cache & History > Import the "Submitted Sites" list or "Identified Site" list and run. Do this for all projects as needed. Once the list(s) are completed, just delete the duplicate project.

    - Filtering PR/OBL/etc- take your list, for instance, "Submitted Sites" and run them through SB. Run the PR Checker, outbound links checker, etc in order to sort the above list into a nicely filtered new list which you can sort according to your criteria. Re-import your list into SER.

    Obviously this isn't as convienant as having SER do all this for you during it's initial parsing, but the above mentioned steps are fairly easy and quick to complete................until or if your features are added.
  • +1 for @globalgoogler's suggestions they make sense

Sign In or Register to comment.