Skip to content

Building lists with Xenu

Since I'm pretty familiar with Xenu for scanning broken links to pick up expired domains for my PBN, it struck me that it might also be a great tool to build your own GSA list very fast and very efficiently.

For the people who don't know what Xenu is; it's basically a (free) bot which crawls a specific page or entire website for (amongst other things) (outbound) broken links.

Enter the beauty of the blog comment "footprint"

Because blog comments are often used to build upper tiers they are a great place to find higher quality, contextual targets.

The best point to start would be to pick about 500 of your verified blog comments urls with extremely high OBL (>1000). As you can see with just 500 of these verified blog comments you already have a potential list of >500,000 targets. Obviously there will be a lot of overlap of duplicate domains and direct links to web2.0's and money sites (which are obviously worthless to us), but there will also 1000's or maybe tens of 1000's of valuable links we can import and post to with GSA.

Besides its simplicity, Xenu is also pretty fast. Whenever I spit through huge directories to find expired domains for my PBN, it takes me about 3-4 hours for 500,000 links. Even if only 5% are usable you'll quickly have a lists of 25k targets to post to! All with minimal effort and just a couple of hours of walking to the fridge to pick a new can of beer. :)

The only potential drawback about this method I see is that the probability of finding fresh, virgin platforms is relativelyy small. There will definitely be some rough diamonds to find because you might be picking up a recently detected "virgin" site by someone else, but the majority will have already been discovered. 

Still, if you parse through enough blog comments (say at least 5,000,000 or about 5k blog websites with extremely high OBL) over the course of a couple of days, you should have an awesome GSA list with minimal effort. 

I will be testing it from now :)

Comments

  • SvenSven www.GSA-Online.de
    That way you describe here is the same techniq used within SER called "Use URLs linking on same verified URL (supported by some engines only)"
  • edited November 2014
    Wow! I had no absolutely no clue you built that awesome feature in SER (i.e. didn't eaxactly know what it  meant). I mean, I obviously searched the forum wether this Xenu or a similar method was already mentioned somewhere, but to no avail. 

    Still interested to say which is the quicker/more efficient way to do this; with the awesome feauture you already built in, or with Xenu. Time for some tests. :D
  • SvenSven www.GSA-Online.de
    this feature within SER is of course slower, as it can only use it when having verified urls from sites based on engines linking on same url as the project.
  • edited November 2014
    Currently only interested in articles and wiki's so I did a quick run with 50 blog comment urls (~ 90,000 OBL):

    image

    A bit less than I hoped, but assuming you'll do this with 5000 comments, you could end up with more than 50k contextuals (disregarding potential duplicates for now). Shouldn't take more than few days.
  • I guess this can be done with Scrapebox and the Outbound Link Checker or the Link Extractror addons? Anyway, nice tip for people that don't own SB. :)
  • @delta_squad‌ my thoughts exactly. Nice tip though. Didn't think about using xenu for this.
  • edited November 2014
    @delta_squad

    Same idea indeed. In fact, Scrapebox is probably a LOT faster as well since it doesn't actually check the sites for broken links. Definitely a far better idea to use the Scrapebox method!

     Thank you, I didn't even think about it until now. :)

    For people not having Scrapebox or also wanting to check the status of the links, Xenu is a good alternative.


  • Easier and faster option is to use ahrefs paid account, i'm able to quickly get millions of backlinks from competitors and their tier 2 backlinks, then i use emeditor to extract only type of the backlinks i'm looking for ( for example all drupals have /node/ in the URL etc ) and use them in SER. 
  • @dariobl not all drupal have "node" in url. So you might be missing some opportunities there.
  • @jpvr90 i just gave a quick example, i have lists of every possible footprint that is being used on the engines that SER can post on.

    With emeditor it takes me 1-2 minutes to get 100k valuable backlinks out of 1m list, while it would take me few days with SER.
  • Wow this looks like a really good method.
  • @dariobl

    That's actually a great suggestion. It might miss a few platforms, but like you said, it saves a LOT of time. 

    Is it possible to PM me your footprints list? I have one myself with thousands of different footprints, but it might not be as complete as yours. Is it also possible to (semi-) automate the process instead of manually entering each footprint?
  • edited November 2014
    Here is a download link, so everyone could get it - http://www78.zippyshare.com/v/5588551/file.html
    It's sorted by types but URL footprints aren't separated from the common google footprints.

    There is a 100k keywords list as well, very useful in scraping software. 

    If i want to do it automatically i just load up all footprints into the gscrapper and he finds them by himself, but i'm usually searching only for few types of backlinks so i choose to go with the manual method.

    Btw, if anyone want's to get tons of backlinks for mass spam aka churn and burn, 301 method or any other, do the following : 

    1. download the footprints list from above link
    2. get gscraper ( there is a "free" version sails around the web, but the software is cheap and extremely useful, so support the developers and buy it )
    3. get proxy list from this site : http://new-freshproxies.blogspot.com/ ( updated multiple times a day ) 
    4. load up the footprints and proxies into the gscraper, make sure you choose the option for gscraper to load the proxies every xx minutes from text file, so you could add the proxies from above mentioned site every few hours once the speed drops down
    5. For keywords, put most commonly used niches in mass spam, such as "payday loans", "weight loss pills" "free xbox live codes", etc, you'll got the point

    And start running the tool, that way i'm able to get 200k-300k unique URLs per day ( after removing the useless links and dupes ), which is great and gives me good LpM, as long as ser is running fine :). I could get much more if i would create a list of at last 1000 frequently used anchors for churn and burn projects and use gscraper on multiple servers, but considering i have subscription with most of the links list sellers, 200k additionally a day is fine for me.


  • Awesome post @dariobl ! Any advantage to using emeditor over let's say notepad++?
  • You can also do this with GScraper.
  • edited November 2014
    @ dariobl

    I quickly tried your method with just 5 inurl: footprints and I've already found over 2000 contextuals (sample size ~500k links). However, I don't know of an automatic way to extract the links with the footprint in it while leaving the rest. 

    Can you tell me how you do it with emeditor?

    Thank you!
  • @delta_squad

    Emeditor can take extremely large files, even if they are few GB in size and it's extremely fast when you are exporting just the lines you need (URLs with your footprints ), while notepad ++ is using RAM for storing the temporary files and it can't take large files, also the exporting is really slow. 


    You can do it with gscrapper, but in that case you'll get lots of useless links aswell, i do it manually with emeditor because i'll rather get 500 working and useful contextual domains, thank 500k garbage with few useful links inside. 


  • edited November 2014
    @ dariobl

    Thanks, I did it with Notepad++ by bookmarking the lines with the inurl: footprints and deleting the rest. Kinda sucks to do it manually (almost 100 contextual inurl: footprints) and the program freezes for 20 minutes before it processed it all. Used 1,9 million urls this time by the way. Will try it with emeditor next time.

    I guess there is no other way though. 
  • That's what i told you to use emeditor, it's way faster than notepad++, it's developed to be used with large files, i'm using it in my company as we deal with huge databases ( few gigabytes files ) and it works excellent.

    You can do it automatically in gscrapper, but gscrapper will collect the garbage as well, at last mine does all the time, that's why i always go for manual way, at least i know what am i operating with.
  • edited November 2014
    @ dariobl

    I tried it with emeditor, but I still prefer Notepad++. Sure, emeditor might be better with processing larger files, but the search function takes almost 2 minutes to go through my (2million URLs) list. In comparison. with Notepad++ it's less than 5 seconds. 

    I have no idea if this is caused by faulty settings on my side, but it really is a PITA to wait 2 minutes before being able to enter another search term.  I rather do it almost instantly and then wait 20 minutes while grabbing a beer until Notepad unfreezes.

    Again, this might be caused by wrong settings on my side, but for me Notepad++ is the better option right now.
  • You're doing it wrong with emeditor, you need to do the following with emeditor : 

    1. CRTL+F - Enter the footprint - click Bookmark all
    2. Edit - Bookmarks - This document - Extract bookmarked lines into the new file

    Done in few seconds :)
Sign In or Register to comment.