Skip to content

Important Question: How do i Scrape Links with Scrapebox to use with GSA SER? + More questions :)

hi ive been using gsa for a few months now and i have always wondered a few things the first thing is how exactly do i use scrapebox to get amazing links for GSA SER also do i have to trim the urls to only thee homepage / Root url?

also im using gsa on a vps with 10 private proxies and mixing them with around 100+ public proxies with death by captcha  

my niche is a hard one as its a money making niche and im just hoping im doing everything right

im scraping for keywords related to my niche with scrapebox  i do have 2 projects running to my money site on strict filters no junk pr 1-10 and set it up like how ron has it

my LMP is at most around 1.36  but im thinking this is due to my keywords im using cause of the niche i am in with social media 

what i need though it to learn how to scrape good links that gsa will use greatly with scrapebox ive been looking for videos or guides and i cannot find one that is very specific on what exactly i need to do to get the links

and i seem to be getting ALOT of duplicate urls and domains around 10,000 dup urls which is wasting resources and captchas 

Comments

  • goonergooner SERLists.com
    hey @ilegit,

    I use gscraper because it has the option not scrape duplicate url's, not sure if scrapebox has that but if it does then that should remove the duplicate url problem.

    If not, gscraper has a free version which allows you to scrape 100,000 urls at a time. That is fine for our needs.

    How i do it is take the footprints for the engines i want from SER and paste them into gscraper. Then add your keywords, but if you want niche specific keywords you will always get limited results.

    I just use a list of 100,000 generic keywords... Use maybe 1000 keywords per scrape... Leave it running overnight and you will have something like 50,000 - 100,000 unique urls.

    Import into SER and let it identify them and you should find it can post to around 60 - 80% of the scraped urls... so around 30,000 - 80,000 viable targets scraped per night.

    That's how i do it but i'm sure some people here will have even better methods,
  • The thing is im not scraping from scrapebox the dup urls are coming from gsa alone if i knew exactly how to use software to scrape for links then i would prefer this so i dont have dup urls and waste resources

    also how exactly do i use the list after i scrape it with software in gsa?
  • goonergooner SERLists.com
    Well gsa ser has to look at a url before it knows it is a dup so i'm not sure there is much you can do about that? Except maybe scrape more targets and give SER more choice of which url to choose.

    For using your lists, you can import directly into a project or multiple projects by selecting them and right clicking and then "import target urls".

    For me, i get better success if i use the "options", "advanced", "tools" and then "import urls and identify".

    If you do that make sure you have "saves identified sites to" selected in "options-advanced" and make sure all projects are set to post from identified lists (I also set it to post from verified lists too).

    Hope that makes sense?


  • yea alright thanks 
  • 2Take22Take2 UK
    edited August 2013

    I'm by no means an expert, but I use scrape box to scrape for target to import into SER, and this is my process (You probably know most of this already, but I'll post it anyway to help others who may not)..

    Assuming that you have already gone through the 'submitted vs verified' lists and worked out which engines perform best.

    - Open GSA SER

    - Click: Options >>Advanced>>Tools>>Search online for urls>>Add predefined footprint  

    - Find the footprints you want, add them to the list, and copy them from the SER popup into to notepad (one per line)

    - Copy the list to excel starting from cell A1

    - In cell B1 type:-    space"%KW%"  (don't type space, just leave a space at the beginning)

    - Drag cell B1 down to the bottom of the list.

    - In cell C1 type:-   =A1&B1   and drag it down the list.

    You should now have a list in column C that looks something like this:

    powered by oxwall "%KW%"
    Powered By Oxwall "Join" "%KW%"
    inurl:"/oxwall/blogs" "%KW%"
    oxwall "User blogs" -oxwall.org "%KW%"
    Top Rated "Most Discussed" "Browse by Tag" "%KW%"

    - Copy the whole of column C to notepad and save it as something memorable like "GSA social scraping footprints".

    - Open Scrapebox and set the delay, proxies, connections, search engines etc. to your liking.

    - Import your list of keywords. 

    - Hit the [M] merge button and import the list of footprints that you just made with excel (as you have used the place holder "%KW%" you should now see your list of footprints with your keywords added onto the end in quotes, like so;

    powered by oxwall "YOUR KEYWORD"

    - Set Scrapebox to harvest over night and, go to bed, have a beer, bang your missus, or whatever else flicks your switch.

    When you wake up in the morning you should have about 1 million URLs (depending on your settings).

    - Use the remove duplicates feature in scrape box.

    - Export them as a text file and save them.

    - Open SER and right click on the project/s that you want to use them in.

    - Modify project>> Import Target URLs>>Choose the scraped url file

    - Sit back and leave SER to be the little mean green spamming machine that she is.

    Hope this helps!


     

  • goonergooner SERLists.com
    @2take2 - Nice method mate!
    Roughly how many do you end up with after removing dups?
  • 2Take22Take2 UK
    edited August 2013

    Thanks @gooner

    Normally end up with about 100k uniques, depending on if I'm using niche related KWs for things like blogs, forums etc, or if I'm using more general KWs for things like social network, article etc.

  • @2Take2 really good method that you shared mate! Thanks a lot!
  • @2take2 or anybody with insight. What is the best way to determine which engines are performing best?

    Would you recommend starting off by scraping with all footprints for engine types I want to use and then filter it down later by comparing submitted to verified after a few scrapes?
  • @2take2 you can make it easier in this link: http://textmechanic.com/Add-Prefix-Suffix-to-Text.html just add the footprints, type de sufix %KW and thats it :)
  • 2Take22Take2 UK
    edited March 2014
    @hadoken - Yes I would do exactly that, but be aware that the best performing engines often change due to changes in the registration process etc.

    @muxmkt - Although I've moved on quite a bit from the way I used to scrape when I originally posted in this thread, that link will still come in handy.  Many thanks - bookmarked ;)
  • @2take2 thanks for that, makes sense
  • You are one Cool dude 2Take2...

    Many many thanks for this post. I've been searching for 2 days now, trying to learn how to scrape for GSA. I'm getting some decent results for the "not so competitive" searches, but I need a ton more targets for the tougher ones... and this will help me out A LOT...

    Thanks man... 
    ^:)^
  • @2Take2 Thanks for sharing this. I've just tried (133k keywords and footprints of 5 article engines) and scrape box is unable to finish the task (of replacing %KW% in front of every line) and it shutdowns .. Does anyone have the same problem ?
  • Hello, @gooner Do you also remove dup domains after gscraper complete or remove only dup urls ?
  • edited December 2014
    2Take2 hello thanks for this info but I would like to edit something 

    you don't need to add "%KW%" , you can do as follow instead : 

    - import footprints

    - click merge button and add the desired list of keywords 

    you will have the same result with less hassle 

    correct me if I'm wrong 

    Thanks
  • @kiosh maybe you need a smaller KW file?

    @all, do you merge all footprints? Like Article, blogs and the like? Or it's more effective if I use article footprints only?
  • MaXMaX Portugal
    hi guys maybe you can help me , can you please explain how I can take footprints for the engines I want from SER

    Your help is appreciated
  • @MaX Via GSA build-in scraper image
  • MaXMaX Portugal
    Hi Unkown 717

    thanks for the help, I have looked at the Footprint Studio before but could not figure it out. What you have shown me is great, will have no problem there. Thanks again.
Sign In or Register to comment.