Important Question: How do i Scrape Links with Scrapebox to use with GSA SER? + More questions :)
hi ive been using gsa for a few months now and i have always wondered a few things the first thing is how exactly do i use scrapebox to get amazing links for GSA SER also do i have to trim the urls to only thee homepage / Root url?
also im using gsa on a vps with 10 private proxies and mixing them with around 100+ public proxies with death by captcha
my niche is a hard one as its a money making niche and im just hoping im doing everything right
im scraping for keywords related to my niche with scrapebox i do have 2 projects running to my money site on strict filters no junk pr 1-10 and set it up like how ron has it
my LMP is at most around 1.36 but im thinking this is due to my keywords im using cause of the niche i am in with social media
what i need though it to learn how to scrape good links that gsa will use greatly with scrapebox ive been looking for videos or guides and i cannot find one that is very specific on what exactly i need to do to get the links
and i seem to be getting ALOT of duplicate urls and domains around 10,000 dup urls which is wasting resources and captchas
Comments
I use gscraper because it has the option not scrape duplicate url's, not sure if scrapebox has that but if it does then that should remove the duplicate url problem.
If not, gscraper has a free version which allows you to scrape 100,000 urls at a time. That is fine for our needs.
How i do it is take the footprints for the engines i want from SER and paste them into gscraper. Then add your keywords, but if you want niche specific keywords you will always get limited results.
I just use a list of 100,000 generic keywords... Use maybe 1000 keywords per scrape... Leave it running overnight and you will have something like 50,000 - 100,000 unique urls.
Import into SER and let it identify them and you should find it can post to around 60 - 80% of the scraped urls... so around 30,000 - 80,000 viable targets scraped per night.
That's how i do it but i'm sure some people here will have even better methods,
For using your lists, you can import directly into a project or multiple projects by selecting them and right clicking and then "import target urls".
For me, i get better success if i use the "options", "advanced", "tools" and then "import urls and identify".
If you do that make sure you have "saves identified sites to" selected in "options-advanced" and make sure all projects are set to post from identified lists (I also set it to post from verified lists too).
Hope that makes sense?
I'm by no means an expert, but I use scrape box to scrape for target to import into SER, and this is my process (You probably know most of this already, but I'll post it anyway to help others who may not)..
Assuming that you have already gone through the 'submitted vs verified' lists and worked out which engines perform best.
- Open GSA SER
- Click: Options >>Advanced>>Tools>>Search online for urls>>Add predefined footprint
- Find the footprints you want, add them to the list, and copy them from the SER popup into to notepad (one per line)
- Copy the list to excel starting from cell A1
- In cell B1 type:- space"%KW%" (don't type space, just leave a space at the beginning)
- Drag cell B1 down to the bottom of the list.
- In cell C1 type:- =A1&B1 and drag it down the list.
You should now have a list in column C that looks something like this:
powered by oxwall "%KW%"
Powered By Oxwall "Join" "%KW%"
inurl:"/oxwall/blogs" "%KW%"
oxwall "User blogs" -oxwall.org "%KW%"
Top Rated "Most Discussed" "Browse by Tag" "%KW%"
- Copy the whole of column C to notepad and save it as something memorable like "GSA social scraping footprints".
- Open Scrapebox and set the delay, proxies, connections, search engines etc. to your liking.
- Import your list of keywords.
- Hit the [M] merge button and import the list of footprints that you just made with excel (as you have used the place holder "%KW%" you should now see your list of footprints with your keywords added onto the end in quotes, like so;
powered by oxwall "YOUR KEYWORD"
- Set Scrapebox to harvest over night and, go to bed, have a beer, bang your missus, or whatever else flicks your switch.
When you wake up in the morning you should have about 1 million URLs (depending on your settings).
- Use the remove duplicates feature in scrape box.
- Export them as a text file and save them.
- Open SER and right click on the project/s that you want to use them in.
- Modify project>> Import Target URLs>>Choose the scraped url file
- Sit back and leave SER to be the little mean green spamming machine that she is.
Hope this helps!
Roughly how many do you end up with after removing dups?
Thanks @gooner
Normally end up with about 100k uniques, depending on if I'm using niche related KWs for things like blogs, forums etc, or if I'm using more general KWs for things like social network, article etc.
Would you recommend starting off by scraping with all footprints for engine types I want to use and then filter it down later by comparing submitted to verified after a few scrapes?
@muxmkt - Although I've moved on quite a bit from the way I used to scrape when I originally posted in this thread, that link will still come in handy. Many thanks - bookmarked
@all, do you merge all footprints? Like Article, blogs and the like? Or it's more effective if I use article footprints only?
Your help is appreciated
thanks for the help, I have looked at the Footprint Studio before but could not figure it out. What you have shown me is great, will have no problem there. Thanks again.