Scraping my own set of targets

loopline · July 2014

@Olve1954
You could also use the custom harvester and do any time span you want.

velsytetra · July 2014

@squawk1200 and @cherub, thank you for the video information.

jpvr90 · August 2014

Was scraping 80k keywords list with SB for a couple hours today on extra server I have, I decide to stop harvester and boom! SB crashes as it usually does when handling large keywords list..now I don't even know which keywords have been processed. Back to square one, what a waste of time.

This is exactly why for scraping I recommend Gscraper over Scrapebox.

/rant

KayKay · August 2014

hmm just look into your harvested links folder .. and check the files maybe helps

derdor · August 2014

jpvr90
exactly
i can't believe that scrapebox devs are not listening to hardcore users for the past 2-3 years.

1- Able to import proxies in a timely fashion. (do not tell me custom harvester. There is literally no way for custom harvester can keep up million keywords. Crashing with %100 rate.)
2- Show us keywords that has/has note been processed or better yet just auto export them.

They are too stubborn to implement these 2 very important features.

Sorry but since you don't have 10k private proxies, scraping with private proxies are gone. But it seems scrapebox devs are not hardcore scrapers cause they keep not enabling to import new proxies during scraping.

Good news is a month ago i opened update addons tab. And saw 4-5 add-ons are greyed out with 64-bit notes next to them.

I am hoping/believing 64 bit version will have most of the features we request.

1linklist · August 2014

Gscraper is the way to go if you really want to get into scraping your own lists.

A quick, easy, low-time intensive way to throw some together however, is an ahrefs subscription

icarusVN · October 2014

"I guess I should restate in that I used to use scraping for raw urls, now days I have become adept at "tracing" other peoples work and I just find people who are doing great things and utilize scraping to find and examine what they do and dial things in. I don't need to process raw billions of urls, I just go see what everyone else accomplished with their hard work and "borrow" the process, combine it with what already know, bounce it off people in the know and dial in. "

...This piqued my interest...

It doesn't have to be an either or situation. I have both programs and SB's crashing does infuriate but they both have their place.

@looplineSome of your videos have been of tremendous benefit. Many thanks

Don · October 2014

@loopline May I ask what proxies you are using for the scraping? Thanks

bluenun · October 2014

The original poster asked this specific question (which no one seems to answer)

"Is there a place in SER where a list like this can be inserted? Is doing this a benefit, or is it just as good to let SER find the sites based on the keywords I gave it? Are people out there using Scrapebox?"

I have the same question - do I really need to use ScrapeBox to get target sites ? as it seems GSA does this already - or have I missed something ?

icarusVN · October 2014

You don't need a separate scraper. Many of the more experienced users here do scrape the targets themselves due to the fact that using a different scraper is more efficient.

It is much more hands on and time consuming though. Loopline has a good YouTube channel with lots of information on how to scrape with Scrapebox. A member here has a good video series on scraping too @DonaldBeck.

Do you need to scrape yourself? No you don't.

loopline · October 2014

@Don
I have used all sorts of proxy providers, right now I am enjoying proxy rack, as well as back connect proxies from reverse proxies (although I have these for other purposes first, but they work good enough for scraping) and then I have a lot of buyproxies.org proxies as well.

If you set a delay you can use your own IP, I use my own IP on every server I have with a 100 second +- delay and can get a surprising amount of results.

@bluenun
Sven has showed how you can use the search online for urls feature under tools to scrape in SER, but the built in scraper that is in the projects is pretty slow, IMHO. With Scrapebox you can get things done faster, and I believe with more flexibility.

Once Scrapebox 2.0 comes out it will have several distinct advantages over SER for scraping, such as it can be exponentially faster, as well as having 64bit allows you to utilize the speed but still work with massive footprint/keyword lists.

I use Scrapebox to scrape for my SER, but I do plan on testing out SER with some very specific footprints, like a 24 hour scrape for specific things. I think that having that in SER would be great, and your only scraping it 1 time every 24 hours so speed is not crucial accuracy is.

Don · October 2014

@loopline That's interesting thank you for your reply. If you don't mind can you explain more how I could use my own IP to scrape with the delay you mention. I don't want to get big lists, just a decent size of relevant targets. Ive tried most proxies and some are good but I'm excited to try the naked route

Thanks

icarusVN · October 2014

@Don‌ there is a full tutorial on YouTube. Search scrapebox + loopline

Don · October 2014

Is there? OK, will check that. Thanks @icarusVN‌

loopline · October 2014

@don
Ive lost track of what I cover in all the videos, so its probably there, but basically in Scrapebox 1.x you choose a delay from the delay drop down in the lower right hand quadrant. Choose RND (random).

Then under settings at the top go to adjust rnd delay range. Then set min to like 50 and max to 60.

Then go to settings and uncheck both use custom harvester, and use multi threaded harvester.

Then just load up keywords as normal and scrape google, its generally slow enough you can get a lot done. The only caveat is that the delay only kicks in pages with results. So if you load in a ton of keywords/footprints that have 0 results it can still wind up going to fast, although generally this is not an issue.

In 2.0 there is a delay box right in the detailed harvester when it pops up, and I just pick a number like 103 seconds and go with it and it works well.

icarusVN · October 2014

@loopline‌ will the new 64bit scrapebox be a paid update?

Don · October 2014

@icarusVN‌ Scrapebox 2 is free upgrade
@loopline‌ Thanks. I ended up watching a lot of your videos last night. Very helpful. I bought SB years ago but never knew the power of it and I guess a lot of others are like that. Of course there's SB 2.0 soon. Can't wait 4 that...

loopline · October 2014

@icarusVN
The new 64bit and 32bit version of Scrapebox will be free updates.

@Don
yes 2.0 will bring even more power, it has some really cool features and its not even done, so Im sure it will bring more. Plus it lays the framework for future updates, and they can build in more things where as before it was often messy to add new things now its clean and laid out with the future in mind.

bluenun · November 2014

Cant wait to see what the new version will do

JackLien · November 2014

satyr85 hi, after remove duplicates, how many links you may get per day ?

Scraping my own set of targets

Comments