Scraping Directories.

February 2019

Is there a way I can scrape directories, yellow pages, etc and get a listing of company information with their website?

February 2019

Amazing question. I've been thinking about this a lot.

I would think you can create an engine to collect any information from a page/site/?? that you can classify as scrapable based on set criteria and output the results to a file.

I have NEVER made any new engines, so...I'm just interested in finding out, too.

February 2019

is this for SER or GSA Website Contact?

February 2019

Yes... the idea is that I want to extract websites from certain categories so that I can then use those sites for the GSA website contact.... It is a way to get more targeted information.

February 2019

ok, because you posted in the wrong category then. right now you can only add that as a custom search engine when editing se.dat file.

February 2019

Thank you..

February 2019

OK, so to clarify, Sven, do you mean that all I have to do is add a new directory would be to EITHER %search% for the KWs that are in Project, OR use specific fixed search terms in the engine itself, OR use the URL of the sub-category of the directory right in the GSA-WC file, in the format of others already there out-of-box?

Select THAT NEW SE ONLY, and then Website Contact is already able to parse even those types of data sources--not just Yoogle And Gahoo?--without any further modifications!??!

So no need to use SER for this task at ALL . But that would have also been doable, tho unnecessarily more complex, no?

February 2019

I used a fixed URL ,using the "%search%" in the sub-cat I want in SE file.
I set %search% as the KWs.

Let's see if that works.

Hmm...Lots of directories have Business Name and phone only.

Scraping the name would be cool if u need a list of businesses and phone numbers.

But you can also then take those business names (with a SER engine u create, I presume---but HOW?!) and then put them into Website Contact as long list of KWs, and u should get those websites associated with those brands returned as results, if existing.

February 2019

Here is a bunch of footprints which you can use for directories.
These are pulled from my list of verified directory links which i extracted using Xrumer's link patern analysis, and some i had from long ago.

Some of these might be article directories, hopefully you can extract the ones that could work for you.

https://pastebin.com/N0jqCe4Q

February 2019

The following lot, is extracted from the above list, but the ones below all contain the word "directory" in the footprints, so this might be better to start with

https://pastebin.com/4xqAQ9FV

Scraping Directories.

Comments