Is there a way I can scrape directories, yellow pages, etc and get a listing of company information with their website?
Comments
Deeeeeeee the Americas
Amazing question. I've been thinking about this a lot.
I would think you can create an engine to collect any information from a page/site/?? that you can classify as scrapable based on set criteria and output the results to a file.
I have NEVER made any new engines, so...I'm just interested in finding out, too.
Sven www.GSA-Online.de
is this for SER or GSA Website Contact?
elindo586
Yes... the idea is that I want to extract websites from certain categories so that I can then use those sites for the GSA website contact.... It is a way to get more targeted information.
Thanked by 1Deeeeeeee
Sven www.GSA-Online.de
ok, because you posted in the wrong category then. right now you can only add that as a custom search engine when editing se.dat file.
Thanked by 1Deeeeeeee
elindo586
Thank you..
Deeeeeeee the Americas
edited February 2019
OK, so to clarify, Sven, do you mean that all I have to do is add a new directory would be to EITHER %search% for the KWs that are in Project, OR use specific fixed search terms in the engine itself, OR use the URL of the sub-category of the directory right in the GSA-WC file, in the format of others already there out-of-box?
Select THAT NEW SE ONLY, and then Website Contact is already able to parse even those types of data sources--not just Yoogle And Gahoo?--without any further modifications!??!
So no need to use SER for this task at ALL . But that would have also been doable, tho unnecessarily more complex, no?
Deeeeeeee the Americas
edited February 2019
I used a fixed URL ,using the "%search%" in the sub-cat I want in SE file.
I set %search% as the KWs.
Let's see if that works.
Hmm...Lots of directories have Business Name and phone only.
Scraping the name would be cool if u need a list of businesses and phone numbers.
But you can also then take those business names (with a SER engine u create, I presume---but HOW?!) and then put them into Website Contact as long list of KWs, and u should get those websites associated with those brands returned as results, if existing.
Here is a bunch of footprints which you can use for directories. These are pulled from my list of verified directory links which i extracted using Xrumer's link patern analysis, and some i had from long ago.
Some of these might be article directories, hopefully you can extract the ones that could work for you.
The following lot, is extracted from the above list, but the ones below all contain the word "directory" in the footprints, so this might be better to start with
Comments
These are pulled from my list of verified directory links which i extracted using Xrumer's link patern analysis, and some i had from long ago.
Some of these might be article directories, hopefully you can extract the ones that could work for you.
https://pastebin.com/N0jqCe4Q
https://pastebin.com/4xqAQ9FV