Skip to content

Scraping Directories.

Is there a way I can scrape directories, yellow pages, etc and get a listing of company information with their website? 

Comments

  • DeeeeeeeeDeeeeeeee the Americas
    Amazing question. I've been thinking about this a lot.

    I would think you can create an engine to collect any information from a page/site/?? that you can classify as scrapable based on set criteria and output the results to a file.

    I have NEVER made any new engines, so...I'm just interested in finding out, too. :p:p
  • SvenSven www.GSA-Online.de
    is this for SER or GSA Website Contact?
  • Yes... the idea is that I want to extract websites from certain categories so that I can then use those sites for the GSA website contact....  It is a way to get more targeted information. 
    Thanked by 1Deeeeeeee
  • SvenSven www.GSA-Online.de
    ok, because you posted in the wrong category then. right now you can only add that as a custom search engine when editing se.dat file.
    Thanked by 1Deeeeeeee
  • Thank you..
  • DeeeeeeeeDeeeeeeee the Americas
    edited February 2019
    OK, so to clarify, Sven, do you mean that all I have to do is add a new directory would be to EITHER %search% for the KWs that are in Project, OR use specific fixed search terms in the engine itself,  OR use the URL of the sub-category of the directory right in the GSA-WC file, in the format of others already there out-of-box?

    Select THAT NEW SE ONLY, and then  Website Contact is already able to parse even those types of data sources--not just Yoogle And Gahoo?--without any further modifications!??!

    So no need to use SER for this task at ALL . But that would have also been doable, tho unnecessarily more complex, no?

  • DeeeeeeeeDeeeeeeee the Americas
    edited February 2019

    1. I used a fixed URL ,using the  "%search%" in the sub-cat I want in SE file.
    2. I set %search% as the KWs.
    Let's see if that works.

    Hmm...Lots of directories have Business Name and phone only.

    Scraping the name would be cool if u need a list of businesses and phone numbers.

    But you can also then take those business names (with a SER engine u create,  I presume---but HOW?!) and then put them into Website Contact as long list of KWs, and u should get those websites associated with those brands returned as results, if existing.
  • royalmiceroyalmice WEBSITE: ---> https://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    edited February 2019
    Here is a bunch of footprints  which you can use for directories.
    These are pulled from my list of verified directory links which i extracted using Xrumer's link patern analysis,  and some i had from long ago.

    Some of these might be article directories, hopefully you can extract the ones that could work for you.

    https://pastebin.com/N0jqCe4Q
  • royalmiceroyalmice WEBSITE: ---> https://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    edited February 2019
    The following lot, is extracted from the above list, but the ones below all contain the word "directory" in the footprints, so this might be better to start with

    https://pastebin.com/4xqAQ9FV
Sign In or Register to comment.