Amazing question. I've been thinking about this a lot.
I would think you can create an engine to collect any information from a page/site/?? that you can classify as scrapable based on set criteria and output the results to a file.
I have NEVER made any new engines, so...I'm just interested in finding out, too.
Yes... the idea is that I want to extract websites from certain categories so that I can then use those sites for the GSA website contact.... It is a way to get more targeted information.
OK, so to clarify, Sven, do you mean that all I have to do is add a new directory would be to EITHER %search% for the KWs that are in Project, OR use specific fixed search terms in the engine itself, OR use the URL of the sub-category of the directory right in the GSA-WC file, in the format of others already there out-of-box?
Select THAT NEW SE ONLY, and then Website Contact is already able to parse even those types of data sources--not just Yoogle And Gahoo?--without any further modifications!??!
So no need to use SER for this task at ALL . But that would have also been doable, tho unnecessarily more complex, no?
I used a fixed URL ,using the "%search%" in the sub-cat I want in SE file.
I set %search% as the KWs.
Let's see if that works.
Hmm...Lots of directories have Business Name and phone only.
Scraping the name would be cool if u need a list of businesses and phone numbers.
But you can also then take those business names (with a SER engine u create, I presume---but HOW?!) and then put them into Website Contact as long list of KWs, and u should get those websites associated with those brands returned as results, if existing.
Here is a bunch of footprints which you can use for directories. These are pulled from my list of verified directory links which i extracted using Xrumer's link patern analysis, and some i had from long ago.
Some of these might be article directories, hopefully you can extract the ones that could work for you.
The following lot, is extracted from the above list, but the ones below all contain the word "directory" in the footprints, so this might be better to start with
Comments
These are pulled from my list of verified directory links which i extracted using Xrumer's link patern analysis, and some i had from long ago.
Some of these might be article directories, hopefully you can extract the ones that could work for you.
https://pastebin.com/N0jqCe4Q
https://pastebin.com/4xqAQ9FV