Skip to content

Keyword Generator - Made in Java - For Scraping

magicallymagically http://i.imgur.com/Ban0Uo4.png
edited April 2015 in Buy / Sell / Trade
Hello eveyrone:)

Well I was kind of borred:P 

So I made a small Tool, that is able to generate several UNIQUE keywords - something that can be used for Scrapebox and GScraper...

Nothing Fancy - Just plain and very simple.

It's still a kind of a Proto-Type - More development will be made (More Useful tools will be added overt time)



image

image

What is does:

It's a java program (Can run on multiple platforms) - It will read a large textfile, like a book. 

Then it will make a UNIQUE list from that book/sourcefile.

No dupes - Just Unique Keywords - That may be used for scraping or other stuff...

image


image


image

image

As I did spend a few hours making it - A small donation of 5$ for each purchace will be hugely appreciated.

If you are interested in this small tool (prepare to make a small donation and recieve it soonish) - feel free to send me a PM.

I expect the small tool to be ready for final launch in about a week from now..

Minor adjustments needs to be implemented before final release;)

Normally I would never approach a forum and sell something - consider this as an exception - an offer for those who are interested.

Comments are welcome of course.
«13456

Comments

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - added Execution Time for the entire operation:

    image

    In this case - It took 42 Seconds to generate 14098 Unique Keywords - Based on a book containing 361.961 words.

    That is pretty damn fast for the record;)
  • Good job! Keep it up mate! :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @young_gooner
    Many thanks;) More is in the sleeves - some additional tools to do the trivial work and make things more easy.

    I was really tired of looking for keywords - so I made this little thingy...

    Surely there will be some "slow" keywords in the generated lists - However tweaks for handling such things can be implemented later.
  • KaineKaine thebestindexer.com
    edited April 2015
    Nice, possible to grab website keyword (with url) ? :)

    Can be good for work on competitor.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Indeed a good suggestion - Will be added too in future updates.
    - Add a list with site urls
    - Grab Kewords from Targets
    - Finally sort and make a final list based on results.
  • That could be handy. Does it work only with English language?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @delta_squad
    That was indeed a good question!

    Right now the actual sorting handles English words - However you are absolutely right in your observations.

    Several different sorting algorithms must be implemented to handle various language - where the user simply select which sorting they want to run on the target file.

    Not so complicated to implement - just takes some time to add support for various languages.

    And yes, I will add support for this as well;)


  • Awesome! Right now I'm using furykyle's keyword lists for scraping and I'm not going to run out of keywords any time soon but with this tool you could generate potentially huge amounts of keywords when support for other languages is added. :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @delta_squad
    You are absolutely correct here - I also used furykyle's keyword lists until recently, as they cover other languages.

    However - It would be a nice addition, if we are able to generate our very own keywords 'on the fly' whenever we want to. 

    On top of that, we will reduce amount of people using the very same keywords in the scrapings.

    This tool enables us to do exactly such kind of tasks;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @Kaine

    Well I actually build another proto-type (even though my time is limited during Easter)....
    This little proto-type demonstrates most parts of your suggestion:

    This demo scans 3 urls and extracts their KeyWords.
    Results are printet out on screen - just for test purpose.


    image

    Bare in mind that this demo, just runs in a prompt to show the idea...

    I think you can imagine the rest of the story
     
    - Combining those results and then sort them...
    - Add suppport for proxy
    - Enhancement to be able to run as multi-threaded
    -Etc...

    Could be useful for some, to include such a feature into this Tool;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Just some new updates on topic...

    - I changed the Grafical Interface a little
    - Prepared support of various sorting algorithms (Via User Selection)
    - Added Tab Feature (Additional Tools will be added)
    - Proto Type of URL Keyword Extractor prepared (Will be added in Tab2)

    Upcomming Tabs:

    Clean Scrapings
    .pdf, .xml, .mp3, .chm, .ppt and so on...
    - Ensure unique urls for GSA

    EDU/GOV Sorter:
    - Sort all urls - Keep only .edu and .gov
    - Keep Unique Urls Only
    - Remove unneeded extensions like .xml, .pdf etc...

    image
  • KaineKaine thebestindexer.com
    Think you can make good tool with that ;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Many thanks for you kind words:)
    Work is in full progress, and I will update this thread from time to time
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Additional features implemented - Bare in mind it's still not complete...

    Features Added:
    - Sort scrapings from scrapebox/gscraper
    - Remove all unneeded like .xml, .pdf, .mp3, .swf, .pdf and more...
    - Keep unique urls only (remove duplicate urls - not domain)
    - Added switch to change sorting algorithm - Mode Normal or Mode EDU/GOV


    The demo below is using 'Mode Normal' - as the actual switch is not complete yet..

    image


    image

    image

    image

    image

    Upcomming Work:

    -Final implementation of Keyword Scraper (Tab 2)
    - Final implementation of switch for normal versus edu/gov mode
    - Adjustments of GUI
    - Cleanup and test

    Later releases will include:
    - Support for different languages (Keyword Generator - Algorithms)
    - Enhancements of existing functionality
    - Addon's

    Stay tuned for more information;)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update:

    - Switch for sorting scrapings, either in "Normal Mode" or "EDU/GOV Mode" fully implemented.
    in both cases, all garbage is removed 'on the fly', leaving just a list of UNIQUE URLS as a result.

    *Garbage = .xml, .pdf, .mp3, .swf, .pdf and more...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    URL Keyword Extractor Update:

    - Enhanced the code and implemented support for 30 threads (default)

    image

    Note: Dont mind the messy output in the picture - Will be sorted in the Graphical Part.


    In progress:
    - Implementation of the graphical part of this feature (GUI)
    - Cleaning up
  • KaineKaine thebestindexer.com
    edited April 2015
    It take keyword meta or keyword in page ?

    At this level maybe add directly article web scraper too ...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    It currently takes the meta keywords from the target-page.

    - Support for both could be implemented later too.

    And and article scraper seems to be interesting to add too;)
    *Your suggestion has been added to the to-do-list.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update - Early implementation of the URL Scraper Function in the Graphical Interface!

    image

    Tasks completed:

    image

    Features:
    - Using 30 Threads
    - Scraping Meta Keywords from Target URL'S (List Input)
    - Generates a list with UNIQUE Keywords - Based on the results
    - Showa Log-visits of target-urls in real time


    *That's it for now - taking a small break to clear the mind and so I can look at it again with fresh eyes and do some cleaning/adjustsments.

    Comments are still very welcome of course - feel free to jump in;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    PS...

    I just took a larger sample to see if it actually works - 1000 random urls:

    Finished all threads
    unique words : 2734
    total words : 7736
    Destination: C:\SEO2015\WillItWork.txt
    Closing Buffered Writer and finishing...

    image

    Speed was actually fast - exactly as expected;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update:

    - Adjusted formating in Log Message

    image

    image

    *Note: Maybe add funtion to trim to root - depending of the job.....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Here is a sample of the speed - using standard footprints and some keywords generated with this Tool:

    Test was done using my home-connection and a laptop.

    image


    I'm quite sure some of you hardcore scrapers, are able to pull even higher speeds with some quality footprints...Unfortunately i'm not that good at making footprints:P

    Actually the speed was increasing, at the moment of this comment: 34723 urls/pr min and still getting higher....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    LOL - Better add the proof for you guys to see....
    image



    *Edit

    A few min later:

    image

    *Edit

    Last one - should settle it:P

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Some might wonder - how about the performance in GSA Ser????

    Well let's take a look:

    Verified preview:

    image image

    Performance (after cleaning the raw scrapings with this Tool)

    image

    Random picked message from GSA SER LOG:

    image

    And some more verified - just to show scraping did go well (Final Output):

    image

    Conclusion:

    It's possible to use the new Tool, to generate decent keywords for scrapings and clean lists.

    Feel free to comment;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Work in progress:

    1. In the upcoming days, I intend to attempt to develop a simple Article Scraper, that will be added to this Tool-box.

    2. Additional enhancements of existing code, and cleanup.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update

    - Basic implementation of URL Extractor (Small Part of Article Scraper)
    - Demo Only (Just console - Not in GUI yet)

    Extraction of all urls - on given target:
    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update:

    - Prepared MERGE FILES functionality - Merge several text-files into one text-file.

    Demo - Consol Window:

    image

    image

    The Graphical part, where user selects files is easy to implement!

    The above image shows files A.txt+B.txt+C.txt are merged into one big file-->Merged.txt

    A great feature to add into the existing ones in this Tool-Box for Scrapings

    Feel free to comment

    More to come...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update:

    For better understanding - I made a quick implementation in the Graphical User Interface:

    image

    Result:

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Here is a demo of a large test.

    1. I Merged Several files from Gscraper (Target18.txt)
    2. I use the 'Clean Scrapings' function in Scraping Tool-Box (Target18-Cleaned.txt)

    imageReady to load into GSA SER.....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015

    Update:

    Prepared algorithm to generate various Random Footprints.
    - Handy for the lazy ones - Make your tasks more random

    - Will be added under "Various Tools"

    Preview of article generation in console (A few only for demonstration purpose...):

    image
    The user will be able to select X-Number of Footprints - which will be generated randomly.
Sign In or Register to comment.