Skip to content

Keyword Generator - Made in Java - For Scraping

magicallymagically http://i.imgur.com/Ban0Uo4.png
edited April 2015 in Buy / Sell / Trade
Hello eveyrone:)

Well I was kind of borred:P 

So I made a small Tool, that is able to generate several UNIQUE keywords - something that can be used for Scrapebox and GScraper...

Nothing Fancy - Just plain and very simple.

It's still a kind of a Proto-Type - More development will be made (More Useful tools will be added overt time)



image

image

What is does:

It's a java program (Can run on multiple platforms) - It will read a large textfile, like a book. 

Then it will make a UNIQUE list from that book/sourcefile.

No dupes - Just Unique Keywords - That may be used for scraping or other stuff...

image


image


image

image

As I did spend a few hours making it - A small donation of 5$ for each purchace will be hugely appreciated.

If you are interested in this small tool (prepare to make a small donation and recieve it soonish) - feel free to send me a PM.

I expect the small tool to be ready for final launch in about a week from now..

Minor adjustments needs to be implemented before final release;)

Normally I would never approach a forum and sell something - consider this as an exception - an offer for those who are interested.

Comments are welcome of course.

Comments

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - added Execution Time for the entire operation:

    image

    In this case - It took 42 Seconds to generate 14098 Unique Keywords - Based on a book containing 361.961 words.

    That is pretty damn fast for the record;)
  • Good job! Keep it up mate! :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @young_gooner
    Many thanks;) More is in the sleeves - some additional tools to do the trivial work and make things more easy.

    I was really tired of looking for keywords - so I made this little thingy...

    Surely there will be some "slow" keywords in the generated lists - However tweaks for handling such things can be implemented later.
  • KaineKaine thebestindexer.com
    edited April 2015
    Nice, possible to grab website keyword (with url) ? :)

    Can be good for work on competitor.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Indeed a good suggestion - Will be added too in future updates.
    - Add a list with site urls
    - Grab Kewords from Targets
    - Finally sort and make a final list based on results.
  • That could be handy. Does it work only with English language?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @delta_squad
    That was indeed a good question!

    Right now the actual sorting handles English words - However you are absolutely right in your observations.

    Several different sorting algorithms must be implemented to handle various language - where the user simply select which sorting they want to run on the target file.

    Not so complicated to implement - just takes some time to add support for various languages.

    And yes, I will add support for this as well;)


  • Awesome! Right now I'm using furykyle's keyword lists for scraping and I'm not going to run out of keywords any time soon but with this tool you could generate potentially huge amounts of keywords when support for other languages is added. :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @delta_squad
    You are absolutely correct here - I also used furykyle's keyword lists until recently, as they cover other languages.

    However - It would be a nice addition, if we are able to generate our very own keywords 'on the fly' whenever we want to. 

    On top of that, we will reduce amount of people using the very same keywords in the scrapings.

    This tool enables us to do exactly such kind of tasks;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @Kaine

    Well I actually build another proto-type (even though my time is limited during Easter)....
    This little proto-type demonstrates most parts of your suggestion:

    This demo scans 3 urls and extracts their KeyWords.
    Results are printet out on screen - just for test purpose.


    image

    Bare in mind that this demo, just runs in a prompt to show the idea...

    I think you can imagine the rest of the story
     
    - Combining those results and then sort them...
    - Add suppport for proxy
    - Enhancement to be able to run as multi-threaded
    -Etc...

    Could be useful for some, to include such a feature into this Tool;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Just some new updates on topic...

    - I changed the Grafical Interface a little
    - Prepared support of various sorting algorithms (Via User Selection)
    - Added Tab Feature (Additional Tools will be added)
    - Proto Type of URL Keyword Extractor prepared (Will be added in Tab2)

    Upcomming Tabs:

    Clean Scrapings
    .pdf, .xml, .mp3, .chm, .ppt and so on...
    - Ensure unique urls for GSA

    EDU/GOV Sorter:
    - Sort all urls - Keep only .edu and .gov
    - Keep Unique Urls Only
    - Remove unneeded extensions like .xml, .pdf etc...

    image
  • KaineKaine thebestindexer.com
    Think you can make good tool with that ;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Many thanks for you kind words:)
    Work is in full progress, and I will update this thread from time to time
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Additional features implemented - Bare in mind it's still not complete...

    Features Added:
    - Sort scrapings from scrapebox/gscraper
    - Remove all unneeded like .xml, .pdf, .mp3, .swf, .pdf and more...
    - Keep unique urls only (remove duplicate urls - not domain)
    - Added switch to change sorting algorithm - Mode Normal or Mode EDU/GOV


    The demo below is using 'Mode Normal' - as the actual switch is not complete yet..

    image


    image

    image

    image

    image

    Upcomming Work:

    -Final implementation of Keyword Scraper (Tab 2)
    - Final implementation of switch for normal versus edu/gov mode
    - Adjustments of GUI
    - Cleanup and test

    Later releases will include:
    - Support for different languages (Keyword Generator - Algorithms)
    - Enhancements of existing functionality
    - Addon's

    Stay tuned for more information;)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update:

    - Switch for sorting scrapings, either in "Normal Mode" or "EDU/GOV Mode" fully implemented.
    in both cases, all garbage is removed 'on the fly', leaving just a list of UNIQUE URLS as a result.

    *Garbage = .xml, .pdf, .mp3, .swf, .pdf and more...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    URL Keyword Extractor Update:

    - Enhanced the code and implemented support for 30 threads (default)

    image

    Note: Dont mind the messy output in the picture - Will be sorted in the Graphical Part.


    In progress:
    - Implementation of the graphical part of this feature (GUI)
    - Cleaning up
  • KaineKaine thebestindexer.com
    edited April 2015
    It take keyword meta or keyword in page ?

    At this level maybe add directly article web scraper too ...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    It currently takes the meta keywords from the target-page.

    - Support for both could be implemented later too.

    And and article scraper seems to be interesting to add too;)
    *Your suggestion has been added to the to-do-list.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update - Early implementation of the URL Scraper Function in the Graphical Interface!

    image

    Tasks completed:

    image

    Features:
    - Using 30 Threads
    - Scraping Meta Keywords from Target URL'S (List Input)
    - Generates a list with UNIQUE Keywords - Based on the results
    - Showa Log-visits of target-urls in real time


    *That's it for now - taking a small break to clear the mind and so I can look at it again with fresh eyes and do some cleaning/adjustsments.

    Comments are still very welcome of course - feel free to jump in;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    PS...

    I just took a larger sample to see if it actually works - 1000 random urls:

    Finished all threads
    unique words : 2734
    total words : 7736
    Destination: C:\SEO2015\WillItWork.txt
    Closing Buffered Writer and finishing...

    image

    Speed was actually fast - exactly as expected;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update:

    - Adjusted formating in Log Message

    image

    image

    *Note: Maybe add funtion to trim to root - depending of the job.....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Here is a sample of the speed - using standard footprints and some keywords generated with this Tool:

    Test was done using my home-connection and a laptop.

    image


    I'm quite sure some of you hardcore scrapers, are able to pull even higher speeds with some quality footprints...Unfortunately i'm not that good at making footprints:P

    Actually the speed was increasing, at the moment of this comment: 34723 urls/pr min and still getting higher....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    LOL - Better add the proof for you guys to see....
    image



    *Edit

    A few min later:

    image

    *Edit

    Last one - should settle it:P

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Some might wonder - how about the performance in GSA Ser????

    Well let's take a look:

    Verified preview:

    image image

    Performance (after cleaning the raw scrapings with this Tool)

    image

    Random picked message from GSA SER LOG:

    image

    And some more verified - just to show scraping did go well (Final Output):

    image

    Conclusion:

    It's possible to use the new Tool, to generate decent keywords for scrapings and clean lists.

    Feel free to comment;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Work in progress:

    1. In the upcoming days, I intend to attempt to develop a simple Article Scraper, that will be added to this Tool-box.

    2. Additional enhancements of existing code, and cleanup.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update

    - Basic implementation of URL Extractor (Small Part of Article Scraper)
    - Demo Only (Just console - Not in GUI yet)

    Extraction of all urls - on given target:
    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update:

    - Prepared MERGE FILES functionality - Merge several text-files into one text-file.

    Demo - Consol Window:

    image

    image

    The Graphical part, where user selects files is easy to implement!

    The above image shows files A.txt+B.txt+C.txt are merged into one big file-->Merged.txt

    A great feature to add into the existing ones in this Tool-Box for Scrapings

    Feel free to comment

    More to come...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update:

    For better understanding - I made a quick implementation in the Graphical User Interface:

    image

    Result:

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Here is a demo of a large test.

    1. I Merged Several files from Gscraper (Target18.txt)
    2. I use the 'Clean Scrapings' function in Scraping Tool-Box (Target18-Cleaned.txt)

    imageReady to load into GSA SER.....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015

    Update:

    Prepared algorithm to generate various Random Footprints.
    - Handy for the lazy ones - Make your tasks more random

    - Will be added under "Various Tools"

    Preview of article generation in console (A few only for demonstration purpose...):

    image
    The user will be able to select X-Number of Footprints - which will be generated randomly.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update (Despite the little interest - the development continues....)
    - Added Compression Functionality under Various Tools
    It will compress multiple files from a folder to a selected destination.

    Example:

    image

    image

    image

    image
    Inside the new Compressed File:

    image

    Will possibly add:
    - Decompression of files
    - Progressbar for all tasks under 'Various Tools'



  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update:

    After some serious coding - Additional language support for the Keyword Generator part is now possible!

    Here is a raw demo:

    Source - Thai
    image



    Result - Unique Keywords Thai:
    image

    Once I have some spare time - I will implement support of various languages in the Graphical Part of the Tool Box.

    In Theory - it should be possible to cover plenty of various languages;)

    That means that 'the switches' i.e the radio buttons, will enforce which language is used, when generating keywords.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png

    Test and preview of Japanese Language:


    image


    Result:

    image


    All good:)

    Actual implementation in the GUI (Grafical User Interface) will be handled asap...

    Next update should show generation from the actual program - so stay tuned;)
  • Very nice . PM me Paypal :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @spammasta
    Many thanks for your support:)

    Once it's ready for a launch - I will shoot you a pm;)

    Some minor things still needs to be adjusted, before it's first release.

    However we are getting close to a first release - Give it a couple of days more...

    The development will continue after first release, meaning more features added and potential bugs will be solved.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update - Language Sorting Algorihtms implemented in the GUI

    1. Demo: Thai Selection

    image

    image

    image

    Result:
    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update - Language Sorting Algorihtms implemented in the GUI

    1. Demo: Englsh Selection

    image

    image

    image

    Result:
    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update - Language Sorting Algorihtms implemented in the GUI

    1. Demo: Portuguese Selection
    image

    image

    Result:
    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    That means the following:
    - Support for:
    English, Greek, Portuguese, Japanese and Thai implemented and working!

    Final adjustments before first release:
    Before first version is released some minor adjustments and bugs needs to be fixed.
    No additional features will be added before release - only fixing and cleaning.
    - Expect a few days time from now.

    Thread will be updated here, once first release is ready;)
    The development will continue - as a donor you are supporting the future of this software.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Well I couldn't resist it:P

    - Added one more multi-feature before I begin to clean up:

    image

    Clicking on the monitor - will result in a pop-up showing cpu-ursage etc....

    image

    This new Statusbar will become Super Handy in terms of future updates!
    It will replace several progressbars and make the overall interface more streamlined...

    Future opdates of the program will globally use this New Taskbar and notify about running tasks...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Initiated Cleanup and Preparation of first official launch.

    Completed tasks:
    - Added specific icon in the GUI
    - Added needed confirmation dialogs
    - Cleaned up file-selectors (all know shows .txt files as default)

    image


    TO DO before launch:
    - Change some listeners and their respective evaluations
    - Add program update log file
    - Adjust GUI
    - Compile final program and launch

    Expected timeframe:
    3-5 days
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    A few To-Do Completed:

    image

    Change Log:

    image

    - Made a Stress Test with at large file (Worked)
    - Adjusted a few listerners (Still some to go)

    image

    - Added Multi-Functionality in 'Latin' sorting algorithm - Now it covers more languages

    To-DO
    - Fix remaining listeners
    - Adjust GUI
    - Compile and Lanch
  • Can you able to make any video for this scraper.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @redfoxseo

    Sure, however please notice the program will feature 2 different kind of scrapers:

    1. The one you see present now ---> 'Url Keyword Scraper'

    2. Upcomming feature 'Article Scraper' ---> Not present at the moment.

    The current one - i.e the 'Url Keyword Scraper', takes a list with x-amount of target links.

    It will then visit each target, and scrape the meta-keywords.

    During the process of visiting the site targets, 30 threads are used to ensure speed.

    Those 'Meta Keywords' will be added to an internal list...

    Finally before writing out the result list, it will remove all duplicates and ensure only unique keywords are left out in the final list.

    The other feature - the non present one (Article Scraper) is still under development, and it will be added to the Tool-Box, once it is ready;)

    In both cases - A small video demonstration is optimal, to show what is going on... and I will add them later;)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update:


    Prior to release, Registration Protection has been added:

    image

    image

    image

    Activated Program starts up - and will no longer ask for input on next startup:

    image

    This is to ensure that the program stays among donors only.

    That leaves very little left to do before it's released:)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update:

    - Enhanced URL KeyWord Scraper, so it shows:
    Processed Targets in Real Time (Green)
    Loaded Targets (Blue)

    image

    Compiled the Program and Tested it on a seperate machine:
    - Shows how it will be distributed as *FINAL*

    image

    Registration and execution both went flawless:)

    There are 2 minor issues left, I would like to fix before calling it ready to release...

    So once I have those 2 fixed - it's ready for you guys to tryout:)

  • PM me your paypal.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @gsasurfer
    Thank you for your support:)

    Will pm you very soon - stay tuned;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Added use of Progressbar in URL KeyWord Scraper:

    image

    That gives a good visual of processed target-urls:
    1. Real Time view of processed targets (Green)
    2. Progressbar--> Overall Progress

    That leaves only one more tiny thing to fix, before I release it....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    And here is a sample with more targets loaded:

    image

    Processing fast - using 30 threads...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Allright - Ladies & Gentlemen:)

    Before I announce that Scraping Tool-Box 1.0 is ready to launch today, I would like to issue out some important information:

    1. This software is not sold - It is released to interested donors only (Support of development)
    2. Version 1.0 is far from finished - it is a process, where version 1.0 is initial release.
    3. Interested may make a donation of 5$ via PayPal - In return they receive the following:

    A. Current Final Compiled Version i.e Scraping Tool-Box 1.0.
    B. Serial + Activation-key
    C: As a donor you are entitled to receive future updates of the program
    - Make sure you inform Paypal, that this is a DONATION - otherwise the payment will be rejected.


    FAQ:
    Q: Why is the URL Keyword Scraper sometimes really slow, when approaching the last targets?
    A: Some sites takes longer to load or respond than others. Threads are also being cleaned up.

    Q: Why are there not a decompress feature?
    A: Well, it's not finished. More features, including this one will come over time.

    Q: I can't start the Application?
    A: Make sure you have the latest Java Installed on your system, please visit this site:

    *Known Potential Bugs:
    If Notepad is installed in different location than the default installation settings.
    Fix: Contact me, and I will remove and recompile the application

    Last Fixes and Improvements:
    - Changed the GUI a little:

    image


    image

    image

    image

    image

    image

    image

    Addtional Information:

    I will contant those who have contacted me via PM or in this thread.
    Please be patient - Time constraints and real life issues can slow the processing:D

    Scraping Tool-Box 1.0 is ready to release later today, and I will start contacting interested asap.
    Stay tuned - It's comming today;)
  • Can you make the video as well to see how it works.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @redfoxseo

    I will try my best and add some later:)

    In the meantime, here is a book to get you started:


    1. Download the book to a location on your harddrive.
    2. Start the program and activate the tab: Keyword Generator
    3. Select the book above as source
    4. Select destination (where the final list should be stored)
    5. Select 'English' as sorting algorithm
    6. Press 'Go' button

    - A few seconds or minutes later, depending on your machine, you will have a new fresh list with unique keywords.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @redfoxseo


    Demo of Keyword Generator

    More videos will follow - quite time consuming to make, and I'm not a Video Guru:P


  • lol.. i don't have the time to read my own history... :))
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Video 2


    I will try to add addtional videos - however time is in to contact some people soon and release it:)
    Best way is by trying it out;)

    Please note:
    Some sites are down, slow etc...it will influence the result of course...
    On top of that, threads must also be cleaned out.

    This video should give you an idea about how it works though;)
  • I understood what are you trying to say and it is related to the software, is there any trial version of this software ?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Sorry Wrong Video - One more try:

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @redfoxseo

    Hmm...sorry, no trials as the program is distributed via donations.

    In other words:

    5$ makes you a donor and that entitles you to the program and updates;)

    The donation itself, is only to support the future development of the application.

    So - it's not sold, actually it's free and driven by it's donors.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    PM's has been distributed to those who either asked for it in this thread or via PM.

    Everyone are still welcome to contact me via PM or here in this thread.

    As stated before - This is just the beginning! 
    The development will continue, and more features added over time.

Sign In or Register to comment.