Skip to content

Keyword Generator - Made in Java - For Scraping

1246

Comments

  • Hi Magically

    How to get your software? PM sent. Still waiting for your response.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @zuluranger
    Hmm.. strange I didn't get any pm.
    I will send you a pm now with the information;)
  • KaineKaine thebestindexer.com
    magically 

    Very nice, how upload new version ?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Once version 1.2 is ready I will send a pm to everyone with the new software and a download link.

    Everyone should then download the new version, delete old one and replace with new one.
    - Activation should not be necessary unless it's the first time you use the software.

    Expect version 1.2 out very soon, just need to adjust a few things.

    (Version 1.21 will include tweaks to the article scraper.)
  • KaineKaine thebestindexer.com
    Ok wait for 1.21 :)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    No need to wait buddy - I need feedback first from version 1.2.
    Please test the article extractor with at least 50-100 urls, that you yourself has located up front.
    - I need to see how it goes for you guys first, before we add the remaining stuff, like copyright removal and url replacement.

    Update - Version 1.2 will be released today:


    image

    Changelog:

    image
    Current donors:

    Patience:D I will send a pm to you guys with the new release.


    Everyone else:
    Please consider to join this adventure, as the development is based purely on interest, support and donations. I don't make money on this project - actually it can not even pay for the electricity;)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Scraping Tool-Box Donors

    PM has been sent out with new release - enjoy and have fun:D
  • KaineKaine thebestindexer.com
    edited April 2015

    I have dowloaded and all is ok. Just before test article extractor, you have used special footprint for scrape urls ? like: site:wordpress.com + diet

    EDIT

    Ok just played 2 min and see you can pusch more threads (see 30) but that eat memory. For avoid that write directly on Hard disk.

    I must quit soft if i want stop work, maybe one button for that can be good :)

    EDIT

    Tested with footprint like:site:wordpress.com +OTHER WORD
    Scrape is very quick and in 2mn 590 unique url are done.

    On 590 i have 525 articles downloaded (very good).

    On this article 525, i have approx 14 articles like that:  http://www60.zippyshare.com/v/Ob14Syzd/file.html
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @Kaine

    Great to hear it worked fine to upgrade to new version:)

    Of course I knew it would lead to issues and problems, that is why I have delayed the rest of the features like removal of copyright and url-replacement:P

    Let's break it down:

    1. The Article Extractor does not use any footprints - as it completely relies on what kind of target urls the user load into the program.

    - The point is that the user himself need to do some research up front and do a manual search in google using various footprints, and then select good ones...

    That can be done automaticly too - but is not implemented.
    Also note, that this would lead to poor quality, as the program wont care if an article is 'good' or 'bad'...

    2. Threads
    Yep, you are right here - as the program currenty is set to use all 30 threads as default.
    Of course I also knew that as well;)
    It needs to count the amount of targets first:
    - If 5 urls are loaded - 1 thread would be enough
    - 100 urls - 10 threads would do
    And so on....
    Not a big issue really - and easy to implement.

    3. Stop Button
    Indeed - there is no stop button (yet:P)
    - Also no button for loading in replacement urls

    As I said - Those features will come in version 1.21;)

    The important thing here was to test, if the 'Article Extractor' indeed does work in a real life.
    And as far as I see - it does exactly what it is supposed to do (ignoring the features below)

    To sum up:
    - Balance thread use
    - Stop button
    - Load replacement urls
    - implementation of replace urls, and remove copyright etc...

    *Edit
    In terms of 'Strange results' like text-fles with nonsense, it will fail on some targets (different encoding and stuff).
    However I think most will work, and your test result with 525 articles out of 590 seems decent.
  • KaineKaine thebestindexer.com
    edited April 2015

    Yes i have scrapesite:wordpress.com +OTHER WORD with Gscraper.
    For me result is good, your soft scrape article very fast and copyright seem to be removed :)
    Maybe remove mail can be good too.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Awesome to hear buddy:)

    The remaining 'tweaks' will be added in upcomming release + some other enhancements/features.

    For now - I just wanted to see how the Article Extracter performed as 'RAW' with default setting.

    I think you will see that next release has the remaining stuff you are looking for - At least I will give it a try;)

    Hope some other guys hanging around here on the forum, also will discover this software here...
    -It's 'hidden' in the sales-section were many users don't look so much.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Added to the to-do list:

    Implementation of Automatic Backup to DropBox

    -Will add a timer to handle the task. User can select files via GUI.
    Upload will be done automatic to DropBox
    (Developer note: A Token must be created to prevent reauthentication)

    Examples could be: Identified List, Verified List etc...

    Simpel demo of Authenthication in the console:

    image

    image

    Note: However the Article Extractor Features must be completed first + some other tweaks and enhancements.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Very early and raw proto-type of DropBox-Connect:

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Small Test that it is possible to get a connection to DropBox
    - Using proper DropBox Access Code
    - Sensible information is scrambled

    image


    TO DO:

    As we are now able to establish a connection to DropBox, some features needs to be implemented.
    1. Upload of Zip-File
    2. Browsing feature to see the files
    3. Download of Zip-file

    - When this is implemented, a special function will be created to handle compressing of GSA Ser Project Files.

    - A Timer will handle upload of GSA Project Files to DropBox - Completely Automatic.

    Please not: This is an early ProtoType - More to come...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - GUI MOCK UP of DropBox Auto Backup (Proto-Type):

    image

    The DropBox Connect Button iniatiates the Image above, and establish the connection.

    I will now try to implement the mentioned functionality above.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update DropBox Auto Backup:
    - Implemented Browse and select Source
    - Implemented Browse Destination (Directly browse Dropbox folders)

    A Sample - Shows a connected DropBox - and a Treview with folders:

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    WOOOHOOO;)


    image

    So - that means the following:
    - A method to pack GSA Ser Project must be made
    - A timer to handle uploads must be made...

    Once those are made and tested - Scraping ToolBox will be able to Auto Back Up GSA SER Project;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Prepared function to compress the entire GSA SER Project folder:

    image

    image

    Getting a little bit tired right now - so taking a break before making the rest:P

    However, we are close to a final working solution of automatic backups...

    Strange to see so little interest, considering so many asking for such a feature

    Known Issues:
    -Developer Notes:
    - Max 100 users (More requires Public Release via DropBox)
    - File extension must be changed more (Zip/Rar/.SL) - would properly mean a new API-Key
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Added Timer (Still not 100% complete)

    NB: Ignore the file that gets uploaded - it's just for demonstration.

    The interesting part is that the 'Timer' is activated and doing some compression in the background...
    Preparing the real file to upload;)

    Notice the Message Log:

    image

    Still a few things to do before a test can be done...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Added creation of internal storage for compression of GSA Ser Project
    - Added switch to handle specific Back Up of GSA Ser Project only
    - Added restrivement of GSA Ser Project Folders

    image

    Hmm.. that leaves only very few things left to handle properly:

    - Adjustment of timer-selection interval
    - Upload of zip file (since it's different than txt-files)
    - Minor adjustments and tweaks of the GUI

    In other words - almost complete already;)

    Actually fucking awesome to say it at least:P
    Feel free to join the adventure anytime
  • Wow great job! You took it upon yourself to create this essential function. I barely had time to react because you're fast. Thanks for giving it a shot.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @zinne
    Many thanks buddy;)

    Here we go - just finished the actual upload function - and the sucker works:D

    First I created a new DropBox - Notice the folder name - that is very important:
    image

    Next I started the Backup Feature, using a considerable smaller file-folder to fake the process.
    I Select ImageBurn folder - just for testing, it could have been the GSA Ser Project Folder:

    image

    I specifically chose the Public folder I created on DropBox...

    Timer is now active and the program executes every 5 minutes to test if files gets uploaded:

    image


    BINGO!!!! It's working flawless;)

    image


    Only one thing left to do:
    - Adjustments of the GUI and enable selection of 'BackUp Interval'
    - Make a test with a larger 'Back-Up Interval' to ensure it performes correctly

    That leads to an upcomming release of Scraping Tool-Box 1.3, with AutoMatic BackUp to DropBox...
    Isn't that just cool???
  • please PM me paypal.. wanna buy 
    :)
    thanks 
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @akwin

    Awesome buddy:)

    Many thanks for your support - Hugely appreciated.
    I will send you a pm as soon as 1.3 is ready for release (Unless you want 1.2 right now) - and that wont be very long.
    Some small things to check and some minor adjustments, then we are there.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Performed test with large file - this time the Real GSA SER Project Folder:

    image

    image

    image

    The real file:

    image

    Showing the file indeed got uploaded:

    image

    Examination of file downloaded from Dropbox:

    image

    This concludes that the new Backup Feature is working!

    *Known Issues or Limitations:

    - During upload - progress is not shown in progressbar.
    That is because the stream needs to be wrapped (This is a guess)...
    However, to avoid blocking - this has been postponed for now, as more testing is needed


    Current Status: Complete
    I will compile version 1.3 very soon and release it.

    Expected timeframe: 1-3 days from now.

    - Feel free to make a donation and support the development (receive the program and updates as a donor)
    - Stay tuned - more information to come;)
  • That's ok
    I will get updates ri8?

    Then I wanna buy now..
    Thanks
    PM me. :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @akwin
    Indeed you will get updates, that is correct buddy:)
    In fact, I think I will let you be the first one to try the new v 1.3 - So expect to get a pm from me later today.

    - Added 'Elapsed Time' in Automatic Backup to DropBox in upcoming V.1.3:
    That will give a better visual that the program indeed is activé + The Message Log

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @akwin PM sent;)

    Scraping Tool-Box v.1.3 has been released!
    - Current donors will recieve an update later today - Patience Please:D


    Important things to notice in terms of using DropBox Auto-Backup:

    1. 
    If you want to make a backup of GSA SER Project Folder please observe the following:
    You will have to select or manual input the location like below, where username should be replaced with your name:

    C:\Users\"Username"\AppData\Roaming\GSA Search Engine Ranker

    2. 
    You will have to select two buttons, otherwise it will not work properly:
    "Upload" and "GSA Project"

    3. 
    You will need to create a folder on your DropBox named: "Public"
    - Choose that folder as remote destination

    Hope everyone will benefit from this new feature - and see you soon with more features to come on the Article Extractor;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - After the release of version 1.3, the focus will be on 3 things:

    1. The Article Extractor (Will get some additional features)
    2. Some minor fixes and tweaks.

    Number 3 is actually not a part of Scraping Tool-box itself - but something new:

    An Experimental Add-On for GSA SER, a special add-on that can submit differently than GSA Ser.

    Codename: Sentinel

    It will be able to 'feed' GSA SER with submitted links, where GSA will take over and handle the remanining.

    That means GSA SER will do the rest, and add verified links etc. like it's doing now.

    As it is experimental, the platforms it can submit to, will be limited in the beginning - however if things works out great, it will be expanded over time.

    More on that topic later, stay tuned;)


Sign In or Register to comment.