Skip to content

Keyword Generator - Made in Java - For Scraping

magicallymagically http://i.imgur.com/Ban0Uo4.png
edited April 2015 in Buy / Sell / Trade
Hello eveyrone:)

Well I was kind of borred:P 

So I made a small Tool, that is able to generate several UNIQUE keywords - something that can be used for Scrapebox and GScraper...

Nothing Fancy - Just plain and very simple.

It's still a kind of a Proto-Type - More development will be made (More Useful tools will be added overt time)



image

image

What is does:

It's a java program (Can run on multiple platforms) - It will read a large textfile, like a book. 

Then it will make a UNIQUE list from that book/sourcefile.

No dupes - Just Unique Keywords - That may be used for scraping or other stuff...

image


image


image

image

As I did spend a few hours making it - A small donation of 5$ for each purchace will be hugely appreciated.

If you are interested in this small tool (prepare to make a small donation and recieve it soonish) - feel free to send me a PM.

I expect the small tool to be ready for final launch in about a week from now..

Minor adjustments needs to be implemented before final release;)

Normally I would never approach a forum and sell something - consider this as an exception - an offer for those who are interested.

Comments are welcome of course.

Comments

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - added Execution Time for the entire operation:

    image

    In this case - It took 42 Seconds to generate 14098 Unique Keywords - Based on a book containing 361.961 words.

    That is pretty damn fast for the record;)
  • Good job! Keep it up mate! :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @young_gooner
    Many thanks;) More is in the sleeves - some additional tools to do the trivial work and make things more easy.

    I was really tired of looking for keywords - so I made this little thingy...

    Surely there will be some "slow" keywords in the generated lists - However tweaks for handling such things can be implemented later.
  • KaineKaine thebestindexer.com
    edited April 2015
    Nice, possible to grab website keyword (with url) ? :)

    Can be good for work on competitor.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Indeed a good suggestion - Will be added too in future updates.
    - Add a list with site urls
    - Grab Kewords from Targets
    - Finally sort and make a final list based on results.
  • That could be handy. Does it work only with English language?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @delta_squad
    That was indeed a good question!

    Right now the actual sorting handles English words - However you are absolutely right in your observations.

    Several different sorting algorithms must be implemented to handle various language - where the user simply select which sorting they want to run on the target file.

    Not so complicated to implement - just takes some time to add support for various languages.

    And yes, I will add support for this as well;)


  • Awesome! Right now I'm using furykyle's keyword lists for scraping and I'm not going to run out of keywords any time soon but with this tool you could generate potentially huge amounts of keywords when support for other languages is added. :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @delta_squad
    You are absolutely correct here - I also used furykyle's keyword lists until recently, as they cover other languages.

    However - It would be a nice addition, if we are able to generate our very own keywords 'on the fly' whenever we want to. 

    On top of that, we will reduce amount of people using the very same keywords in the scrapings.

    This tool enables us to do exactly such kind of tasks;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @Kaine

    Well I actually build another proto-type (even though my time is limited during Easter)....
    This little proto-type demonstrates most parts of your suggestion:

    This demo scans 3 urls and extracts their KeyWords.
    Results are printet out on screen - just for test purpose.


    image

    Bare in mind that this demo, just runs in a prompt to show the idea...

    I think you can imagine the rest of the story
     
    - Combining those results and then sort them...
    - Add suppport for proxy
    - Enhancement to be able to run as multi-threaded
    -Etc...

    Could be useful for some, to include such a feature into this Tool;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Just some new updates on topic...

    - I changed the Grafical Interface a little
    - Prepared support of various sorting algorithms (Via User Selection)
    - Added Tab Feature (Additional Tools will be added)
    - Proto Type of URL Keyword Extractor prepared (Will be added in Tab2)

    Upcomming Tabs:

    Clean Scrapings
    .pdf, .xml, .mp3, .chm, .ppt and so on...
    - Ensure unique urls for GSA

    EDU/GOV Sorter:
    - Sort all urls - Keep only .edu and .gov
    - Keep Unique Urls Only
    - Remove unneeded extensions like .xml, .pdf etc...

    image
  • KaineKaine thebestindexer.com
    Think you can make good tool with that ;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Many thanks for you kind words:)
    Work is in full progress, and I will update this thread from time to time
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Additional features implemented - Bare in mind it's still not complete...

    Features Added:
    - Sort scrapings from scrapebox/gscraper
    - Remove all unneeded like .xml, .pdf, .mp3, .swf, .pdf and more...
    - Keep unique urls only (remove duplicate urls - not domain)
    - Added switch to change sorting algorithm - Mode Normal or Mode EDU/GOV


    The demo below is using 'Mode Normal' - as the actual switch is not complete yet..

    image


    image

    image

    image

    image

    Upcomming Work:

    -Final implementation of Keyword Scraper (Tab 2)
    - Final implementation of switch for normal versus edu/gov mode
    - Adjustments of GUI
    - Cleanup and test

    Later releases will include:
    - Support for different languages (Keyword Generator - Algorithms)
    - Enhancements of existing functionality
    - Addon's

    Stay tuned for more information;)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update:

    - Switch for sorting scrapings, either in "Normal Mode" or "EDU/GOV Mode" fully implemented.
    in both cases, all garbage is removed 'on the fly', leaving just a list of UNIQUE URLS as a result.

    *Garbage = .xml, .pdf, .mp3, .swf, .pdf and more...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    URL Keyword Extractor Update:

    - Enhanced the code and implemented support for 30 threads (default)

    image

    Note: Dont mind the messy output in the picture - Will be sorted in the Graphical Part.


    In progress:
    - Implementation of the graphical part of this feature (GUI)
    - Cleaning up
  • KaineKaine thebestindexer.com
    edited April 2015
    It take keyword meta or keyword in page ?

    At this level maybe add directly article web scraper too ...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    It currently takes the meta keywords from the target-page.

    - Support for both could be implemented later too.

    And and article scraper seems to be interesting to add too;)
    *Your suggestion has been added to the to-do-list.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update - Early implementation of the URL Scraper Function in the Graphical Interface!

    image

    Tasks completed:

    image

    Features:
    - Using 30 Threads
    - Scraping Meta Keywords from Target URL'S (List Input)
    - Generates a list with UNIQUE Keywords - Based on the results
    - Showa Log-visits of target-urls in real time


    *That's it for now - taking a small break to clear the mind and so I can look at it again with fresh eyes and do some cleaning/adjustsments.

    Comments are still very welcome of course - feel free to jump in;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    PS...

    I just took a larger sample to see if it actually works - 1000 random urls:

    Finished all threads
    unique words : 2734
    total words : 7736
    Destination: C:\SEO2015\WillItWork.txt
    Closing Buffered Writer and finishing...

    image

    Speed was actually fast - exactly as expected;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update:

    - Adjusted formating in Log Message

    image

    image

    *Note: Maybe add funtion to trim to root - depending of the job.....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Here is a sample of the speed - using standard footprints and some keywords generated with this Tool:

    Test was done using my home-connection and a laptop.

    image


    I'm quite sure some of you hardcore scrapers, are able to pull even higher speeds with some quality footprints...Unfortunately i'm not that good at making footprints:P

    Actually the speed was increasing, at the moment of this comment: 34723 urls/pr min and still getting higher....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    LOL - Better add the proof for you guys to see....
    image



    *Edit

    A few min later:

    image

    *Edit

    Last one - should settle it:P

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Some might wonder - how about the performance in GSA Ser????

    Well let's take a look:

    Verified preview:

    image image

    Performance (after cleaning the raw scrapings with this Tool)

    image

    Random picked message from GSA SER LOG:

    image

    And some more verified - just to show scraping did go well (Final Output):

    image

    Conclusion:

    It's possible to use the new Tool, to generate decent keywords for scrapings and clean lists.

    Feel free to comment;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Work in progress:

    1. In the upcoming days, I intend to attempt to develop a simple Article Scraper, that will be added to this Tool-box.

    2. Additional enhancements of existing code, and cleanup.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update

    - Basic implementation of URL Extractor (Small Part of Article Scraper)
    - Demo Only (Just console - Not in GUI yet)

    Extraction of all urls - on given target:
    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update:

    - Prepared MERGE FILES functionality - Merge several text-files into one text-file.

    Demo - Consol Window:

    image

    image

    The Graphical part, where user selects files is easy to implement!

    The above image shows files A.txt+B.txt+C.txt are merged into one big file-->Merged.txt

    A great feature to add into the existing ones in this Tool-Box for Scrapings

    Feel free to comment

    More to come...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update:

    For better understanding - I made a quick implementation in the Graphical User Interface:

    image

    Result:

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Here is a demo of a large test.

    1. I Merged Several files from Gscraper (Target18.txt)
    2. I use the 'Clean Scrapings' function in Scraping Tool-Box (Target18-Cleaned.txt)

    imageReady to load into GSA SER.....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015

    Update:

    Prepared algorithm to generate various Random Footprints.
    - Handy for the lazy ones - Make your tasks more random

    - Will be added under "Various Tools"

    Preview of article generation in console (A few only for demonstration purpose...):

    image
    The user will be able to select X-Number of Footprints - which will be generated randomly.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update (Despite the little interest - the development continues....)
    - Added Compression Functionality under Various Tools
    It will compress multiple files from a folder to a selected destination.

    Example:

    image

    image

    image

    image
    Inside the new Compressed File:

    image

    Will possibly add:
    - Decompression of files
    - Progressbar for all tasks under 'Various Tools'



  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update:

    After some serious coding - Additional language support for the Keyword Generator part is now possible!

    Here is a raw demo:

    Source - Thai
    image



    Result - Unique Keywords Thai:
    image

    Once I have some spare time - I will implement support of various languages in the Graphical Part of the Tool Box.

    In Theory - it should be possible to cover plenty of various languages;)

    That means that 'the switches' i.e the radio buttons, will enforce which language is used, when generating keywords.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png

    Test and preview of Japanese Language:


    image


    Result:

    image


    All good:)

    Actual implementation in the GUI (Grafical User Interface) will be handled asap...

    Next update should show generation from the actual program - so stay tuned;)
  • Very nice . PM me Paypal :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @spammasta
    Many thanks for your support:)

    Once it's ready for a launch - I will shoot you a pm;)

    Some minor things still needs to be adjusted, before it's first release.

    However we are getting close to a first release - Give it a couple of days more...

    The development will continue after first release, meaning more features added and potential bugs will be solved.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update - Language Sorting Algorihtms implemented in the GUI

    1. Demo: Thai Selection

    image

    image

    image

    Result:
    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update - Language Sorting Algorihtms implemented in the GUI

    1. Demo: Englsh Selection

    image

    image

    image

    Result:
    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update - Language Sorting Algorihtms implemented in the GUI

    1. Demo: Portuguese Selection
    image

    image

    Result:
    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    That means the following:
    - Support for:
    English, Greek, Portuguese, Japanese and Thai implemented and working!

    Final adjustments before first release:
    Before first version is released some minor adjustments and bugs needs to be fixed.
    No additional features will be added before release - only fixing and cleaning.
    - Expect a few days time from now.

    Thread will be updated here, once first release is ready;)
    The development will continue - as a donor you are supporting the future of this software.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Well I couldn't resist it:P

    - Added one more multi-feature before I begin to clean up:

    image

    Clicking on the monitor - will result in a pop-up showing cpu-ursage etc....

    image

    This new Statusbar will become Super Handy in terms of future updates!
    It will replace several progressbars and make the overall interface more streamlined...

    Future opdates of the program will globally use this New Taskbar and notify about running tasks...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Initiated Cleanup and Preparation of first official launch.

    Completed tasks:
    - Added specific icon in the GUI
    - Added needed confirmation dialogs
    - Cleaned up file-selectors (all know shows .txt files as default)

    image


    TO DO before launch:
    - Change some listeners and their respective evaluations
    - Add program update log file
    - Adjust GUI
    - Compile final program and launch

    Expected timeframe:
    3-5 days
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    A few To-Do Completed:

    image

    Change Log:

    image

    - Made a Stress Test with at large file (Worked)
    - Adjusted a few listerners (Still some to go)

    image

    - Added Multi-Functionality in 'Latin' sorting algorithm - Now it covers more languages

    To-DO
    - Fix remaining listeners
    - Adjust GUI
    - Compile and Lanch
  • Can you able to make any video for this scraper.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @redfoxseo

    Sure, however please notice the program will feature 2 different kind of scrapers:

    1. The one you see present now ---> 'Url Keyword Scraper'

    2. Upcomming feature 'Article Scraper' ---> Not present at the moment.

    The current one - i.e the 'Url Keyword Scraper', takes a list with x-amount of target links.

    It will then visit each target, and scrape the meta-keywords.

    During the process of visiting the site targets, 30 threads are used to ensure speed.

    Those 'Meta Keywords' will be added to an internal list...

    Finally before writing out the result list, it will remove all duplicates and ensure only unique keywords are left out in the final list.

    The other feature - the non present one (Article Scraper) is still under development, and it will be added to the Tool-Box, once it is ready;)

    In both cases - A small video demonstration is optimal, to show what is going on... and I will add them later;)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update:


    Prior to release, Registration Protection has been added:

    image

    image

    image

    Activated Program starts up - and will no longer ask for input on next startup:

    image

    This is to ensure that the program stays among donors only.

    That leaves very little left to do before it's released:)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update:

    - Enhanced URL KeyWord Scraper, so it shows:
    Processed Targets in Real Time (Green)
    Loaded Targets (Blue)

    image

    Compiled the Program and Tested it on a seperate machine:
    - Shows how it will be distributed as *FINAL*

    image

    Registration and execution both went flawless:)

    There are 2 minor issues left, I would like to fix before calling it ready to release...

    So once I have those 2 fixed - it's ready for you guys to tryout:)

  • PM me your paypal.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @gsasurfer
    Thank you for your support:)

    Will pm you very soon - stay tuned;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Added use of Progressbar in URL KeyWord Scraper:

    image

    That gives a good visual of processed target-urls:
    1. Real Time view of processed targets (Green)
    2. Progressbar--> Overall Progress

    That leaves only one more tiny thing to fix, before I release it....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    And here is a sample with more targets loaded:

    image

    Processing fast - using 30 threads...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Allright - Ladies & Gentlemen:)

    Before I announce that Scraping Tool-Box 1.0 is ready to launch today, I would like to issue out some important information:

    1. This software is not sold - It is released to interested donors only (Support of development)
    2. Version 1.0 is far from finished - it is a process, where version 1.0 is initial release.
    3. Interested may make a donation of 5$ via PayPal - In return they receive the following:

    A. Current Final Compiled Version i.e Scraping Tool-Box 1.0.
    B. Serial + Activation-key
    C: As a donor you are entitled to receive future updates of the program
    - Make sure you inform Paypal, that this is a DONATION - otherwise the payment will be rejected.


    FAQ:
    Q: Why is the URL Keyword Scraper sometimes really slow, when approaching the last targets?
    A: Some sites takes longer to load or respond than others. Threads are also being cleaned up.

    Q: Why are there not a decompress feature?
    A: Well, it's not finished. More features, including this one will come over time.

    Q: I can't start the Application?
    A: Make sure you have the latest Java Installed on your system, please visit this site:

    *Known Potential Bugs:
    If Notepad is installed in different location than the default installation settings.
    Fix: Contact me, and I will remove and recompile the application

    Last Fixes and Improvements:
    - Changed the GUI a little:

    image


    image

    image

    image

    image

    image

    image

    Addtional Information:

    I will contant those who have contacted me via PM or in this thread.
    Please be patient - Time constraints and real life issues can slow the processing:D

    Scraping Tool-Box 1.0 is ready to release later today, and I will start contacting interested asap.
    Stay tuned - It's comming today;)
  • Can you make the video as well to see how it works.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @redfoxseo

    I will try my best and add some later:)

    In the meantime, here is a book to get you started:


    1. Download the book to a location on your harddrive.
    2. Start the program and activate the tab: Keyword Generator
    3. Select the book above as source
    4. Select destination (where the final list should be stored)
    5. Select 'English' as sorting algorithm
    6. Press 'Go' button

    - A few seconds or minutes later, depending on your machine, you will have a new fresh list with unique keywords.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @redfoxseo


    Demo of Keyword Generator

    More videos will follow - quite time consuming to make, and I'm not a Video Guru:P


  • lol.. i don't have the time to read my own history... :))
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Video 2


    I will try to add addtional videos - however time is in to contact some people soon and release it:)
    Best way is by trying it out;)

    Please note:
    Some sites are down, slow etc...it will influence the result of course...
    On top of that, threads must also be cleaned out.

    This video should give you an idea about how it works though;)
  • I understood what are you trying to say and it is related to the software, is there any trial version of this software ?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Sorry Wrong Video - One more try:

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @redfoxseo

    Hmm...sorry, no trials as the program is distributed via donations.

    In other words:

    5$ makes you a donor and that entitles you to the program and updates;)

    The donation itself, is only to support the future development of the application.

    So - it's not sold, actually it's free and driven by it's donors.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    PM's has been distributed to those who either asked for it in this thread or via PM.

    Everyone are still welcome to contact me via PM or here in this thread.

    As stated before - This is just the beginning! 
    The development will continue, and more features added over time.

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    All right - Once the first people has got the program and hopefully has activated it successfully, we will first approach potential bugs and try to correct these...

    - Next step is to enhance tab 4 'Various Tools' and add more features + enable progress-bar.
    Then we move on and create the article-scraper (some parts are done already).

    I also have a plan to develop a 'headless submitter' - i.e feature to post on some selected targets.
    Why? Well I think it's possible to approach some platforms differently than GSA SER does.
    Well do some testing 'on the side' and figure out if this indeed is possible.

    This is a process, an adventure - and over time I think it will be possible to create something really cool;)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Ohh I forgot in the previous message...

    Here is another Book in Spanish to get you going:


    NB: Make sure to select the proper sorting algorithm---->Latin, before hitting 'Go' button

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Work in progress for next release:

    - Split file (large file)

    I have prepared the code for 'Split File' - which will be added as a feature in next release.

    Early raw demo - without implementation in the GUI:

    In this sample a file of 349MB is split into 4 parts. The source file could have been significant larger - however this is just a sample/demo.

    image

    image



  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Articles Scraping - Sources

    I would appreciate if someone could mention a list with some good sources, besides ezinearticles.

    A list with 10-15 sources would be a good starting point!

    I need some sources to work on;)
  • KaineKaine thebestindexer.com
    edited April 2015
    I think it's better if you can see for website article, no directory. To much user/duplicate time after time.

    After you see what is good, simply share footprint used for find good source for your software :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @Kaine
    Okay, let me see if I understand you correctly...

    What you are saying, is that the 'traditional' way of 'getting articles' is overused, resulting in way to similiar content and duplicates...Also because most scrapers are using the same sources...

    So, in order to prevent making the same mistakes - a new approach is needed.

    How?

    1.
    - By feeding the program with 'special footprints' - combined with 'targeted keywords' - avoiding directories.

    That would involve making use of a search engine like google or other search engine...
    Grab results and filter out the 'bad directories'.
    Scrape and deliver content after removing source url and copyright stuff....

    Example searching Google with simple footprints:

    image

    Perhaps adding some additional filters, like define required length of article content etc...

    Semi Automatic:

    *The above results could be listed as 'clickable' in a panel, and if content is okay, user can select this [x]

    When all targets are selected - finally scrape everything and write out articles....

    *Perhaps running different sequences with different footprints and keywords, and present search results prior to scraping everything....


    2. Other suggestions or strategies are welcome
     - Please add suggestions, strategies;)
  • KaineKaine thebestindexer.com
    edited April 2015
    magically 

    I mean, found website where your software can scrape article easily.

    Example <article> .... </article> in html5.

    Then, found good Footprints for locate that on the web and users scrape this urls for insert into the software.

    For exemple footprint of wicked article creator (not good directory):

    site:goarticles.com + 
    site:ezinemark.com + 
    site:examiner.com + 
    site:voices.yahoo.com + 
    site:articlebiz.com + 
    site:articletrader.com + 
    site:a1articles.com + 
    site:articlesnatch.com + 
    site:pubarticles.com + 
    site:articlealley.com + 
    site:ezinearticles.com + 
    site:buzzle.com + 
    site:selfgrowth.com + 
    site:brighthub.com + 
    site:suite101.com + 
    site:isnare.com + 
    site:articlecity.com + 
    site:articlerich.com + 
    site:ideamarketers.com + 
    site:articleslash.com + 
    site:articlepool.com + 
    site:abcarticledirectory.com + 
    site:searcharticles.net + 
    site:streetarticles.com + 
    site:articlealley.com
    site:articlecube.com + 
    site:sooperarticles.com + 
    site:bukisa.com + 
    site:infobarrel.com + 
    site:gather.com + 
    site:isnare.com + 


    Maybe:

    site:wordpress.com +
    site:blogger.com +
    .....


    can return good result.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Hmm...I think we are talking about the same:D

    If you look at the image above, you will see some text high-lighted  in green: "blog" "skin care"
    That would be the search term = footprint + keyword

    It can actually handle your suggested footprint: site:wordpress.com + diet

    -->Example: If 5 footprints + keywords are given, it will repeat the seach with all footprints + keywords

    The real question would be if the user should have a chance to view the article, and if found good --> select it.

    For instance if 25 results are shown, the user finds 10 suitable and select those.
    It will then do the job, scrape the articles and output the text-files.

    - Or should that process be 100% automatic?
    -----------------------------------------------------------------------------------------------------------------------------------------

    Different Option:


    Perhaps you are suggesting to simply feed the scraper with urls you have found up front?

    So, you would ask the scraper to load a list with targets, and simply scrape those?
    Meaning if you feed it with 50 urls - it will scrape these and deliver the articles as output

  • KaineKaine thebestindexer.com
    edited April 2015
    magically 

    "Perhaps you are suggesting to simply feed the scraper with urls you have found up front?

    So, you would ask the scraper to load a list with targets, and simply scrape those?
    Meaning if you feed it with 50 urls - it will scrape these and deliver the articles as output"

    Yes exactly, soft only visit urls and grab article, no scrape urls.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Yep, I think I got your point right now:D

    Most users know how to make a decent footprint, so they can also do their own searching in google...

    So here is the scenario:

    1. Users makes their own target list (using their own footprints and keywords) example: site:wordpress.com + diet
    2. When they have collected enough information - they make the list.
    3. They load their list into the program.
    4. Program extracts all articles and write out text-files

    In other words:

    Program must have a feature to do the following:

    - Load targets (reference to articles of course)
    - Extract articles for each loaded url
    - Write article to file, for each url - without source information and copyright info.

    Is that correctly?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Update Work in progress for next release:

    - Implementation of 'SPLIT FILE' in the GUI (**See previous above, code is ready)

    - Prepared code for 'REMOVE DUPLICATE DOMAINS' (Will be implemented in the GUI)

    Obviously GSA SER does not calculate it correctly...:D

    Scraping Tool-Box removes all junk files (.pdf, .xml,.chm etc) + Removes Duplicate Domains in this new algorithm.

    image

    That will give users an opportunity to keep only UNIQUE URLS or Remove Duplicate Domains
     
    Compare Of Cases:

    Source file contained 4.615.209 urls

    Remaining Targets left: 724.826 (Scraping Tool-Box)
    Remaining Targets left: 713.554 (GSA SER)

    How come GSA Ser has less results, considering it doesn't remove junk files during dedupe?
    @Sven Could it be a bug:))


    -Preparation of Article Scraper and Implementation
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Here is a very basic proto-type of the article extracter:

    image

    Program was loaded with target:

    Site was visited and during visit, the article was extracted.
    Finally, it was printed out (console for demonstration)

    I hope it was something like that you have in mind?
  • KaineKaine thebestindexer.com
    edited April 2015
    magically 

    Yes is that :) you think is possible to lunch multiple page in same time and scape after ? (for no wait loading time).
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Great to hear:D

    Well, as you will feed the program with a 'known list of targets' - there is no need to open any browser during the process. 

    It will work similar to 'URL Key Word Scraper' - Using multiple threads to extract the articles.

    That will speed up the process rapidly.

    During the weekend I plan to start making some more testing and coding of this feature, and as usual I will update the thread during the implementation/Test Results etc.
  • Thanks to @Kaine for pointing me at this thread, I hadn't seen it before. 

    @magically good work dude, this looks a very handy tool.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @JudderMan
    Many thanks for your kind words, really appreciated:)
    - And indeed thanks to @Kaine as well for support, ideas and feedback.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    -Work in progress:

    Early preview of upcoming new feature - Split File

    image

    Not completed yet, still some heavy coding left to do....

    Once this feature is fully implemented, the work of the article scraper will be initiated.
  • Can't wait to get this software. SHOW YOUR MAGIC...  :((
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Completed - Split File (Included in next release...)

    image

    Ability to select varios units:

    image

    Selection of target file that needs to be split:

    image

    Calculation of File Size in done 'on the fly'....

    image

    Process Initiated:

    image

    Task Completed:

    image

    Result:

    image

    Moving on to next feature - I Plan to start doing it during the weekend (if time allows me to do it:D):
    - Work in Progress: Article Extracter
    Stay Tuned;)
  • KaineKaine thebestindexer.com
    Is the best feature for me ^^
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Indeed buddy;)

    Very early GUI MOCK UP (Can't still change a lot)

    image

    Will see if I can get some time during the weekend, to make the code and enhance the GUI here...
    Stay tuned for progress and updates during the weekend and the upcomming week
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Added Detection of File Encoding Type (Under Various Tools Tab)
    It will detect which encoding a text-file is using 'on the fly' - really blasting fast!

    It can detect the following encoding types:
    Chinese:
    ISO-2022-CN
    BIG5
    EUC-TW
    GB18030

    Cyrillic:
    ISO-8859-5
    KOI8-R
    WINDOWS-1251
    MACCYRILLIC
    IBM866
    IBM855

    Greek:
    ISO-8859-7
    WINDOWS-1253

    Hebrew:
    ISO-8859-8
    WINDOWS-1255

    Japanese:
    ISO-2022-JP
    SHIFT_JIS
    EUC-JP

    Korean:
    ISO-2022-KR
    EUC-KR

    Unicode:
    UTF-8
    UTF-16BE / UTF-16LE

    Others:
    WINDOWS-1252

    image

    image

    image

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update - Almost completed the Article Extractor

    A small demo:

    1. We find 3 random targets using this footprint: site:wordpress.com + skincare

    In this case, the following were picked:
    Let's hit start and see what happens...

    image

    Now we look at the destination folder:

    image
    Indeed 3 articles has been extracted and generated:D

    Sample from article2:
    image


    To Do before release of Scraping Tool-Box 1.2

    - Minor adjustments in the GUI
    - Implementation of other minor stuff
    - Compile the program
    - Launch;)

    Expected timeframe for version 1.2:
    5-7 days

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Added Korean Language Support for Keyword Generator

    image

    As I can't read Korean - here is a translation:
    image


  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Adjusted logfile:

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    -Adjusted 'Article Extractor' GUI even futher:

    image

    - Please see the previous entry - Article Scraper completed
    Just needs very small fixes - and it's done;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Fixed formatting issues in 'URL Keyword Scraper'

    image

    Still to do, before release of version 1.2:
    - Minor adjustments & Enhancements
  • KaineKaine thebestindexer.com
    edited April 2015

    And that clean copyright ? Maybe if that can delete time of article is posted can be good with possibility to change url in article.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @Kaine
    hehehe:P

    Well, that will be added in v. 1.21

    Simply, because I need some feedback on how it works on a large amount of targets..
    And there is 1 more thing to consider too - before adding this last feature to the article extractor.

    Performance - Depending on the amount of targets, removal of different things, before writing the text-files could take some time. However it can be done;)


    I just need to see, how it works for you guys in 'real life first', before adding advanced 'tweaks':D

    So, I suggest to finish up the remaining stuff and simply release v. 1.2 for you guys to try, then we take it from there..

    PS:
    If there should be an url present in some of the articles - it's not complicated to generate a random  url as replacement;)

    Pick one randomly---->Now replace existing one with random... (Easy to implement later)


    PSS:

    I also hope to see more guys interested here -After all we are all in the same boat, so why not help eachother;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Prepared code to handle replacement of existing url in scraped text, before making final text-files.

    Here is a little demo (please note - it will not be implemented before version 1.21)

    Scraped demo text:

    When we entered our core range of Aurelia Probiotic Skincare products in to the<p><a href='http://www.aureliaskincare.com/aurelia-tv/'><b>example</b></a&gt; link.</p>Bible to be tested only a few weeks after our launch in January 2013, we could only dream of seeing one of our products in the final published book.

    image

    Demo shows: The existing url is replaced with: "http://www.SomeUrl.com/"

    Question: Will it work on any URL?
    Answer: Most likely not - However it will cover and handle quite a lot;)
  • Hi Magically

    How to get your software? PM sent. Still waiting for your response.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @zuluranger
    Hmm.. strange I didn't get any pm.
    I will send you a pm now with the information;)
  • KaineKaine thebestindexer.com
    magically 

    Very nice, how upload new version ?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Once version 1.2 is ready I will send a pm to everyone with the new software and a download link.

    Everyone should then download the new version, delete old one and replace with new one.
    - Activation should not be necessary unless it's the first time you use the software.

    Expect version 1.2 out very soon, just need to adjust a few things.

    (Version 1.21 will include tweaks to the article scraper.)
  • KaineKaine thebestindexer.com
    Ok wait for 1.21 :)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    No need to wait buddy - I need feedback first from version 1.2.
    Please test the article extractor with at least 50-100 urls, that you yourself has located up front.
    - I need to see how it goes for you guys first, before we add the remaining stuff, like copyright removal and url replacement.

    Update - Version 1.2 will be released today:


    image

    Changelog:

    image
    Current donors:

    Patience:D I will send a pm to you guys with the new release.


    Everyone else:
    Please consider to join this adventure, as the development is based purely on interest, support and donations. I don't make money on this project - actually it can not even pay for the electricity;)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Scraping Tool-Box Donors

    PM has been sent out with new release - enjoy and have fun:D
  • KaineKaine thebestindexer.com
    edited April 2015

    I have dowloaded and all is ok. Just before test article extractor, you have used special footprint for scrape urls ? like: site:wordpress.com + diet

    EDIT

    Ok just played 2 min and see you can pusch more threads (see 30) but that eat memory. For avoid that write directly on Hard disk.

    I must quit soft if i want stop work, maybe one button for that can be good :)

    EDIT

    Tested with footprint like:site:wordpress.com +OTHER WORD
    Scrape is very quick and in 2mn 590 unique url are done.

    On 590 i have 525 articles downloaded (very good).

    On this article 525, i have approx 14 articles like that:  http://www60.zippyshare.com/v/Ob14Syzd/file.html
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @Kaine

    Great to hear it worked fine to upgrade to new version:)

    Of course I knew it would lead to issues and problems, that is why I have delayed the rest of the features like removal of copyright and url-replacement:P

    Let's break it down:

    1. The Article Extractor does not use any footprints - as it completely relies on what kind of target urls the user load into the program.

    - The point is that the user himself need to do some research up front and do a manual search in google using various footprints, and then select good ones...

    That can be done automaticly too - but is not implemented.
    Also note, that this would lead to poor quality, as the program wont care if an article is 'good' or 'bad'...

    2. Threads
    Yep, you are right here - as the program currenty is set to use all 30 threads as default.
    Of course I also knew that as well;)
    It needs to count the amount of targets first:
    - If 5 urls are loaded - 1 thread would be enough
    - 100 urls - 10 threads would do
    And so on....
    Not a big issue really - and easy to implement.

    3. Stop Button
    Indeed - there is no stop button (yet:P)
    - Also no button for loading in replacement urls

    As I said - Those features will come in version 1.21;)

    The important thing here was to test, if the 'Article Extractor' indeed does work in a real life.
    And as far as I see - it does exactly what it is supposed to do (ignoring the features below)

    To sum up:
    - Balance thread use
    - Stop button
    - Load replacement urls
    - implementation of replace urls, and remove copyright etc...

    *Edit
    In terms of 'Strange results' like text-fles with nonsense, it will fail on some targets (different encoding and stuff).
    However I think most will work, and your test result with 525 articles out of 590 seems decent.
  • KaineKaine thebestindexer.com
    edited April 2015

    Yes i have scrapesite:wordpress.com +OTHER WORD with Gscraper.
    For me result is good, your soft scrape article very fast and copyright seem to be removed :)
    Maybe remove mail can be good too.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Awesome to hear buddy:)

    The remaining 'tweaks' will be added in upcomming release + some other enhancements/features.

    For now - I just wanted to see how the Article Extracter performed as 'RAW' with default setting.

    I think you will see that next release has the remaining stuff you are looking for - At least I will give it a try;)

    Hope some other guys hanging around here on the forum, also will discover this software here...
    -It's 'hidden' in the sales-section were many users don't look so much.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Added to the to-do list:

    Implementation of Automatic Backup to DropBox

    -Will add a timer to handle the task. User can select files via GUI.
    Upload will be done automatic to DropBox
    (Developer note: A Token must be created to prevent reauthentication)

    Examples could be: Identified List, Verified List etc...

    Simpel demo of Authenthication in the console:

    image

    image

    Note: However the Article Extractor Features must be completed first + some other tweaks and enhancements.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Very early and raw proto-type of DropBox-Connect:

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    Small Test that it is possible to get a connection to DropBox
    - Using proper DropBox Access Code
    - Sensible information is scrambled

    image


    TO DO:

    As we are now able to establish a connection to DropBox, some features needs to be implemented.
    1. Upload of Zip-File
    2. Browsing feature to see the files
    3. Download of Zip-file

    - When this is implemented, a special function will be created to handle compressing of GSA Ser Project Files.

    - A Timer will handle upload of GSA Project Files to DropBox - Completely Automatic.

    Please not: This is an early ProtoType - More to come...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - GUI MOCK UP of DropBox Auto Backup (Proto-Type):

    image

    The DropBox Connect Button iniatiates the Image above, and establish the connection.

    I will now try to implement the mentioned functionality above.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    Update DropBox Auto Backup:
    - Implemented Browse and select Source
    - Implemented Browse Destination (Directly browse Dropbox folders)

    A Sample - Shows a connected DropBox - and a Treview with folders:

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    WOOOHOOO;)


    image

    So - that means the following:
    - A method to pack GSA Ser Project must be made
    - A timer to handle uploads must be made...

    Once those are made and tested - Scraping ToolBox will be able to Auto Back Up GSA SER Project;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Prepared function to compress the entire GSA SER Project folder:

    image

    image

    Getting a little bit tired right now - so taking a break before making the rest:P

    However, we are close to a final working solution of automatic backups...

    Strange to see so little interest, considering so many asking for such a feature

    Known Issues:
    -Developer Notes:
    - Max 100 users (More requires Public Release via DropBox)
    - File extension must be changed more (Zip/Rar/.SL) - would properly mean a new API-Key
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Added Timer (Still not 100% complete)

    NB: Ignore the file that gets uploaded - it's just for demonstration.

    The interesting part is that the 'Timer' is activated and doing some compression in the background...
    Preparing the real file to upload;)

    Notice the Message Log:

    image

    Still a few things to do before a test can be done...
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Added creation of internal storage for compression of GSA Ser Project
    - Added switch to handle specific Back Up of GSA Ser Project only
    - Added restrivement of GSA Ser Project Folders

    image

    Hmm.. that leaves only very few things left to handle properly:

    - Adjustment of timer-selection interval
    - Upload of zip file (since it's different than txt-files)
    - Minor adjustments and tweaks of the GUI

    In other words - almost complete already;)

    Actually fucking awesome to say it at least:P
    Feel free to join the adventure anytime
  • Wow great job! You took it upon yourself to create this essential function. I barely had time to react because you're fast. Thanks for giving it a shot.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @zinne
    Many thanks buddy;)

    Here we go - just finished the actual upload function - and the sucker works:D

    First I created a new DropBox - Notice the folder name - that is very important:
    image

    Next I started the Backup Feature, using a considerable smaller file-folder to fake the process.
    I Select ImageBurn folder - just for testing, it could have been the GSA Ser Project Folder:

    image

    I specifically chose the Public folder I created on DropBox...

    Timer is now active and the program executes every 5 minutes to test if files gets uploaded:

    image


    BINGO!!!! It's working flawless;)

    image


    Only one thing left to do:
    - Adjustments of the GUI and enable selection of 'BackUp Interval'
    - Make a test with a larger 'Back-Up Interval' to ensure it performes correctly

    That leads to an upcomming release of Scraping Tool-Box 1.3, with AutoMatic BackUp to DropBox...
    Isn't that just cool???
  • please PM me paypal.. wanna buy 
    :)
    thanks 
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @akwin

    Awesome buddy:)

    Many thanks for your support - Hugely appreciated.
    I will send you a pm as soon as 1.3 is ready for release (Unless you want 1.2 right now) - and that wont be very long.
    Some small things to check and some minor adjustments, then we are there.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Performed test with large file - this time the Real GSA SER Project Folder:

    image

    image

    image

    The real file:

    image

    Showing the file indeed got uploaded:

    image

    Examination of file downloaded from Dropbox:

    image

    This concludes that the new Backup Feature is working!

    *Known Issues or Limitations:

    - During upload - progress is not shown in progressbar.
    That is because the stream needs to be wrapped (This is a guess)...
    However, to avoid blocking - this has been postponed for now, as more testing is needed


    Current Status: Complete
    I will compile version 1.3 very soon and release it.

    Expected timeframe: 1-3 days from now.

    - Feel free to make a donation and support the development (receive the program and updates as a donor)
    - Stay tuned - more information to come;)
  • That's ok
    I will get updates ri8?

    Then I wanna buy now..
    Thanks
    PM me. :)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @akwin
    Indeed you will get updates, that is correct buddy:)
    In fact, I think I will let you be the first one to try the new v 1.3 - So expect to get a pm from me later today.

    - Added 'Elapsed Time' in Automatic Backup to DropBox in upcoming V.1.3:
    That will give a better visual that the program indeed is activé + The Message Log

    image
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @akwin PM sent;)

    Scraping Tool-Box v.1.3 has been released!
    - Current donors will recieve an update later today - Patience Please:D


    Important things to notice in terms of using DropBox Auto-Backup:

    1. 
    If you want to make a backup of GSA SER Project Folder please observe the following:
    You will have to select or manual input the location like below, where username should be replaced with your name:

    C:\Users\"Username"\AppData\Roaming\GSA Search Engine Ranker

    2. 
    You will have to select two buttons, otherwise it will not work properly:
    "Upload" and "GSA Project"

    3. 
    You will need to create a folder on your DropBox named: "Public"
    - Choose that folder as remote destination

    Hope everyone will benefit from this new feature - and see you soon with more features to come on the Article Extractor;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - After the release of version 1.3, the focus will be on 3 things:

    1. The Article Extractor (Will get some additional features)
    2. Some minor fixes and tweaks.

    Number 3 is actually not a part of Scraping Tool-box itself - but something new:

    An Experimental Add-On for GSA SER, a special add-on that can submit differently than GSA Ser.

    Codename: Sentinel

    It will be able to 'feed' GSA SER with submitted links, where GSA will take over and handle the remanining.

    That means GSA SER will do the rest, and add verified links etc. like it's doing now.

    As it is experimental, the platforms it can submit to, will be limited in the beginning - however if things works out great, it will be expanded over time.

    More on that topic later, stay tuned;)


  • edited April 2015
    Can you relase some more vids specially about new features.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @redfox
    Surely more videos will be released, specially featuring the new features.
    In fact I think a small collection should be in the same place for reference.
    - Added to the To-Do-List.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Added Google Search in Article Extractor

    It will make a list, that can be loaded into the Article Extractor afterwards, in order to extract articles.

    Demonstration:
    image

    Asking for User Input, like Search Term, Amount of Results and Destination (for results)

    image

    image

    image

    image

    Generated Results in the Message Log:

    image

    Result Text File:

    image

    That new generated list is ready to load as Targets for Article Extraction!

    Limitations:
    - Don't use operators like
    Site:, Inurl, AllInurl, Intext

    Do it like shown in the demonstration;)
  • How to download your software? Should I pay first? Please PM me
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @shadir
    Will send a pm for you - stay tuned;)
  • KaineKaine thebestindexer.com
    @magically

    Look your pm, I sent you a message that I think could give pleasure to many Members ;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Awesome - i'm on it - will include it stay tuned;)
  • KaineKaine thebestindexer.com
    Very nice ;)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - New feature implementation (Currently Working on it)
    This is a very basic proto-type - raw console output...

    BULK CHECK CF & TF

    image

    The idea is to add x-amount urls, and the Bulk-Checker will return information like above.
    Of course some serious string manipulation is needed in order to present the data nicely!

    Like this:
    S. No. URL Status Citation Flow Trust Flow External Backlinks Referring Domains
    1 https://forum.gsa-online.de/ Found 39 31 131 22
    2 https://forum.gsa-online.de/ Found 39 31 131 22

    Speed and performance must be fast.
    Not an easy task - so patience guys:)

    I will work on several different projects during the upcoming 2 weeks - so whenever I have some time, I will continue working on this, and update the thread.

    One more example with a few more urls:

    image
  • KaineKaine thebestindexer.com
    edited April 2015
    @magically Very fast working :)

    Maybe for string "Of course some serious string manipulation is needed in order to present the data nicely!" 
    You can directly work on .cvs ?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Unfortunately no - as that is a flash-object - However I will try to figure out something;)

    Most important is the ability to get at list - however the response time is different, which will make it a bit complicated.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Update after some manipulation of the text:

    image
    Plan is to present the retrieved data like shown in the image....
    It will also generate a txt-file with these information.

    However much more must be done, in order to make this work nicely in the GUI:P

    - Still we are close at a working solution;)
  • KaineKaine thebestindexer.com
    edited April 2015

    Mmm think at that, maybe if list are sorted be best domain that can be good.

    Want says all url are mixed in output i think ?

    In this ways it's hard to find easily/quickly best domain.

    Idem for extract all good urls without deleted end off row :)

    Maybe choose before TF CF minimum wanted and extract in 2 file ? one with all, second with only good domain (no other information for exemple).



    I think you know the calculation TF CF to define domain with the best authority?

    Best is like 50/50 approximatively ?


  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Surely it can be sorted - however I need to know the 'Sort-Term' - what defines Top Authority.

    Some sort of calculation is needed...

    In other words - 2 lists can be generated.
    1. Raw Results
    2. Sorted by a defined specification.
  • KaineKaine thebestindexer.com
    edited April 2015

    I thought about whether your app could retrieve verified url of ser, test and re-inject only url with good authority :)
    Can be great for build in direct live optimised tier.

    Just no says if ser can see project have changed in real time.


    You think is possible to do that ?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    It would be able to retrieve the verifed urls - BUT...GSA SER keeps a record of the files inside it's own database.
    It would be possible to take that list, sort it according to specific parameters and generate a new verified list.

    But keep in mind, it wouldn't have reference to GSA Ser, unless you run those targets once more in a new project.

    Now we are at it - I now realize that in order to really do effective sorting of CF & TF, a database is needed.
    In other words, if we are going to sort the data - implementation of a database is required.
    It wouldn't mean anything for the user, as the sorting is done 'behind the scene' via SQL.
    However the workflow for me will increase:D

  • KaineKaine thebestindexer.com
    edited April 2015

    lol, think .xls (excel /open office) like is enought :)


  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @Kaine
    hehehe, yeah that could also do it - Users would be able to open it in .xls and sort there:P
    Actually less complicated:D

    *I will do some more work on that part in the weekend + test if threads, would be optional in this case in terms of retrievement.

    Actually, a new 'tecnology' is being used here - One that could be quite handy in the future.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Added Replace Urls to Article Extractor
    Will randomly replace original links, with your links instead!

    image

    It will load a list with your preferred links to replace. Original URL'S will be replaced with your links!.

    In other words, if you enable 'Replace URL'S' - Program will replace existing url's, with your url's before printing out txt-files.


    Demo of article, where all links have been replaced (Click on the image to see):
    image

    Download a sample with 24 articles, where links have been replaced - if links were found:

  • Any news on BULK CHECK CF & TF?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @zuluranger

    Indeed, will get some more work done soon;) I was quite occupied with another project the last 2 weeks.
    However hands are free again, meaning I will continue with the process.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @zuluranger

    I will try to do some work sunday and post some updates.
    - Will add a new 'tab' with experimental features, first one comming up will be Bulk Check.
    However, keep in mind that these features are experimental only.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Experimental Features - The Final curtain fall:

    The very last series of features, will be experimental, starting with Bulk Check Page Rank

    Early Preview:

    image

    image

    - Target urls are automatically trimmed to domain root
    - Multi-threaded handler takes care of the PR Lookup
    - List is finally made as Text-file.
    (*As google is very aggressive - Some proxy handling must be added too!!!)

    - Bulk check of CF and TF will also be added here soon.
    - Check Proxies Dead/Alive also

    Once these last features are fully implemented - a final version will be released to every donor.

    The project will then be closed and discontinued, meaning no more development.

    Yes, that is correct - too little interest:(
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited May 2015
    - Added simpel lookup - Only 10 url's are allowed in one lookup - 1 Thread solution.

    image

    Issues:

    - Very slow and unstable
    - Loading more url's will lead to no results

    Will try to do 2 things, in order to improve the performance a little.

    1. Split the lookup into 5-10 threads.
    2. Add the results in batches

    *Problem with adding more threads to do the jobs, is severe memory usage

    Bare in mind, that this feature is only experimental - we have no API to work with here.


    ***EDIT****

    - Added support of proxies during Page Rank lookup

    image


    *As always - remember to use fast passed google proxies!!!
  • KaineKaine thebestindexer.com
    Maybe must use proxy anon for Bulk check of CF and TF ?


  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Yeah, it would be better to use proxies - absolutely.

    However, I will rewrite must of the feature for 'Bulk Check CF and TF' and probally add a database.
    Simply because it's more easy to do calculations and retrievements later on.

    Will look at it later in this week, when I'm in a better mood (right now i'm pissed:P)
    Not on this - it's something else;)
  • KaineKaine thebestindexer.com
    And cut to the root domain too ^^
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Yeah - i'm still alive:P

    Just gone through some testing and upgrading stuff on one of my computers...
    Testing Windows 10 Technical Preview, so it sort of delayed the remaining stuff.

    However, I will get some more work done soonish.

    I played around with some automation for improving the CTR also;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited May 2015
    Well in order to actually get something new released - we will leave the CF&TF Checker as it is for now:

    image


    Complete Log Message:

    Starting Ghost Engine.....

    Please Wait getting data.....

    URL: www.sme.ao Status: Found Citation Flow: 21 Trust Flow: 15 External Backlinks: 58964 Referring Domains: 32
    URL: forum.joomla.org Status: Found Citation Flow: 63 Trust Flow: 69 External Backlinks: 5382605 Referring Domains: 107775
    URL: www.youtube.com Status: Found Citation Flow: 80 Trust Flow: 84 External Backlinks: 171723141 Referring Domains: 358061
    URL: kabbalahexperience.com Status: Found Citation Flow: 26 Trust Flow: 34 External Backlinks: 13 Referring Domains: 4
    URL: londonfuse.ca Status: Found Citation Flow: 29 Trust Flow: 34 External Backlinks: 171 Referring Domains: 35
    URL: www.shopify.com Status: Found Citation Flow: 58 Trust Flow: 71 External Backlinks: 21738375 Referring Domains: 46082
    URL: www.relevantmagazine.com Status: Found Citation Flow: 48 Trust Flow: 40 External Backlinks: 246480 Referring Domains: 1799
    URL: torosyfaenas.com.mx Status: Found Citation Flow: 13 Trust Flow: 0 External Backlinks: 33 Referring Domains: 6
    Please Wait getting data.....

    URL: www.relevantmagazine.com Status: Found Citation Flow: 48 Trust Flow: 40 External Backlinks: 246480 Referring Domains: 1799
    URL: bangladesheconomy.wordpress.com Status: Found Citation Flow: 17 Trust Flow: 0 External Backlinks: 30 Referring Domains: 15
    URL: sk-ester.com Status: Found Citation Flow: 29 Trust Flow: 19 External Backlinks: 6 Referring Domains: 5
    URL: www.metareklam.net Status: Found Citation Flow: 13 Trust Flow: 0 External Backlinks: 5 Referring Domains: 2
    URL: dev.stwinefrides.org.uk Status: Found Citation Flow: 17 Trust Flow: 19 External Backlinks: 12 Referring Domains: 4
    URL: www.tigerstores.co.uk Status: Found Citation Flow: 43 Trust Flow: 32 External Backlinks: 58086 Referring Domains: 517
    URL: www.ohssl.org Status: Found Citation Flow: 19 Trust Flow: 18 External Backlinks: 516 Referring Domains: 26
    URL: de-de.facebook.com Status: Found Citation Flow: 56 Trust Flow: 48 External Backlinks: 1072578 Referring Domains: 3471
    Please Wait getting data.....

    URL: zeit-zum-aufwachen.blogspot.com Status: Found Citation Flow: 14 Trust Flow: 15 External Backlinks: 852 Referring Domains: 17
    URL: www.metalogicdesign.com Status: Found Citation Flow: 29 Trust Flow: 43 External Backlinks: 2363 Referring Domains: 13
    URL: issuu.com Status: Found Citation Flow: 67 Trust Flow: 75 External Backlinks: 2439468 Referring Domains: 47179
    URL: www.blogger.com Status: Found Citation Flow: 74 Trust Flow: 81 External Backlinks: 112147899 Referring Domains: 435686
    URL: lists.clean-mx.com Status: Found Citation Flow: 0 Trust Flow: 0 External Backlinks: 0 Referring Domains: 0
    URL: issues.joomla.org Status: Found Citation Flow: 52 Trust Flow: 48 External Backlinks: 9042 Referring Domains: 453
    URL: www.upinfra.com Status: Found Citation Flow: 0 Trust Flow: 0 External Backlinks: 0 Referring Domains: 0
    URL: www.postes-restantes.be Status: Found Citation Flow: 16 Trust Flow: 21 External Backlinks: 228 Referring Domains: 5
    Please Wait getting data.....

    URL: issues.joomla.org Status: Found Citation Flow: 52 Trust Flow: 48 External Backlinks: 9042 Referring Domains: 453
    URL: www.forosdelweb.com Status: Found Citation Flow: 41 Trust Flow: 51 External Backlinks: 49740 Referring Domains: 962
    URL: uk7.valuehost.co.uk Status: Found Citation Flow: 14 Trust Flow: 3 External Backlinks: 65 Referring Domains: 3
    URL: www.feuerwehr-lunz.at Status: Found Citation Flow: 19 Trust Flow: 25 External Backlinks: 60 Referring Domains: 20
    URL: www.efr-germany.de Status: Found Citation Flow: 26 Trust Flow: 24 External Backlinks: 2387 Referring Domains: 125
    URL: surface.syr.edu Status: Found Citation Flow: 25 Trust Flow: 28 External Backlinks: 98 Referring Domains: 30
    URL: www.stmichaelsabbey.com Status: Found Citation Flow: 32 Trust Flow: 42 External Backlinks: 101 Referring Domains: 29
    URL: www.eastmanandassociates.net Status: Found Citation Flow: 12 Trust Flow: 9 External Backlinks: 3 Referring Domains: 1
    Please Wait getting data.....

    URL: en.wikipedia.org Status: Found Citation Flow: 64 Trust Flow: 76 External Backlinks: 5407848 Referring Domains: 49455
    URL: www.surf-devil.com Status: Found Citation Flow: 29 Trust Flow: 38 External Backlinks: 946 Referring Domains: 92
    URL: www.surf-devil.com Status: Found Citation Flow: 29 Trust Flow: 38 External Backlinks: 946 Referring Domains: 92
    URL: www.surf-devil.com Status: Found Citation Flow: 29 Trust Flow: 38 External Backlinks: 946 Referring Domains: 92
    URL: dev06.hubzero.org Status: MayExist Citation Flow: 0 Trust Flow: 0 External Backlinks: 30 Referring Domains: 3
    URL: wilsonrealestateinvestment.com Status: Found Citation Flow: 18 Trust Flow: 8 External Backlinks: 59 Referring Domains: 25
    URL: www.philotheamission.org Status: Found Citation Flow: 14 Trust Flow: 6 External Backlinks: 2 Referring Domains: 1
    URL: www.geotimes.ge Status: Found Citation Flow: 36 Trust Flow: 48 External Backlinks: 53515 Referring Domains: 578

    - It will take some time to perform and lookup the url-targets, hence I suggest to add no more than 50 pr. time

    I will see if we should enhance this feature later, with a database to be able to actually do some calculations.
    However, in my honest opinion that requires a full API.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited May 2015
     will now 'fine adjust' a few things...and prepare a release in this upcoming week.

    To sum up the new features in upcoming release:

    - Added Google Search in Article Extractor (Use these results to scrape for articles)
    - Added Replace Urls in Article Extractor (Replace existing ones with your urls)
    - Added Experimental Ghost Engine
    - Added Experimental Bulk Check Of CF & TF
    - Added Experimental Bulk Check PR

    - Wrapped execution file into .exe file + Added Icon for .exe file
    - Minor Tweaks and Bug fixings

    Limitations:
    - Don't use operators like
    Site:, Inurl, AllInurl, Intext in new Google Search Function

    - Experimantal Only for Bulk Checkers

    Addional information:

    Donors will receive a PM from me with the update, once it's ready - please have patience.
    Everyone else - feel free to contact me via PM.

    Estimated time to next release: 1 week
    image
Sign In or Register to comment.