Skip to content

Keyword Generator - Made in Java - For Scraping

1235

Comments

  • edited April 2015
    Can you relase some more vids specially about new features.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @redfox
    Surely more videos will be released, specially featuring the new features.
    In fact I think a small collection should be in the same place for reference.
    - Added to the To-Do-List.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Added Google Search in Article Extractor

    It will make a list, that can be loaded into the Article Extractor afterwards, in order to extract articles.

    Demonstration:
    image

    Asking for User Input, like Search Term, Amount of Results and Destination (for results)

    image

    image

    image

    image

    Generated Results in the Message Log:

    image

    Result Text File:

    image

    That new generated list is ready to load as Targets for Article Extraction!

    Limitations:
    - Don't use operators like
    Site:, Inurl, AllInurl, Intext

    Do it like shown in the demonstration;)
  • How to download your software? Should I pay first? Please PM me
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @shadir
    Will send a pm for you - stay tuned;)
  • KaineKaine thebestindexer.com
    @magically

    Look your pm, I sent you a message that I think could give pleasure to many Members ;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Awesome - i'm on it - will include it stay tuned;)
  • KaineKaine thebestindexer.com
    Very nice ;)

  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - New feature implementation (Currently Working on it)
    This is a very basic proto-type - raw console output...

    BULK CHECK CF & TF

    image

    The idea is to add x-amount urls, and the Bulk-Checker will return information like above.
    Of course some serious string manipulation is needed in order to present the data nicely!

    Like this:
    S. No. URL Status Citation Flow Trust Flow External Backlinks Referring Domains
    1 https://forum.gsa-online.de/ Found 39 31 131 22
    2 https://forum.gsa-online.de/ Found 39 31 131 22

    Speed and performance must be fast.
    Not an easy task - so patience guys:)

    I will work on several different projects during the upcoming 2 weeks - so whenever I have some time, I will continue working on this, and update the thread.

    One more example with a few more urls:

    image
  • KaineKaine thebestindexer.com
    edited April 2015
    @magically Very fast working :)

    Maybe for string "Of course some serious string manipulation is needed in order to present the data nicely!" 
    You can directly work on .cvs ?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Unfortunately no - as that is a flash-object - However I will try to figure out something;)

    Most important is the ability to get at list - however the response time is different, which will make it a bit complicated.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Update after some manipulation of the text:

    image
    Plan is to present the retrieved data like shown in the image....
    It will also generate a txt-file with these information.

    However much more must be done, in order to make this work nicely in the GUI:P

    - Still we are close at a working solution;)
  • KaineKaine thebestindexer.com
    edited April 2015

    Mmm think at that, maybe if list are sorted be best domain that can be good.

    Want says all url are mixed in output i think ?

    In this ways it's hard to find easily/quickly best domain.

    Idem for extract all good urls without deleted end off row :)

    Maybe choose before TF CF minimum wanted and extract in 2 file ? one with all, second with only good domain (no other information for exemple).



    I think you know the calculation TF CF to define domain with the best authority?

    Best is like 50/50 approximatively ?


  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    Surely it can be sorted - however I need to know the 'Sort-Term' - what defines Top Authority.

    Some sort of calculation is needed...

    In other words - 2 lists can be generated.
    1. Raw Results
    2. Sorted by a defined specification.
  • KaineKaine thebestindexer.com
    edited April 2015

    I thought about whether your app could retrieve verified url of ser, test and re-inject only url with good authority :)
    Can be great for build in direct live optimised tier.

    Just no says if ser can see project have changed in real time.


    You think is possible to do that ?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine
    It would be able to retrieve the verifed urls - BUT...GSA SER keeps a record of the files inside it's own database.
    It would be possible to take that list, sort it according to specific parameters and generate a new verified list.

    But keep in mind, it wouldn't have reference to GSA Ser, unless you run those targets once more in a new project.

    Now we are at it - I now realize that in order to really do effective sorting of CF & TF, a database is needed.
    In other words, if we are going to sort the data - implementation of a database is required.
    It wouldn't mean anything for the user, as the sorting is done 'behind the scene' via SQL.
    However the workflow for me will increase:D

  • KaineKaine thebestindexer.com
    edited April 2015

    lol, think .xls (excel /open office) like is enought :)


  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    @Kaine
    hehehe, yeah that could also do it - Users would be able to open it in .xls and sort there:P
    Actually less complicated:D

    *I will do some more work on that part in the weekend + test if threads, would be optional in this case in terms of retrievement.

    Actually, a new 'tecnology' is being used here - One that could be quite handy in the future.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited April 2015
    - Added Replace Urls to Article Extractor
    Will randomly replace original links, with your links instead!

    image

    It will load a list with your preferred links to replace. Original URL'S will be replaced with your links!.

    In other words, if you enable 'Replace URL'S' - Program will replace existing url's, with your url's before printing out txt-files.


    Demo of article, where all links have been replaced (Click on the image to see):
    image

    Download a sample with 24 articles, where links have been replaced - if links were found:

  • Any news on BULK CHECK CF & TF?
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @zuluranger

    Indeed, will get some more work done soon;) I was quite occupied with another project the last 2 weeks.
    However hands are free again, meaning I will continue with the process.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @zuluranger

    I will try to do some work sunday and post some updates.
    - Will add a new 'tab' with experimental features, first one comming up will be Bulk Check.
    However, keep in mind that these features are experimental only.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    - Experimental Features - The Final curtain fall:

    The very last series of features, will be experimental, starting with Bulk Check Page Rank

    Early Preview:

    image

    image

    - Target urls are automatically trimmed to domain root
    - Multi-threaded handler takes care of the PR Lookup
    - List is finally made as Text-file.
    (*As google is very aggressive - Some proxy handling must be added too!!!)

    - Bulk check of CF and TF will also be added here soon.
    - Check Proxies Dead/Alive also

    Once these last features are fully implemented - a final version will be released to every donor.

    The project will then be closed and discontinued, meaning no more development.

    Yes, that is correct - too little interest:(
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited May 2015
    - Added simpel lookup - Only 10 url's are allowed in one lookup - 1 Thread solution.

    image

    Issues:

    - Very slow and unstable
    - Loading more url's will lead to no results

    Will try to do 2 things, in order to improve the performance a little.

    1. Split the lookup into 5-10 threads.
    2. Add the results in batches

    *Problem with adding more threads to do the jobs, is severe memory usage

    Bare in mind, that this feature is only experimental - we have no API to work with here.


    ***EDIT****

    - Added support of proxies during Page Rank lookup

    image


    *As always - remember to use fast passed google proxies!!!
  • KaineKaine thebestindexer.com
    Maybe must use proxy anon for Bulk check of CF and TF ?


  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Yeah, it would be better to use proxies - absolutely.

    However, I will rewrite must of the feature for 'Bulk Check CF and TF' and probally add a database.
    Simply because it's more easy to do calculations and retrievements later on.

    Will look at it later in this week, when I'm in a better mood (right now i'm pissed:P)
    Not on this - it's something else;)
  • KaineKaine thebestindexer.com
    And cut to the root domain too ^^
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    @Kaine

    Yeah - i'm still alive:P

    Just gone through some testing and upgrading stuff on one of my computers...
    Testing Windows 10 Technical Preview, so it sort of delayed the remaining stuff.

    However, I will get some more work done soonish.

    I played around with some automation for improving the CTR also;)
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited May 2015
    Well in order to actually get something new released - we will leave the CF&TF Checker as it is for now:

    image


    Complete Log Message:

    Starting Ghost Engine.....

    Please Wait getting data.....

    URL: www.sme.ao Status: Found Citation Flow: 21 Trust Flow: 15 External Backlinks: 58964 Referring Domains: 32
    URL: forum.joomla.org Status: Found Citation Flow: 63 Trust Flow: 69 External Backlinks: 5382605 Referring Domains: 107775
    URL: www.youtube.com Status: Found Citation Flow: 80 Trust Flow: 84 External Backlinks: 171723141 Referring Domains: 358061
    URL: kabbalahexperience.com Status: Found Citation Flow: 26 Trust Flow: 34 External Backlinks: 13 Referring Domains: 4
    URL: londonfuse.ca Status: Found Citation Flow: 29 Trust Flow: 34 External Backlinks: 171 Referring Domains: 35
    URL: www.shopify.com Status: Found Citation Flow: 58 Trust Flow: 71 External Backlinks: 21738375 Referring Domains: 46082
    URL: www.relevantmagazine.com Status: Found Citation Flow: 48 Trust Flow: 40 External Backlinks: 246480 Referring Domains: 1799
    URL: torosyfaenas.com.mx Status: Found Citation Flow: 13 Trust Flow: 0 External Backlinks: 33 Referring Domains: 6
    Please Wait getting data.....

    URL: www.relevantmagazine.com Status: Found Citation Flow: 48 Trust Flow: 40 External Backlinks: 246480 Referring Domains: 1799
    URL: bangladesheconomy.wordpress.com Status: Found Citation Flow: 17 Trust Flow: 0 External Backlinks: 30 Referring Domains: 15
    URL: sk-ester.com Status: Found Citation Flow: 29 Trust Flow: 19 External Backlinks: 6 Referring Domains: 5
    URL: www.metareklam.net Status: Found Citation Flow: 13 Trust Flow: 0 External Backlinks: 5 Referring Domains: 2
    URL: dev.stwinefrides.org.uk Status: Found Citation Flow: 17 Trust Flow: 19 External Backlinks: 12 Referring Domains: 4
    URL: www.tigerstores.co.uk Status: Found Citation Flow: 43 Trust Flow: 32 External Backlinks: 58086 Referring Domains: 517
    URL: www.ohssl.org Status: Found Citation Flow: 19 Trust Flow: 18 External Backlinks: 516 Referring Domains: 26
    URL: de-de.facebook.com Status: Found Citation Flow: 56 Trust Flow: 48 External Backlinks: 1072578 Referring Domains: 3471
    Please Wait getting data.....

    URL: zeit-zum-aufwachen.blogspot.com Status: Found Citation Flow: 14 Trust Flow: 15 External Backlinks: 852 Referring Domains: 17
    URL: www.metalogicdesign.com Status: Found Citation Flow: 29 Trust Flow: 43 External Backlinks: 2363 Referring Domains: 13
    URL: issuu.com Status: Found Citation Flow: 67 Trust Flow: 75 External Backlinks: 2439468 Referring Domains: 47179
    URL: www.blogger.com Status: Found Citation Flow: 74 Trust Flow: 81 External Backlinks: 112147899 Referring Domains: 435686
    URL: lists.clean-mx.com Status: Found Citation Flow: 0 Trust Flow: 0 External Backlinks: 0 Referring Domains: 0
    URL: issues.joomla.org Status: Found Citation Flow: 52 Trust Flow: 48 External Backlinks: 9042 Referring Domains: 453
    URL: www.upinfra.com Status: Found Citation Flow: 0 Trust Flow: 0 External Backlinks: 0 Referring Domains: 0
    URL: www.postes-restantes.be Status: Found Citation Flow: 16 Trust Flow: 21 External Backlinks: 228 Referring Domains: 5
    Please Wait getting data.....

    URL: issues.joomla.org Status: Found Citation Flow: 52 Trust Flow: 48 External Backlinks: 9042 Referring Domains: 453
    URL: www.forosdelweb.com Status: Found Citation Flow: 41 Trust Flow: 51 External Backlinks: 49740 Referring Domains: 962
    URL: uk7.valuehost.co.uk Status: Found Citation Flow: 14 Trust Flow: 3 External Backlinks: 65 Referring Domains: 3
    URL: www.feuerwehr-lunz.at Status: Found Citation Flow: 19 Trust Flow: 25 External Backlinks: 60 Referring Domains: 20
    URL: www.efr-germany.de Status: Found Citation Flow: 26 Trust Flow: 24 External Backlinks: 2387 Referring Domains: 125
    URL: surface.syr.edu Status: Found Citation Flow: 25 Trust Flow: 28 External Backlinks: 98 Referring Domains: 30
    URL: www.stmichaelsabbey.com Status: Found Citation Flow: 32 Trust Flow: 42 External Backlinks: 101 Referring Domains: 29
    URL: www.eastmanandassociates.net Status: Found Citation Flow: 12 Trust Flow: 9 External Backlinks: 3 Referring Domains: 1
    Please Wait getting data.....

    URL: en.wikipedia.org Status: Found Citation Flow: 64 Trust Flow: 76 External Backlinks: 5407848 Referring Domains: 49455
    URL: www.surf-devil.com Status: Found Citation Flow: 29 Trust Flow: 38 External Backlinks: 946 Referring Domains: 92
    URL: www.surf-devil.com Status: Found Citation Flow: 29 Trust Flow: 38 External Backlinks: 946 Referring Domains: 92
    URL: www.surf-devil.com Status: Found Citation Flow: 29 Trust Flow: 38 External Backlinks: 946 Referring Domains: 92
    URL: dev06.hubzero.org Status: MayExist Citation Flow: 0 Trust Flow: 0 External Backlinks: 30 Referring Domains: 3
    URL: wilsonrealestateinvestment.com Status: Found Citation Flow: 18 Trust Flow: 8 External Backlinks: 59 Referring Domains: 25
    URL: www.philotheamission.org Status: Found Citation Flow: 14 Trust Flow: 6 External Backlinks: 2 Referring Domains: 1
    URL: www.geotimes.ge Status: Found Citation Flow: 36 Trust Flow: 48 External Backlinks: 53515 Referring Domains: 578

    - It will take some time to perform and lookup the url-targets, hence I suggest to add no more than 50 pr. time

    I will see if we should enhance this feature later, with a database to be able to actually do some calculations.
    However, in my honest opinion that requires a full API.
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    edited May 2015
     will now 'fine adjust' a few things...and prepare a release in this upcoming week.

    To sum up the new features in upcoming release:

    - Added Google Search in Article Extractor (Use these results to scrape for articles)
    - Added Replace Urls in Article Extractor (Replace existing ones with your urls)
    - Added Experimental Ghost Engine
    - Added Experimental Bulk Check Of CF & TF
    - Added Experimental Bulk Check PR

    - Wrapped execution file into .exe file + Added Icon for .exe file
    - Minor Tweaks and Bug fixings

    Limitations:
    - Don't use operators like
    Site:, Inurl, AllInurl, Intext in new Google Search Function

    - Experimantal Only for Bulk Checkers

    Addional information:

    Donors will receive a PM from me with the update, once it's ready - please have patience.
    Everyone else - feel free to contact me via PM.

    Estimated time to next release: 1 week
    image
Sign In or Register to comment.