Keyword Generator - Made in Java - For Scraping

zuluranger · April 2015

Hi Magically

How to get your software? PM sent. Still waiting for your response.

magically · April 2015

@zuluranger

Hmm.. strange I didn't get any pm.

I will send you a pm now with the information;)

Kaine · April 2015

magically

Very nice, how upload new version ?

magically · April 2015

@Kaine

Once version 1.2 is ready I will send a pm to everyone with the new software and a download link.

Everyone should then download the new version, delete old one and replace with new one.

- Activation should not be necessary unless it's the first time you use the software.

Expect version 1.2 out very soon, just need to adjust a few things.

(Version 1.21 will include tweaks to the article scraper.)

Kaine · April 2015

Ok wait for 1.21

magically · April 2015

@Kaine

No need to wait buddy - I need feedback first from version 1.2.

Please test the article extractor with at least 50-100 urls, that you yourself has located up front.

- I need to see how it goes for you guys first, before we add the remaining stuff, like copyright removal and url replacement.

Update - Version 1.2 will be released today:

Changelog:

Current donors:

Patience:D I will send a pm to you guys with the new release.

Everyone else:

Please consider to join this adventure, as the development is based purely on interest, support and donations. I don't make money on this project - actually it can not even pay for the electricity;)

magically · April 2015

@Scraping Tool-Box Donors

PM has been sent out with new release - enjoy and have fun:D

magically · April 2015

Made a small test with just 4 targets:

http://en.wikipedia.org/wiki/Sandwich

http://en.wikipedia.org/wiki/Ice

http://en.wikipedia.org/wiki/Burger_King

http://en.wikipedia.org/wiki/Food

Here are the extracted articles (results from the Article Extractor):

https://www.dropbox.com/s/qwosze84a90j8my/Article6.txt?dl=0

https://www.dropbox.com/s/wa1as0ldo3fao5a/Article7.txt?dl=0

https://www.dropbox.com/s/t42x4ez4bsror76/Article8.txt?dl=0

https://www.dropbox.com/s/oy3v9saa4fbz8rg/Article9.txt?dl=0

Imagine doing this with 1000+ targets....

Incredible fast and effecient

Kaine · April 2015

magically

I have dowloaded and all is ok. Just before test article extractor, you have used special footprint for scrape urls ? like: site:wordpress.com + diet

EDIT

Ok just played 2 min and see you can pusch more threads (see 30) but that eat memory. For avoid that write directly on Hard disk.

I must quit soft if i want stop work, maybe one button for that can be good

EDIT

Tested with footprint like:site:wordpress.com +OTHER WORD

Scrape is very quick and in 2mn 590 unique url are done.

On 590 i have 525 articles downloaded (very good).

On this article 525, i have approx 14 articles like that: http://www60.zippyshare.com/v/Ob14Syzd/file.html

magically · April 2015

@Kaine

Great to hear it worked fine to upgrade to new version:)

Of course I knew it would lead to issues and problems, that is why I have delayed the rest of the features like removal of copyright and url-replacement:P

Let's break it down:

1. The Article Extractor does not use any footprints - as it completely relies on what kind of target urls the user load into the program.

- The point is that the user himself need to do some research up front and do a manual search in google using various footprints, and then select good ones...

That can be done automaticly too - but is not implemented.

Also note, that this would lead to poor quality, as the program wont care if an article is 'good' or 'bad'...

2. Threads

Yep, you are right here - as the program currenty is set to use all 30 threads as default.

Of course I also knew that as well;)

It needs to count the amount of targets first:

- If 5 urls are loaded - 1 thread would be enough

- 100 urls - 10 threads would do

And so on....

Not a big issue really - and easy to implement.

3. Stop Button

Indeed - there is no stop button (yet:P)

- Also no button for loading in replacement urls

As I said - Those features will come in version 1.21;)

The important thing here was to test, if the 'Article Extractor' indeed does work in a real life.

And as far as I see - it does exactly what it is supposed to do (ignoring the features below)

To sum up:

- Balance thread use

- Stop button

- Load replacement urls

- implementation of replace urls, and remove copyright etc...

*Edit

In terms of 'Strange results' like text-fles with nonsense, it will fail on some targets (different encoding and stuff).

However I think most will work, and your test result with 525 articles out of 590 seems decent.

Kaine · April 2015

magically

Yes i have scrapesite:wordpress.com +OTHER WORD with Gscraper.

For me result is good, your soft scrape article very fast and copyright seem to be removed

Maybe remove mail can be good too.

magically · April 2015

@Kaine

Awesome to hear buddy:)

The remaining 'tweaks' will be added in upcomming release + some other enhancements/features.

For now - I just wanted to see how the Article Extracter performed as 'RAW' with default setting.

I think you will see that next release has the remaining stuff you are looking for - At least I will give it a try;)

Hope some other guys hanging around here on the forum, also will discover this software here...

-It's 'hidden' in the sales-section were many users don't look so much.

magically · April 2015

- Added to the to-do list:

Implementation of Automatic Backup to DropBox

-Will add a timer to handle the task. User can select files via GUI.

Upload will be done automatic to DropBox

(Developer note: A Token must be created to prevent reauthentication)

Examples could be: Identified List, Verified List etc...

Simpel demo of Authenthication in the console:

Note: However the Article Extractor Features must be completed first + some other tweaks and enhancements.

magically · April 2015

- Very early and raw proto-type of DropBox-Connect:

magically · April 2015

Small Test that it is possible to get a connection to DropBox

- Using proper DropBox Access Code

- Sensible information is scrambled

TO DO:

As we are now able to establish a connection to DropBox, some features needs to be implemented.

1. Upload of Zip-File

2. Browsing feature to see the files

3. Download of Zip-file

- When this is implemented, a special function will be created to handle compressing of GSA Ser Project Files.

- A Timer will handle upload of GSA Project Files to DropBox - Completely Automatic.

Please not: This is an early ProtoType - More to come...

magically · April 2015

- GUI MOCK UP of DropBox Auto Backup (Proto-Type):

The DropBox Connect Button iniatiates the Image above, and establish the connection.

I will now try to implement the mentioned functionality above.

magically · April 2015

Update DropBox Auto Backup:

- Implemented Browse and select Source

- Implemented Browse Destination (Directly browse Dropbox folders)

A Sample - Shows a connected DropBox - and a Treview with folders:

magically · April 2015

WOOOHOOO;)

So - that means the following:

- A method to pack GSA Ser Project must be made

- A timer to handle uploads must be made...

Once those are made and tested - Scraping ToolBox will be able to Auto Back Up GSA SER Project;)

magically · April 2015

- Prepared function to compress the entire GSA SER Project folder:

Getting a little bit tired right now - so taking a break before making the rest:P

However, we are close to a final working solution of automatic backups...

Strange to see so little interest, considering so many asking for such a feature

Known Issues:

-Developer Notes:

- Max 100 users (More requires Public Release via DropBox)

- File extension must be changed more (Zip/Rar/.SL) - would properly mean a new API-Key

magically · April 2015

- Added Timer (Still not 100% complete)

NB: Ignore the file that gets uploaded - it's just for demonstration.

The interesting part is that the 'Timer' is activated and doing some compression in the background...

Preparing the real file to upload;)

Notice the Message Log:

Still a few things to do before a test can be done...

magically · April 2015

- Added creation of internal storage for compression of GSA Ser Project

- Added switch to handle specific Back Up of GSA Ser Project only

- Added restrivement of GSA Ser Project Folders

Hmm.. that leaves only very few things left to handle properly:

- Adjustment of timer-selection interval

- Upload of zip file (since it's different than txt-files)

- Minor adjustments and tweaks of the GUI

In other words - almost complete already;)

Actually fucking awesome to say it at least:P

Feel free to join the adventure anytime

zinne · April 2015

Wow great job! You took it upon yourself to create this essential function. I barely had time to react because you're fast. Thanks for giving it a shot.

magically · April 2015

@zinne

Many thanks buddy;)

Here we go - just finished the actual upload function - and the sucker works:D

First I created a new DropBox - Notice the folder name - that is very important:

Next I started the Backup Feature, using a considerable smaller file-folder to fake the process.

I Select ImageBurn folder - just for testing, it could have been the GSA Ser Project Folder:

I specifically chose the Public folder I created on DropBox...

Timer is now active and the program executes every 5 minutes to test if files gets uploaded:

BINGO!!!! It's working flawless;)

Only one thing left to do:

- Adjustments of the GUI and enable selection of 'BackUp Interval'

- Make a test with a larger 'Back-Up Interval' to ensure it performes correctly

That leads to an upcomming release of Scraping Tool-Box 1.3, with AutoMatic BackUp to DropBox...

Isn't that just cool???

akwin · April 2015

please PM me paypal.. wanna buy

thanks

magically · April 2015

@akwin

Awesome buddy:)

Many thanks for your support - Hugely appreciated.

I will send you a pm as soon as 1.3 is ready for release (Unless you want 1.2 right now) - and that wont be very long.

Some small things to check and some minor adjustments, then we are there.

magically · April 2015

- Performed test with large file - this time the Real GSA SER Project Folder:

The real file:

Showing the file indeed got uploaded:

Examination of file downloaded from Dropbox:

This concludes that the new Backup Feature is working!

*Known Issues or Limitations:

- During upload - progress is not shown in progressbar.

That is because the stream needs to be wrapped (This is a guess)...

However, to avoid blocking - this has been postponed for now, as more testing is needed

Current Status: Complete

I will compile version 1.3 very soon and release it.

Expected timeframe: 1-3 days from now.

- Feel free to make a donation and support the development (receive the program and updates as a donor)

- Stay tuned - more information to come;)

akwin · April 2015

That's ok
I will get updates ri8?

Then I wanna buy now..
Thanks
PM me.

magically · April 2015

@akwin

Indeed you will get updates, that is correct buddy:)

In fact, I think I will let you be the first one to try the new v 1.3 - So expect to get a pm from me later today.

- Added 'Elapsed Time' in Automatic Backup to DropBox in upcoming V.1.3:

That will give a better visual that the program indeed is activé + The Message Log

magically · April 2015

@akwin PM sent;)

Scraping Tool-Box v.1.3 has been released!

- Current donors will recieve an update later today - Patience Please:D

Important things to notice in terms of using DropBox Auto-Backup:

1.

If you want to make a backup of GSA SER Project Folder please observe the following:

You will have to select or manual input the location like below, where username should be replaced with your name:

C:\Users\"Username"\AppData\Roaming\GSA Search Engine Ranker

2.

You will have to select two buttons, otherwise it will not work properly:

"Upload" and "GSA Project"

3.

You will need to create a folder on your DropBox named: "Public"

- Choose that folder as remote destination

Hope everyone will benefit from this new feature - and see you soon with more features to come on the Article Extractor;)

magically · April 2015

- After the release of version 1.3, the focus will be on 3 things:

1. The Article Extractor (Will get some additional features)

2. Some minor fixes and tweaks.

Number 3 is actually not a part of Scraping Tool-box itself - but something new:

An Experimental Add-On for GSA SER, a special add-on that can submit differently than GSA Ser.

Codename: Sentinel

It will be able to 'feed' GSA SER with submitted links, where GSA will take over and handle the remanining.

That means GSA SER will do the rest, and add verified links etc. like it's doing now.

As it is experimental, the platforms it can submit to, will be limited in the beginning - however if things works out great, it will be expanded over time.

Keyword Generator - Made in Java - For Scraping

Comments