No need to wait buddy - I need feedback first from version 1.2.
Please test the article extractor with at least 50-100 urls, that you yourself has located up front.
- I need to see how it goes for you guys first, before we add the remaining stuff, like copyright removal and url replacement.
Update - Version 1.2 will be released today:
Changelog:
Current donors:
Patience:D I will send a pm to you guys with the new release.
Everyone else:
Please consider to join this adventure, as the development is based purely on interest, support and donations. I don't make money on this project - actually it can not even pay for the electricity;)
Great to hear it worked fine to upgrade to new version:)
Of course I knew it would lead to issues and problems, that is why I have delayed the rest of the features like removal of copyright and url-replacement:P
Let's break it down:
1. The Article Extractor does not use any footprints - as it completely relies on what kind of target urls the user load into the program.
- The point is that the user himself need to do some research up front and do a manual search in google using various footprints, and then select good ones...
That can be done automaticly too - but is not implemented.
Also note, that this would lead to poor quality, as the program wont care if an article is 'good' or 'bad'...
2. Threads
Yep, you are right here - as the program currenty is set to use all 30 threads as default.
Of course I also knew that as well;)
It needs to count the amount of targets first:
- If 5 urls are loaded - 1 thread would be enough
- 100 urls - 10 threads would do
And so on....
Not a big issue really - and easy to implement.
3. Stop Button
Indeed - there is no stop button (yet:P)
- Also no button for loading in replacement urls
As I said - Those features will come in version 1.21;)
The important thing here was to test, if the 'Article Extractor' indeed does work in a real life.
And as far as I see - it does exactly what it is supposed to do (ignoring the features below)
To sum up:
- Balance thread use
- Stop button
- Load replacement urls
- implementation of replace urls, and remove copyright etc...
*Edit
In terms of 'Strange results' like text-fles with nonsense, it will fail on some targets (different encoding and stuff).
However I think most will work, and your test result with 525 articles out of 590 seems decent.
Wow great job! You took it upon yourself to create this essential function. I barely had time to react because you're fast. Thanks for giving it a shot.
- After the release of version 1.3, the focus will be on 3 things:
1. The Article Extractor (Will get some additional features)
2. Some minor fixes and tweaks.
Number 3 is actually not a part of Scraping Tool-box itself - but something new:
An Experimental Add-On for GSA SER, a special add-on that can submit differently than GSA Ser.
Codename: Sentinel
It will be able to 'feed' GSA SER with submitted links, where GSA will take over and handle the remanining.
That means GSA SER will do the rest, and add verified links etc. like it's doing now.
As it is experimental, the platforms it can submit to, will be limited in the beginning - however if things works out great, it will be expanded over time.
Comments
How to get your software? PM sent. Still waiting for your response.
I will get updates ri8?
Then I wanna buy now..
Thanks
PM me.