Platform Identifier

s4nt0s · December 2014

@blackseocn - Ok, we'll look into that. Thanks for letting me know.

blackseocn · December 2014

s4nt0s @Sven

For my comment above about the bug, It's not pause/restart problem. I just found out the identifier won't identify correctly if you decease the threads when it is running. Maybe you can check it right now.

s4nt0s · December 2014

@blackseocn - Ok, we're looking into it. Thanks

rad · December 2014

Looking to purchase this software. hows it working out? bugs etc? how quick is it? how long to go through

a 1 mill url list?

thanks,

@s4nt0s

Has this been implemented? pulled this from another post below.

Things changing:

SER can sort the same URL to match multiple engines, while PI wasn't set to do this. It's taking a URL and after detection is done, it no longer checks for other engines.

So tomorrow we will work on a "full detection" and "single detection" feature so that user can pick how they want this to work. Full detection will work same as SER and sort the URL into multiple engines.

slimdusty72 · December 2014

@rad,
Current version allows full detection mode.

@S4nt0s,
I love the monitor folders option - now I can completely automate scraping - without having to add linklist files to SER. Everything is now on auto-pilot.

It would be great to have the option for the PI to automatically removed Duplicate Domains and Duplicate URLs while monitoring folders. At the moment I have this option enabled through Gscraper but this slows down scraping speed considerably and would rather use PI to remove duplicates.

Also a question about monitoring folders - if a project is stopped (eg while updating version of PI), will PI start processing the folder from the beginning again, or will it remember where it was up to with the processing? Also, if I add new files in a folder which is being monitored, I don't think PI is picking up newly added files.

This is a really awesome tool for anyone who is scraping their own links.

s4nt0s · December 2014

@rad - Yep, that was implemented a few updates ago. It's running fine

How long it takes to go through a 1 mill URL list depends on how many threads you're running and bandwidth.

-------

@slimdusty72 - Ya the remove duplicates has been requested before and something we'll probably add in the future.

Projects should continue where they left off if they aren't "stopped" by using the stop button. Otherwise, they are saved and restored to pick up where they left off.

The monitor folders should detect when new files are added to the folder, it should start using the new file when the old one is finished. I will look into it to today. Thanks

rad · December 2014

@s4nt0s

thanks for the info, going to purchase. im running GSCRAPER with there proxies,

im scraping roughly 25k to 45k urls a minute. Running for days. my list are in the billions,

tens of billions. some list are 1 to 5 gigs

Sorting and importing in GSA is killing me. I can run 1000 threads no problem, 1GB internet

connection on my VPS. Hopefully, it is alot faster at sorting.

s4nt0s · December 2014

@rad - Wow, that's a lot of scraping lol. You can test out the short free trial to see if you like the speed.

rad · December 2014

@s4nt0s

Yeaaaaaa, im loving the GSCRAPER with there Proxies! BEAST!

@slimdusty72

thanks for the info

rad · December 2014

@s4nt0s

Was just playing around with the trial version, i can only run around 10 threads

on my vps, on one project running. The CPU resources stays around 100 percent anything

over that on 10 threads. is that normal? stats 6 core 6 gig 1gb internet 60gig hd

s4nt0s · December 2014

@rad - Did you use any type of filters or just a normal project? Can you enable the bandwidth limit and see if that helps? I keep mine set to around 10,000.

blackseocn · December 2014

@s4nt0s

Can you add an option like in GSA ser "Option" to remove the duplicate url and domains. I know you have this feature in tools, but I think they are different since if I drag all file in identification folder, then PI will compare all urls in this folder rather other each files.

s4nt0s · December 2014

@blackseocn - Yes, I'll put that on the to-do list.

vuli · December 2014

Is
Full detection mode (settings) the same as
Deep Matching (Edit porject) ?

s4nt0s · December 2014

No, they are different. The one in the new project window can help increase detection rate but is more CPU intensive. The other one allows a single URL to be sorted into multiple platforms if it matches more than one.

A video:

sickseo · March 2015

Is anyone facing issues with the monitor folder option?

I'm using version 1.18. I've got 6 projects running, all of which are monitoring folders. 2 of the projects keep getting stuck after only processing a few urls. It's been running for a few hours and it's just stuck. I've stopped and started the project, as well as deleting them and creating them again. I've restarted the software also.

If I set up a process files project, it runs through the urls till the end...a few hundred thousand of them. But when I set up a monitor folders project for the same scraped list, it gets stuck again!

4 out of the 6 projects are working just fine, so I don't understand why 2 of them shouldn't work.

s4nt0s · March 2015

@sickseo - I'm going to PM you a new version to try.

Platform Identifier

Comments