Skip to content

Platform Identifier

2»

Comments

  • s4nt0ss4nt0s Houston, Texas
    @blackseocn - Ok, we'll look into that. Thanks for letting me know.
  • edited December 2014
    s4nt0s  @Sven

    For my comment above about the bug, It's not pause/restart problem.  I just found out the identifier won't identify correctly if you decease the threads when it is running.   Maybe you can check it right now. 
  • s4nt0ss4nt0s Houston, Texas
    @blackseocn - Ok, we're looking into it. Thanks
  • radrad
    edited December 2014
    Looking to purchase this software. hows it working out? bugs etc?  how quick is it? how long to go through
    a 1 mill url list?

    thanks,


    Has this been implemented? pulled this from another post below. 

    Things changing:
    SER can sort the same URL to match multiple engines, while PI wasn't set to do this. It's taking a URL and after detection is done, it no longer checks for other engines.

    So tomorrow we will work on a "full detection" and "single detection" feature so that user can pick how they want this to work. Full detection will work same as SER and sort the URL into multiple engines.

  • @rad,
    Current version allows full detection mode.




    @S4nt0s,
    I love the monitor folders option - now I can completely automate scraping - without having to add linklist files to SER. Everything is now on auto-pilot.

    It would be great to have the option for the PI to automatically removed Duplicate Domains and Duplicate URLs while monitoring folders. At the moment I have this option enabled through Gscraper but this slows down scraping speed considerably and would rather use PI to remove duplicates.

    Also a question about monitoring folders - if a project is stopped (eg while updating version of PI), will PI start processing the folder from the beginning again, or will it remember where it was up to with the processing? Also, if I add new files in a folder which is being monitored, I don't think PI is picking up newly added files.

    This is a really awesome tool for anyone who is scraping their own links.

  • s4nt0ss4nt0s Houston, Texas
    edited December 2014
    @rad - Yep, that was implemented a few updates ago. It's running fine :)

    How long it takes to go through a 1 mill URL list depends on how many threads you're running and bandwidth.

    -------

    @slimdusty72 - Ya the remove duplicates has been requested before and something we'll probably add in the future. 

    Projects should continue where they left off if they aren't "stopped" by using the stop button. Otherwise, they are saved and restored to pick up where they left off. 

    The monitor folders should detect when new files are added to the folder, it should start using the new file when the old one is finished. I will look into it to today. Thanks


  • radrad
    edited December 2014
    @s4nt0s

    thanks for the info, going to purchase. im running GSCRAPER with there proxies, 
    im scraping roughly 25k to 45k urls a minute. Running for days. my list are in the billions,
    tens of billions. some list are 1 to 5 gigs

    Sorting and importing in GSA is killing me. I can run 1000 threads no problem, 1GB internet
    connection on my VPS.         Hopefully, it is alot faster at sorting. 
  • s4nt0ss4nt0s Houston, Texas
    @rad - Wow, that's a lot of scraping lol. You can test out the short free trial to see if you like the speed. 
  • radrad
    edited December 2014
    @s4nt0s

    Yeaaaaaa, im loving the GSCRAPER with there Proxies! BEAST!



    thanks for the info 
  • @s4nt0s

    Was just playing around with the trial version, i can only run around 10 threads
    on my vps, on one project running. The CPU resources stays around 100 percent anything
    over that on 10 threads. is that normal? stats 6 core 6 gig 1gb internet 60gig hd
  • s4nt0ss4nt0s Houston, Texas
    @rad - Did you use any type of filters or just a normal project? Can you enable the bandwidth limit and see if that helps? I keep mine set to around 10,000.
  • edited December 2014
    @s4nt0s

    Can you add an option like in GSA ser "Option" to remove the duplicate url and domains.  I know you have this feature in tools, but I think they are different since if I drag all file in identification folder, then PI will compare all urls in this folder rather other each files.  
  • s4nt0ss4nt0s Houston, Texas
    @blackseocn - Yes, I'll put that on the to-do list. :)
  • Is
    Full detection mode (settings) the same as
    Deep Matching (Edit porject) ?
  • s4nt0ss4nt0s Houston, Texas
    No, they are different. The one in the new project window can help increase detection rate but is more CPU intensive. The other one allows a single URL to be sorted into multiple platforms if it matches more than one.

    A video: 
  • sickseosickseo London,UK
    Is anyone facing issues with the monitor folder option?

    I'm using version 1.18. I've got 6 projects running, all of which are monitoring folders. 2 of the projects keep getting stuck after only processing a few urls. It's been running for a few hours and it's just stuck. I've stopped and started the project, as well as deleting them and creating them again. I've restarted the software also.

    If I set up a process files project, it runs through the urls till the end...a few hundred thousand of them. But when I set up a monitor folders project for the same scraped list, it gets stuck again!

    4 out of the 6 projects are working just fine, so I don't understand why 2 of them shouldn't work.
  • s4nt0ss4nt0s Houston, Texas
    @sickseo - I'm going to PM you a new version to try. 
Sign In or Register to comment.