Skip to content

PI Is Idle?

I just got PI in efforts to better utilize GSA in tandem with Scrapebox. I had URL's processing for the past few days finding only article and social sites on particular platforms. It has been doing fine.

20 Minutes ago it stopped. The latest requests show "idle".

My Scrapebox sessions are still running and sending URLS to harvester sessions just fine.

I have PI set to monitor the harvester sessions folder and save recognized to another folder. I also set up blacklist.

I also and running a dedupe project which is pulling from the folder that the saved PI recognized URLS are being sent to.

I have all dedupe URLS going to my site_identified-list file for SER, and a cleaner project pulling from SER's identified site list.

I cannot figure out why the PI has stopped. Maybe due to continuing to pull URLS on the blacklist? If so how would I check that?

Also I am harvesting over 1million URLS every hour from SB and have checked SER, Scrapebox sessions to make sure my CPU is not being overloaded.

I am only using 30%.

GSA SER also for some reason had 190 submitted now says 160? Not sure why, I will post that in SER thread though. In options I ticked the submit-urls folder to start saving them, hope that helps.

Any help would be appreciated thank you!

Comments

  • s4nt0ss4nt0s Houston, Texas
    From what users have told me, you need to use the Scrapebox automator plugin to set the output folder for SB, because the harvestor sessions folder causes issues with Pi. :(
  • MuchibuMuchibu United States
    Really? Ok. I have automator. I will have to change it and see if it works. Thanks
  • I'm curious how to set it up properly too as I had similar issues too. I had a folder set up where the SB Automator dumped the files. From this folder a PI dedupe project moved the files to a second folder. From this second folder PI identified the URLs and copied to a third folder.
    The problem was that PI stopped after some URLs without finishing the batch. If I stopped the dedupe project and started the Identifier project separately it could finish all the URLs.
    Unfortunately I couldn't find a solution for it yet so I'm eager to see if there is any.
  • MuchibuMuchibu United States
    I am not sure. I switched my automator to save to a separate file and it now works.

    1. I have an SB automator file saving harvested URLS to file
    2. PI pulls from file on 2 different PI projects
    • First project identifies blogs/image comments/directory/forum
    • Second PI project identifies my contextual platforms of choice
    1. First project in PI saves to my success file in SER site lists
    2. Second PI project identifies and saves to my indentified SER site list
    3. SER then has two projects
    • One pulls the Blogs/Image/Directory/Forum
    • Two pulls contextual platforms of choice
    • One saves verified to failed site list
    • Two saves to verified to verified site list.

    Got the idea from Shaun on this forum.

    Haven't had the issue since.
  • I had a little bit different set up where sb scraped and saved the URLs into a folder into separate files. Kind of the same as it does with the default scrape folder. I guess this continuous file movement is that causes issues with PI.
    I guess I need to reconfigure the things again or wait for s4nt0s to solve it... ;)
    Thanks for sharing!
Sign In or Register to comment.