PI Is Idle?

Muchibu · December 2016

I just got PI in efforts to better utilize GSA in tandem with Scrapebox. I had URL's processing for the past few days finding only article and social sites on particular platforms. It has been doing fine.

20 Minutes ago it stopped. The latest requests show "idle".

My Scrapebox sessions are still running and sending URLS to harvester sessions just fine.

I have PI set to monitor the harvester sessions folder and save recognized to another folder. I also set up blacklist.

I also and running a dedupe project which is pulling from the folder that the saved PI recognized URLS are being sent to.

I have all dedupe URLS going to my site_identified-list file for SER, and a cleaner project pulling from SER's identified site list.

I cannot figure out why the PI has stopped. Maybe due to continuing to pull URLS on the blacklist? If so how would I check that?

Also I am harvesting over 1million URLS every hour from SB and have checked SER, Scrapebox sessions to make sure my CPU is not being overloaded.

I am only using 30%.

GSA SER also for some reason had 190 submitted now says 160? Not sure why, I will post that in SER thread though. In options I ticked the submit-urls folder to start saving them, hope that helps.

Any help would be appreciated thank you!

s4nt0s · December 2016

From what users have told me, you need to use the Scrapebox automator plugin to set the output folder for SB, because the harvestor sessions folder causes issues with Pi.

Muchibu · December 2016

Really? Ok. I have automator. I will have to change it and see if it works. Thanks

TheGypsy · December 2016

I'm curious how to set it up properly too as I had similar issues too. I had a folder set up where the SB Automator dumped the files. From this folder a PI dedupe project moved the files to a second folder. From this second folder PI identified the URLs and copied to a third folder.
The problem was that PI stopped after some URLs without finishing the batch. If I stopped the dedupe project and started the Identifier project separately it could finish all the URLs.
Unfortunately I couldn't find a solution for it yet so I'm eager to see if there is any.

Muchibu · December 2016

I am not sure. I switched my automator to save to a separate file and it now works.

I have an SB automator file saving harvested URLS to file
PI pulls from file on 2 different PI projects

First project identifies blogs/image comments/directory/forum
Second PI project identifies my contextual platforms of choice

First project in PI saves to my success file in SER site lists
Second PI project identifies and saves to my indentified SER site list
SER then has two projects

One pulls the Blogs/Image/Directory/Forum
Two pulls contextual platforms of choice
One saves verified to failed site list
Two saves to verified to verified site list.

Got the idea from Shaun on this forum.

Haven't had the issue since.

TheGypsy · December 2016

I had a little bit different set up where sb scraped and saved the URLs into a folder into separate files. Kind of the same as it does with the default scrape folder. I guess this continuous file movement is that causes issues with PI.
I guess I need to reconfigure the things again or wait for s4nt0s to solve it...

Thanks for sharing!

PI Is Idle?

Comments