PI Is Idle?
Muchibu
United States
I just got PI in efforts to better utilize GSA in tandem with Scrapebox. I had URL's processing for the past few days finding only article and social sites on particular platforms. It has been doing fine.
20 Minutes ago it stopped. The latest requests show "idle".
My Scrapebox sessions are still running and sending URLS to harvester sessions just fine.
I have PI set to monitor the harvester sessions folder and save recognized to another folder. I also set up blacklist.
I also and running a dedupe project which is pulling from the folder that the saved PI recognized URLS are being sent to.
I have all dedupe URLS going to my site_identified-list file for SER, and a cleaner project pulling from SER's identified site list.
I cannot figure out why the PI has stopped. Maybe due to continuing to pull URLS on the blacklist? If so how would I check that?
Also I am harvesting over 1million URLS every hour from SB and have checked SER, Scrapebox sessions to make sure my CPU is not being overloaded.
I am only using 30%.
GSA SER also for some reason had 190 submitted now says 160? Not sure why, I will post that in SER thread though. In options I ticked the submit-urls folder to start saving them, hope that helps.
Any help would be appreciated thank you!
20 Minutes ago it stopped. The latest requests show "idle".
My Scrapebox sessions are still running and sending URLS to harvester sessions just fine.
I have PI set to monitor the harvester sessions folder and save recognized to another folder. I also set up blacklist.
I also and running a dedupe project which is pulling from the folder that the saved PI recognized URLS are being sent to.
I have all dedupe URLS going to my site_identified-list file for SER, and a cleaner project pulling from SER's identified site list.
I cannot figure out why the PI has stopped. Maybe due to continuing to pull URLS on the blacklist? If so how would I check that?
Also I am harvesting over 1million URLS every hour from SB and have checked SER, Scrapebox sessions to make sure my CPU is not being overloaded.
I am only using 30%.
GSA SER also for some reason had 190 submitted now says 160? Not sure why, I will post that in SER thread though. In options I ticked the submit-urls folder to start saving them, hope that helps.
Any help would be appreciated thank you!
Comments
The problem was that PI stopped after some URLs without finishing the batch. If I stopped the dedupe project and started the Identifier project separately it could finish all the URLs.
Unfortunately I couldn't find a solution for it yet so I'm eager to see if there is any.
Got the idea from Shaun on this forum.
Haven't had the issue since.
I guess I need to reconfigure the things again or wait for s4nt0s to solve it...
Thanks for sharing!