Skip to content

Resubmissions of Submitted URLs Process

edited February 2013 in Need Help
We all have those days when we leave SER running for quite some time, only to find out our verification rates are extremely low (high submissions/low verifications). The usual suspects are proxy problems, high thread counts, captcha services not working/low on credits, cpu/memory is overloaded, etc. What were left with is a massive amount of submitted/scraped urls that would have, under ideal circumstances, been successfully verified. So my question is, what process do you go about resubmitting these urls, once you've corrected all the "kinks in your chain"?

My Resubmission Process is:

1. Project > Right Click > Show URLs > Submitted > Export

2. Project > Right Click > Show URLs > Verified > Export

3. Trim to Root both Submitted and Verified Lists and then Subtract Verified List from Submitted List(can be done within SB)

4. Project > Right Click > Modify Project > Duplicate

5. Right click on the Duplicated Project and click Import Target URLs

6. Click Start!

If anyone else has a more ideal method.....please let me know. :-)

Comments

  • AlexRAlexR Cape Town
    I have been reviewing this exact same thing this morning! Hence the post about .sl sitelists!

    I've also asked about this in past, but don't think it's a feature until the lists aspect gets improved to per project. This way you could re-import the failed or identified list on a per project basis. Issue solved. 
  • Actually.....what your requesting on that thread is not the same. Your trying to import and export site lists according to the Verified/Submitted/Identified folders. What I am explaining above is the ability to run through site lists on a per project basis.

    Note: I just edited step #3. This, so far, is the only way I know of, to resubmit URLs on a per project basis, and have them resubmitted almost immediately. If your importing lists via Tools>Advanced, those lists are spread out through all projects and URLs are chose at random, at random intervals.

  • AlexRAlexR Cape Town
    @grafx77 - yes, I know it's not the same. I'd like a per project but wouldn't mind it starting with a global option. It seems a per project basis might be a while off, if at all. Not sure if this is even planned!
  • @Global - but you can already do this on a global basis. I am only speaking of per project.
  • AlexRAlexR Cape Town
    You can't do it on a global basis yet. At least efficiently that is. You'd have to take all your .txt files in your lists, merge them. Pause SER, then re-import. Would take days to sort them when your lists are big. Been running SER for ages so the sitelists are massive. Just deduping them crashes my setup! 
  • OzzOzz
    edited February 2013
    sorry, if i misunderstood something out of the context, but what do you mean with crashing? "not responding" isn't crashing for example. deduping is an CPU intense process so its normal if your system doesn't respond due to this.

    just stop all your projects, dedupe your urls and come back after half an hour or so. also you can dedupe all site lists with the exception of "blog comments" first and do the "blog comments" later.
  • AlexRAlexR Cape Town
    @Ozz- yes, in task manager SER becomes "Not Responding"

    General blog file sitelist is 350mb so far, so that might be causing trouble. I always switch off projects when I dedupe. 

    Just think we should also have option for it to run in background when thread count is down and there are spare resources. 
  • OzzOzz
    edited February 2013
    i'm not 100% sure about this, but i don't think its a good idea to start this in the background because of the heavy CPU usage. all task that SER wants to do right now will slow down because of this.

    i have not problem at all with a scheduler for some daily tasks like depuping, link-check etc. when SER does most of this things one after another and combine tasks if it makes sense.
  • AlexRAlexR Cape Town
    I've just been thinking that the longer SER runs for and the bigger these lists get, the more important it is to manage them. 

    In managing them I mean:
    1) Dedupe
    2) Remove broken links. (Maybe there is a quick check to see if header request works rather than scraping whole page again. Read it somewhere as an idea here on the forum)
    3) Remove entries where platform has changed. (This must happen as many entries in site lists are over 1 year old, and platforms change)

    Not sure best process to do the above. 
Sign In or Register to comment.