I really like the idea of a global site list (3 big "pots" of urls which are used for scraped url's and to share between the different projects). What I think is suboptimal is the use of the available ressources. Wouldn't it be possible if there are not processed URL's to use the not used threads to get them done?
I know I work at the upper borders, but for my examples 800 threads work nice (takes about 80 mbit/second) if I force SER to do it (like identifying sites and so on). But with normal operations there aren't very often more than 10 threads used (and only less bandwith). In my eyes is this a waste of ressources. Why not using free threads to handle the global list beside the "normal" searching/posting.
I don't see any disadvantages with this strategy and a HUGE performance boost for people which really use their global site list and feed SER with a lot nice urls