As others I'm working now very often with the global site list and frequently add new url's to it. I've set my projects to use the global site list(s). But I really don't know (per project) how many unused url's are there. So it would be nice to display (on a per project base) how many URL's are there left to try to post to. This information would give a much better overview how often the "global site lists queue" has to be filled and how big the "buffer" of available sites is. Like "274523 are ready to process...".Because global site lists are really important I think the "care" of it should have some priority. When I'm dealing with this function I saw the following points where I think an improvement would be nice:
- Using all 3 types of site lists produces a huge amount of duplicates. I clean up regularely my site lists (last time after importing/identifying a new scraped list about 2 days ago). Today it removed again 670k duplicates. That's a lot! Wouldn't it be possible to auto-clean them up from time to time (in a save way)?
- When I import via identify sites it would be usefull to have to only import sites which aren't duplicates. This would give a much better idea how many really were imported.
- An option to recheck the whole sitelist "database" would be great. These files are growing a lot and I think it would make sense to have the abbility to recheck them if they are still alive.
- Because mostly only one post per domain is made it would make sense that SER saves the PR of the page too and starts with the page with the highest PR. For identifying the indentifying of the pages (to prevent google bans because of the high amount of request) it would be useful to use a format like [URL]|[PR]. This could be done with Scrapebox before importing.
Would be interesting what others think about my points. Perhaps there are better ideas?