How to verify quick a huge identified list with no duplicate url and no duplicate domains ?
I am done with scraping, identifying, Removing unknowns and de duping a huge list. Here are the stats of the list that I have got at the end. Now I want to remove the dead,404 or any other error sites from them. Any quick way ?
P.S Sorry for being such a dumb here. I am new in this Thanks
P.S Sorry for being such a dumb here. I am new in this Thanks
-------------------------------
Category - Article............: 2653092
Category - Blog Comment.......: 2454838
Category - Directory..........: 46435
Category - Document Sharing...: 1437
Category - Exploit............: 14602
Category - Forum..............: 203295
Category - Guestbook..........: 113352
Category - Image Comment......: 27150
Category - Indexer............: 17
Category - Microblog..........: 32714
Category - Pingback...........: 1546006
Category - Referrer...........: 1264
Category - RSS................: 402
Category - Social Bookmark....: 57015
Category - Social Network.....: 94746
Category - Trackback..........: 262263
Category - URL Shortener......: 165374
Category - Video..............: 48539
Category - Web 2.0............: 19987
Category - Wiki...............: 89975
-------------------------------
Total.........................: 7832503
Comments
@Seljo For that addon proxies are required or not ?