It looks like you're new here. If you want to get involved, click one of these buttons!
<?php
$files = glob('D:\GSA\List\GlobalList\*.txt');
$comparepath = "D:\GSA\List\IndonesiaList";
//$comparefile = "D:\htdocs\serpresults\betresultsEN.txt";
foreach ($files as $key => $value)
{
clearstatcache();
$i = 0;
$file_path = pathinfo($value);
$comparefile = $comparepath."/".$file_path['basename'];
if (file_exists($comparefile))
{
echo "Removing duplicate from ".$value." and ".$comparefile." ";
$contenta = file($value);
$contentb = file($comparefile);
if (!empty($contenta) && !empty($contentb))
{
foreach ($contenta as $keyc => $cvalue)
{
$grhost = parse_url($cvalue, PHP_URL_HOST);
if (!empty($grhost))
{
if (empty(preg_grep("/\b$grhost\b/i", $contentb)))
{
//$result = $grscheme."://".$grhost;
$cleanurl[] = $cvalue.PHP_EOL;
}
else
{
$i++;
}
}
}
//$diff = array_diff($contentb, $contenta);
$cleanfile = fopen($value, 'r+');
$cleanarr = implode($cleanurl);
ftruncate($cleanfile, 0);
rewind($cleanfile);
fwrite($cleanfile, $cleanarr);
fclose($cleanfile);
unset($contenta);
unset($contentb);
unset($cleanarr);
}
}
echo "Found ".$i." duplicate on ".$value."\n";
unset($i);
}
?>
Comments
the code i wrote will comparing the content from each file and removing it from the first list if url found on second list and since I'm using 3 list on my GSA, with this code i'm really sure that there is no same domain for each list
for example domain abc.com will only found on list A but can't be find on List B or List C