PHP Codes to removing duplicate url from one list compared to other list
boiler
Indonesia
First i was trying to removing same urls from 2 different urls list and can't found it so I'm deciding to make it using php to removing urls from List 1 if found on List 2 on the same filename.
With this codes you can make sure that there is no duplicated urls on the other urls list if you need to run multiple urls list on your GSA SER
To make this codes work, you need to identifying first the raw urls using GSA Platform Identified or GSA SER and then putting them to 2 different folder then setup the path
With this codes you can make sure that there is no duplicated urls on the other urls list if you need to run multiple urls list on your GSA SER
To make this codes work, you need to identifying first the raw urls using GSA Platform Identified or GSA SER and then putting them to 2 different folder then setup the path
Feel free to modifying or improving this codes
<div><br></div><?php<br>$files = glob('D:\GSA\List\GlobalList\*.txt');<br>$comparepath = "D:\GSA\List\IndonesiaList";<br>//$comparefile = "D:\htdocs\serpresults\betresultsEN.txt";<br>foreach ($files as $key => $value)<br>{<br> clearstatcache();<br> $i = 0;<br> $file_path = pathinfo($value);<br> $comparefile = $comparepath."/".$file_path['basename'];<br> if (file_exists($comparefile))<br> {<br> echo "Removing duplicate from ".$value." and ".$comparefile." ";<br> $contenta = file($value);<br> $contentb = file($comparefile);<br> if (!empty($contenta) && !empty($contentb))<br> {<br> foreach ($contenta as $keyc => $cvalue)<br> {<br> $grhost = parse_url($cvalue, PHP_URL_HOST);<br> if (!empty($grhost))<br> {<br> if (empty(preg_grep("/\b$grhost\b/i", $contentb)))<br> {<br> //$result = $grscheme."://".$grhost;<br> $cleanurl[] = $cvalue.PHP_EOL;<br> }<br> else<br> {<br> $i++;<br> }<br> }<br> }<br> //$diff = array_diff($contentb, $contenta);<br> $cleanfile = fopen($value, 'r+');<br> $cleanarr = implode($cleanurl);<br> ftruncate($cleanfile, 0);<br> rewind($cleanfile);<br> fwrite($cleanfile, $cleanarr);<br> fclose($cleanfile);<br> unset($contenta);<br> unset($contentb);<br> unset($cleanarr);<br> }<br> }<br> echo "Found ".$i." duplicate on ".$value."\n";<br> unset($i);<br>}<br>?>
Comments
the code i wrote will comparing the content from each file and removing it from the first list if url found on second list and since I'm using 3 list on my GSA, with this code i'm really sure that there is no same domain for each list
for example domain abc.com will only found on list A but can't be find on List B or List C