DO NOT use a higher metrics from a www. and non-www. version of URL

0
Hi @Sven
I am using the built-in "TF+unique domain" script in the PR emulator, and I got a lot of unwanted domains because the www. version of the domain meet the criteria while the root domain doesn't. For instance:
Apparently the metrics of root domain matters to me, because the URL is on the root domain, even I use the "PR of subdomain" in the SER, the PR emulator should return the PR of the ROOT domain, it will help me to filter spammy sites.
Please fix this, otherwise, a lot of spammy sites are NOT able to be filtered...

BTW, when will it be able to use the custom script in the PR emulator? I really look forward to this feature.
Thanks in advance.


Comments

  • 0
    SvenSven www.GSA-Online.de
    Sorry but I will not add this as I think that a www-version with better factors should always be used over a none www-version. It might be different in your situation here but I don't want to break this bahaviour as it seems important for most of us.
  • 0
    No, please, Sven, you are NOT right on this, in the example above, it's a obvious over spammy site, TF/CF of root domain equal to 5/17, and referring domains 655, but on its www. subdomain, it's TF/CF is 21/27, and referring domains is 7, seems like a pretty clean site, which IS not.
    There are a lot of similar websites over there...
    I need to filter it, otherwise, it's a waste of resources to build such links which can't even been indexed.
    Please reconsider this
  • 0
    SvenSven www.GSA-Online.de
    well I can maybe do the following: if the links placed to www. are higher than the once for none www, I will take that www. version.
  • 0
    I think you misunderstand what I try to implement, I try to FILTER the spammy site, the TF/CF metrics is just HALF of factors to determine if the target site is "clean" site, the "referring domain" matter more, because larger the number is, more likely the site is over spammy.
    Let's say if the metrics of root domain shows over spammy, it doesn't matter whether the www. version of subdomain shows clean, it should be filtered anyway.
    Ideally PR emulator should support custom scripts just like PRJacker does.
  • 0
    SvenSven www.GSA-Online.de
    well I do get you! When I take the www or none www version of the one that has build more URLs, then I would take that for the rest of the values as well. That should fix your issues all away.
  • 0
    It doesn't resolve the issues fully, but is close to what I need, I just don't understand why you force to get the metrics of www. version subdomain while the URL is with non-www version, it doesn't make any sense, image what if a site has NO www version subdomain...
     
  • 0
    SvenSven www.GSA-Online.de
    well ppl not always know what they do as you might. They import some URL or just a domain without www. and wonder why the metric is bad. This would resolve things for you and others. Next version however behaves like that now.
  • 0
    Thanks for that.
    When can I expect the custom script being supported in PR simulator?

  • 0
    SvenSven www.GSA-Online.de
    Accepted Answer
    well not in next version ;)
  • 0
    Waiting for the new version, I am suffer from a ton of spammy sites now...
  • 0
    "well I can maybe do the following: if the links placed to www. are higher than the once for none www, I will take that www. version."

    this is a welcome change
  • 0
    SvenSven www.GSA-Online.de
    its already included in latest update
  • 0
    Always had an issue with this, as 90% of sites don't redirect their non-www to their www version, and vice-versa. That means when scraping millions of sites you shouldn't de-dupe as these 'two' versions may have higher metrics than its counterparts, and yet even de-duping can lose your 1000s of targets.

    This is really cool update Sven, thanks for sorting it out.
  • 0
    @Sven, The issue still persists in the latest version v1.15, for instance:
    ---------
    Google-PR Emulator for http://www.acc.ac.th/ => EmuPR-0 (TrustFlow=13; CitationFlow=39; RefDomains=1815)
    ----------
    The TF/CF returned in PR Emulator is the root domain "acc.ac.th" while the URL is with the www. version, the correct metrics of TF/CF for the "www.acc.ac.th" is 33/56, and the PR is supposed to be "3".

    Please resolve this.
  • 0
    SvenSven www.GSA-Online.de
    well how many refDomains are for www. vs. none www. version?
  • 0
    Refdomains for www. version: 1778
    Refdomains for non-www version: 1815
  • 0
    SvenSven www.GSA-Online.de
    so the non-www version is used as it has more urls.
  • 0
    @Sven, the logic is buggy, that will miss a lot of decent site like the sample above by using current script.
    I beg you to correct this just for the script "TF+unique domain", and remain the same logic for other scripts, so that this won't effect other users.
  • 0
    SvenSven www.GSA-Online.de
    Common, your logic is not correct then. Once you want the version with less build links, then again you want the version with higher build links.

    I would always take that version where more links are build to. Because tjhat seems to be the one that G. also sees and uses.
  • 0
    Google deems domains with www and non-www as two different domains, in the other word, a URL on the domain with www like: "http://www.acc.ac.th/xxxx" is NOT the same as "http://acc.ac.th/xxxx" assuming it does not set 301 directs. and it doesn't make any sense to retrieve the PR of "acc.ac.th" while the URL is "http://www.acc.ac.th/xxxx".
  • 0
    SvenSven www.GSA-Online.de
    so what is this discussion all about then? You don'T want any of the stuff I suggest after I added them that you previously agread on. Sorry, it is as it is now and I will not change. It is the best solution for me and most others it seams.
  • 0
    @sashilover why not send a bulk email out to the webmasters explaining to them to redirect the non www to the www domain - export your list of targets, send them bulk email with Scrapebox and the contact form option, and see if they change them, then you won't have this problem. @Sven, you're right I'd take the highest referring domains as the target - even if the lower number had better domains, in most cases more is better.
Sign In or Register to comment.