Skip to content

Additional filter for indexing services

I noticed yesterday that despite noindex tags being stored with verified link information ('flag 0') these urls will still get sent to indexing services. Sending a noindex page for indexing is a waste of resources, so could we get an additional checkbox in indexing options to choose not to send them?

Tagged:

Comments

  • SvenSven www.GSA-Online.de
    hmm but "doFollow" has nothing to do with "NoIndex". That are two different things. Some customers still might want a "noIndex" link to be sent to indexing services.
    However, I get your point and will add an option to not send links who are clearly set to "noIndex".
  • Yeah I neglected to actually put another checkbox in that image for noindex like I thought I had  :D
    An option either way would be great
  • googlealchemistgooglealchemist Anywhere I want
    edited May 2022
    Sven said:
    hmm but "doFollow" has nothing to do with "NoIndex". That are two different things. Some customers still might want a "noIndex" link to be sent to indexing services.
    However, I get your point and will add an option to not send links who are clearly set to "noIndex".
    id also love that option so it doesnt send noindex links to indexer services...similar to the existing function of only sending t2 links to an indexable link
  • SvenSven www.GSA-Online.de
    in next update you have this option added.
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    in next update you have this option added.
    awesome thanks...i see it set as default on now which makes sense.

    in the 'days to index' section of the main program global options...is that the same function as the project specific option of 'send to index/ping delayed by'? and if so, will the project specific options always override the global option?

  • SvenSven www.GSA-Online.de
    no, this is an option from the indexing service. Some allow you to define when they should start indexing.
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    no, this is an option from the indexing service. Some allow you to define when they should start indexing.
    the drip feed function is that what you mean?
  • SvenSven www.GSA-Online.de
    yes
    Thanked by 1googlealchemist
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    hmm but "doFollow" has nothing to do with "NoIndex". That are two different things. Some customers still might want a "noIndex" link to be sent to indexing services.
    However, I get your point and will add an option to not send links who are clearly set to "noIndex".
    Hey I dont remember if it was another thread or in a dm but I had asked if the software could detect a noindex tag before posting to a site to avoid wasting resources...and you said that wasnt practical which I understand when I think about it.

    But could we somehow integrate the noindex detection being used for existing urls before sending them to indexers or adding them to a tiered project.....could we use that to send those particular websites to a blacklist to avoid posting to those sites again in the future?
  • SvenSven www.GSA-Online.de
    The noindex via meta is easily detectable before posting. However, most noindex is happening via robot.txt and you really don't want SER to download that one each time it tries to post. This is wasting so much traffic.
  • edited May 2022
    Browser automation studio could do the robots.txt checking and list filtering if you can’t write your own scripts. BAS is free, and this would be trivial to accomplish.
  • SvenSven www.GSA-Online.de
    @the_other_dude don't get me wrong, parsing the robots.txt file is easy. I can do that as well, but the traffic and time waste that would be generated is huge as you would need to do this for every URL.
  • edited May 2022
    Right, I agree. I was just posting a free solution for the few that will inevitably come across this and believe they need to be checking robots.txt :)
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    The noindex via meta is easily detectable before posting. However, most noindex is happening via robot.txt and you really don't want SER to download that one each time it tries to post. This is wasting so much traffic.
    the new option in the indexing settings 'skip noindex urls'....if its detecting that for a particular created profile page or article page...and not sending that link to indexers if it has the noindex tag...cant that also be added at the same time to an internal black list or skip list so the software never builds another link like that again to avoid wasting resources?
  • SvenSven www.GSA-Online.de
    This can only be detected after submission/verification unless you define it in the engine script.
    Thanked by 1the_other_dude
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    This can only be detected after submission/verification unless you define it in the engine script.
    yeah, thats what im saying too....when it detects it at that stage...add that domain to an internal blacklist so the software doesnt use it again to signup/post to
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    This can only be detected after submission/verification unless you define it in the engine script.
    yeah, thats what im saying too....when it detects it at that stage...add that domain to an internal blacklist so the software doesnt use it again to signup/post to
    did this make sense how it put it and if so, is it possible to implement?
  • Further to this, I'm noticing that pages that have content="noindex, follow" in their meta robots tags are being marked as indexable, whereas they technically aren't. Could these be filtered too when sending to indexing services and reporting?
  • SvenSven www.GSA-Online.de
    @cherub SER is checking if a page uses <meta name=robots content=noindex> and also checks if a <a ... rel="noindex"> is used.
    You probably mean this is not checked for GSA SEO Indexer sites?
  • I mean that a page with <meta name="robots" content="noindex, follow" /> is being marked with an index flag of 1, which means it's being forwarded to indexers and being reported as indexable, whereas it isn't indexable.
  • SvenSven www.GSA-Online.de
    can you send me a sample of that in private message? SER should have detected that and it would be a bug then.
  • Sure, sending in a minute
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    The noindex via meta is easily detectable before posting. However, most noindex is happening via robot.txt and you really don't want SER to download that one each time it tries to post. This is wasting so much traffic.
    Right, I agree. I was just posting a free solution for the few that will inevitably come across this and believe they need to be checking robots.txt :)
    I've only ever checked the source code of the specific url with my content/link on it...never thought about sitewide robots.txt files really till recently and especially after reading this.

    So my article or profile url with my content/links on it could be set to index/follow, but it could still be unindexable by what the main domains robots.txt file says for those particular types of pages sitewide or am i still misunderstanding?
  • googlealchemistgooglealchemist Anywhere I want
    edited April 2023
    cherub said:
    Further to this, I'm noticing that pages that have content="noindex, follow" in their meta robots tags are being marked as indexable, whereas they technically aren't. Could these be filtered too when sending to indexing services and reporting?
    im wondering if these would still be worth sending to a basic cheapo crawling service just to make sure g sees them once since they are at least set to follow. vs wasting time/money on more expensive full blown indexer services.

    this ties into the other thread about wanting the function to allocate different indexing services on a per project/link type vs just global levels
Sign In or Register to comment.