256 Kilobytes

[PHP] How to Add rel="nofollow", rel="nofollow ugc", rel="nofollow sponsored" and/or Other rel Attributes to Links

Articles in Server-Side | By August R. Garcia

Published 4 weeks agoSun, 15 Sep 2019 23:37:47 -0700 | Last update 1 week agoTue, 08 Oct 2019 07:00:49 -0700

Without writing some garbage RegEx

158 views, 1 RAM, and 1 comment

Until like five minues ago, the standard way to attempt to attempt to disincentivize link spam on forms, article comments, and so on was to add nofollow links. However, like three five minutes ago, Google announced support for two new rel values:

  • rel="sponsored": Use the sponsored attribute to identify links on your site that were created as part of advertisements, sponsorships or other compensation agreements.
  • rel="ugc": UGC stands for User Generated Content, and the ugc attribute value is recommended for links within user generated content, such as comments and forum posts.
  • rel="nofollow": Use this attribute for cases where you want to link to a page but don’t want to imply any type of endorsement, including passing along ranking credit to another page.

Source: Evolving “nofollow” – new ways to identify the nature of links, Tuesday, September 10, 2019

There are also other attributes like the XHTML Friends Network (XFN) attributes and a bunch of other values that basically no one uses, but basically no one uses those.

Commentary on Google's FAQ

If I use nofollow for ads or sponsored links, do I need to change those?
No. You can keep using nofollow as a method for flagging such links to avoid possible link scheme penalties. You don't need to change any existing markup. If you have systems that append this to new links, they can continue to do so. However, we recommend switching over to rel=”sponsored” if or when it is convenient.

If you decide to do this, use rel="nofollow sponsored" not rel="sponsored". There is literally basically no reason to not include "nofollow" as well, unless you just want to suck off Google, although Bing et. al. will probably also support these values or at least use these values as aliases for "nofollow" in the near future.

Why should I bother using any of these new attributes?
Using the new attributes allows us to better process links for analysis of the web. That can include your own content, if people who link to you make use of these attributes.

IDK LOL. These attributes even make sense from the perspective of being able to communicate things to robots. It's unclear whether adding these would actually be helpful to you personally or if it's basically a pointless waste of time that only makes things more convenient for robots/Google.

Won’t changing to a “hint” approach encourage link spam in comments and UGC content?
Many sites that allow third-parties to contribute to content already deter link spam in a variety of ways, including moderation tools that can be integrated into many blogging platforms and human review. The link attributes of “ugc” and “nofollow” will continue to be a further deterrent. In most cases, the move to a hint model won’t change the nature of how we treat such links. We’ll generally treat them as we did with nofollow before and not consider them for ranking purposes. We will still continue to carefully assess how to use links within Search, just as we always have and as we’ve had to do for situations where no attributions were provided.

This entire question is irrelevant, since nofollow links are already used as a "hint" for ranking sites.

When do these attributes and changes go into effect?
All the link attributes, sponsored, ugc and nofollow, now work today as hints for us to incorporate for ranking purposes. For crawling and indexing purposes, nofollow will become a hint as of March 1, 2020. Those depending on nofollow solely to block a page from being indexed (which was never recommended) should use one of the much more robust mechanisms listed on our Learn how to block URLs from Google help page.

This question is also irrelevant for the same reason.

rel="ugc" and rel="sponsored", TL;DR

As of the time of this post, it's difficult to predict what the impact of these new rel values will be. Realistically rel="ugc" probably won't have any real impact, since it seems like they're just going to do the same thing they're already doing, but like you can flag it if you want or whatever. Shit that could/would use rel="sponsored" instead of rel="nofollow" is probably much harder for Google to identify; that rel value will probably be more impactful than rel="ugc" will be.

This code is fairly straightforward. The main thing that is notable here is that this code does not attempt to use a FUCKING REGEX to parse HTML. For some reason, basically every other solution to this issue on the Internet seems to use a regular expression or other hack to parse HTML. Instead, this code uses the PHP DOMDocument library to parse the HTML, since that is actually a solution that works correctly.

The PHP Code

<?php
// Remove the existing "rel" attribute (if it exists) and then create a new "rel" attribute set to "nofollow"
function add_rel_nofollow($xml_doc, $node){
        $node->removeAttribute( "rel" ) ;
        $att         = $xml_doc->createAttribute("rel");
        $att->value  = "nofollow"; // Change this to "nofollow ugc" or "nofollow sponsored" if applicable
        $node->appendChild($att);
}

// Helper function
// If $string ends with $test, returns true; else returns false; 
// die( endsWith("it's nice", 'nice') ); 
function endswith($string, $test) {
    $strlen  = strlen($string);
    $testlen = strlen($test);
    if ($testlen > $strlen) return false;
    return substr_compare($string, $test, $strlen - $testlen, $testlen) === 0;
}

function is_internal_link($href) {
        if (!$href)                        return False; // If the href attribute is empty, then it is not an internal link  
        if ( strpos($href, '/') === 0 )    return True;  // Check if a relative link (and therefore on the current domain)  
        if ( strpos($href, '#') === 0 )    return True;  // Check if an anchor link (and therefore on the current domain)  


        $link_domain =  parse_url( $href );
        // Trying to access the 'host' element can throw a warning if the 'href' attribute is an invalid URL (or a relative path)    
        if (isset($link_domain['host']))   $link_domain = $link_domain['host'];
        else                               $link_domain = Null; 
        
        
        if ($link_domain === $_SERVER['HTTP_HOST'])             return True;   // If link is not the same as the current domain  
        if (endswith($link_domain, $_SERVER['HTTP_HOST']) )     return True;   // parse_url() may give subdomains (www., etcetera);  
        return False;                                                          // If none of these conditions matched, then it is not an internal link
}
// $html_str  -- The string of HTML that should be parsed to add "rel='nofollow'" to any links  
function prevent_link_spam($html_str) { 
        if ($html_str == NULL) return $html_str;

        $xml_doc = new DOMDocument(); 
        libxml_use_internal_errors(true); 
        $xml_doc->loadHTML( $html_str , LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
        
        $link = $html_str;  

        // Loop backwards. Removing/replacing nodes while looping through a PHP DOMDocument object causes issues due to obj being 
        // passed by reference; looping backwards solves this.  
        $nodeList = $xml_doc->getElementsByTagName('a');
        for($n=$nodeList->length-1;$n>=0;--$n)  {
                // Get the href='' value out of the HTML node  
                $href = $nodeList->item($n)->getAttribute("href");  

                // Ignore internal links 
                if ( !is_internal_link($href) ) {
                        add_rel_nofollow($xml_doc, $nodeList->item($n)); 
                } 
                // Add your own edge cases here if applicable (ex: if user is banned, if user is admin, etc)
        }       
                
        // Convert back to an HTML string from the sanitized output 
        return $xml_doc->saveHTML();
}                       
// Some test string of HTML 
$html = "               
<div>
<p>This is some text. Click <a href='https://www.example.com'>here</a>.</p>
<p>This is an internal link. Click <a href='https://www.256kilobytes.com'>here</a>.</p>
<p>This is a relative link. Click <a href='/prices.php'>here</a>.</p>
<p>This is an anchor link. Click <a href='#faq'>here</a>.</p>
</div>
";
$html = prevent_link_spam($html)
?>                      
                
<html>                  
<head>                  
        <title>Some Test Webpage</title>
        <style>code, div {width:50%;background-color:#eeeeeeee;display:block;margin-left:auto;margin-right:auto;border:1px solid grey;}</style> 
</head>
<body>  
        <h1>Some Webpage</h1>

        <h2>Raw HTML</h2>
        <code><?= htmlspecialchars($html) ; ?></code>

        <hr/>

        <h2>Rendered HTML</h2>
        <div><?= $html; ?></div>
</body>
</html>

Resulting HTML

<div>
   <p>This is some text. Click <a href="https://www.example.com" rel="nofollow ugc">here</a>.</p>
   <p>This is an internal link. Click <a href="https://www.256kilobytes.com">here</a>.</p>
   <p>This is a relative link. Click <a href="/prices.php">here</a>.</p>
   <p>This is an anchor link. Click <a href="#faq">here</a>.</p>
</div>
Users Who Have Downloaded More RAM:
Hash Brown (4 weeks ago)
🐏 ⨉ 1
Posted by August R. Garcia 4 weeks ago

Edit History

• [2019-09-15 23:37 PDT] August R. Garcia (4 weeks ago)
• [2019-09-15 23:37 PDT] August R. Garcia (4 weeks ago)
• [2019-09-15 23:37 PDT] August R. Garcia (4 weeks ago)
• [2019-09-15 23:37 PDT] August R. Garcia (4 weeks ago)
• [2019-09-15 23:37 PDT] August R. Garcia (4 weeks ago)
• [2019-09-15 23:37 PDT] August R. Garcia (4 weeks ago)
• [2019-09-15 23:37 PDT] August R. Garcia (4 weeks ago)
• [2019-09-15 23:37 PDT] August R. Garcia (4 weeks ago)
• [2019-09-15 23:37 PDT] August R. Garcia (4 weeks ago)
• [2019-09-15 23:37 PDT] August R. Garcia (4 weeks ago)
🕓 Posted at 15 September, 2019 23:37 PM PDT

Profile Photo - August R. Garcia August R. Garcia LARPing as a Sysadmi... Portland, OR
🗎 191 🗨 943 🐏 286
Site Owner

Grahew Mattham

August Garcia is some guy who used to sell Viagra on the Internet. He made this website to LARP as a sysadmin while posting about garbage like user-agent spoofing, spintax, the only good keyboard, virtual assitants from Pakistan, links with the rel="nofollow" attributeproxiessin, the developer console, literally every link building method, and other junk.

Available at arg@256kilobytes.com, via Twitter, or arg.256kilobytes.com. Open to business inquiries based on availability.


Account created 10 months ago.
191 posts, 943 comments, and 286 RAMs.

Last active 2 days ago:
Commented in thread Why buy gmail accounts from bulk accounts Sale?

Profile Photo - August R. Garcia August R. Garcia LARPing as a Sysadmi... Portland, OR
🗎 191 🗨 943 🐏 286
Site Owner

Ahrefs now shows/lists data related to the rel="ugc" and rel="sponosored" attribute values. See newly added cover image to this post.

Download more RAM. 🐏 ⨉ 0 Posted by August R. Garcia 1 week ago 🕓 Posted at 08 October, 2019 07:01 AM PDT

Sir, I can do you a nice SEO.

Post a New Comment

To leave a comment, login to your account or create an account.

Do you like having a good time?

Read Quality Articles

Read some quality articles. If you can manage to not get banned for like five minutes, you can even post your own articles.

View Articles →

Argue with People on the Internet

Use your account to explain why people are wrong on the Internet forum.

View Forum →

Vandalize the Wiki

Or don't. I'm not your dad.

View Wiki →

Ask and/or Answer Questions

If someone asks a terrible question, post a LMGTFY link.

View Answers →

Make Some Money

Hire freelancers and/or advertise your goods and/or services. Hire people directly. We're not a middleman or your dad. Manage your own business transactions.

Register an Account
You can also login to an existing account or recover your password. All use of this site is subject to terms outlined in the terms of service and privacy policy.