ALGORITHMVILLE—Starting this week Google will begin using a new metric (or signal) in its search algorithm that takes into account "the number of valid [DMCA] copyright removal notices we receive for any given site." The news sounded promising at first—after all, who but pirates and freeloaders does not want to see sites with terabytes of stolen content penalized in their rankings—but the announcement was also vague enough to leave many observers unclear about the extent to which sites would be impacted, and whether all sites would be treated fairly.
The Electronic Frontier Foundation, for one, has expressed concern about DMCA notices that mistakenly (or not) target legal content. "In particular," EFF wrote on its blog last week, " we worry about the false positives problem. For example, we’ve seen the government wrongly target sites that actually have a right to post the allegedly infringing material in question or otherwise legally display content. In short, without details on how Google’s process works, we have no reason to believe they won’t make similar, over-inclusive mistakes, dropping lawful, relevant speech lower in its search results without recourse for the speakers [emphasis in original]." AVN has written previously about companies in the adult space that have been accused of filing a significant percentage of false DMCA notices.
Public Knowledge's John Bergmayer, a staff attorney, expressed similar concern, writing, "Because [DMCA] notices to search engines might not be challenged, entities with questionable copyright claims might be more willing to send such notices. And because being highly ranked on Google can be so important, there's a strong incentive for entities to send DMCA notices to search engines to suppress their rivals. There's a danger that, in good faith, Google is setting up a process that can be abused [emphasis in original]."
The MPAA, of course, is pleased with the move by Google, but is still taking a wait-and-see attitude. According to Michael O’Leary, senior executive vice president for global policy and external affairs, “We are optimistic that Google’s actions will help steer consumers to the myriad legitimate ways for them to access movies and TV shows online, and away from the rogue cyberlockers, peer-to-peer sites, and other outlaw enterprises that steal the hard work of creators across the globe. We will be watching this development closely—the devil is always in the details—and look forward to Google taking further steps to ensure that its services favor legitimate businesses and creators, not thieves.”
In the few days since the announcement, in fact, a few more details about the new signal have been revealed. Search Engine Land—which posted a rather comprehensive article on Friday about the change it has dubbed the “'Emanuel Update' in honor of Hollywood mogul Ari Emanuel, who helped prompt it," and then another one later in the day about how YouTube will escape the penalty—spoke with Google again yesterday.
In the latter article, Search Engine Land's Danny Sullivan quoted Google as having told him, "We’re treating YouTube like any other site in search rankings. That said, we don’t expect this change to demote results for popular user-generated content sites." Sullivan responded with incredulity to that comment, writing, "I just don’t see that. There’s no way to treat YouTube—or Blogger—like any other site in the search rankings, when those sites have special takedown forms that don’t allow their alleged infringing activity to measured up against other sites."
Sunday, in his follow-up article titled Google: Many Popular Sites Will Escape Pirate Penalty, Not Just YouTube, Sullivan reported on further communication he had with Google: "Google told me today that the new penalty will look beyond just the number of notices. It will also take into account other factors, specifics that Google won’t reveal, but with the end result that YouTube—as well as other popular sites beyond YouTube—aren’t expected to be hit.
"What other sites?" he continued. "Examples Google gave me include Facebook, IMDB, Tumblr and Twitter. But it’s not that there’s some type of 'whitelist' of sites. Rather, Google says the algorithm automatically assesses various factors or signals to decide if a site with a high number of copyright infringement notices against it should also face a penalty."
In the end, Sullivan thinks Google will employ a secret formula that will include a number of criteria that will themselves likely change with the times. "Without clarification from Google, we can only make assumptions on how this will work," he writes. "My guess is that Google will be looking at factors to somehow determine if a site seems legitimate. Does it have many reputable links to it? Can Google detect if there’s a lot of sharing of content from those sites? Are there factors that already give the site a good 'reputation' in Google’s algorithms for other types of searches."
Wherever the truth lies, says Sullivan, the "pirate penalty" or "Emanuel Update" no longer appears to be "purely tied to number of notices acted upon," as suggested by Google's initial announcement. The question is whether anyone is truly surprised by that.
Google's original announcement on Friday is reprinted below in its entirety:
We aim to provide a great experience for our users and have developed over 200 signals to ensure our search algorithms deliver the best possible results. Starting next week, we will begin taking into account a new signal in our rankings: the number of valid copyright removal notices we receive for any given site. Sites with high numbers of removal notices may appear lower in our results. This ranking change should help users find legitimate, quality sources of content more easily—whether it’s a song previewed on NPR’s music website, a TV show on Hulu or new music streamed from Spotify.
Since we re-booted our copyright removals over two years ago, we’ve been given much more data by copyright owners about infringing content online. In fact, we’re now receiving and processing more copyright removal notices every day than we did in all of 2009—more than 4.3 million URLs in the last 30 days alone. We will now be using this data as a signal in our search rankings.
Only copyright holders know if something is authorized, and only courts can decide if a copyright has been infringed; Google cannot determine whether a particular webpage does or does not violate copyright law. So while this new signal will influence the ranking of some search results, we won’t be removing any pages from search results unless we receive a valid copyright removal notice from the rights owner. And we’ll continue to provide "counter-notice" tools so that those who believe their content has been wrongly removed can get it reinstated. We’ll also continue to be transparent about copyright removals.