Exposed: ‘Googlejacking’

If you’ve noticed your Website traffic slipping despite your best efforts to maintain a good page ranking in Google, you may have become the victim of an easily accomplished page-redirect scheme that seems to be gaining in popularity.

Apparently in active use maliciously since at least August 2003, an exploit known commonly as Google 302 Pagejacking or simply Googlejacking allows the careless and the ethically challenged to supplant competitors’ legitimate Google search results with imposter pages. According to a poster known as “japanese” on the mainstream Web-master forum Webmaster World, “All pagerank [sic] sites of 5 or below are susceptible. If your site is 3 or 4, then be very alarmed.”

Google and MSN Search appear to be particularly vulnerable to the exploit, although other search engines are subject to it as well. That’s because the exploit uses the way core search engine technologies react to PHP and other redirect scripts that use the 302 redirect code. Unlike the more properly used 301 redirect code, which means a page has been moved permanently, the 302 code indicates a temporary relocation. The way search engine spiders react to it, in combination with how they interact with each other, can lead them to believe a legitimate page linked to using a 302 code is a duplicate of or a supplement to the page that bears the link. In a worst-case scenario that has resulted in disaster for at least one Adult industry Website, the “imposter” page can literally assume the identity of the target.

Daze Reader’s Sex News Blog became a casualty of the exploit in early March, according to its Webmaster. “Over four-plus years, Daze Reader had built up a solid ranking and steady stream of traffic from Google,” he posted on March 4. “Then, on December 17 … my daily Google traffic dropped by more than 90 percent. At first I just waited for a correction, but nothing changed…. My best guess is that Daze Reader has fallen victim to [a] scam involving bogus 302 redirects…. My Google referrals for March 1-2 were approximately 1 percent of my average level of last November.

“Figure I might as well make it official: Daze Reader is on sabbatical.”

It’s unknown how widely the exploit is employed within and to target the Adult industry. Neither noted search engine expert Scott “Traffic Dude” Rabinowitz nor Adult industry technology consultant Brandon Shalton were aware of any specific industry insiders who have been plagued by mysteriously disappearing page rankings. Sex.com owner Gary Kremen noted, “There’s no doubt in my mind it’s happening. I know it’s going on. I’ve heard people talking about it, but they’re keeping mum about it because they’re making money from it.”

Perhaps the worst news for Webmasters is that there isn’t much that can be done about the page hijacking, except to convince the search engines to change the way their spiders behave. Yahoo! did just that last October, after admitting publicly that the problem existed. According to a post at SearchEngineWatch by a Yahoo! representative who goes by the online handle TheOtherTim, “We recently revamped the way we handle redirects in Yahoo! search. We have documented the behaviors of 301s, 302s, and meta refresh redirects…. The presentation is available for download on the Yahoo! Search Blog and is called ‘Search Engines and Webmasters.’”

It should be noted that at least one search engine expert believes the majority of page hijackings accomplished through 302 redirection of search engine spiders is unintentional. Claus Schmidt, founder of the Internet consulting firm clsc.net, wrote in his analysis of the situation, “This is a flaw on the technical side of the search engines. Some Webmasters do of course exploit this flaw, but almost all cases I've seen are not a deliberate attempt at hijacking. The hijacker and the target are equally innocent as this is something that happens internally in the search engines, and in almost all cases the hijacker does not even know that s/he is hijacking another page.

“It is important to stress that this is a search engine flaw,” Scmidt continued. “It affects innocent and un-knowing Webmasters as these Webmasters go about doing their normal routines, maintaining their pages and links as usual. It is not so that you have to take steps that are in any way outside of the ‘normal’ or ‘default’ in order to either become hijacked or hijack others. On the contrary, page hijacks are accomplished using everyday standard procedures and techniques used by most Webmasters.”

How can a Webmaster tell if his or her page has been Googlejacked? For starters, type “allinurl:yourdomain.com” in the search box at Google to see what comes up. The subject domain should lead the list. If there are other entries on the page bearing the correct page title and excerpt – and in some cases, cached result – but incorrect URLs, chances are good the page has been targeted by a Googlejack, accidental or otherwise.

Fixing the problem is not as easy as determining if it exists. Webmasters can’t ban 302 referrers or most redirect scripts because servers don’t receive that information during connection requests. Click-throughs from the redirect script-bearing page can be banned, but that will only affect surfers and not search engine spiders, where the problem resides. Webmasters can request the removal of pages from Google, but that’s a lengthy, tedious process that only works within specific parameters.

Schmidt suggests several steps Webmasters can take to minimize the chances that their pages will be hijacked:

  • Always redirect “non-www” domains (yourdomain.com) to the www version (www.yourdomain.com) or vice-versa, and do it using a 301 code instead of a 302 code.
  • Always use absolute internal linking on Websites (include the full domain name in links that are pointing from one page to another page within the same site).
  • Include a bit of constantly updated content on all pages, like a time stamp, a random quote, or a page counter.
  • Use the meta tag on all pages.
  • Make all pages confirm their URL “artificially” by inserting a 302 redirect from any URL to the exact same URL and then serving a “200 OK” status code.

Schmidt also suggests that Webmasters take precautions to avoid becoming inadvertent hijackers:

  • Always use 301 redirects instead of 302 redirects or disallow redirect scripts in the “robots.txt” file or both.
  • Request removal of all redirect script URLs from Google’s index. Simply including the URLs in the robots.txt file won’t remove them from Google. That move just ensures the URLs are not revisited by Google spiders.
  • If you discover that one of your pages has hijacked someone else’s in Google’s index accidentally, make the script in question return a 404 (page not found) error and then request removal of the script from Google’s index.