Finding Confidential Info By Ordinary Web Search?

Improperly configured servers, security system holes, human error, and other factors are provoking an Internet subculture of people who can use ordinary search engines to find an increasing volume of private or purportedly secret documents – from individuals and government agencies alike – with some techies even offering directions on how to find that material with the simplest searches.

"There's a whole subculture that's doing this," said security consultancy Netsec hacking expert Chris O'Ferrell to The Washington Post – even as he showed a Post reporter how to get what looked like a sensitive government report on Taliban membership with nothing more elaborate than a Google search.

And the Post said that security engineers believe there's a growing, though informal group they call Googledorks who surf the Net specifically to seek out confidential information and other materials. Computer Sciences Corp. researcher Johnny Long – himself a veteran hacker – told the paper the number of affected sites could already be in the tens of thousands.

"Search strings including "xls," or "cc," or "ssn" often brings up spread sheets, credit card numbers, and Social Security numbers linked to a customer list," the paper continued. "Adding the word 'total' in searches often pulls up financial spreadsheets totaling dollar figures. A hacker with enough time and experience recognizing sensitive content can find an alarming amount of supposedly private information."

Google is said especially to be a widely used search engine for those purposes, because of its simplicity and effectiveness. "Its powerful computer crawls over every Web page on the Internet at least every couple weeks," the Post said, "which means surfing every public server on the globe, grabbing every page, and every link attached to every page. Those results are then catalogued using complex mathematical systems."

How to keep Google from reaching information in a Web server? Security experts say the way is setting up digital gatekeepers "in the form of an instruction sheet for the search engine's crawler," the paper continued. "That file, which is called robots.txt, defines what is open to the crawler and what is not. But if the robots.txt file is not properly configured, or is left off inadvertently, a hole is opened where Google gets in. And because Google's crawlers are legal, no alarms will go off."

Long said the "scariest thing" is that this kind of activity could involve government documents and materials and the government might not even know it. "If there's a chink in the armor," he told the Post, "the hackers will find it." Google, however, told the paper that while they're sensitive to the problem, they can't control it – and don't really want to police or censor what goes on the Internet if they can help it. But Google does offer a tool on its site on how to remove sites from the Google system, even their deep warehouse of cached pages that may not be online anymore.

One critical problem, according to the Post: Nobody knows exactly what the law is when someone digs up a confidential document through an ordinary Internet search. The FBI told the paper they haven't been taking action against those who find secure documents by way of ordinary Web searches, but "if they use it for some sinister purpose, that's another issue," according to spokesman Paul Bresson.