A META Tag Primer

Near the edges of the brave new world of dynamic Web languages like DHTML, XTML, WXML, and others lurk some trusted old friends who often are overlooked in the ever-raging battle for Website popularity.

I speak, of course, of the lowly META tags.

Hardly the belles of the Internet ball, META tags nevertheless should be on the guest list of any well-heeled Website, for it is they that enable you to keep pages up to date, make them easier to find, prevent them from being framed or cached, and either attract or repel robots. META tags can reveal a page's creator and the software used to build it, identify the HTML specifications the page's code follows, broadcast keywords and descriptions of content to the nosy and search engines, and specify refresh parameters, among other things. Although a few META tags are the young, brash, upstart types, most have been around since the first browsers set byte upon the face of the Web.

While it may seem cruel, segregation is a necessary evil among META tags. Although the opposing clans of HTTP-EQUIV and NAME tags remain civil to each other in close proximity, they do not behave well at all when forced to coexist within the same set of brackets. Hence, what follows is a brief who's who of META tags, and a guide to the etiquette of their use.

In most cases, META HTTP-EQUIV tags control or direct the actions of Web browsers. They are called "HTTP-EQUIV" because, like hypertext transfer protocol (which parses infor-mation about browser requests across the network), they provide information to the browser about how the page should be displayed.

Although there are differences in the ways Web server software translates HTTP-EQUIV tags, those differences have little, if any, effect on the final product.

Two of the most useful HTTP-EQUIV tags are "expires" and "pragma." The tags can be used together or separately, and both control browser caching, or the ability browsers have to store Web pages on a user's hard disk in preparation for reloading them again quickly - perhaps in case the user wants to click the "Back" button and return to a previous page. Caching is a great feature where static pages are concerned, but the user experience on pages that change frequently can be damaged by caching, as an out-of-date version of the page may be displayed inadvertently.

The "expires" tag tells browsers when a page should be considered obsolete. In the case of page content that should expire at a specific date and time, that information can be entered in the tag in Greenwich Mean Time (GMT) format, like so: . Otherwise, to keep Navigator from caching a page at all, the tag should use an "illegal" date such as "0," which is interpreted by the browser to mean "immediately": .

"Pragma" is similar in concept, but always bears the CONTENT value "no-cache" in order to prevent Navigator from storing page contents locally: .

For reasons known only to Microsoft engineers (but widely guessed at by the rest of us), Internet Explorer versions up to and including 5.x provide incomplete support for the "expires" and "pragma" tags - in other words, sometimes IE pays attention to the tags, and sometimes it ignores them entirely. Microsoft's browser can be prevented reliably from caching pages only by using technology designed to work on Microsoft's server platforms - like HTML+TIME, Active Server Pages, and certain Dynamic HTML commands and cookies.

"Refresh," a META tag that can be used in nearly the same way as "expires" and "pragma," is recognized by all popular browsers, but the result of its use to prevent cached pages from reappearing is much less elegant. Usually employed to redirect a browser to another URL after a page has moved, "refresh" specifies the time, in seconds, a browser displays a given document before reloading its contents or sending the user automatically to another page. To automatically and immediately refresh the content of a page that has been stored locally, the correct syntax for the tag is . To use the tag to forward a surfer to another page, use the tag like this: , where "X" equals the number of seconds to wait before the browser begins the redirection and "www.newurl.com" represents the address of the page to which the surfer should be transferred.

Another time-relative HTTP-EQUIV tag is "set-cookie," which sets a cookie in the user's browser as soon as he or she enters the page on which the tag resides. If the tag includes an expiration date, the cookie will remain on the user's hard drive until the expiration date rolls around, when it will expire automatically. If no expiration date is defined or the variable is set to a date that has already passed, the cookie will be considered valid for the current session only and will be discarded when the user closes his or her Web browser. Cookies of this kind are especially useful for allowing members to move around among a network of sites without having to log in at each one.

To set a cookie that expires when the browser is closed, use this META tag code: , where "xxx" is the name of the cookie.

To set a cookie that expires at a specific time, use this code: .

The "Window-Target" tag is a favorite of those who can't stand to find their pages "framed" within the pages of another site. This tag forces the page on which it resides to open in a new window. The correct syntax for the tag is . The same effect can be generated by substituting "_blank" for the CONTENT variable.

"Content-... -Type" tags specify browser default settings for a variety of features, notably character set, executable scripts, and style (as in cascading style sheets). "Content-Text-Type" can specify a particular character set when included like so: . On pages that make use of more than one script, the "Content-Script-Type" tag can be especially useful. It is employed thusly: . Content-Style-Type specifies the default style sheet language for a document using the syntax .

The "Content-Language" tag, in conjunction with the "Vary" tag, can be helpful when a site is available in more than one language. Search engine robots often use these tags to categorize sites and pages by natural language. The correct syntax for the "Content-Language" tag is , where "en-US" equals the language-dialect pair (in this case, English-United States). The "Vary" tag indicates an option is available in response to the Accept-Language header sent by the browser request. In this case, the syntax is .

The last of the HTTP-EQUIV META tags is the Platform for Internet Content Selection, or "PICS-Label," tag. Designed primarily as a content ratings label similar to that used by the motion picture industry, "PICS-Label" tags grew out of an effort by the World Wide Web Consortium to standardize site content ratings as a means for site owners to comply with the Communications Decency Act (since struck down). Anything that has a URL can be labeled, and labels can be assigned in two ways. Both of them involve a third-party labeling service, per the W3C's standard.

In the first method, a site owner contacts a labeling service that rates his or her site, then stores the ratings at the labeling bureau. In the second method, a site owner contacts a rating service and fills out the proper forms. The rating service then provides the site owner or developer with the "PICS-Label" META tags to place on his or her site. One excellent free service of the second type is offered by the Internet Content Rating Association (www.rsac.org). Instructions and codes are presented in four languages: English, French, German, and Spanish.

"PICS-Label" tags also can be used for code signing, privacy, and intellectual property rights management. For more in-depth information about PICS-Labels and their uses, see the W3C's Website at www.w3.org/PICS.

A complete "PICS-Label" tag for an entire site with mild adult content might look like this: .

NAME

Although NAME tags are used for META data type that do not correspond to usual HTTP headers, there remains some controversy and confusion - largely due to the lack of standards among search engine robots and spiders - about which META data is best presented by which type of tag. Primary among the issues is the placement of the keyword attribute - which, unfortunately, is one of the most important META types. Fortunately, most search engines now recognize the keyword attribute whether it is categorized as NAME or HTTP-EQUIV.

The "Keyword" attribute allows search engines to index pages using keywords the designer specifies, like this: . Although the "Keywords" tag can contain as many words as the designer desires, most search engines will ignore anything after the fifteenth; however, more than one "Keyword" tag can be used in a page header.

Although early in the Web's history designers could get away with "spiking" their "Keyword" tags by repeating the same words over and over, that no longer works, as search engines have refined their spidering techniques and often not only ignore such pages, but also may blacklist a site that tries that trick. That's where the "Description" tag comes into play. Using this attribute, designers can re-use keywords in a descriptive sentence or phrase about their site: . It's best to use only one "Description" tag per page.

Many sites - those with members-only areas, for example - may not want all of their pages indexed by search engine spiders. The "Robots" META tag was designed to address this issue. The syntax for this tag is , and they may be used in combination. For example, the default value for the CONTENT variable is "all," meaning the page itself and all files it links to would be indexed: . "None" would tell spiders and robots not to index any files and not to follow any hyperlinks on the page. "Index" indicates that the page on which it appears may be indexed, but not the files it links to; "follow" indicates that spiders are free to follow links from the page but not to index the page itself. Likewise, "noindex" and "nofollow" represent the inverse of their positive counterparts. The META tag would allow robots to follow all links from the page, but not to index the page itself.

Other NAME-type META tags are used rarely, if at all. "Author," as its name indicates, may be used to specify the author of a page: . Similarly, the "Copyright" tag indicates copyright information: .

Placement

META tags always should reside in the head of an HTML document, between the and tags and before the tag. Ideally, META data should be presented on each page, and it should reflect the page's unique nature (unless the site has a bunch of pages that are virtually identical in description, and that's not likely). That may seem like a lot of extra work, but consider this: META tags are the single biggest contributor to a site's non-paid ranking in search engines, and most search-engine surfers don't proceed past the first or second page of search results. With that in mind, doesn't it make sense to spend the extra time creating META tags that do a good job for the site they represent?

Don't use META tags just on "normal" pages, either. Because a significant number of surfers still are unable to view documents in frames, ensure that pages that will be framed include META data, too, to ensure the maximum number of hits.

In addition, statistics indicate that only about 21 percent of Web pages employ "Keyword" and "Description" META tags. Although that figure may be higher in the adult community (after all, the adult Net leads the way in most other areas), imagine the difference a few well-chosen META tags could mean in one site's ranking versus that of its competition.

One final word of caution: Be especially careful about the "Keyword" and "Description" terms chosen. Copyright and trademark violations, even if they are the product of something surfers can't see (META tags are invisible to surfers), can subject a site's owner to high-dollar lawsuits. At least five such suits have been adjudicated so far, with the settlement in the largest topping $3 million.