FEATURE 200506 - Who's Visiting Your Website?: Use Your Server's Access Logs to Help Increase and Manage Traffic

What started as a trickle has become a flood, letting you know your hard work and diligence have paid off. You have found the holy grail of the adult Web: traffic. Any experienced webmaster will tell you, though, that it’s not just acquiring a nice flow of traffic that’s important for success. It’s how you direct and control that traffic that’s going to make you money.

As webmasters, and adult webmasters in particular, it’s beaten into our heads from day one that having a site on the Web does one absolutely no good if surfers aren’t visiting it. We tweak, we test, we promote, we optimize for search engines, we test some more and tweak some more until finally we come up with a formula that works for us. Our server logs begin to show a steady – or even sudden – increase in visitors. Acquiring the traffic is just the tip of the iceberg, though. Without something to do with website visitors, all you’ve created is a traffic jam that costs you money each month in bandwidth and hosting fees. This is where server logs can become a webmaster’s best friends.

There are two primary types of server logs with which webmasters should become familiar: error logs and access logs. Error logs are records of diagnostic information and errors encountered by the server during its normal course of operation. If you’re having a problem with your server, this is the first place to look, as the error log will contain details about the problem and how to fix it.

More important for our purposes here is the access log. Access logs record all requests processed by the server. Far from just providing a head count of site visitors, access logs are veritable treasure troves containing all sorts of information about the people who visit Web sites. Waded through, digested, and evaluated either manually or with the help of a software program designed especially for the purpose, access logs can be very effective traffic cops. Among other things, they can tell a webmaster where his or her traffic is coming from, how surfers are finding the site, how they navigate through the site, and, in some cases, where they go when they leave.

Reading Access Logs

You can, of course, employ software tools to parse server log data, analyze it, and present the statistics in a way customized to make sense to you. In fact, that’s the most common way of handling the task these days, because performing the chore manually is nearly impossible, especially in the adult industry where the sheer volume of data is overwhelming. That said, however, it’s a good idea to be able to read and understand access logs in their native format in case you need to look for something specific that your analysis software isn’t set to check or in case the analysis software develops a bug.

Access logs are records of every request for a file on the server. They are configurable on both Linux and Windows machines to provide as much or as little information as the system administrator wants to know. Most access log files "in the raw" are presented in what is called Common Log Format (CLF). This standard format is produced by many servers and can be read by most loganalyzing tools. Each line in a CLF represents one file request. A CLF entry in a log file with default settings might look like this:

127.0.0.1 - Joe [13/Apr/2005:13:00:57 -0400] "GET /islandnewspage.htm HTTP/1.1" 200 1318

In order from left to right, all that gibberish means:

- "127.0.0.1" is the IP address of the remote host (client) that made the request to the server. It may also be presented as a hostname (e.g., wkstn237-142.pcs.georgetown.edu). Note that if a proxy server is sitting between the surfer and the Internet, this entry will be the proxy server’s IP address.

- "-" (hyphen) indicates no information was available for that variable. In this case, it’s the RFC 1413 identity of the client determined by "identd" on the client’s machine. This number hardly ever shows up in a log, and when it does it’s virtually useless (unless the server is on a tightly controlled internal network).

- "Joe" is the user ID of the person requesting the document. This slot will only contain a value when password-protected documents have been requested, as in members’ sections of pay sites.

- "[13/Apr/2005:13:00:57 -0400]" represents the time the local server finished processing the request. It appears in day/month/year: hour: minute: second zone format. Although it is possible to have the time displayed in other formats by specifying that in the log config file, this is the format most often used.

- " ‘GET /islandnewspage.htm HTTP/1.1’ " indicates exactly what document the client requested and how it was requested. In this case, the client issued the GET command to retrieve the resource /islandnewspage.htm using the HTTP/1.1 protocol. This field is useful in determining which pages within a site are the most popular.

- "200" is the status code the server sent back to the client. Successful responses begin with a 2, redirection codes begin with a 3, client-initiated error (like a request for a nonexistent page) codes begin with a 4, and server-based error codes begin with a 5. A complete list of status codes can be found in the HTTP specification at the World Wide Web Consortium’s site (www.w3.org).

- "1318" indicates the size, in bytes, of the object returned to the client.

More commonly, especially in the adult Web industry, access logs are presented in a slightly different format known as the Combined Log Format (because it combines three previously separate logs: access, referer [sic], and agent). It is substantially the same as the Common Log Format, but it adds two additional fields that give the webmaster or server administrator more information about the users visiting the site. Combined Log Format entries look like this:

127.0.0.1 - Joe [13/Apr/2005:13:00:57 -0400] "GET /islandnewspage.htm HTTP/1.0" 200 1318 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

The additional gibberish means:

- " ‘Mozilla/4.08 [en] (Win98; I ;Nav)’ " is the identifying information the client browser reports about itself. In this case, the client machine is using the English-language version of a Mozilla browser on a PC running Windows 98.

- "http://www.example.com/start.html" " is the URL of the site that referred the client. This field can tell you from whence the site’s traffic comes. It’s a good piece of information to watch, especially over time and especially if you run a pay site or engage in traffic trading. The information in this field can tell you what marketing strategies are working, who’s becoming a valuable ally, and who’s falling by the wayside. It also can indicate traffic patterns within the site, helping webmasters improve navigation and pointing out pages that could stand increased traffic.

In the case of search-engine-inspired visits, the referer [sic] string might look like this:

http://ink.yahoo.com/bin/query? p="adult+video+news"&b=21&hc=0&h s=0

In this case, the visitor was referred by Yahoo after searching for the term "Adult Video News." This is the kind of information webmasters need in order to fine-tune their search-engine optimization strategies. The referer [sic] string also will tell you how often spiders visit your site and may help you determine if someone is stealing your bandwidth by hot-linking to your content.

For a variety of reasons, adult webmasters sometimes choose to create custom log files and multiple log files by adjusting the configuration values in the log config file. Multiple access logs might be desirable, for example, if you’re considering geo-targeting and want to determine what countries deserve your attention first. The same can be said for translating pages into other languages. If your site receives an abundance of visitors from Japan, then it may make sense to dedicate your translation resources to Japanese before, say, Farsi. This can be accomplished by telling the server to log all English-language page requests in one file, and all foreign-language requests in another (or each foreign-language request in its own). That’s where the client browser information comes in handy.

In addition, access log files get very large very quickly, so breaking them down according to other criteria sometimes is wise, depending upon the types of information a webmaster wants to analyze and how he or she plans to analyze it. Most log analysis tools provide a mechanism for backing up and storing log data at regular intervals, leaving the server free to purge the original files.

Of course, this is a very basic and simplistic view of all the things log files can track. Cookie data, for example, is important to track (or there wouldn’t be cookies, right?) in order to determine how long a single user spends on the site and on each page within the site, as well as how he or she navigates through the site and whether he or she is a regular or first-time user. At sites with several thousand users per hour, this becomes extremely important because individual log file strings are recorded as they occur, not by user. In addition, usability issues can be pointed out by log file analysis: If a user loops back and forth between a main page and subordinate pages, for example, navigation between those pages might not be as good as it could be.

Knowledge in Action

It’s not as important how a webmaster reads or analyzes his or her server logs as it is how he or she uses that information. For example, comparing the number of unique visits to a sign-up page to the number of actual joins results in a conversion ratio for the site under observation. If it seems low, ask yourself "Why?" Are you offering too much free content before the surfer gets to the join page? Is the sign-up process too difficult? Are you attracting the wrong kind of traffic? The join page is one of the most common "bail-out" points in a website. Others include unattractive or confusing home pages and any page with so much content that it becomes boring or tiresome. If you’re a free site owner and you’re having bail-out problems, solicit critical input from other webmasters and your sponsor, and engage in some serious trial-and-error experimentation until you find a formula that works for you—and keep an eye on your access log data while you’re doing it.

Pay-site owners can, and often do, use referer [sic] stats from their logs to determine who their best--and worst--performing affiliates are. They may offer bonuses to the stellar performers and incentives to the underachievers. They may redesign promotional materials and offer advice based on what’s working for their top affiliates. They also keep an eye on their referer [sic] stats to tell them if they’ve become the victim of a bait-and-switch tactic unique to adult entertainment. Certain purveyors of illegal content may sign up as affiliates of large sponsor programs and send potential buyers of their illicit wares through the sponsor program’s sign-up process. After the sponsor has issued a user name and password to the buyer, he then forwards it to the underground operator who uses it to grant the buyer access to his prohibited content. In the mind of the purveyor of illegal content, this puts a "shield" between him and the authorities. Sponsor programs have caught on to the gambit, though: An unusually high number of joins who never use their user name or password on the sponsor’s site, all coming from the same affiliate, is a dead giveaway to the scam.

Pay-site owners also carefully watch the IP number field in their access logs, especially in conjunction with user IDs. If any one user ID logs in frequently from a variety of IP addresses, there’s a good chance that ID and password have been shared and it’s time to shut off that user. Pay-site operators and free-site operators alike use their log stats to tell them which pages on their sites are the most popular and which are the least visited. By creating more pages that mimic the popular ones and redesigning the unpopular ones they can increase the amount of time users spend on their sites, the number of repeat visits, and the number of months members stick with them. This sort of analysis in action – reducing customer "churn" – has been shown to increase profit by between 30 percent and 85 percent, according to an article by Karl Long in Design Management Review. In addition, Long points out that "increasing customer retention by 2 percent is equivalent to reducing operating expenses by 10 percent." That alone is a powerful argument for server log analysis!