PHP TIPS AND TRICKS - Maintaining State On The Web - An Overview

This article is geared toward those experienced in writing HTML code, and have a basic understanding of PHP and its common uses. Seasoned PHP programmers will also find a lot of useful information in this article.

(For more about Zend Technologies, please visit www.zend.com)

Introduction

Prior to the advent of the World Wide Web, if you wrote a computer program, you could be assured that when you sent or retrieved data to/from a user session, the data was actually being transferred to that session.

This assurance was primarily due to the fact the computer terminals were hard-wired to the mainframe, and were not likely to be moved. Even with programs that are designed to operate on a modern network, the application protocols have state maintenance built in; you didn't have to worry about this aspect of the environment.

In contrast, when programming Web applications you are relegated to using the HTTP protocol, which is stateless by its very nature.

This article will explore the fundamentals of maintaining session state so you can deliver a richer, more interactive Web environment to your users. It will attempt to bridge the gap between sessionless, static Web pages, and today's interactive environment of computer operating systems and programs.

I'll look at how session state is handled in non-Web environments, and how it differs with HTTP environments. I will examine specific methodologies for maintaining session state, and briefly compare PHP's session functionality with the functionality of other languages. Finally, I will demonstrate a very simple method of maintaining state using a Web server's default document feature.

Challenges in maintaining state

The problem with programming highly interactive applications for the Web is two-fold:

  • HTTP -- the voice of the Web -- is a stateless protocol. Using HTTP, you are unable to automatically maintain a session with a user. Every time a user sends or receives data, a brand new connection is created. The user is, in effect , a new and different entity with each transfer of data.

  • Because the Internet is a non-centralized network that relies on dynamic routing and IP addressing, it is an anonymous and ambiguous transport medium. If you want to try to identify a user by some unique characteristic, such as the user's IP address, etc. in order to create a unique session, you will find it nearly impossible and completely unreliable.

The early days of the Web

In the early days of the Web, it was difficult or impossible to send data to a user unless the user specifically requested it, so creating a unique ID for the user was unreliable at best, impossible at worst. (It is true that some of the first generation browsers did offer a method for creating a unique ID, but not all browsers supported this and browser companies did not adhere to the same standards in the same way.)

Also, there were no simple programming languages like ASP, PHP, etc. that could easily send the necessary HTTP headers to create unique IDs on the client, or that could automatically maintain state by use of URL-encoded variables.

PERL and C

PERL was one of the first programming languages that was adapted specifically for creating interactive Web applications. Unfortunately, PERL was originally designed as a report generation tool, and modified over the years to provide HTTP functionality. PERL was the prototype for more modern and higher-level Web languages such as PHP, though it is still widely used today.

The main drawback to languages like PERL, and other lower-level languages, is that they are generally more complex to code, and lack the sophistication of richer Web programming languages like PHP. Also, because a PERL or "C" program must be run as a separate program, it is much harder to quickly integrate that functionality into existing HTML code.

Methods of state maintenance

The method for creating a unique ID is of course, the cookie. Most people who use the Internet are familiar with cookies, and most Web programmers bent on creating good interactive applications know that cookies are the easiest way to maintain session state. Not only can you create a unique user ID for each user, but you can also store other variables on their machine for later retrieval. These functions make life much easier for both the programmer and the end-user.

Before I continue discussing the use of cookies to maintain state, I will explore some other ways of maintaining session state that are -- in my opinion -- less desirable than cookies.

URL-encoded variables

One way to maintain state is through the use of variables appended to URLs. Although this works -- to an extent -- it has several disadvantages, including the creation of some very unattractive URLs. This can confuse the user, and increase the possibility that the user may change the variables -- either accidentally or on purpose. It is not a good thing to let the user tinker with the variables that you've worked so hard to set up!

Another problem with appending variables to the URL is that, if you are using a lower level language such as PERL, it takes a good deal of programming to parse the variables. Using other, newer languages like PHP, parsing is simpler, but you're still left with unattractive URLs and a very insecure method of transferring user data. Also, if you use the URL to send unique session IDs, one user can guess another user's ID and change the URL in order to hijack the other user's session. Obviously this is not a good method for creating secure sessions.

Converting variables to paths

A modification of sending data in the URL -- and one that PHP and Apache are particularly good at -- is to create a unique path for each variable, embedding the session ID or other data in the path. This is a bit harder for a user to modify, and more aesthetically attractive, but no more secure than just appending the variables to the URL with a question mark. Also, it increases the overhead in parsing the URL quite a bit. This is, however, a very good method to make a dynamic site look like a static site, allow search engines to index the site, and make the URLs easier to remember. Many of the large dynamic sites use this method (see CNN.com for a very good example).

Converting the standard URL encoded variables into paths requires the use of URL rewriting, which is a function of the Web server. You must also include a parser script that will convert the path back into usable variables. Unfortunately, many hosting companies do not allow its users to rewrite URLs, but if you have direct access to the Web server, you should have no problem with this. If you want search engines to be able to index your dynamic pages, and if you want your users to be able to bookmark them, then you will want to convert the variables to paths.

Here is an example of a dynamic page using URL encoded variables, followed by the same URL converted to a path:

http://www.mydomain.com/index.php?year=2001&month=09&day=24&edition=europe&corr=mike

http://www.mydomain.com/2001/09/24/europe/mike/index.php

As you can see, the second URL, though still not simple, is more attractive than the first, can be bookmarked, and can be indexed by search engines. The content that this URL points to could have been extracted from a database or any other source.

Hidden form variables

Another, slightly more secure, way of maintaining state is through the use of hidden form variables. This method, obviously, relies on HTML forms, and that means that it relies on the client to properly send the form data to the server. A session ID can be stored in the pages that the user requests, and other variables can also be stored this way.

Though it is more difficult for your users to modify the data in a hidden field, it can be done. In order to change the data in a posted form, a user only has to copy the page to his local machine, and edit the HTML. Then the page is loaded into the user's browser and sent to the server.

One way to help to avoid a user maliciously changing the form data is to check the HTTP referrer of the page, and only accept the data if the page originated from the calling page on your server. Unfortunately, even the referrer header can be faked, as this header is created by the client. Again, don't use this method for secure sessions, but it can certainly be used for trivial and non-critical security.

This method is only applicable for forms that use the POST method. If you use the GET method for the form, the form data just gets URL encoded anyway.

Magic cookies -- the preferred way

This brings us to back to cookies -- originally known as "Magic Cookies". A cookie is simply a small bit of data that is stored either in a client's RAM or on their disk drive. This makes your state solution much more elegant, attractive and reliable, though not much more secure than the other methods.

PHP, ASP and other server-side Web languages use RAM-based (non-persistent) cookies to propagate session IDs to maintain state. PHP also has a function that allows you to use the URL appending method as an alternative, in cases where a user doesn't allow cookies. (Using the URL appending method, unfortunately, puts us back in the Dark Ages again.)

User issues with cookies

The main problem with cookies is that the user's computer might be configured not to accept them. To overcome this, I prefer to warn the user to expect cookies. I explain that these cookies are required in order to benefit from all the features of the site. I also tell the user the exact purpose of the cookies, and that I do not use them to violate privacy.

A very good example of this is a chat room that I wrote in ASP many years ago, and a PHP chat room I am currently finishing. These rely heavily on non-persistent (RAM-based) cookies. If a user does not accept cookies, they don't get into the chat room.

I also give the user the option of accepting persistent cookies to remember their login info and preferences. If they don't accept the persistent cookies, it is only a slight inconvenience for them, and of no consequence to the proper functioning of the site. This use of persistent cookies is also state-maintenance, but it is longer term, and stores some of the program data on the client's machine.

Automatic state maintenance in PHP

As I mentioned previously, much of PHP's usefulness comes from the fact that PHP is essentially a compilation of all of the best features of many other programming languages. One of these features that I am particularly enamored with is the state maintenance functionality that is built into PHP. This state maintenance method -- which is a more functional version of ASP's state maintenance -- is very easy to use and very reliable. This method implements the use of session IDs, which uniquely identify each Web user to the PHP scripts.

There are two forms of PHP session IDs: cookie-based, and URL-based. The cookie-based version relies on non-persistent cookies to store the user's session ID on their computer, which can then be linked to that user's personalized variables and content on the server.

The URL-based method simply appends the user's session ID to the URL of subsequent pages, which is necessary if the user doesn't accept cookies. The cookie-based version of session ID storage is preferred by most programmers; it makes it more difficult for users to alter their session ID.

Session IDs are certainly the predominant method of state maintenance on PHP-enabled Web sites. PHP sessions are extremely versatile, reliable, and make life very simple for the programmer who wants to create rich and personalized content.

A parlor trick

Finally, in an attempt to find an alternative to the above methods, I figured out a nifty (though admittedly of very limited use) way to maintain a very limited amount of state -- specifically a single boolean variable. I wanted to simulate a JavaScript expanding tree function (this is about where the functionality ends, and there are much better ways to do the same thing).

The core of this method relies on the default index document of the Web server, and the ability to refer to it as either "www.yourdomain.com", or "www.yourdomain.com/index.php". (A tri-state method could also be used, adding “www.yourdomain.com/” to the mix. And if you have control of your DNS server, you could even add subdomains -- but now we're getting beyond the scope of a simple trick.)

Anyway, here's the trick: Create a page called index.php. In the page, create two sections of HTML, one with the desired tree collapsed, and one with it expanded. Branch to each section based on HTTP referrer. If the referer is "www.yourdomain.com/index.php", display the expanded tree with a link to www.yourdomain.com. If the referer is "www.yourdomain.com", display the collapsed tree with a link to "www.yourdomain.com/index.php". This doesn't require any fancy parsing, cookies, form variables, URL rewriting, or anything complicated at all.

Security issues

A note about security: All of these methods of maintaining state are insecure. While cookies and other data can be encrypted, hackers can still get in. If you need to process credit cards, or handle sensitive data, the only way to do it is with SSL authentication and encryption, preferably 128-bit. Even SSL is not fool-proof; 128 bits has recently been cracked by a large peer-to-peer system, but a hacker would have to be well organized and funded to break through. Also, firewalls, packet filtering, intrusion detection, and other network measures must be implemented to create a seriously secure site.

What we have learned

In this article, we have covered the basics of how session state is maintained in non-Web environments, maintaining session state in interactive Web applications, and the reasons for doing this. We also explored the origins of interactivity on the Web, and the concepts that allow you to make your users' experience richer and more interactive. And finally, we looked at an extremely simple way to maintain session state by exploiting multiple URLs pointing to the same page.

About the author

Michael Frey is a network engineer and has been programming interactive Web applications since the early days of the World Wide Web. He has worked on many large Internet, intranet and extranet projects, and is currently working on the rollout of a large, highly interactive Web site and portal system.

You can contact Michael at [email protected], and visit his Web site at http://www.allhttp.com.

For more PHP Tips & Tricks, click here.

* Zend Technologies, Inc., the PHP company, is the leading provider of products and services for developing, deploying and managing business-critical PHP applications. Zend and its founders are the creators and ongoing innovators of PHP, which is used by more than fifteen million Web sites and has quickly become the most popular language for building dynamic web applications. Deployed at more than 6,000 companies worldwide, the Zend family of products is a comprehensive platform for supporting the entire lifecycle of PHP applications.