The Performance Puzzle: Performance Improvement, Caching Solutions Are The First Steps In The Cycle Of Growth

Your Web presence is growing. Traffic and usage are escalating, and you have to maintain a superior level of performance in order to keep your current customers and gain new ones. The state of growth encompasses many opportunities, but it also holds challenges as a result of change.

Not only does performance affect your operating costs, it also reflects directly on the end-user experience. The marketing and sales departments are probably saying that happier users directly increase sales. Unsatisfactory user experience can be traced back to any of several reasons:

* Reduced response time due to increased request-seconds on the HTTP server.

* Reduced response time due to increased hits to the database.

* Slow download speed due to users dialing up with slow connections.

* Undesired visitors hacking the site.

* Errors resulting from adding software to the site.

* Etc.

Successful growth doesn't happen immediately: It is a constant cycle of evaluation and change, identifying points of fault, reporting, decision making, and implementing solutions.

Implementing homegrown solutions to deal with identified issues often results in a patchwork of disparate systems or solutions either developed internally or purchased separately. The amount of investment needed to tackle all issues effectively at once is great and requires major investment in critical resources such as capital and personnel.

This article will only attempt to tackle performance improvement, the first step in the cycle of growth. There are several techniques available to address the issue.

Cache Explained

Cache is disk memory that is set aside as specialized buffer storage that is continually updated. Temporary files, such as HTML documents, are stored in the cache, which is optimized for fast read-and-write access to short-lived data.

Caching is a word widely used to describe various solutions that rely on the basic concept of cache. These solutions provide increased performance and scalability for Websites. Caching solutions can be grouped into three main subcategories: proxy caching, server-side caching, and client-side caching. The type of content to be stored is the primary factor for determining the best caching option for a Website. While proxy caching is the traditional caching technique, server-side caching is becoming increasingly popular as a dynamic information caching method.

When the World Wide Web was in its infancy, it consisted primarily of static information. Dial-up connections were slow, which was fine because online traffic was light. Since those early years, the Internet has changed dramatically, with content becoming increasingly dynamic. Dynamic content evolved as a result of the desire to increase interactivity on the Internet. Dynamic content reflects information that can change with every user request. The content is a result of the server-side execution of a set of instructions hidden within the requested page. Once the code is executed and a static HTML page is created and received by the browser, the Web server discards the results from memory.

Database-driven Websites, such as news and e-commerce portals, are typical examples of dynamic content sites. Dynamic content puts a heavy load on the Web server due to the immense processing power required to produce dynamic results for thousands of requests in real time. Database servers are also hard at work providing the dynamic data to these Web servers.

Proxy Caching

Forward proxy caching is deployed in front of Web browsers for an enterprise's internal users' access. This type of cache stores frequently-requested content, and when users access external Web information, instead of going across the Internet to get it, the forward proxy cache delivers the stored content from an internal location. For example, if www.php.net, www.yahoo.com, and www.google.com are widely accessed by the employees, the forward proxy cache would hold a copy of these files.

This technique provides accelerated performance for the company's internal users and conserves the external bandwidth cost related to retrieving the documents from the Internet. The forward proxy cache is not related at all to Web server performance.

Reverse proxy caching is used by enterprises when dealing with static, unsecured Web content. Reverse proxy caching is deployed in front of the Web server for external user access. Requests for content to an internal Web server are filtered through the reverse proxy cache before they reach the source Web server, thereby offloading traffic bound for the Web server. Reverse proxy caches, which store the frequently requested data, are optimized to serve static data quickly.

When a client browser makes an HTTP request, the DNS will route the request to the reverse proxy machine, not the actual Web server. The reverse proxy will check its cache to see if it contains the requested item. If not, it connects to the real Web server and downloads the requested item to its disk cache. The reverse proxy can only serve cacheable URLs (such as HTML pages and images). Therefore, reverse proxies are ideal for caching unsecured, static content.

Server-side Caching

As content becomes more dynamic, powerful Web servers are required to handle increasing user demands. Additionally, bandwidth capacity continues to grow as businesses move more and more of their applications online. With these ever-increasing needs, server-side caching emerges as the best response to the tremendous performance and scalability requirements of the modern-day Website. The simple technique of dynamic content caching can improve both Website performance and scalability quickly and efficiently, by executing only necessary code on the Web HTTP server. A specifically designed PHP filter between the server and the client caches the output of dynamic requests. Once the PHP code is executed and the HTML result is available, it is saved to the Web content cache. The cache filter detects subsequent requests to the same dynamic page and immediately responds with the cached output before these requests arrive at the Web server. The benefit is threefold:

* Clients can receive requested HTML files much faster.

* The Web server does not need to execute the dynamic request repeatedly, saving processor cycles for more productive work.

* The cache reduces the amount of database queries executed, conserving overloaded resources.

Code acceleration is sometimes referred to as code caching. In fact the two are very different, because code acceleration caches the code and not the content. Code acceleration is at work when the server runs a script inside the PHP script engine. By default, every time a PHP script is accessed, the script is parsed and compiled before it is executed. As long as the script does not change, parsing and compiling is redundant. Code caching maintains the compiled bytecode version of the PHP script in memory, eliminating the need to parse and compile each time.

Code caching actively maintains the cached compiled scripts in a shared memory model, serving each process that is generated by the Web server. For this reason, it is critical that the code accelerator be tightly integrated with the scripting engine, to ensure proper memory utilization and prevent failure.

In dynamic Websites, the database server is more likely to become the bottleneck, since every query has to be processed and the quality of service drops dramatically under heavy load. In query caching, the cache contains a list of recently executed queries and their results. Whenever possible, a new query is satisfied by results already stored in the cache, thereby avoiding potentially large data scans. This technique benefits Web applications that run complex, process intensive queries. For Web applications that run frequent small-scale queries, the latency caused by the travel time over the network will most likely overshadow the benefit from query caching.

Client-side Caching

Client-side caching is a feature that stores frequently used information on the client's machine. It provides performance enhancements on the client side by allowing the client to quickly access a file that normally would be accessed from a server. Client caching is especially effective when the client disconnects from the server. In that case files can still be accessed from the local cache. Client-side caching is usually defined in the client's browser settings.

Dynamic Content Caching

Dynamic content caching solutions address the scaling and performance issues of Web servers directly. They work well for heavily-visited sites containing many dynamic features that must be tuned to respond quickly and easily. For a growing company that anticipates reaching its Web server's maximum processing power, dynamic content caching solutions will expand the Web server's capacity more effectively than adding additional hardware to manage the load. In certain cases, server-side dynamic caching may be 20-100 times faster than processing the page normally.

Factors to keep in mind when considering whether to cache dynamic pages include frequency of content changes and demand level for current content (hit count). Anticipated hit count determines the prioritization of pages to be cached. More popular pages take precedence over less frequently accessed pages. Some caching solutions provide analysis of script popularity on the site.

Full-page caching is the simplest and most straightforward dynamic caching mechanism. The user defines which page to cache and the server stores it in the cache disk space. When subsequent requests for the page arrive, instead of parsing the page, running the script, and building the HTML page all over again, the server locates the existing cached page and sends it to the client. The home page for a Website is an excellent example of a popular page for full-page caching. Generally it is the first page that users see, the most frequently accessed, and therefore has the greatest impact on the user experience.

As mentioned, when using dynamic content caching the Web server no longer creates pages for each request, but sends pre-built copies to the clients. Therefore, users do not see content changes once a page has been cached. Introducing the concept of "lifetime" settings can solve this problem while still maintaining huge performance gains, even when set as low as two minutes. Lifetime settings permit the server to discard cached pages once their defined lifetime has expired, forcing the application server to reprocess the pages.

Some sites use various parameters to recognize which version of the same page to serve. For example, the "Request" parameter can be set so that it serves the appropriate page for Internet Explorer and Netscape browsers. Full page caching with conditions allows you to save distinct output based on the requesting URL query string or parameters such as Get, Cookie, Request, Server, and Session. Caching conditions can also be used to determine whether or not to serve identical requests from the cache. For example, Zend.com has some pages that only should be cached when the end-user is not logged in. When the end-user is logged in, Zend.com sets a cookie named Zend_In, signifying that the user has logged in. A caching condition can be set to cache the relevant pages only when the user is not logged in - in other words, 'Zend_In' is not set: -

COOKIE:Zend_In.

It is important to note that some pages cannot be cached at all. Server-side script pages that process information submitted via an HTML form must be executed for every request and should never reside in a Web content cache. It may be useful to cache the form itself, especially if fields are populated from a database table.

Partial caching provides greater flexibility and control for pages that contain both static and dynamic content. By selecting portions of the page to cache, only relevant parts of the page are cached while the dynamic content remains untouched.

Some server-side dynamic content caching solutions also implement page compression. This functionality removes all the remarks and white space inside the HTML that is sent to the clients. Additionally, the file can be zipped, reducing its size by as much as 80 percent. Compressed pages download much faster and further enhance the user's experience.

Performance improvement is the first step in the cycle of growth. Even by itself it is a broad statement encompassing many concerns for network elements, each one calling for a different solution. As you and your site go through the growth cycle, you will add to and modify your infrastructure. At each point, choose among the various caching solutions depending on the type of data your site serves and the location of the bottleneck.

Rinat Gersch is the Product Manager for Zend Technologies Ltd., the internationally recognized PHP authority founded by the designers of PHP and of the upcoming PHP5. Zend provides a complete platform of products that enables PHP-based businesses to develop, protect, and scale their PHP applications. Ms. Gersch may be reached at [email protected].