Java Technology By 'Ravi Kant Soni': J2EE Application Performance Optimization

J2EE application performance optimization

How to extract maximum performance from your J2EE Web applications

In today's world of larger-then-ever software applications, users still expect real-time data and software that can process data at blazing speeds. With the advent of broadband access, users have grown accustomed to receiving responses in seconds, irrespective of how much data the server handles. It is becoming increasingly important for Web applications to provide as short response times as possible. The most obvious and simple way to improve a Website's performance is by scaling up hardware. But scaling the hardware does not work in all cases and is definitely not the most cost-effective approach. Other solutions can improve performance, without extra costs for the hardware. This article provides some suggestions that prove helpful when trying to maximize a J2EE Web application's performance.
If you decide to try this article's recommendations, keep in mind this article provides suggestions only. Performance tuning is as much an art as it is a science. Changes that often result in improvement might not make any difference in some cases, or, in rare scenarios, they can result in degradation. For best results with performance tuning, take a holistic approach.

Overview

Figure 1 illustrates at a broad level how a J2EE application appears when deployed in a production environment. To get the best performance from a J2EE application, all the underlying layers must be tuned, including the application itself.

Figure 1. J2EE application architecture. Click on thumbnail to view full-sized image.
For maximum performance, all the components in Figure 1—operating system, Web server, application server, and so on—need to be optimized. This article will give a glimpse into tuning a J2EE application server, Web server, relational database, and your J2EE application. For maximum payoff from this article, follow these guidelines:

Set a goal: Before you begin tuning your J2EE application's performance, set a goal. Often this goal addresses the maximum concurrent users the application will support for a given limit on response times. But the goal can also focus on other variables—for example, the response times should not increase more than 10 percent during the peak hour of user load.
Identify problem areas: It is important to identify the bottlenecks when you start making changes to improve performance. A little investigation into problems might reveal the specific component that causes poor performance. For example, if the CPU usage on an application server is high, you will want to focus on tuning the application server first.
Follow a methodical and focused path: Once the goal is set, try to make changes that are expected to have the biggest impact on performance. Your time is better spent tuning a method that takes 10 seconds but gets called 100 times than tuning a method that takes one minute but gets called only once. In an ideal world, you test one change at a time before using it in a production environment. You make one change and stress-test it. If the change results in positive impact, only then will you make it permanent.

Identify bottlenecks

The goal of performance tuning is to identify bottlenecks and remove them. It is an iterative process. Once one area of the application improves, another area will become a bottleneck. You must repeat the cyclic process of first identifying the bottleneck, then resolving the bottleneck, then identifying the next bottleneck until the desired goal has been reached. You will need two kinds of tools that prove helpful in this process. First, you need stress tools that generate load for your application. Second, you need monitoring tools that collect data for various performance indicators.

Stress tools

Many different stress tools are available in the market today. Some of the popular ones are:

Mercury Interactive's LoadRunner
Segue's SilkPerformer
RadView Software's WebLoad

For a comprehensive list of stress tools see this article's Resources section. You must choose a tool that best fits your needs based on the tool's features and associated price tag. Some of the important features you should consider before choosing a tool are:

Support for a large number of concurrent Web users, each running through a predefined set of URL requests
Ability to record test scripts automatically from browser
Support for cookies, HTTP request parameters, HTTP authentication mechanisms, and HTTP over SSL (Secure Socket Layer)
Option to simulate requests from multiple client IP addresses
Flexible reporting module that lets you specify the log level and analyze the results long after the tests were run
Option to specify the test duration, schedule test for a later time, and specify the total load

For the sample tests mentioned in this article, LoadRunner was used.

Performance monitors

Using a monitoring tool, you collect data for various system performance indicators for all the appropriate nodes in your network topology. Many stress tools also provide monitoring tools. Windows OS also has a built-in performance monitor sufficient for many purposes. This Windows performance monitor can be started from the Administrative Tools menu, accessed from the Control Panel menu, or by typing "perfmon" in the Run window (accessed from the Start menu). You can display the performance counters data in real time, but usually, you'll want to log this data into a file so it can be viewed later. To log the data into a file, go to the Counter Logs selection in the left-hand side of the Performance window, right click with your mouse, and select New Log Settings as shown in Figure 2.

Figure 2. Windows performance monitor. Click on thumbnail to view full-sized image.
You can also set the file path/name for where the data should be logged, as well as a schedule of when to collect this data. It is possible to log the data and view it in real time, but the performance counters must be added twice—first, in the System Monitor link on the left-hand side and second, in Counter Logs, as shown above.
Many performance counters are available in Windows OS. The following table lists some of the important counters that you should always monitor:

System resource	Performance monitor	Description
CPU	System: Processor queue length	Processor queue length indicates the number of threads in the server's processor queue waiting to be executed by the CPU. As a rule of thumb, if the processor queue remains at a value higher than 2 for an extended period of time, most likely, the CPU is a bottleneck.
	Processor: Percent of processor time	This counter provides the total overall CPU utilization for the server. A value that exceeds 70-80 percent for long periods indicates a CPU bottleneck.
	System: Percent of total privileged time	This counter measures what percentage of the total CPU execution time is used for running in kernel/privileged mode. All the I/O operations run under kernel mode, so a high value (about 20-30 percent) usually indicates problems with the network, disk, or any other I/O interface.
RAM	Memory: Pages per second	Pages per second is the number of pages read from or written to the disk for resolving hard page faults. A value that continuously exceeds 25-30 indicates a memory bottleneck.
	Memory: Available bytes	Available bytes is the amount of physical memory available to processes running on the computer, in bytes. A low value (less than 10 MB) usually means the machine requires more RAM.
Hard disk	Physical disk: Average disk queue length	The average number of requests queued for the selected disk. A sustained value above 2 indicates an I/O bottleneck.
	Physical disk: Percent of disk time	The percentage of elapsed time that the selected disk drive is busy servicing requests. A continuous value above 50 percent indicates a bottleneck with hard disks.
Network	Network interface: Total bytes per second	Shows the bytes transfer rate (sent and received) on the selected network interface. Depending on the network interface bandwidth, this counter can tell if the network is the problem.
	Network interface: Output queue length	Indicates the length of an output packet queue in packets. A sustained value higher than 2 indicates a bottleneck in the network interface.

You should add the above counters (and any others as appropriate) in your counter log and collect this data while you stress-test your application using the stress tool. A file generated by the counter log can be opened later by clicking on the View Log File Data button on the right-hand side toolbar. Looking at these counters should give you some hint as to where the problem exists—application server, Web server, or database server.

After identifying the bottleneck this way, you should try to resolve it. You can use two different strategies—either tune the hosting environment in which your application runs or tune the application itself.

Environment tuning

In this section, we look at the possible tuning options in a typical J2EE Web application hosting environment. As already discussed above, a J2EE application environment usually consists of an application server, Web server, and a backend database.

Web server/application server tuning

Most application servers and Web servers provide similar kinds of configuration options though they have different mechanisms to set them. In this article, I cover Tomcat 4.1.x and Apache 1.3.27, which talk to each other using the JK connector. The configuration options presented here exist in most of the other application servers and Web servers, but you will need to locate the correct place to set them.
Apache
Probably the most important setting for your Windows Apache HTTP Server is the option for number of threads. This value should be high enough to handle the maximum number of concurrent users, but not so high that it starts adding its own overhead of too many context switches. The optimum value can be determined by monitoring the number of threads in use during peak hours. To monitor the threads in use, make sure you have the following configuration directives present in the Apache configuration file (httpd.conf):

LoadModule status_module modules/mod_status.so
<Location /server-status>
    SetHandler server-status
    Allow from all
</Location>

Now, from your browser, make an HTTP request to your Apache server with this URL: http://<apache_machine>/server-status. It displays how many requests are being processed and their status (reading request, writing response, etc.). Monitor this page during peak load on the server to ensure the server is not running out of idle threads. After you come up with the optimum number of threads for your application, change the ThreadsPerChild directive in the configuration file to an appropriate value.
A few other items that improve performance in the Apache HTTP Server are:

DNS reverse lookups are inefficient. Switch off DNS lookups by setting HostnameLookups in the configuration file to off.
Do not load unnecessary modules. Apache allows dynamic modules that extend basic functionality of the Apache HTTP Server. Comment out all the LoadModule directives you don't need.
Try to minimize logging as much as possible. Look for directives LogLevel, CustomLog, and LogFormat in the configuration file for changing logging level.
Minimize the JK connector's logging also by setting the JkLogLevel directive to emerg.

Tomcat

The two most important configuration options for Tomcat are its heap size and number of threads. Unfortunately there is no good way to determine the heap size needed because in most cases, the JVM doesn't start cleaning up the memory until it reaches the maximum memory allocated. One good rule of thumb is to allocate half of the total physical RAM as Tomcat heap size. If you still run out of memory, look into your application design to reduce memory usage, identify any memory leaks, or try various garbage collector options in the JVM. To change the heap size add -Xms<size> -Xmx<size> as the JVM parameter in the command line that starts Tomcat. <size> is the JVM heap size usually specified in megabytes by appending a suffix m, for example, 512m. Initial heap size is -Xms, and -Xmx is the maximum heap size. For server applications, both should be set to the same value.

The number of threads in Tomcat can be modified by changing the values of minProcessors and maxProcessors attributes for the appropriate connector in <Tomcat>/conf/server.xml. If you are using the JK connector, change the values of its attributes. Again, there is no simple way to decide the optimum value for these attributes. The value should be set such that enough threads are available to handle your Web application's peak load. You can monitor a process's current thread count in the Windows Task Manager, which can assist in determining the correct value of these attributes.
A few other options you should be aware of:

If you are not using the latest JRE (Java Runtime Environment), consider upgrading to the latest one. I have seen up to a 30 percent performance improvement after upgrading from JRE 1.3.1 to JRE 1.4.1.
Add the -server option to the JVM options for Tomcat, which should result in better performance for server applications. Note that this option, in some cases, causes the JVM to crash for no apparent reason. If you face this problem, remove the option.
Change the default Jasper (JavaServer Pages, or JSP, compiler) settings in <Tomcat>/conf/web.xml by setting development="false", reloading="false" and logVerbosityLevel="FATAL".
Minimize logging in Tomcat by setting debug="0" everywhere in <Tomcat>/conf/server.xml.
Remove any unnecessary resources from the Tomcat configuration file. Some examples include the Tomcat examples Web application and extra <Connector>, <Listener> elements.
Set the autodeploy attribute of the <Host> tag to false (unless you need any of the default Tomcat applications like Tomcat Manager).
Make sure you have set reloadable="false" for all your Web application contexts in <Tomcat>/conf/server.xml.

Database tuning

In the case of Microsoft SQL Server, more often than not, you do not need to modify any configuration options, since it automatically tunes your database to a great degree. You should change these settings only if your stress tests identify the database as a bottleneck. Some of the configuration options that you can try are:

Run your SQL Server on a dedicated server instead of a shared machine.
Keep your application database and your temporary database on different hard disks.
Consider taking local backups and moving them to a different machine. The backups should complete much faster.
Normalize your database to the third normal form. This is usually the best compromise, as the fourth and fifth forms of normalization can result in performance degradation.
If you have more than 4 GB of physical RAM available, set the awe enabled configuration option to 1, which will allow SQL Server to use more than 4 GB of memory up to a maximum of 64 GB (depending on the SQL Server edition).
In case you have many concurrent queries executing and enough memory is available, you can increase the value of the min memory per query option (default is 1,024 KB).
Change the value of the max worker threads option, which indicates the maximum number of user connections allowed. Once this limit is reached, any new user requests will wait until one of the existing worker threads finishes its current task. The default value for this option is 255.
Set the priority boost option to 1. This will allow SQL Server to run with a higher priority as compared to the other applications running on the same server. If you are running on a dedicated server, it is usually safe to set this option.

If none of the configuration options resolve the bottleneck, you should consider scaling up the database server. Horizontal scaling is not possible in SQL Server, as it does not support true clustering, so usually, the only option is vertical scaling.

Application tuning

After you have tuned your hosting environment, now it is time to get down and dirty inside your application source code and database schema. In this section, we look at one of the many possible ways to tune your Java code and your SQL queries.

Java code optimization

The most popular way to optimize Java code is by using a profiler. Sun's JVM has built-in support for profiling (Java Virtual Machine Profiler Interface, or JVMPI) that can be switched at execution time by passing the right JVM parameters. Many commercial profilers are available; some rely on JVMPI, others provide their own custom hooks into Java applications (using bytecode instrumentation or some other method). But be aware that all these profilers add significant overhead. Thus, your application cannot be profiled at a realistic load level. Use these profilers with a single user or a limited number of users. It is still a good idea to run your application through the profiler and analyze the results for any obvious bottlenecks.
To identify your application's slowest areas in a full-fledged deployed environment, you can add your own timing logs to the application, which can be switched off easily in the production environment. A logging API, such as log4j or J2SE 1.4's Java Logging API, is handy for this purpose. The code below shows a sample utility class that can help you add timing logs to your application:

import java.util.HashMap;
//Import org.apache.log4j.Logger;
public class LogTimeStamp {
    private static HashMap ht = new HashMap();
    // Preferably we should use log4j instead of System.out
//    private static Logger logger = Logger.getLogger("LogTimeStamp");
    private static class ThreadExecInfo {
        long timestamp;
        int stepno;
    }
    public static void LogMsg(String Msg) {
        LogMsg(Msg, false);
    }
    /*
     * Passing true in the second parameter of this function resets the counter for
     * the current thread. Otherwise it keeps track of the last invocation and prints
     * the current counter value and the time difference between the two invocations.
     */
    public static void LogMsg(String Msg, boolean flag) {
        LogTimeStamp.ThreadExecInfo thr;
        long timestamp = System.currentTimeMillis();
        synchronized (ht) {
            thr = (LogTimeStamp.ThreadExecInfo) ht.get(Thread.currentThread().getName());
            if (thr == null) {
                thr = new LogTimeStamp.ThreadExecInfo();
                ht.put(Thread.currentThread().getName(), thr);
            } 
        }
        if (flag == true) {
            thr.stepno = 0;
        }
        if (thr.stepno != 0) {
//            logger.debug(Thread.currentThread().getName() + ":" + thr.stepno + ":" +
//                    Msg + ":" + (timestamp - thr.timestamp));
            System.out.println(Thread.currentThread().getName() + ":" + thr.stepno + ":" +
                    Msg + ":" + (timestamp - thr.timestamp));
        }
        thr.stepno = thr.stepno + 1;
        thr.timestamp = timestamp;
    }
}

After adding the above class in your application, you must invoke method LogTimeStamp.LogMsg() at various checkpoints in your code. This method prints the time (in milliseconds) it took for one thread to get from one checkpoint to the next one. First, call LogTimeStamp.LogMsg("Your Msg", true) at one place in the code that is the start of a user request. Now you can insert the following invocations in your code:

    public void startingMethod() {
        ...
        LogTimeStamp.LogMsg("This is a test message", true); //This is starting point
        ...
        LogTimeStamp.LogMsg("One more test message"); //This will become check point 1
        method1();
        ...
    }
    public void method1() {
        ...
        LogTimeStamp.LogMsg("Yet another test message"); //This will become check point 2
        method2();
        ...
        LogTimeStamp.LogMsg("Oh no another test message"); //This will become check point 4
    }
    public void method2() {
        ...
        LogTimeStamp.LogMsg("Wow! another test message"); //This will become check point 3
        ...
    }

The Perl script analyze.pl, which can be downloaded from Resources, can take the output of the above log messages as input and print the results in the format below. From these results, you now know which part of the code requires the most time and can concentrate on optimizing that part:

Transactions                           Avg. Time  Max Time  Min Time
--------------------------------------------------------------------
[This is a ...] to [One more t...]         14410     20937      7500
[One more t...] to [Yet anothe...]            16        62         0
[Yet anothe...] to [Wow! anoth...]         39860     50844     27703
[Wow! anoth...] to [Oh no anot...]           711      1844        94
[Oh no anot...] to [OK thats e...]         68089    228452     19718

The above approach represents just one of the ways to tune your Java code. You can use whatever methodology works for you. Some excellent resources are available for Java performance tuning. Check Resources for some links. Some general suggestions you should be aware of while developing a J2EE application are:

Avoid using synchronized blocks in your code as much as possible. That does not mean you should abdicate handling synchronization for your code's multithreaded parts, but you should try to limit its usage. Synchronized blocks can severely impair your application's scalability.
Proper logging proves necessary in serious software development. You should try to use a logging mechanism (like log4j) that lets you switch off logging in the production environment to reduce logging overhead.
Instead of creating and destroying resources every time you need them, use a resource pool for every resource that is costly to create. One obvious choice for this is your JDBC (Java Database Connectivity) Connection objects. Threads are also usually good candidates for pooling. Many free APIs are available for pooling various resources.
Try to minimize the objects you store in HttpSession. Extra objects in HttpSession not only lead to more memory usage, they also add additional overhead for serialization/deserialization in case of persistent sessions.
Where ever possible, use RequestDispatcher.forward() instead of HttpServletResponse.sendRedirect(), as the latter involves a trip to the browser.
Minimize the use of SingleThreadModel in servlets so that the servlet container does not have to create many instances of your servlet.
Java stream objects perform better than reader/writer objects because they do not have to deal with string conversion to bytes. Use OutputStream in place of PrintWriter.
Reduce the default session timeout either by changing your servlet container configuration or by calling HttpSession.setMaxInactiveInterval() in your code.
Just as we switched off the DNS lookup in the Web server configuration, try not to use ServletRequest.getRemoteHost(), which involves a reverse DNS lookup.
Always add directive <%@ page session="false"%> to JSP pages where you do not need a session.
Excessive use of custom tags also may result in poor performance. Keep performance in mind while designing your custom tag library.

SQL query optimization

The optimization of SQL queries is a vast subject in itself, and many books cover only this topic. SQL query running times can vary by many orders of magnitude even if they return the same results in all cases. Here I just show how to identify the slow queries and offer a few suggestions as to how to fix some of the most common mistakes.

First of all, to identify slow queries, you can use SQL Profiler, a tool from Microsoft that comes standard with SQL Server 2000. This tool should be run on a machine other than your SQL Server database server and the results should be stored in a different database as well. Storing results in a database allows all kinds of reports to be generated using standard SQL queries. Profiling any application inherently adds a lot of overhead, so try to use appropriate filters that can reduce the total amount of data collected.
To start the profiling, from SQL Profiler's File menu, select New, then Trace, and give the connection information and the appropriate credentials to connect to the database you want to profile. A Trace Properties windows will open, where you should enter a meaningful name you will recognize later. Select Save To Table option and now give the connection information and credentials for the database server (this should differ from the server you are profiling) where you want to store the data collected by the profiler. Next, you should be asked to provide the database and the table name where the results will be stored. Usually, you would also want to add a filter, so go to the Filters tab and add the appropriate filters (for example "duration greater than or equal to 500 milliseconds" or "CPU greater than or equal to 20" as shown in Figure 3). Now click on the Run button and the profiling will start.

Figure 3. SQL Profiler. Click on thumbnail to view full-sized image.
Let's say based on SQL Profile's results, you have identified the following query as the most time consuming:

SELECT [TABLE1].[T1COL1], [TABLE1].[T1COL2], 
       [TABLE1].[T1COL3], [TABLE1].[T1COL4]
FROM ((([TABLE1] LEFT JOIN [TABLE4] ON [TABLE1].[T1COL4] = [TABLE4].[T4COL4]) 
        LEFT JOIN [TABLE3] ON [TABLE1].[T1COL3] = [TABLE3].[T3COL3]) 
        LEFT JOIN [TABLE2] ON [TABLE1].[T1COL2] = [TABLE2].[T2COL2]) 
WHERE [TABLE1].[T1COL5] = 'VALUE1'

Now your task is to optimize this query to improve performance. You look at your database and find that TABLE1 has 700,000 records, TABLE2 has 16, TABLE3 has 100, and TABLE4 happens to have more than 4 million records. You should first understand this query's cost, and Query Analyzer comes in handy for this task. Select Show Execution Plan in the Query menu and execute this query in Query Analyzer. Figure 4 shows the resulting execution plan.

Figure 4. Execution plan before indexes. Click on thumbnail to view full-sized image.
From this plan, you see that SQL Server is clearly doing full-table scans for all four tables, and together, they make up around 80 percent of the total query cost. Luckily, another feature in Query Analyzer can analyze a query and recommend appropriate indexes. Run Index Tuning Wizard from the Query menu again. This wizard analyzes the query and gives recommendations for indexes. As shown in Figure 5, it recommends two indexes to be created ([TABLE4].[T4COL4] & [TABLE1].[T1COL5]) and also indicates performance will improve 99 percent!

Figure 5. Index Tuning Wizard. Click on thumbnail to view full-sized image.
After creating the indexes, the execution time declines from 4,125 milliseconds to 110 milliseconds, and the new execution plan shown in Figure 6 shows only two table scans (not a problem as TABLE2 and TABLE3 both have limited records).

Figure 6. Execution plan after indexes. Click on thumbnail to view full-sized image.
This was just an example of what proper tuning can achieve in terms of performance. In general, SQL Server's auto-tuning features automatically handle many tasks for you. For example, reordering WHERE clauses will never yield any benefit, as SQL Server internally handles that. Still, here are a few things you should keep in mind while writing SQL queries:

Keep the transactions as short as possible. The longer a transaction is open, the longer it holds the locks on the resources acquired, and every other transaction must wait.
Do not use DISTINCT clauses unnecessarily. If you know the rows will be unique, do not add DISTINCT in the SELECT clause.
When possible, avoid using SELECT *. Select only the columns from which you need the data.
Consider adding indexes to those columns causing full-table scans for your queries. Indexes can result in a big performance gain, as shown above, even though they consume extra disk space.
Avoid using too many string functions or operators in your queries. Functions like SUBSTRING, LOWER, UPPER, and LIKE result in poor performance.

Summary

In this article, I barely scratched the surface of performance tuning. Just reading this article will not make you an expert on all aspects of performance. But I hope you have gained a better appreciation for the methodology and acquired enough knowledge to be able to start the process. For optimum results, you need to tune every aspect of the application starting with the bottlenecks. Even experienced developers make small mistakes that prove costly in terms of performance. Performance tuning helps to identify these mistakes and makes your product not only more scalable, but also more reliable under stress conditions.

Java Technology By 'Ravi Kant Soni'

About Me

Tuesday 17 July 2012

J2EE Application Performance Optimization