Tips on Performance Testing and Optimization(J2EE)

Tips on Performance Testing and Optimization
http://www.theserverside.com/news/1364725/Tips-on-Performance-Testing-and-Optimization
http://www.theserverside.com/news/1364675/Improving-J2EE-Application-Performance

The purpose of this document is to explain how to go about performing scalability testing, performance testing, and optimization, in a typical Java 2 Enterprise Edition (J2EE) environemnt.

Definitions:

Response Time the time it takes between initial request and complete download of response (rendering of entire web page).

Load a measurement of the usage of the system. A server is said to experience high load when its supported application is being heavily trafficked.

Scalability - A scalable application will has a response time that increases linearly as load increases. Such an application, will be able to process more and more volume by adding more hardware resources in a linear (not exponential) fashion.

Automation testing tools Tools (Silk from Segue Software, WebLoad, etc) used to simulate a user by requesting pages or going through pre-programmed workflow on your site.

Load testing tools Most automation testing tools can also be used as load testing software, like WebLoad. These tools will simulate any number of users using your site and provide you with important data like average response times.

Profiler. A profiler is a program that examines your application as it runs. It provides you with useful run time information such as time spent in particular code blocks, memory / heap utilization, number of instances of particular objects in memory, etc.

A Process for performance testing

1) Functional Testing. Most applications begin tests be first completing functional tests. That is, ensuring that all the usecases / workflow in your application work.

2) Load and Scalability Testing. Load and scalabilty testing has too forms:

Test Response time as you increase the size of our database
Testing response time as you increase concurrent users

3) Interpreting the results. After measuring response time at varied database sizes and loads, you can now make interpretations based on the average response time of these tests and the resource utilization of the server during the tests.

4) Optimization. After identifying problems in the last step, you now interpret the results and track down the problem.

Load and Scalability Testing

The purpose of load and scalability testing is to ensure that your application will have a good response time during peak usage. You can also test how your application will behave over time (as your website contains more and more data in your database). To begin testing, write some testing scripts that will populate your database with an average amount of data. Run your performance tests, measure your response time. Then populate your database with an extreme amount of data (3 to 4 times more data than you can foresee having in 3 years). Run your performance tests again. If response times are significantly larger for the second test, then something is wrong.

To run your performance tests, you will want to simulate server usage at different loads. As a rule of thumb, I simulate low load (one to 5 concurrent users), medium load (10-50 concurrent users), high load (100 concurrent users) and extreme load (1000+ concurrent users). Note that these numbers are arbitrary and depend on your business needs. Also, simulating 10 concurrent users with load testing software isnt representative of 10 people, since each robot in the load test may wait just milliseconds before hitting the server again. Thus, using a load tester to simulate 10 users is probably more representative of the web surfing patterns of 30-40 people.

Once you have tested at all three load levels, you can now compare average response times to see if your system is scales, that is, if the response time increases linearly.

Interpreting the results

The fun part of this process is interpreting the results of your load testing. Let us examine some of the different possibilities:

Response time increases too much when database is over populated
Response time should not increase too much if you move from a database with 100 rows in its tables to 50,000. Database indexing technology makes finding a row in a table take a matter of milliseconds, even if there are hundreds of thousands of rows. Thus, if your response time increases too much after moving from a moderately populated database to an over populated database, then you probably havent indexed your appropriate columns yet.
Response time increases exponentially as load increases
If your system becomes un-useable as you increase concurrent users, then your system is not scalable. Interpreting these results are difficult, as the problem could be with hardware, deployment configuration, architecture, etc. Make sure you watch the server resources during the tests:
1. Watch memory requirements
2. Watch CPU usage
  If CPU is over used, need faster processor, or more processors. If the cpu is underused, then the problem is probably input/output (I/O) related. Check your database connections, your running thread count, and the network configuration of your test boxes.

If after checking your configuration, verifying that the slowdown is not a hardware bottleneck, and looking over your architecture for code to optimize, its time to run a code profiler.

Optimization

The database, your architecture, configuration and hardware will need to be optimized. As mentioned in the previous section, the easiest way not to scale is to have a database that isnt tuned. A database administrator (DBA) is always a vital person to have on any dev team, but if you dont have one, here is what you can do:

Look though your EJBs and verify that your database isnt doing linear searches for any of the sql queries that you have encoded. To do this, copy your SQL from your code and in your database sql window, run an EXPLAIN clause:

Explain select * from table where tablefield = somevalue

Although the explain syntax differs from database to database, there is always something similar. After running this line of code, your DB will tell you if it is searching an index or a linear search. Make sure you verify that every piece of SQL in your application is using your DBs indexes, and if not, create the indexes.

After optimizing the database, and optimizing your hardware configuration (as discussed in the previous section), the next step is optimizing your code, and this is done with a profiler.

A profiler is a program that analyzes your application as it runs. A Profiler provides you with information you could not otherwise get access to, such as:

How many objects of each class are in memory and garbage collection behaviour
- This information can help you identify classes, which should be pooled.
- Can help you tune your java heap.
How much time your application is spending in particular classes

This is the most important feature. Your profiler will point its finger and show you which classes are the bottlenecks.

One such program that really helped me is called Optimize-It. Optimize-It can be used with any java program or any java-based application server. Configuration with Weblogic is easy, and Optimize-It can be used to profile an application on a remote server.

Optimizing your architecture is extremely project specific, but here are some tips:

Make sure you have minimized your network calls, especially database calls
- It is better to make one large database call rather than many small ones.
- Make sure ejbStore isnt storing anything for read only operations.
- Use Details Objects to get entity bean state.
Make sure to take advantage of caching where possible
Your app. Server probably allows you to cache entity beans in memory, make sure you take advantage of this, as it will dramatically reduce database calls and speed up data access.
Make sure you are using session beans as a faade to your entity beans.
You can encapsulate the workflow of one entire usecase in one network call to one method on a session bean (and one transaction).

Conclusion

Performance testing and optimizing your application can be pretty challenging. Luckily, there are tools on the market that can make the process easier. By using these tools and following the simple steps in this paper, you should be able to effectively track down the bottlenecks in your system.

Testing and Optimizing TheServerSide.com

TheServerSide.com experienced numerous scalability problems before it was launched. Using the tips suggested in this paper, we fixed all the problems resulting in TheServerSide being one of the fastest java-based portals out there.

The first step in testing TheServerSide was populating the database with test data. After populating it to a moderate amount and to an extreme amount (added 16,000 messages and 40,000 users to our database), we found a serious problem. The response time for our top level pages jumped from 2 seconds to 12 seconds, at a single user.

Having not read this document, we did the most common of mistakes, we the first doubled our CPU speed and memory on our box. This only brought response time down to 8 seconds, and was therefore not the cause of the bottleneck.

The problem we had indicated that something was wrong in the database. After checking how our database handled our queries, we discovered that our primary key columns (and others) were not being indexed properly. This means that the database had to do linear searches, even for ejbFindByPrimaryKey(), which is the most common of calls.

After making changes to the database and our primary key strategy (our PostGreSQL database was buggy in handling 8 byte integer indexing), we were able to index all the appropriate columns, and bring the response time down to 3 to 4 seconds.

Once we had optimized the database, we began running proper load testing. We used WebLoad, a powerful load testing tool. The evaluation copy allows testing with 12 concurrent users ( probably representative of 30 40 real people). After running the tests at the maximum (12 concurrent users), we found that our site was extremely unscalable. The response time jumped from 3 to 4 seconds single user to 15 to 20 seconds per page under heavier load. These numbers were obviously not good enough.

Having optimized the database and upgraded the hardware, we now turned to examine our architecture. Minor modifications were made, but we still couldnt find the cause of the problem. Once we began using a profiler, that all changed. After running Optimize It remotely (had a window on my local machine telling me the stats of the server running at our ISP) I discovered the cause of our problem. 30% of the CPU time was being spent in socket communications with our database. Optimize It allowed me to trace back and see which objects/methods were initiating these calls, and I identified a design problem that was causing us to query for a count in the database every time we wanted to display a message on TheServerSide.com. After fixing this silly problem, I brought down the number of database calls invoked to display a page from 15+ down to 1, and suddenly our response time went down to about 1 second. This was exactly what we wanted.