For developing Java based applications at Enterprise level, performance is one of the most important facets for end user. Many times we meet surprises where good looking application with adequate performance on desktop does not do so well when deployed on QE environments. This blog discusses various aspects and disciplined approaches to solve performance issues when java applications are deployed at enterprise level. It includes suggestions for code level, database level and system level improvements. It also underlines the need of granular test cases and isolating the problems in the components involved. Finally, it also describes the java profiling tools which are widely accepted and used for performance analysis and improvements.
Introduction
With adoption of agile practices wildly in the industry, there is strong demand to develop products with excellent performance in short duration. Gone those days when systems were defined and built for months (sometimes years) before end customers try their hands on them and then give extra time to improve the performance. In the past, performance improvement was kind of additional patch towards the end of product delivery life cycle, but thanks to the maturity Software Industry has achieved, performance objectives are defined (or forced to define) as part of system requirements and acceptance criteria (at times included in SOWs).
Needless to say performance must be excellent for whatever piece of software you deliver. No doubt, this brings in challenges especially when you are working at enterprise level and integrating with multiple components. In my recent assignment in USA, where I was leading one of the components in SOA at enterprise level, we went through lot of pain to get to the performance level we were intending to. Thought of writing down the lessons learnt as well as thought process for tackling performance issues and writing healthy applications in short amount of time.
In this article, we cover following points for addressing performance issues –
1. Define performance requirements and the load, system is expected to carry, very clearly.
2. Use right tools to measure performance
3. Define testing strategy (Load/ Cluster) and facilitate separate environment for testing.
4. Accept that you have performance problem
5. Divide and Rule (each component)
6. Data layer – Tuning SQL/ defining indexes
7. Transactions, caching & memory management
8. Code refactoring – Recursions/ object lifecycles
9. Celebrate and acknowledge even small improvements
Performance Improvement guide lines
Let’s elaborate these points with the help of typical Enterprise Applications.
There are various components (Independent applications), mainly, presentation, Process, Business Logic, Data Access & Database. This being enterprise system it is spanning across various components owned by different teams (Testers, developers, build teams, deployment teams, platform teams and management).
1. Define performance requirements and load, which system is expected to carry, very clearly.
Everybody across the teams should be crystal clear with what are our objectives and while we are addressing performance issues, when we are going to say that we are DONE with improvement. Here are some examples of defined performance criteria –
Incomplete performance criteria
User should get the response in two seconds.
Correct performance criteria
Application should support 100 concurrent users with response time less than two seconds. Users should also be able to pull 20000 documents from the system in an hour. System should be up and running 24X7.
2. Use right tools to measure performance
Many tools are available in the market for measuring performances of different components.
UI based application – Load runner
Middleware application – Jprobe, JProfiler, VisualVM
Database – SQL optimizer
Web services – Jmeter, SOAP UI
3. Define testing strategy (Load/ Cluster) and facilitate separate environment for testing
You have to have testing strategy jotted down in the form of Test Plan, which is accepted by all stakeholders. Moreover, you must have objectives defined for every load test you run. Performance testing is always iterative process and we need to make sure that every test we perform helps us to move towards our final goal.
It is very important to have separate environment for testing performance. Firstly, it should mirror your production environment to accurately calculate system capabilities. Secondly, we can start performance testing bit early in the product development lifecycle. This way we will have less dependency on normal QE environment where functional and regression testing is being carried on. Also, we need to make sure that our database is loaded with enough data so that we can reach to very close what we expect in production. Coming up with this kind of dedicated environment is costly but very much essential for mission critical applications. Finally, one needs to accept that performance testing cannot be done on local machine and you need sufficient horsepower to run the race.
4. Accept that you have performance problem
To make sure that all the teams are working in the same direction and for the same goal of improving overall performance, each person in the team should accept that performance of the respective component can be improved. Each component owner should be open for suggestions and try best to look for various opportunities where performance can be improved. Since you know your component better than anyone else, at times enterprise architect cannot point to the areas of improvements which you can. While individual owner is taking care of his own component, it is system architect’s responsibility as well, to address performance issues at integration level and suggest ways of improvement.
5. Divide and Rule (each component) – Bottom up approach
This is very important step. While you are testing system performance as a whole, it is very important that each component is tested separately and in partial isolation. Partial isolation means testing downstream components upward (e.g. first database components, then data layer, business logic layer, process layer and finally UI). This is the beauty of component architecture. Mocking tools helps to load individual components and capture performance data. At times, we can write our own thread classes to test how each component behave under load. If each team certifies or comes up with their load limits it will be all the more helpful when we test System for performance as a whole.
I always recommend writing performance log at log4j INFO level when you write the code. Even technics like capturing time in milliseconds for running the method, in logs helps a lot to analyze behavior of the system while loaded. We went to the extent that these logs are consolidated and analyzed programmatically and then used to produce graphical representations of performance of each component under different stages of load. It paid off well to pinpoint the problem in components as well as to provide up to date information to senior management and vendors about how each component in the system actually performs under load. Finally, it is important for one to understand that how much logging is acceptable and too much of logging is not slowing down the system.
6. Data layer – Tuning SQL/ defining indexes
Data layer consists of databases, LDAP, Legacy systems or Mainframe systems. As explained above, bottom up approach is the best way of improving system performances. Though companies have expert DBAs to handle database load decisions, as a java developer, it always helps to understand query plans, database structure and their effect on performance. Making sure that indexes are defined appropriately and dead locks have been taken care of always helps. Another notable point we learnt during the process was about terminating search queries (or defining time outs for them) properly. Due to lack of this functionality, lots of resources were getting used with no output from them, noticeably under high load and resulting in bringing system down.
7. Transactions, caching & memory management
This is very important aspect to consider while evaluating performance across components. Length of the transaction, types of commits (multiphase/ single phase), their sequence always matter. Identifying the opportunities for asynchronous call outs against synchronous call outs is skill.
Carrying right amount of data and caching at various levels is another important consideration. Architect must be well versed with caching technics simply because caching has direct impact on the performance of the component and hence that of complete system. Caching comes up with its own challenges especially in clustered environments.
Keeping eye on memory utilization of the system is another crucial activity. Underlying operating system (ZLinux/ AIX) plays important role in eliciting performance problem. As Java programmers, obviously, we are more keen towards, object lifecycles, how garbage collector is running and how our code is performing in that aspect. Java profiling tools (jProbe) are commonly used to analyze and understand heaps and turning the memory supplied. Only with experience and experiments one can come up with right amount of CPU & memory for the JVM for application you are testing. Making sure that memory is not leaking and is being used adequately for each component is one of the important steps in isolating performance issues.
8. Code refactoring – Recursions/ object lifecycles/ expensive logics
While you are addressing system level issues and improvement areas, it is also important to take closer look at code being executed. As explained in one of the points above, you can have your own logs measuring response time for public methods, you can also use jProbe or similar tool which will report methods which are taking more time and object graphs which are not getting garbage collected due to various reasons. If we keep object lifecycle in mind while coding, use design patterns appropriately and use right frameworks, lots of problems will get eliminated at this stage. Not closing connections, lots of file i/0, native codes are typical areas where performance hits badly. With compilers getting smarter every day, there are in built improvements in JVM as well. (Java 1.5 onwards addition of two strings is no more overhead as compiler takes care of defining StringBuilder internally, you can verify the same from class file)
However, we have to be careful when we are performing actions recursively. Use of static analyzers like find bugs always help in pointing such areas. Synchronization at correct level, use of classes in concurrent package (e.g. concurrentHashMap), using isDebugEnabled around debug statements etc. are few important considerations for better coding practices.
There were instances where we had to revisit our design strategies. Using value objects to carry right amount of data, lazy loading and suggestions to business for changing few workflow steps were the outcomes of discussions we had for overall improvement in system’s design.
9. Celebrate and acknowledge even small improvements
While everybody is tirelessly working towards improving performance of the system, it is important to acknowledge even slightest improvement at component level or system level. This always encourages developer to run that extra mile and look for more opportunities to improve.
SUMMARY
Performance improvement is an art and at enterprise level it’s a battle we fight under pressure.
There are three typical areas where performance can be improved –
1. Code refactoring, design improvement
2. Database level improvements (indexes, query tuning)
3. Environment & platform level improvements (App server, clusters, CPUs, RAM, JVM, OS, network latency etc.)
While expectations are ever increasing from Java developer, one must be ready with Java profiling tools, Java static analysis tools and SQL query optimizers. Focused and very well planned team work for improving performance of the system at Enterprise level is critical for achieving performance goals in short amount of time and reduced load testing iterations.