Tuesday, June 19, 2012

Performance Tuning Hibernate Transaction Flush Mode

I have been using Hibernate on and off for the post ten years and here are some tips and tools that I have used to help me identify, tune, and improve Hibernate performance.

Tip 1 Use a good JDBC Profiler

My personal favorite is Elvyx, which is easy to install, configure, and use. While Hibernate SQL log is useful, it is not easy to read and it won't show the actual parameters sent to the database. Elvyx, on the other hand, has a UI that will show both unbound (similar to Hibernate) and bound SQL, which shows the actual parameters in the SQL. Elvyx UI also allows us to do the following,
  • Sort the queries
  • Total time eclipsed summary graph
  • Drill down to a single query and how execution status
  • Export data into Excel and other formats

A JDBC profiler should be used as part of the development, QA process to catch potential performance issues and in production to help trouble shoot live performance issues.

In Development and QA

In development, JDBC profiler should be used to profile every Web Services call or every single page-turn for web applications, to identify the following potential performance issues,
  • Hibernate is generating the correct SQL (from HQL)
  • Hibernate is loading just right amount of data (use lazy loading whenever possible)
  • Hibernate is generating the correct amount of SQL calls. An abnormal amount of SQL calls per web service call or per web page turn indicates poor design and potential performance issue
  • Look for SQL that is taking long time to execute. Examine the explain plan and make sure the plan makes sense. If the generated sql does not meet requirement, consider rewriting the query or using native SQL or a function or a stored procedure for better performance

In Production

Since Elvyx is not intrusive and does not recompiling application or any other type of special treatment, it is ideal to trouble shoot live production performance issue. Simple deploy, configure, restart, and start troubleshooting.

Tip 2 Understand Transaction Flush Mode

Most people don't understand Hibernate Transaction Flush Mode and what is the most appropriate Flush mode to use. Wrong Transaction Flush mode will lead to huge performance issues.

What is Transaction Flush Mode

Hibernate does not flush every add or update transaction to the database. Rather Hibernate collects them and waits for the right time to flush them all t the database. And the right time is defined by the Transaction Flush mode. There are four Flush mode,

  • Always, the session is flushed every query
  • Commit, the session is flushed when transaction is committed
  • Manuel, the session is flushed manually, i.e., Hibernate will NOT flush session to the database during query or commit time
  • Auto, default Flush mode and yet the most confusion one. The session is flushed before a query is executed or transaction is committed

Why we need transaction Flush mode?

Database transaction is expensive and does not perform well so Hibernate turns auto commit off. Hibernate defers database transaction until the end when all necessary database updates have been made.

For example, in a transaction, we can do the following,
  1. Begin transaction
  2. Create employee A
  3. Create employee B
  4. Associate A with its manager C
  5. Associate B with its manager D
  6. Commit transaction
Instead of 4 separate transactions, we only need a single transaction. Very efficient.

Now, if we change the follow a little,
  1. Begin transaction
  2. Create employee A
  3. Create employee B
  4. Associate A with its manager C
  5. Look up all employees reporting to C
  6. Associate B with its manager D
  7. Look up all employees reporting to D
  8. Commit transaction
If we don't' call transaction flush before step 5 and step 7, we will get incorrect results, because the query results won't include the newly created employee A and B. If we want to include newly created results in the query results before committing them to the database, we must flush the pending transactions (creation of employee A and B) to the database before they can be included in a later query.

Hibernate default Flush mode, AUTO, is designed to be overly cautious and does a database flush every time before executing a query. It is designed to protect novice user but it does come with a hefty performance penalty.

What is the Performance Penalty associated with Database Transaction Flushing

Hibernate does not keep track of which object has been modified in session object. In order to do a proper transaction flush, it must first determine which object has changed in the session by going through ALL the objects in the session and comparing the current object with what's in the database one object at a one. This process is extremely CPU intensive and only gets worse if one has a lot of objects loaded in the session, which is typical in a bulk load/update type of transactions.

Default Flush Mode introduces Performance Problem during Bulk Operations

We had a page that creates a new campaign based on an existing campaign template via a deep copying . A campaign object could contain possibly hundreds of other objects. A typical flow is like the following,

  • Begin Transaction
  • Retrieve the template campaign
  • Shallow copy and save the top level campaign objects
  • For each top level campaign object 
    • Retrieve next level campaign objects
    • Shallow copy and save the secondary level campaign objects
  • Iterate through all nested objects
  • Commit Transaction
A typical copy operation takes 30 minutes. This clearly indicates a performance issue. After further investigation, we traced the problem back to hefty performance cost introduced by database transaction flush.

For each select statement like retrieving the next level campaign objects, Hibernate does a database flush and as the number of objects loaded in the session increases, the time to determine the "dirty" objects increases dramatically. And there is absolute NO need to do database flush, since we are NOT making any changes to existing objects, only creating new ones.

The solution is to switch the default flush mode to COMMIT. This cuts the execution time from 30 minutes to 3 seconds.

So next time if an operation takes abnormal long time to execute and it is not being held up by the database itself, check Hibernate transaction flush mode carefully. Typically I use either MANUEL or COMMIT for any type of bulk operations or read-only operations.

Tip 3 Use Batch Operations

As we have shown before, Hibernate carries huge performance penalty if we execute one query at a time, because of the overhead related to database transaction management. However, we can reduce this cost dramatically if we can batch a set of operations together and carry them out in a single transaction or a single query.
We had a page displaying a grid, which can be sorted or filtered by a set of criteria. The original implementation performs poorly because it is implemented like the following,
  • Select a set of user ids based on the selection criteria
  • Get each user for each returned user id
A must better performant implementation is like,
  • Select a set of user ids based on selection criteria
  • For every 300 user id
    • Select users where user id in (the set of 300 users)
The second implementation is typically 10 to 20 times faster than the first one.

2 comments: