April 17, 2008
In version 10.0 of Informix Dynamic Server we were given the ability configure the page size of the data stored in a dbspace. For my Linux-based servers this means I have the option to store data and index pages on 2K, 4K, 6K, 8K, 10K, 12K, 14K and 16K pages instead of the previous option of just 2K pages. That's all fine and dandy, but how can we put this to good use?
One way would be to store data pages more efficiently. If I have a wide table that takes up 1007 bytes then I can only store 1 data row on a 2K page.
data rows per page = truncate((page size in bytes - 28) / (row size in bytes + 4)) data rows per 2K page = truncate((2048 - 28) / (1007 + 4)) = truncate(1.998) = 1
If I move this table to a 4K page I can store 4 data rows on a 4K page
data rows per 4K page = truncate((4096 - 28) / (1007 + 4)) = truncate(4.023) = 4
Wow. I can store twice as much data in the same amount of space by simply using a dbspace with a larger page size and I can query the same amount of data in half as many IOs. This benefit is a no brainer and you should check your system for similar situations.
What happens if we put our indexes on larger page sizes, can we get any performance benefit? At first glance the answer would be an easy yes. Under the covers the index structure is a balanced tree of pages. At the very top of the tree you have the root node and at the very bottom of the tree you have the leaf nodes and in between you have levels of nodes that connect the two. When you use an index to get to a data page you start at the root node and traverse the tree down through the levels until you reach the leaf node that has the address of the data page or pages that match your search criteria.
When you increase the page size of the pages your index nodes are stored on you can fit more keys per node, resulting in fewer nodes in your index tree. Fewer nodes can mean fewer levels in your index tree, fewer levels in your tree means fewer hops to get from the root node to your leaf nodes, fewer hops should mean fewer IOs and fewer IOs should mean better performance. This kind of thinking has led some (myself included) to suggest that it would be a good idea to always place indexes in a 16K dbspace because you will reduce the height of your index trees and therefore must increase performance.
If we only had some way of testing this. Oh yeah, we can modify the load testing python script from my previous blog to test this theory against production data on our test platform to get some real world results to see if we're right.
Our test table contains 50,127,002 rows stored in a 2K dbspace
Our test table has a primary key consisting of 3 fixed length character fields, we will vary the page size of this index
The load tester will read the 3 fields of the primary key from a flat file generated from production logs to simulate real world access patterns
The load tester will attempt to select a row based on the primary key and if a row is found update the row
The load tester will only run 1 thread
The database engine will be bounced before each run to clear the buffer cache
The load tester will run for 20 minutes before collecting statistics and then collect statistics for 10 minutes in order to simulate a long running system with the appropriate pages cached in the buffers
The amount of memory available for buffers is 512MB
Test #1 - Index Page Size 2K, 262144 2K Buffers
First lets look at an oncheck -pT output to determine the number of levels in the 2K index
Index Usage Report for index index_2k_pk on idxpgsztest:informix.testtable
Average Average
Level Total No. Keys Free Bytes
----- -------- -------- ----------
1 1 29 668
2 29 34 381
3 991 36 246
4 36617 36 244
5 1354786 36 207
----- -------- -------- ----------
Total 1392424 36 208
Each time we use this index to find a data page we will read 5 index pages.
Running the load tester gives us
thread1 complete - 195188 transactions in 599.938441 seconds, 325.346713 transactions per second
Lets see how the same index in a 16K page dbspace against the same data stacks up. The 16K page dbspace will need its own set of buffers, for this test I have given 25% of the 512MB to the 2K page dbspace (data) and 75% of the 512MB to the 16K page dbspace (index)
Test #2 - Index Page Size 16K, 24576 16K buffers, 65536 2K Buffers
Checking the oncheck -pT output we see that a 16K page size for this index has reduced the number of index levels from 5 to 4.
Index Usage Report for index index_16k_pk on idxpgsztest:informix.testtable
Average Average
Level Total No. Keys Free Bytes
----- -------- -------- ----------
1 1 5 16156
2 5 109 11103
3 548 304 1720
4 167093 299 1656
----- -------- -------- ----------
Total 167647 300 1656
Great. When we use this index to find a data page we should now look at only 4 index nodes instead of 5. Surely this will result in a higher transaction per second rate.
Running the load tester gives us
thread1 complete - 187557 transactions in 599.933167 seconds, 312.629823 transactions per second
Uh oh, wait a minute. What is this all about? I was supposed to improve performance because I have fewer levels in my index but I actually processed 12.7 transactions per second fewer when using a 16K page size for my indexes.
We have drastically reduced the number of buffer pages we can cache and this is causing problems. Even though we are still buffering the same amount of data (512MB) we are buffering different, possibly not needed, data because we are bringing it in from the disk in 16K chunks and not the more selective 2K chunks resulting in a lower cache hit rate. Coupled with the fact that a cache miss causing a 16K read is a lot more expensive than a cache miss causing a 2K read you can see how the performance gains of the fewer index levels can become overshadowed if buffer memory is not increased to compensate for the larger page sizes. Seems obvious now.
If this is true, then we should be able to improve performance over the 2K index page if we give the 16K buffer pool enough buffer space.
Test #3 - Index Page Size 16K, 98304 16K buffers, 65536 2K Buffers
Running the load tester gives us
thread1 complete - 212317 transactions in 599.944915 seconds, 353.894157 transactions per second
An increase of 28.54 transactions per second, about and 8% improvement for the cost of an additional 1GB of buffer space.
So what is the moral of the story? Blindly placing all of your indexes in a 16K page dbspace to attempt to reduce the number of index levels and increase performance may not yield the results you think it will so you should test any theories you have before implementing them.
One thing that would be interesting to look at would be to find the smallest page size that still decreases the number of index levels. This would give the best of both worlds, a more selective page size to improve cache hit rate and fewer index levels to reduce the number of IOs. Maybe I can work on that for the next blog.
Trackback Pings
TrackBack URL for this entry:
http://www.ibmdatabasemag.com/blog/main/archives/2008/04/configuragle_pa.html
« DB2 DBA Certification Sample Questions: Backing up a database without affecting workloads; understanding privileges | Main | A Web 2.0 Expo debate: creativity vs. control in the enterprise »
This is a public forum. CMP Media and its affiliates are not responsible for and do not control what is posted herein. CMP Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.
Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Media's Terms of Service.
Important Note: This comment area is NOT intended for commercial messages or solicitations of business.
