Topic(s):   Informix

May 05, 2008

Finding the Right Page Size for Your Indexes on Informix
Posted by Andrew Ford @ 05:31 PM ET | May 5, 2008

In my last blog we took a look at storing indexes in a dbspace with a large page size to try to decrease the number of levels in an index and increase performance by reducing the number of I/Os required to traverse the index. Using the largest page size available, 16K, we were able to decrease the number of levels in our test index from 5 to 4, but only realized a slight performance gain if--and only if--we increased our resident segment (buffers) from 512 MB to 1.5 GB to compensate for the larger, less selective page size and larger cache miss penalty of a 16K disk read vs a 2K disk read. I ended the blog with the suggestion that the best solution would be to find the smallest page size that still reduces the number of levels, giving us the benefits of better caching through smaller index pages and the benefits of reduced index levels when using larger page sizes.

Ok, so we want to identify the smallest page size that is large enough to reduce the number of levels in our b-tree index. How do we find this perfect page size? If you want an exact answer, you'll have to build your index using each of the available page sizes and run some onchecks to see how many levels make up the index. Oh, and you'll have to do this periodically as your table grows to make sure everything still fits in the expected number of index levels.

There is a better way. You can determine an approximate, but still very accurate, answer if you turn to the IDS Performance Guide and follow the steps to estimate the size of an index, which also provides a calculation for the number of levels in an index.

Running these calculations by hand for different page sizes and indexes can become very tedious and time consuming. To make this a little more efficient, I wrote the following python script that will compute the number of index levels, number of pages, and total size of the index for each page size based on the number of rows in the table, the size of the index key, the uniqueness of the indexed data, the FILLFACTOR used to build the index and if the table is fragmented or not.

Note: I have deviated slightly from the Performance Guide's recommendations. I use FILLFACTOR to determine the amount of free space per index page up front and use this throughout my calculations and not at the end of the calculation to convert compact index pages to sparse index pages.

    #! /usr/bin/env python24
    
    import math
    
    def main():
            # number of rows in the table
            rows = 50127002
    
            # sum of column size for the index keys
            colsize = 40
    
            # proportion of unique entries to the total number of rows
            # a unique index has a propunique of 1.0
            propunique = 1.0
    
            # fillfactor used to build this index
            fillfactor = 90
    
            # is the table fragmented?
            fragmented = False
    
            # available page sizes
            pagesizes = (2, 4, 6, 8, 10, 12, 14, 16)
    
            # propunique must be between 0.01 and 1.0
            if propunique > 1.0:
                    propunique = 1.0
            elif propunique < 0.01:
                    propunique = 0.01
    
            # calculate actual size of and index key
            keysize = colsize + 4
    
            # calculate entrysize differently if table is fragmented
            if fragmented == True:
                    entrysize = (keysize * propunique) + 9 + 4
            else:
                    entrysize = (keysize * propunique) + 5 + 4
    
            print
            print "number of rows: %d" % (rows)
            print "index key size: %d bytes" % (colsize)
            print "FILLFACTOR    : %d" % (fillfactor)
            print "fragmented    : %s" % (fragmented)
            print "prop unique   : %f" % (propunique)
            print
    
            # for each available page size, calcuate number of levels 
            # and number of index pages
            for pagesize in pagesizes:
                    # free space in each page
                    pagefree = ((pagesize * 1024) - 28) * (fillfactor / 100.0)
    
                    # number of index keys that can fit on a page
                    pagents = math.floor(pagefree / entrysize)
    
                    # number of leaf nodes is the number of page entries it takes to hold all the rows
                    nodes = math.ceil(rows / pagents)
    
                    # initialize counts for total number of nodes and total number of levels
                    num_nodes = nodes
                    num_levels = 1
    
                    # calculate the number of node entries that fit on a branch page
                    node_ents = math.floor(pagefree / (keysize + 4) + 4)
    
                    # calculate the number of branch pages in the next level of the index
                    # until we reach the root node
                    while nodes > 1:
                            nodes = math.ceil(nodes / node_ents)
    
                            num_nodes = num_nodes + nodes
                            num_levels = num_levels + 1
    
                    print "page size: %2dK - index levels: %1d - index pages: %9d - index size: %9dK" % \
                    (pagesize, num_levels, num_nodes, num_nodes * pagesize)
    
    if __name__ == "__main__":
            main()
    

For the table and index we tested in the last blog, we get:

    number of rows: 50127002
    index key size: 40 bytes
    FILLFACTOR    : 90
    fragmented    : False
    prop unique   : 1.000000
    
    page size:  2K - index levels: 5 - index pages:   1511185 - index size:   3022370K
    page size:  4K - index levels: 5 - index pages:    735677 - index size:   2942708K
    page size:  6K - index levels: 4 - index pages:    490831 - index size:   2944986K
    page size:  8K - index levels: 4 - index pages:    365570 - index size:   2924560K
    page size: 10K - index levels: 4 - index pages:    291247 - index size:   2912470K
    page size: 12K - index levels: 4 - index pages:    242037 - index size:   2904444K
    page size: 14K - index levels: 4 - index pages:    207903 - index size:   2910642K
    page size: 16K - index levels: 4 - index pages:    181551 - index size:   2904816K
    

It looks like the best page size for this index is 6K, if I actually build the index in a 6K page dbspace and run oncheck -pT against the new index I get:

    
    Index Usage Report for index index_6k_pk on idxpgsztest:informix.testtable
    
                        Average    Average
        Level    Total No. Keys Free Bytes
        ----- -------- -------- ----------
            1        1       37       4380
            2       37      106       1020
            3     3929      113        648
            4   447565      111        628
        ----- -------- -------- ----------
        Total   451532      112        628
    

4 index levels and 451532 index pages, pretty close to our estimate of 4 index levels and 490831 index pages.

Let's test the performance of a 6K page for indexes against last blog's tests of 512MB of buffer space with 75% going to index buffers and 25% going to data page buffers.

Test #1 - Index Page Size 6K, 65536 6K buffers, Data Page Size 2K, 65536 2K buffers

    thread1 complete - 203115 transactions in 599.941735 seconds, 338.557877 transactions per second
    

If you remember from the last blog, using a 2K page dbspace for the index we processed about 325.34 transactions per second. By simply moving our index from a 2K page dbspace that required 5 index levels to a 6K page dbspace that required only 4 index levels, we have increased our transaction rate by 13.211164 transactions per second or 4%. Not anything really special, but we didn't pay anything for this 4% increase so that's a pretty good ROI.

What would happen if we gave the 6K page buffer pool the same 1.5 GB of buffer space? I'm glad you asked.

Test #2 - Index Page Size 6K, 262144 6K buffers, Data Page Size 2K, 65536 buffers

    thread1 complete - 224913 transactions in 599.939488 seconds, 374.892809 transactions per second
    

Using the same 1.5GB of buffer space for indexes we increased our transactions per second by 20.998652 or 6% over a 16K page dbspace when using a 6K page dbspace.

So what did we prove? We proved that you can get a free performance improvement by simply creating your indexes in a dbspace with a page size that optimizes the b-tree structure without compromising the buffer cache hit ratios. Are you going to be able to do this kind of analysis for every index in your system? Doubtful, but if you focus on the heavily used indexes of your system you can see some real improvement in performance.

Trackback Pings

TrackBack URL for this entry:
http://www.ibmdatabasemag.com/blog/main/archives/2008/05/finding_the_rig.html

« IIUG Power Conference Update: Winds of Change | Main | DB2 Certification: Creating objects from the Control Center and understanding Import command options »





This is a public forum. CMP Media and its affiliates are not responsible for and do not control what is posted herein. CMP Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Media's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.



CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
RECENT JOB POSTINGS
CAREER NEWS
10 Search Engines You Don't Know About
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Subscribe to the new digital version of IBM Database Magazine
New Digital Version

Sponsored links:



Subscribe to the IBM Database Magazine Newsletter

Email Address *
First Name
Last Name
HTML Preference
HTML Text
 

Fields with * are required.

 




Visit these other IBM and TechWeb Partner Sites: :
Maximizing ROI Through Business Process Management (BPM) and Service-Oriented Architecture (SOA)
Internet Evolution – The Macrosite for News, Analysis, & Opinion About the Future of the Internet
Business Innovation – Technology Strategies and Solutions for Driving Business Success