September 02, 2008
One of the engines that I manage contains a particularily nasty table. Let me know if this sounds familiar: This troublemaker is a history table that stores the details of each of the 2 million transactions our platform processes each day. The rows are wide, weighing in at over 900 bytes per row, letting me fit a whopping 2 rows per 2K page. The table contains the most recent six months of history; anything older than six months must be removed nightly to make room for tomorrow's data.
Without data partitioning this table could not exist.
When you partition data, you store your data in multiple partitions inside of one or more dbspaces. Which partition your data is stored in is typically determined by a partitioning scheme.
For example, I can tell IDS to store the transactions for 08/05/2008 in partition ptn20080805 of dbspace transdbs01, 08/06/2008 in partition ptn20080806 of dbspace transdbs02 and 08/07/2008 in partition ptn20080807 of transdbs01.
You may be saying "Whoop-i-dee-doooo, that just seems like more things for me to manage. Why would I bother with this partitioning business?"
And to that I would say:
There still exists a 16 million page limit per table per dbspace in IDS for non partitioned tables.
Without partitioning, my transaction table would be limited to 16 million pages. I can store 2 rows per 2K page, giving me a maximum capacity of 32 million rows. If I add 2 million rows per day to this table, I would be only be able to store 16 days of transaction history--well short of my six-month requirement. Even if I were to take advantage of the Configurable Page Size feature added to IDS 10.0 and used a 16K page size, I would still max out at 256 million rows, leaving me 55 days short of my 183 day goal.
When I partition my data, the 16 million-page limit applies to the partition and not the dbspace. Each partition can hold 16 million pages. If I'm partitioning my data by day, I can now store 32 million transactions per day.
Partition elimination for queries based on transaction dates
I have told the engine to store transactions for each day in their own partition. When my users run a report against last week's data (most reports are over a date range) the engine will eliminate the partitions that don't hold last week's data from the disk scan.
If it takes N seconds to light scan a day's worth of data and I hadn't partitioned the data by day, the report would return in about 183 * N seconds. Partition elimination will only scan the seven paritions that contain the days I'm reporting on allowing the report to return in 7 * N seconds. That's a pretty good ROI for a little bit of partition management overhead.
Partitions can be quickly detached and attached from a table.
The final requirement for my transaction table is that data older than six months must be purged to make room for new data, keeping my table size and disk costs relatively constant.
Partitioning gives me the ability to detach a partition from a table, effectively deleting the 2 million rows for the partition I am detaching from my transaction table in seconds. The detached partition is converted to a table that can then be archived and dropped.
I can then attach a new empty partition to my transaction table to make room for tomorrow's data in seconds.
The benefits of partitioning don't end here. When PDQ is used, multiple partitions can be scanned in parallel and indexes that are partitioned can have fewer levels (an index tree is created for the data in each partition, fewer nodes in the tree can mean fewer index levels), which results in fewer hops between the root node and the leaf nodes and increases performance.
Creating a Partitioned Table
Partitioning a table is simple: a slight modification to the CREATE TABLE statement to tell IDS where the data partitions live and what data lives in those partitions is all that is needed.
create table trans ( trans_id integer, trans_date date, user_id integer, trans_info char(100) ) fragment by expression partition transptn20080808 (trans_date = "08/08/2008") in transdbs01, partition transptn20080807 (trans_date = "08/07/2008") in transdbs01, partition transptn20080806 (trans_date = "08/06/2008") in transdbs01, partition transptn20080805 (trans_date = "08/05/2008") in transdbs01, partition transptn20080804 (trans_date = "08/04/2008") in transdbs01, partition transptn20080803 (trans_date = "08/03/2008") in transdbs01, partition transptn20080802 (trans_date = "08/02/2008") in transdbs01 extent size 1024 next size 1024 lock mode row; create unique index trans_uix1 on trans (trans_id, trans_date); create index trans_ix2 on trans (user_id);
My trans table is now partitioned. The trans_uix1 unique index i have created to support my primary key and the trans_ix2 index will inherit the trans table's partitioning strategy, so the index pages will also live in the transdbs01 partitions. This is an important step that will allow me to quickly detach and attach partitions later.
You do have the ability to store the index pages in different partitions in different dbspaces if you need to and still want to quickly attach and detach partitions. To do this you must explicitly partition the index storage with the same partition expression as the table. Consult the Fragmentation Guidelines of the IDS Performance Guide for more detail.
You know what? Do that anyway. There's far too much information in the guide to be covered here.
Let's take a look at some partition elimination in action.
When I execute the following query with set explain on:
select {+full (trans)}
count(*)
from
trans
where
trans_date between "08/02/2008" and "08/04/2008"
We will get the following sqexplain.out file:
QUERY:
------
select {+full (trans)}
count(*)
from
trans
where
trans_date between "08/02/2008" and "08/04/2008"
DIRECTIVES FOLLOWED:
FULL ( trans )
DIRECTIVES NOT FOLLOWED:
Estimated Cost: 4
Estimated # of Rows Returned: 1
1) informix.trans: SEQUENTIAL SCAN (Serial, fragments: 4, 5, 6)
Filters: (informix.trans.trans_date <= 08/04/2008 AND
informix.trans.trans_date >= 08/02/2008 )
Look at the SEQUENTIAL SCAN. IDS is nice enough to let us know it will only be looking at fragments (partitions) 4, 5 and 6. because fragments 0, 1, 2 and 3 do not contain any data that will meet the trans_date between "08/02/2008" and "08/04/2008" filter condition.
Partition Detach and Attach
What will happen if we attempt to insert a trans row with a trans_date of 08/09/2008? We will get an "Record/Key doesn't qualify for any table/index fragment" error.
What we want to do is detach and drop the oldest data from our trans table and add a new partition to the trans table that will accept this date. Note: You are not required to detach a partition before you attach a new partition, but this is usually the way this functionality is used.
There are two small gotchas regarding unique indexes and primary keys that you need to deal with if you want to take advantage of the partition attach and detach functionality.
If you have a unique index on your table, the columns included in the partition expression (trans_date in our case) must be part of the unique index, which is why we have included trans_date as part of our unique index along with trans_id. This does open us up to the possibility of duplicate trans_ids occurring in our table but this is a risk we are willing to take in order to use the attach and detach functionality.
The second gotcha is that you can not attach to a table with a primary key. If a primary key is absolutely necessary, you can drop the key, attach a partition, and then recreate the primary key.
To detach and drop the transptn20080802 partition
alter fragment on table trans detach partition transptn20080802 transtbl20080802; drop table transtbl20080802;
The detach fragment will remove the transptn20080802 partition from the trans table and convert the partition to a table names transtbl20080802. This table can then be dropped to free up disk space for a new partition.
To attach a new empty partition for 08/09/2008 data
create table transptn20080809 ( trans_id integer, trans_date date, user_id integer, trans_info char(100) ) in transdbs01 extent size 1024; alter table transptn20080809 add constraint check (trans_date = "08/09/2008") constraint transptn20080809_ck1; create unique index transptn20080809_uix1 on transptn20080809 (trans_id, trans_date); create index transptn20080809_ix2 on transptn20080809 (user_id); alter fragment on table trans attach transptn20080809 as partition transptn20080809 (trans_date = "08/09/2008") before transptn20080808;
Voila. You are now ready to accept transaction data for 08/09/2008 in your trans table.
In the next blog, we'll look at a python script that will do all of this partition detaching, dropping, and attaching for you so you can take advantage of partition detach/attach without having to actively manage the day to day dropping of old data and adding empty paritions for tomorrow's data.
Trackback Pings
TrackBack URL for this entry:
http://www.ibmdatabasemag.com/blog/main/archives/2008/09/data_partitioni.html
« DB2 9 DBA certification sample questions: Dual logging and LBAC protection | Main | DB2 9 DBA certification sample questions: Table locks and table space allocation »
This is a public forum. CMP Media and its affiliates are not responsible for and do not control what is posted herein. CMP Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.
Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Media's Terms of Service.
Important Note: This comment area is NOT intended for commercial messages or solicitations of business.
