June 18, 2008
Last month we explored some of the basic, but powerful, funtionality of the Informix High Performance Loader (HPL) and the CLI configuration utility onpladm. Today we will be looking at some advanced uses of HPL that will allow you to simultaneously unload and load data without the need for large intermediate data files.
In the last HPL blog we performed an unload of a large table to an unload file on the filesystem and then used that unload file to load data into a new table. If you wanted to reorg a table to reclaim unused space and minimize extents and you have enough filesystem space to hold the unload file, this would be one of your options.
What do you do if you don't have a lot of filesystem space but you want to use HPL to perform a table reorg? If you are running on a Unix like OS you can take advantage of named pipes and avoid these large files.
A named pipe is a special file created with the mkfifo command that can take the output of one process and redirect it to another. Here is an example of a named pipe.
In one shell run:
> mkfifo /home/informix/blog/blog.p > ll /home/informix/blog/blog.p prw-r--r-- 1 informix informix 0 Jun 18 08:29 /home/informix/blog/blog.p > > echo "Output from Process 1" > /home/informix/blog/blog.p
The echo command will not complete until you run the following in another shell:
> cat /home/informix/blog/blog.p Output from Process 1
We can use these named pipes to pass data between two seperate onpload processes. The first onpload process will unload a stream of data to the named pipe and the second onpload process will read the stream of data from the named pipe and load it into our target table.
The key is to use 2 device arrays of type PIPE and different PIPECOMMANDs. The unload PIPECOMMAND will send data to the named pipe via "cat - > /home/informix/blog/blog.p". Instead of sending the output of the onpload unload to a file onpload will send the output as input to the cat command which will then pass the output to the named pipe (much like our echo "Output from Process 1" example.) The load PIPECOMMAND will pull the data from the named pipe via "cat /home/informix/blog/blog.p".
> cat blog1.spec
BEGIN OBJECT DEVICEARRAY blog1_unl_devarrary
BEGIN SEQUENCE
TYPE PIPE
FILE
TAPEBLOCKSIZE 0
TAPEDEVICESIZE 0
PIPECOMMAND "cat - > /home/informix/blog/blog.p"
END SEQUENCE
END OBJECT
BEGIN OBJECT DEVICEARRAY blog1_load_devarrary
BEGIN SEQUENCE
TYPE PIPE
FILE
TAPEBLOCKSIZE 0
TAPEDEVICESIZE 0
PIPECOMMAND "cat /home/informix/blog/blog.p"
END SEQUENCE
END OBJECT
BEGIN OBJECT PROJECT blog1
END OBJECT
BEGIN OBJECT QUERY blog1_query
PROJECT blog1
DATABASE blog
SELECTSTATEMENT "select * from unload_table"
END OBJECT
BEGIN OBJECT FASTUNLOADJOB blog1_unload
PROJECT blog1
DEVICE blog1_unl_devarrary
SERVER cheetah03
DATABASE blog
QUERY blog1_query
END OBJECT
BEGIN OBJECT FASTLOADJOB blog1_load
PROJECT blog1
DEVICE blog1_load_devarrary
SERVER cheetah03
DATABASE blog
TABLE load_table
END OBJECT
> onpladm create object -F blog1.spec
Successfully created object DEVICEARRAY blog1_unl_devarrary
Successfully created object DEVICEARRAY blog1_load_devarrary
Successfully created object PROJECT blog1
Successfully created object QUERY blog1_query
Successfully created object FASTUNLOADJOB blog1_unload
Successfully created object FASTLOADJOB blog1_load
In seperate shells, start the onpload unload job and then start the onpload load job.
> onpload -p blog1 -j blog1_unload -fu Tue Jun 17 10:39:22 2008 SHMBASE 0x44000000 CLIENTNUM 0x49010000 Session ID 1 Unload Database -> blog Query Name -> AUTO.1 Device Array -> blog1_unl_devarrary Query Mapping -> AUTO.1 Query -> select * from unload_table 10:39:24 Records Processed -> 10634 10:39:34 Records Processed -> 21268 10:39:35 Records Processed -> 31902 10:39:35 Records Processed -> 42536 10:39:36 Records Processed -> 53170 10:39:36 Records Processed -> 63804 10:39:37 Records Processed -> 74438 10:39:37 Records Processed -> 85072 10:39:37 Records Processed -> 95706 10:39:37 Records Processed -> 106340 10:39:37 Records Processed -> 116974 ... 10:42:51 Records Processed -> 9602502 10:42:51 Records Processed -> 9613136 10:42:52 Records Processed -> 9623770 10:42:52 Records Processed -> 9634404 10:42:52 Records Processed -> 9645038 10:42:52 Records Processed -> 9655672 10:42:52 Records Processed -> 9666306 10:42:53 Records Processed -> 9676940 Database Unload Completed -- Unloaded 9680403 Records Detected 0 Errors Tue Jun 17 10:42:53 2008 |
> onpload -p blog1 -j blog1_load -fl Tue Jun 17 10:39:34 2008 SHMBASE 0x44000000 CLIENTNUM 0x49020000 Session ID 2 Load Database -> blog Load Table -> load_table Device Array -> blog1_load_devarrary Record Mapping -> AUTO.2 10:39:36 Records Processed -> 10634 10:39:36 Records Processed -> 21268 10:39:36 Records Processed -> 31902 10:39:37 Records Processed -> 42536 10:39:37 Records Processed -> 53170 10:39:37 Records Processed -> 63804 10:39:38 Records Processed -> 74438 10:39:38 Records Processed -> 85072 10:39:38 Records Processed -> 95706 10:39:38 Records Processed -> 106340 10:39:38 Records Processed -> 116974 ... 10:42:51 Records Processed -> 9602502 10:42:51 Records Processed -> 9613136 10:42:52 Records Processed -> 9623770 10:42:52 Records Processed -> 9634404 10:42:52 Records Processed -> 9645038 10:42:52 Records Processed -> 9655672 10:42:52 Records Processed -> 9666306 10:42:52 Records Processed -> 9676940 Table 'load_table' will be read-only until level 0 archive Database Load Completed -- Processed 9680403 Records Records Inserted-> 9680403 Detected Errors--> 0 Engine Rejected--> 0 Tue Jun 17 10:42:54 2008 |
Ta-da! The PIPE device array type opens up a whole new world of HPL wonderfulness. We can take this example a step furture and send data across a network to a different server. This is a common task for moving production data to test systems or performing migrations using HPL.
The magic here is similar to the previous example. 1 device array setup on the target server will pull data from the blog.p named pipe for the load just like before. The device array on the source server for the unload will redirect the unload output to ssh which will connect to the target server and write the ssh input (the output of the unload) to the named pipe on the target server.
For the following example to work, your login from the source server to the target server must be trusted, i.e. you can ssh, rsh, whateversh from the source to the target without being prompted for a password. I'm using ssh in these examples with trusts setup using ssh-keygen and the ~/.ssh/authorized_keys2 file. Talk to your SysAdmin about setting up trusted logins in your environment.
On the source server
source> cat blog2.source.spec
BEGIN OBJECT DEVICEARRAY blog2_unl_devarrary
BEGIN SEQUENCE
TYPE PIPE
FILE
TAPEBLOCKSIZE 0
TAPEDEVICESIZE 0
PIPECOMMAND "ssh cheetah04 'cat - > /home/informix/blog/blog.p'"
END SEQUENCE
END OBJECT
BEGIN OBJECT PROJECT blog2
END OBJECT
BEGIN OBJECT QUERY blog2_query
PROJECT blog2
DATABASE blog
SELECTSTATEMENT "select * from unload_table"
END OBJECT
BEGIN OBJECT FASTUNLOADJOB blog2_unload
PROJECT blog2
DEVICE blog2_unl_devarrary
SERVER cheetah03
DATABASE blog
QUERY blog2_query
END OBJECT
source> onpladm create object -F blog2.source.spec
Successfully created object DEVICEARRAY blog2_unl_devarrary
Successfully created object PROJECT blog2
Successfully created object QUERY blog2_query
Successfully created object FASTUNLOADJOB blog2_unload
On the target server
target> cat blog2.target.spec
BEGIN OBJECT DEVICEARRAY blog2_load_devarrary
BEGIN SEQUENCE
TYPE PIPE
FILE
TAPEBLOCKSIZE 0
TAPEDEVICESIZE 0
PIPECOMMAND "cat /home/informix/blog/blog.p"
END SEQUENCE
END OBJECT
BEGIN OBJECT PROJECT blog2
END OBJECT
BEGIN OBJECT FASTLOADJOB blog2_load
PROJECT blog2
DEVICE blog2_load_devarrary
SERVER cheetah04
DATABASE blog
TABLE load_table
END OBJECT
target> onpladm create object -F blog2.target.spec
Successfully created object DEVICEARRAY blog2_load_devarrary
Successfully created object PROJECT blog2
Successfully created object FASTLOADJOB blog2_load
Start the onpload unload job on the source server and then start the onpload load job on the target server.
source> onpload -p blog2 -j blog2_unload -fu Wed Jun 18 04:11:21 2008 SHMBASE 0x44000000 CLIENTNUM 0x49010000 Session ID 1 Unload Database -> blog Query Name -> AUTO.1 Device Array -> blog2_unl_devarrary Query Mapping -> AUTO.1 Query -> select * from unload_table 04:11:23 Records Processed -> 10634 04:12:00 Records Processed -> 21268 04:12:00 Records Processed -> 31902 04:12:01 Records Processed -> 42536 04:12:01 Records Processed -> 53170 04:12:01 Records Processed -> 63804 04:12:02 Records Processed -> 74438 04:12:02 Records Processed -> 85072 04:12:02 Records Processed -> 95706 04:12:03 Records Processed -> 106340 04:12:03 Records Processed -> 116974 ... 04:15:34 Records Processed -> 9591868 04:15:34 Records Processed -> 9602502 04:15:35 Records Processed -> 9613136 04:15:35 Records Processed -> 9623770 04:15:35 Records Processed -> 9634404 04:15:35 Records Processed -> 9645038 04:15:36 Records Processed -> 9655672 04:15:36 Records Processed -> 9666306 04:15:36 Records Processed -> 9676940 Database Unload Completed -- Unloaded 9680403 Records Detected 0 Errors Wed Jun 18 04:15:36 2008 |
target> onpload -p blog2 -j blog2_load -fl Wed Jun 18 04:13:07 2008 SHMBASE 0x44000000 CLIENTNUM 0x49010000 Session ID 1 Load Database -> blog Load Table -> load_table Device Array -> blog2_load_devarrary Record Mapping -> AUTO.1 04:13:10 Records Processed -> 10634 04:13:10 Records Processed -> 21268 04:13:10 Records Processed -> 31902 04:13:11 Records Processed -> 42536 04:13:11 Records Processed -> 53170 04:13:11 Records Processed -> 63804 04:13:12 Records Processed -> 74438 04:13:12 Records Processed -> 85072 04:13:13 Records Processed -> 95706 04:13:13 Records Processed -> 106340 04:13:14 Records Processed -> 116974 ... 04:16:42 Records Processed -> 9591868 04:16:43 Records Processed -> 9602502 04:16:43 Records Processed -> 9613136 04:16:43 Records Processed -> 9623770 04:16:43 Records Processed -> 9634404 04:16:43 Records Processed -> 9645038 04:16:44 Records Processed -> 9655672 04:16:44 Records Processed -> 9666306 04:16:44 Records Processed -> 9676940 Table 'load_table' will be read-only until level 0 archive Database Load Completed -- Processed 9680403 Records Records Inserted-> 9680403 Detected Errors--> 0 Engine Rejected--> 0 Wed Jun 18 04:16:46 2008 |
If you are dealing with a slow network you can gain some performance by compressing the unload data before you send it across the network and uncompress it before writing to the named pipe. A simple change to the unload device array command line is all that is needed
PIPECOMMAND "gzip -9c | ssh cheetah04 'gunzip > /home/informix/blog/blog.p'"
In Part 3 of the HPL blog series we will learn how to use maps and formats, necessary objects if we want to move data between tables with different structures.
Trackback Pings
TrackBack URL for this entry:
http://www.ibmdatabasemag.com/blog/main/archives/2008/06/the_informix_hi_1.html
« DB2 LUW Performance: The DNA Test of Performance Accountability | Main | DB2 LUW Performance: Statement Analysis Introduction - Hunting Elephants and Mosquitoes »
This is a public forum. CMP Media and its affiliates are not responsible for and do not control what is posted herein. CMP Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.
Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Media's Terms of Service.
Important Note: This comment area is NOT intended for commercial messages or solicitations of business.
