Topic(s):   Informix

June 18, 2008

The Informix High Performance Loader - Part 2
Posted by Andrew Ford @ 07:52 PM ET | Jun 18, 2008

Last month we explored some of the basic, but powerful, funtionality of the Informix High Performance Loader (HPL) and the CLI configuration utility onpladm. Today we will be looking at some advanced uses of HPL that will allow you to simultaneously unload and load data without the need for large intermediate data files.

In the last HPL blog we performed an unload of a large table to an unload file on the filesystem and then used that unload file to load data into a new table. If you wanted to reorg a table to reclaim unused space and minimize extents and you have enough filesystem space to hold the unload file, this would be one of your options.

What do you do if you don't have a lot of filesystem space but you want to use HPL to perform a table reorg? If you are running on a Unix like OS you can take advantage of named pipes and avoid these large files.

A named pipe is a special file created with the mkfifo command that can take the output of one process and redirect it to another. Here is an example of a named pipe.

In one shell run:

    > mkfifo /home/informix/blog/blog.p
    > ll /home/informix/blog/blog.p
    prw-r--r-- 1 informix informix 0 Jun 18 08:29 /home/informix/blog/blog.p
    >
    > echo "Output from Process 1" > /home/informix/blog/blog.p
    

The echo command will not complete until you run the following in another shell:

    > cat /home/informix/blog/blog.p
    Output from Process 1
    

We can use these named pipes to pass data between two seperate onpload processes. The first onpload process will unload a stream of data to the named pipe and the second onpload process will read the stream of data from the named pipe and load it into our target table.

The key is to use 2 device arrays of type PIPE and different PIPECOMMANDs. The unload PIPECOMMAND will send data to the named pipe via "cat - > /home/informix/blog/blog.p". Instead of sending the output of the onpload unload to a file onpload will send the output as input to the cat command which will then pass the output to the named pipe (much like our echo "Output from Process 1" example.) The load PIPECOMMAND will pull the data from the named pipe via "cat /home/informix/blog/blog.p".

    > cat blog1.spec
    BEGIN OBJECT DEVICEARRAY blog1_unl_devarrary
            BEGIN SEQUENCE
                    TYPE                    PIPE
                    FILE
                    TAPEBLOCKSIZE           0
                    TAPEDEVICESIZE          0
                    PIPECOMMAND             "cat - > /home/informix/blog/blog.p"
            END SEQUENCE
    END OBJECT
    
    BEGIN OBJECT DEVICEARRAY blog1_load_devarrary
            BEGIN SEQUENCE
                    TYPE                    PIPE
                    FILE
                    TAPEBLOCKSIZE           0
                    TAPEDEVICESIZE          0
                    PIPECOMMAND             "cat /home/informix/blog/blog.p"
            END SEQUENCE
    END OBJECT
    
    BEGIN OBJECT PROJECT blog1
    END OBJECT
    
    BEGIN OBJECT QUERY blog1_query
            PROJECT                 blog1
            DATABASE                blog
            SELECTSTATEMENT         "select * from unload_table"
    END OBJECT
    
    BEGIN OBJECT FASTUNLOADJOB blog1_unload
            PROJECT                 blog1
            DEVICE                  blog1_unl_devarrary
            SERVER                  cheetah03
            DATABASE                blog
            QUERY                   blog1_query
    END OBJECT
    
    BEGIN OBJECT FASTLOADJOB blog1_load
            PROJECT                 blog1
            DEVICE                  blog1_load_devarrary
            SERVER                  cheetah03
            DATABASE                blog
            TABLE                   load_table
    END OBJECT
    
    > onpladm create object -F blog1.spec
    Successfully created object DEVICEARRAY   blog1_unl_devarrary
    Successfully created object DEVICEARRAY   blog1_load_devarrary
    Successfully created object PROJECT   blog1
    Successfully created object QUERY   blog1_query
    Successfully created object FASTUNLOADJOB   blog1_unload
    Successfully created object FASTLOADJOB   blog1_load
    
    

In seperate shells, start the onpload unload job and then start the onpload load job.

    > onpload -p blog1 -j blog1_unload -fu
    Tue Jun 17 10:39:22 2008
    
    SHMBASE        0x44000000
    CLIENTNUM      0x49010000
    Session ID 1
    
    Unload Database -> blog
    Query Name      -> AUTO.1
    Device Array    -> blog1_unl_devarrary
    Query Mapping   -> AUTO.1
    Query           -> select * from unload_table
    10:39:24 Records Processed ->  10634
    10:39:34 Records Processed ->  21268
    10:39:35 Records Processed ->  31902
    10:39:35 Records Processed ->  42536
    10:39:36 Records Processed ->  53170
    10:39:36 Records Processed ->  63804
    10:39:37 Records Processed ->  74438
    10:39:37 Records Processed ->  85072
    10:39:37 Records Processed ->  95706
    10:39:37 Records Processed ->  106340
    10:39:37 Records Processed ->  116974
    ...
    10:42:51 Records Processed ->  9602502
    10:42:51 Records Processed ->  9613136
    10:42:52 Records Processed ->  9623770
    10:42:52 Records Processed ->  9634404
    10:42:52 Records Processed ->  9645038
    10:42:52 Records Processed ->  9655672
    10:42:52 Records Processed ->  9666306
    10:42:53 Records Processed ->  9676940
    
    Database Unload Completed -- 
    Unloaded 9680403 Records Detected 0 Errors
    Tue Jun 17 10:42:53 2008
    
    > onpload -p blog1 -j blog1_load -fl
    Tue Jun 17 10:39:34 2008
    
    SHMBASE        0x44000000
    CLIENTNUM      0x49020000
    Session ID 2
    
    Load Database   -> blog
    Load Table      -> load_table
    Device Array    -> blog1_load_devarrary
    Record Mapping  -> AUTO.2
    10:39:36 Records Processed ->  10634
    10:39:36 Records Processed ->  21268
    10:39:36 Records Processed ->  31902
    10:39:37 Records Processed ->  42536
    10:39:37 Records Processed ->  53170
    10:39:37 Records Processed ->  63804
    10:39:38 Records Processed ->  74438
    10:39:38 Records Processed ->  85072
    10:39:38 Records Processed ->  95706
    10:39:38 Records Processed ->  106340
    10:39:38 Records Processed ->  116974
    ...
    10:42:51 Records Processed ->  9602502
    10:42:51 Records Processed ->  9613136
    10:42:52 Records Processed ->  9623770
    10:42:52 Records Processed ->  9634404
    10:42:52 Records Processed ->  9645038
    10:42:52 Records Processed ->  9655672
    10:42:52 Records Processed ->  9666306
    10:42:52 Records Processed ->  9676940
    Table 'load_table' will be read-only 
    until level 0 archive
    
    Database Load Completed -- 
    Processed 9680403 Records
    Records Inserted-> 9680403
    Detected Errors--> 0
    Engine Rejected--> 0
    
    Tue Jun 17 10:42:54 2008
    

Ta-da! The PIPE device array type opens up a whole new world of HPL wonderfulness. We can take this example a step furture and send data across a network to a different server. This is a common task for moving production data to test systems or performing migrations using HPL.

The magic here is similar to the previous example. 1 device array setup on the target server will pull data from the blog.p named pipe for the load just like before. The device array on the source server for the unload will redirect the unload output to ssh which will connect to the target server and write the ssh input (the output of the unload) to the named pipe on the target server.

For the following example to work, your login from the source server to the target server must be trusted, i.e. you can ssh, rsh, whateversh from the source to the target without being prompted for a password. I'm using ssh in these examples with trusts setup using ssh-keygen and the ~/.ssh/authorized_keys2 file. Talk to your SysAdmin about setting up trusted logins in your environment.

On the source server

    source> cat blog2.source.spec
    BEGIN OBJECT DEVICEARRAY blog2_unl_devarrary
            BEGIN SEQUENCE
                    TYPE                    PIPE
                    FILE
                    TAPEBLOCKSIZE           0
                    TAPEDEVICESIZE          0
                    PIPECOMMAND             "ssh cheetah04 'cat - > /home/informix/blog/blog.p'"
            END SEQUENCE
    END OBJECT
    
    BEGIN OBJECT PROJECT blog2
    END OBJECT
    
    BEGIN OBJECT QUERY blog2_query
            PROJECT                 blog2
            DATABASE                blog
            SELECTSTATEMENT         "select * from unload_table"
    END OBJECT
    
    BEGIN OBJECT FASTUNLOADJOB blog2_unload
            PROJECT                 blog2
            DEVICE                  blog2_unl_devarrary
            SERVER                  cheetah03
            DATABASE                blog
            QUERY                   blog2_query
    END OBJECT
    
    source> onpladm create object -F blog2.source.spec
    
    Successfully created object DEVICEARRAY   blog2_unl_devarrary
    Successfully created object PROJECT   blog2
    Successfully created object QUERY   blog2_query
    Successfully created object FASTUNLOADJOB   blog2_unload
    

On the target server

    target> cat blog2.target.spec
    BEGIN OBJECT DEVICEARRAY blog2_load_devarrary
            BEGIN SEQUENCE
                    TYPE                    PIPE
                    FILE
                    TAPEBLOCKSIZE           0
                    TAPEDEVICESIZE          0
                    PIPECOMMAND             "cat /home/informix/blog/blog.p"
            END SEQUENCE
    END OBJECT
    
    BEGIN OBJECT PROJECT blog2
    END OBJECT
    
    BEGIN OBJECT FASTLOADJOB blog2_load
            PROJECT                 blog2
            DEVICE                  blog2_load_devarrary
            SERVER                  cheetah04
            DATABASE                blog
            TABLE                   load_table
    END OBJECT
    
    target> onpladm create object -F blog2.target.spec
    
    Successfully created object DEVICEARRAY   blog2_load_devarrary
    Successfully created object PROJECT   blog2
    Successfully created object FASTLOADJOB   blog2_load
    

Start the onpload unload job on the source server and then start the onpload load job on the target server.

    source> onpload -p blog2 -j blog2_unload -fu
    Wed Jun 18 04:11:21 2008
    
    SHMBASE        0x44000000
    CLIENTNUM      0x49010000
    Session ID 1
    
    Unload Database -> blog
    Query Name      -> AUTO.1
    Device Array    -> blog2_unl_devarrary
    Query Mapping   -> AUTO.1
    Query           -> select * from unload_table
    04:11:23 Records Processed ->  10634
    04:12:00 Records Processed ->  21268
    04:12:00 Records Processed ->  31902
    04:12:01 Records Processed ->  42536
    04:12:01 Records Processed ->  53170
    04:12:01 Records Processed ->  63804
    04:12:02 Records Processed ->  74438
    04:12:02 Records Processed ->  85072
    04:12:02 Records Processed ->  95706
    04:12:03 Records Processed ->  106340
    04:12:03 Records Processed ->  116974
    ...
    04:15:34 Records Processed ->  9591868
    04:15:34 Records Processed ->  9602502
    04:15:35 Records Processed ->  9613136
    04:15:35 Records Processed ->  9623770
    04:15:35 Records Processed ->  9634404
    04:15:35 Records Processed ->  9645038
    04:15:36 Records Processed ->  9655672
    04:15:36 Records Processed ->  9666306
    04:15:36 Records Processed ->  9676940
    
    Database Unload Completed -- 
    Unloaded 9680403 Records  Detected 0 Errors
    Wed Jun 18 04:15:36 2008
    
    target> onpload -p blog2 -j blog2_load -fl
    Wed Jun 18 04:13:07 2008
    
    SHMBASE        0x44000000
    CLIENTNUM      0x49010000
    Session ID 1
    
    Load Database   -> blog
    Load Table      -> load_table
    Device Array    -> blog2_load_devarrary
    Record Mapping  -> AUTO.1
    04:13:10 Records Processed ->  10634
    04:13:10 Records Processed ->  21268
    04:13:10 Records Processed ->  31902
    04:13:11 Records Processed ->  42536
    04:13:11 Records Processed ->  53170
    04:13:11 Records Processed ->  63804
    04:13:12 Records Processed ->  74438
    04:13:12 Records Processed ->  85072
    04:13:13 Records Processed ->  95706
    04:13:13 Records Processed ->  106340
    04:13:14 Records Processed ->  116974
    ...
    04:16:42 Records Processed ->  9591868
    04:16:43 Records Processed ->  9602502
    04:16:43 Records Processed ->  9613136
    04:16:43 Records Processed ->  9623770
    04:16:43 Records Processed ->  9634404
    04:16:43 Records Processed ->  9645038
    04:16:44 Records Processed ->  9655672
    04:16:44 Records Processed ->  9666306
    04:16:44 Records Processed ->  9676940
    Table 'load_table' will be read-only 
    until level 0 archive
    
    Database Load Completed -- 
    Processed 9680403 Records
    Records Inserted-> 9680403
    Detected Errors--> 0
    Engine Rejected--> 0
    
    Wed Jun 18 04:16:46 2008
    

If you are dealing with a slow network you can gain some performance by compressing the unload data before you send it across the network and uncompress it before writing to the named pipe. A simple change to the unload device array command line is all that is needed

    PIPECOMMAND             "gzip -9c | ssh cheetah04 'gunzip > /home/informix/blog/blog.p'"
    

In Part 3 of the HPL blog series we will learn how to use maps and formats, necessary objects if we want to move data between tables with different structures.

Trackback Pings

TrackBack URL for this entry:
http://www.ibmdatabasemag.com/blog/main/archives/2008/06/the_informix_hi_1.html

« DB2 LUW Performance: The DNA Test of Performance Accountability | Main | DB2 LUW Performance: Statement Analysis Introduction - Hunting Elephants and Mosquitoes »





This is a public forum. CMP Media and its affiliates are not responsible for and do not control what is posted herein. CMP Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Media's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.



CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
RECENT JOB POSTINGS
CAREER NEWS
10 Search Engines You Don't Know About
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Subscribe to the new digital version of IBM Database Magazine
New Digital Version

Sponsored links:



Subscribe to the IBM Database Magazine Newsletter

Email Address *
First Name
Last Name
HTML Preference
HTML Text
 

Fields with * are required.

 




Visit these other IBM and TechWeb Partner Sites: :
Maximizing ROI Through Business Process Management (BPM) and Service-Oriented Architecture (SOA)
Internet Evolution – The Macrosite for News, Analysis, & Opinion About the Future of the Internet
Business Innovation – Technology Strategies and Solutions for Driving Business Success