Topic(s):   Informix

May 16, 2008

The Informix High Performance Loader – Part 1
Posted by Andrew Ford @ 01:03 PM ET | May 16, 2008

The High Performance Loader is the absolute best utility available when it comes to inserting or unloading large amounts of data from your systems. Up until version 9.4 you had two options if you wanted to configure an HPL load or unload job. You could use the ipload GUI or you could reverse engineer the steps required to correctly populate the onpload database.

The GUI option posed some problems for those that couldn’t use a GUI because they were looking at a text only ssh window and for those that didn’t want to use the GUI because they would rather be productive. The reverse engineering option worked well, but not everyone is interested in manually populating the device or session tables and even if you were you probably didn’t want to mess with more complicated things like column maps, unload formats and filters. In 9.4 we were given a proper command line utility, onpladm, that could be used to create our HPL jobs. If you take a little bit of time to understand how this tool works you will be rewarded with the awesome powers of HPL to do your bidding.

Creating and Running a Simple Unload Job

In this blog entry, I want to focus on using onpladm to set up a simple HPL scenario of unloading a large table to a single flat file and loading this data into a new table with the same schema. In future blogs we’ll tackle more interesting HPL topics that will make your life easier, but for now we’ve got to understand the basics.

To create a simple unload job you really only need 3 things:

Device Array – The device array defines where the unloaded data will go, a single flat file on disk in our case. The device array gives you the ability write to multiple files in parallel by defining multiple devices in the device array, very useful if you have multiple physical disks. Today we’re only defining a single device in the device array.

Query – The query defines what we’re unloading and is simply a select statement. Today we’re unloading all of the data from our table to be inserted into a new table of the exact same schema so our query is a very simple select * from unload_table. If you wanted to add a where clause, you could. If you wanted to perform a join of multiple tables, you could. If you wanted to select only certain fields, you could, but you would need to configure column maps and formats (we’ll take a look at this in later blogs.)

Unload Job – The unload job ties the query to the device array and defines an object that you can ask onpload to run for you.

If you’re the neat and tidy type there is a fourth object called a Project that you can use to keep the query and unload job objects logically grouped together. I’m the neat and tidy type when it comes to my database engines (does extent interleaving keep you up at night?) so I use projects and I recommend you use them too. Notice that device arrays do not belong to a project, they are globally available to all projects.

Alrighty then. We now know what we need to create to perform our unload, lets do it. onpladm does provide commands that allow you to create jobs and projects via the command line but creating a device array and a query are not as straight forward. This is why I prefer to use the create object onpladm command to create all of my objects based on object definitions in a single specification file.

For our unload, our specification file is named blog_unload.spec and looks like:

    # unload device array of single file
    BEGIN OBJECT DEVICEARRAY blog_array
       BEGIN SEQUENCE
          TYPE            FILE
          FILE            /home/informix/blog/blog.unl
          TAPEBLOCKSIZE   0
          TAPEDEVICESIZE  0
          PIPECOMMAND
       END SEQUENCE
    END OBJECT
    
    # project to hold our query and job objects
    BEGIN OBJECT PROJECT blog
    END OBJECT
    
    # query that simply selects all fields and all rows
    # from the unload_table table in the blogdb database
    BEGIN OBJECT QUERY blog_query
       PROJECT                 blog
       DATABASE                blogdb
       SELECTSTATEMENT         "select * from unload_table"
    END OBJECT
    
    # no conversion unload job object
    BEGIN OBJECT FASTUNLOADJOB blog_unload
       PROJECT                 blog
       DEVICE                  blog_array
       SERVER                  test1
       DATABASE                blogdb
       QUERY                   blog_query
    END OBJECT
    

Using the onpladm create object command to create our objects based on this specifications file looks like this:

    > onpladm create object –F blog_unload.spec
    
    Successfully created object DEVICEARRAY   blog_array
    Successfully created object PROJECT   blog
    Successfully created object QUERY   blog_query
    Successfully created object FASTUNLOADJOB   blog_unload
    

Notice that we have created a FASTUNLOADJOB, this is also called a no conversion job. A no conversion job does not unload the data in a delimited format, it unloads the raw informix data page to the flat file. We prefer to perform a no conversion job here because no conversion load jobs run faster. onpload does not need to parse out each field of the table from a delimited unload file and can just lay the informix raw page from the file to disk. We can use a no conversion job because we are unloading data to be reloaded into a table with the exact same schema on the same hardware architecture running the same IDS version. We’re actually going to reload to the same engine, similar to what you would do for a table copy to a test system.

Our unload job is created and is ready to run via onpload. I like to nohup my unload, run it in the background and redirect the output to a logfile.

    > nohup onpload -p blog -j blog_unload -fu > onpload.log 2>&1 &
    

-p identifies the project we are working with
-j identifies the job we want to run
-fu tells onpload that we’re running an unload and proves the onpload authors had a sense of humor

The onpload.log output will look like this:

    Fri May 16 10:44:45 2008
    
    SHMBASE        0x44000000
    CLIENTNUM      0x49010000
    Session ID 1
    
    Unload Database -> blogdb
    Query Name      -> AUTO.1
    Device Array    -> blog_array
    Query Mapping   -> AUTO.1
    Query           -> select * from unload_table for read only
    10:44:48 Records Processed ->  10044
    10:44:48 Records Processed ->  20088
    10:44:48 Records Processed ->  30132
    10:44:48 Records Processed ->  40176
    10:44:48 Records Processed ->  50220
    10:44:48 Records Processed ->  60264
    10:44:48 Records Processed ->  70308
    10:44:48 Records Processed ->  80352
    10:44:48 Records Processed ->  90396
    10:44:48 Records Processed ->  100440
    10:44:48 Records Processed ->  110484
    10:44:48 Records Processed ->  120528
    10:44:48 Records Processed ->  130572
    
    …
    
    10:56:06 Records Processed ->  212701788
    10:56:06 Records Processed ->  212711832
    10:56:06 Records Processed ->  212721876
    10:56:06 Records Processed ->  212731920
    10:56:06 Records Processed ->  212741964
    10:56:06 Records Processed ->  212752008
    10:56:06 Records Processed ->  212762052
    10:56:06 Records Processed ->  212772096
    10:56:06 Records Processed ->  212782140
    10:56:06 Records Processed ->  212792184
    10:56:07 Records Processed ->  212802228
    10:56:07 Records Processed ->  212812272
    
    Database Unload Completed -- Unloaded 212812323 Records  Detected 0 Errors
    Fri May 16 10:56:07 2008
    

We unloaded almost 213 million rows (unload_table row size = 151 bytes) in 10 minutes and 22 seconds for a rate of 342,142 rows per second. Wow, that’s a big number. Maybe I have miscalculated…convert minutes to seconds…divide rows by seconds…carry the 2...nope, that is correct. Cool.

Creating and Running a Simple Load Job

We’ve got our data unloaded, now we want to load it into a new table. As you might have guessed this is similar to creating the unload objects with onpladm create object command and a specification file and running an unload job via onpload.

For our load the specification file is named blog_load.spec and looks like this;

    # no conversion load job object
    BEGIN OBJECT FASTLOADJOB blog_load
       PROJECT                 blog
       DEVICE                  blog_array
       SERVER                  test1
       DATABASE                blogdb
       TABLE                   load_table
    END OBJECT
    

Notice we only have one object to define for our load, the no conversion load job object. This is because we will be loading from the same device array we created for the unload and will be creating this load job object inside of the blog project that was previously created. The unload and load objects could have been included in a single specifications file and created at the same time using a single onpladm create object command.

Creating the load object via onpladm looks like this:

    > onpladm create object -F blog.load.spec
    
    Successfully created object FASTLOADJOB   blog_load
    

Run the load job using onpload just like we ran the unload job.

    > nohup onpload -p blog -j blog_load -fl > onpload.log 2>&1 &
    

-fl here tells onpload we’re loading data.

The onpload.log will look like

    Fri May 16 11:48:45 2008
    
    SHMBASE        0x44000000
    CLIENTNUM      0x49010000
    Session ID 2
    
    Load Database   -> blogdb
    Load Table      -> load_table
    Device Array    -> blog_array
    Record Mapping  -> AUTO.2
    11:48:48 Records Processed ->  10044
    11:48:48 Records Processed ->  20088
    11:48:48 Records Processed ->  30132
    11:48:48 Records Processed ->  40176
    11:48:48 Records Processed ->  50220
    11:48:48 Records Processed ->  60264
    11:48:48 Records Processed ->  70308
    11:48:48 Records Processed ->  80352
    11:48:49 Records Processed ->  90396
    11:48:49 Records Processed ->  100440
    11:48:49 Records Processed ->  110484
    11:48:49 Records Processed ->  120528
    11:48:49 Records Processed ->  130572
    11:48:49 Records Processed ->  140616
    11:48:49 Records Processed ->  150660
    11:48:49 Records Processed ->  160704
    
    …
    
    12:11:01 Records Processed ->  212711832
    12:11:01 Records Processed ->  212721876
    12:11:01 Records Processed ->  212731920
    12:11:01 Records Processed ->  212741964
    12:11:01 Records Processed ->  212752008
    12:11:01 Records Processed ->  212762052
    12:11:01 Records Processed ->  212772096
    12:11:01 Records Processed ->  212782140
    12:11:01 Records Processed ->  212792184
    12:11:01 Records Processed ->  212802228
    12:11:01 Records Processed ->  212812272
    Table 'load_table' will be read-only until level 0 archive
    
    Database Load Completed -- Processed 212812323 Records
    Records Inserted-> 212812323
    Detected Errors--> 0
    Engine Rejected--> 0
    
    Fri May 16 12:11:03 2008
    

Almost 213 million rows loaded in 22 minutes and 18 seconds for a rate of 159,053 per second, that is serious fast. The load_table now has the 212 million rows but will be read only until you complete a level 0 backup. I’m not going to tell you to backup to /dev/null if you want to make this table available immediately. If you were to backup to /dev/null just make sure you take a real level 0 backup as soon as possible to minimize the risk of losing this data to a disk failure.

Well, there you have it, a taste of what HPL and onpladm can do for you. Next time we’ll look at some more interesting things you can do with HPL. Now that we know the basics we can look at pipe commands, table maps and unload formats, eliminating big flat files and simultaneous unloading from machine A and loading to machine B over a network.

Trackback Pings

TrackBack URL for this entry:
http://www.ibmdatabasemag.com/blog/main/archives/2008/05/the_informix_hi.html

« DB2 for Linux, UNIX, and Windows Versions 8, 9.1 and 9.5 Security Vulnerabilities and HIPER APARs | Main | DB2 Certification: Restoring a database and understanding database privileges »





This is a public forum. CMP Media and its affiliates are not responsible for and do not control what is posted herein. CMP Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Media's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.



CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
RECENT JOB POSTINGS
CAREER NEWS
10 Search Engines You Don't Know About
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Subscribe to the new digital version of IBM Database Magazine
New Digital Version

Sponsored links:



Subscribe to the IBM Database Magazine Newsletter

Email Address *
First Name
Last Name
HTML Preference
HTML Text
 

Fields with * are required.

 




Visit these other IBM and TechWeb Partner Sites: :
Maximizing ROI Through Business Process Management (BPM) and Service-Oriented Architecture (SOA)
Internet Evolution – The Macrosite for News, Analysis, & Opinion About the Future of the Internet
Business Innovation – Technology Strategies and Solutions for Driving Business Success