July 21, 2008
Welcome to the third and final installment of the Informix HPL blog series (at least for now.) In parts 1 and 2 we abandoned the GUI and went old school command line with onpladm (that's the way it was and weeeeeeeee liked it!), we unloaded and loaded lots of data to and from a flat file on disk, we moved data from one table to another in parallel using named pipes and we even moved data between servers without using flat files. This toolset should be enough to cover most of what you would ever need to do with HPL. If not, it will at least give you a solid foundataion to start with. In my final attempt to increase HPL awareness I want to take a look at the more tedious, but sometimes necessary, HPL maps and formats objects.
An HPL format object defines the structure of the unload or load file and a map object associates fields in the unload/load file to columns in your table.
In parts 1 and 2 we had no need for these things. We were performing FASTUNLOADJOB and FASTLOADLOB jobs because the strucutre of our unload and load tables were the same and these job types do not use formats and maps.
If you need to move data between tables with different structures you'll need to create LOADJOB and UNLOADJOB jobs (don't worry, they'll still be fast even if they don't say FAST anymore) and the necessary format and maps to describe the flat file and map flat file fields to table columns.
Say we have two tables, unload table and load table with the following schemas:
create table unload_table (
field1 char(16),
field2 char(12),
field3 char(12),
field4 integer,
field5 integer,
field6 date,
field7 smallint,
field8 smallint
);
create table load_table (
field2 char(12),
field6 date
);
And we want to unload the field2 and field6 columns from unload table and load them into the field2 and field6 columns of load table that is on the same server.
To do this we still need the same type of device arrays that can unload and load from a named pipe and we need a query object to select field2 and field6 from our unload table. We need a format object to describe our unload and load data structure. We need 2 maps, the first map will map the unloaded columns to the fields in the unload data and the second map will map the fields in the load data to columns in our load table. Finally we will need two jobs to unload and load the data.
> cat blog4.spec
BEGIN OBJECT DEVICEARRAY blog4_unl_devarrary
BEGIN SEQUENCE
TYPE PIPE
FILE
TAPEBLOCKSIZE 0
TAPEDEVICESIZE 0
PIPECOMMAND "cat - > /home/informix/blog/blog.p"
END SEQUENCE
END OBJECT
BEGIN OBJECT DEVICEARRAY blog4_load_devarrary
BEGIN SEQUENCE
TYPE PIPE
FILE
TAPEBLOCKSIZE 0
TAPEDEVICESIZE 0
PIPECOMMAND "cat /home/informix/blog/blog.p"
END SEQUENCE
END OBJECT
BEGIN OBJECT PROJECT blog4
END OBJECT
BEGIN OBJECT QUERY blog4_query
PROJECT blog4
DATABASE blog
SELECTSTATEMENT "select field2, field6 from unload_table"
END OBJECT
BEGIN OBJECT DELIMITEDFORMAT blog4_format
PROJECT blog4
CHARACTERSET ASCII
RECORDSTART
RECORDEND newline
FIELDSTART
FIELDEND
FIELDSEPARATOR |
BEGIN SEQUENCE
FIELDNAME field2
FIELDTYPE Chars
END SEQUENCE
BEGIN SEQUENCE
FIELDNAME field6
FIELDTYPE Chars
END SEQUENCE
END OBJECT
BEGIN OBJECT UNLOADMAP blog4_unloadmap
PROJECT blog4
FORMAT blog4_format
DATABASE blog
QUERY blog4_query
BEGIN SEQUENCE
COLUMNNAME field2
FIELDNAME field2
JUSTIFICATION
CASECONVERT
DEFAULTVALUE
TRANSFERBYTES 0
COLUMNOFFSET 0
FIELDOFFSET 0
FIELDMINIMUM
FIELDMAXIMUM
FILLCHARACTER
PICTURE
FUNCTION
STORAGECODING
BLOBCOLUMN
END SEQUENCE
BEGIN SEQUENCE
COLUMNNAME field6
FIELDNAME field6
JUSTIFICATION
CASECONVERT
DEFAULTVALUE
TRANSFERBYTES 0
COLUMNOFFSET 0
FIELDOFFSET 0
FIELDMINIMUM
FIELDMAXIMUM
FILLCHARACTER
PICTURE
FUNCTION
STORAGECODING
BLOBCOLUMN
END SEQUENCE
END OBJECT
BEGIN OBJECT LOADMAP blog4_loadmap
PROJECT blog4
FORMAT blog4_format
DATABASE blog
TABLE load_table
BEGIN SEQUENCE
COLUMNNAME field2
FIELDNAME field2
JUSTIFICATION
CASECONVERT
DEFAULTVALUE
TRANSFERBYTES 0
COLUMNOFFSET 0
FIELDOFFSET 0
FIELDMINIMUM
FIELDMAXIMUM
FILLCHARACTER
PICTURE
FUNCTION
STORAGECODING
BLOBCOLUMN
END SEQUENCE
BEGIN SEQUENCE
COLUMNNAME field6
FIELDNAME field6
JUSTIFICATION
CASECONVERT
DEFAULTVALUE
TRANSFERBYTES 0
COLUMNOFFSET 0
FIELDOFFSET 0
FIELDMINIMUM
FIELDMAXIMUM
FILLCHARACTER
PICTURE
FUNCTION
STORAGECODING
BLOBCOLUMN
END SEQUENCE
END OBJECT
BEGIN OBJECT UNLOADJOB blog4_unload
PROJECT blog4
DEVICE blog4_unl_devarrary
MAP blog4_unloadmap
FILTER
SERVER cheetah04
DATABASE blog
REJECTFILE /tmp/blog4_unload.rej
LOGFILE /tmp/blog4_unload.log
ISOLATIONLEVEL DR
MAXERRORS 0
END OBJECT
BEGIN OBJECT LOADJOB blog4_load
PROJECT blog4
DEVICE blog4_load_devarrary
MAP blog4_loadmap
FILTER
SERVER cheetah04
DATABASE blog
FLTFILE /tmp/blog4_load.flt
REJECTFILE /tmp/blog4_load.rej
LOGFILE /tmp/blog4_load.log
RUNMODE E
GENERATEVIORECS Y
TAPES 0
NUMRECORDS 0
STARTRECORD 0
MAXERRORS 0
END OBJECT
> onpladm create object -F blog4.spec
Successfully created object DEVICEARRAY blog4_unl_devarrary
Successfully created object DEVICEARRAY blog4_load_devarrary
Successfully created object PROJECT blog4
Successfully created object QUERY blog4_query
Successfully created object DELIMITEDFORMAT blog4_format
Successfully created object UNLOADMAP blog4_unloadmap
Successfully created object LOADMAP blog4_loadmap
Successfully created object UNLOADJOB blog4_unload
Successfully created object LOADJOB blog4_load
The blog4_query QUERY object defines what data we are going to unload via a simple select statement.
The blog4_format DELIMITEDFORMAT object defines how our unload/load file will be structured, a standard pipe delimited ASCII file.
The blog4_unloadmap UNLOADMAP object links the field2 and field6 columns we are pulling from blog4_query to the field positions in the blog4_format unload file.
The blog4_loadmap LOADMAP object links the delimited fields in the blog4_format load file to the field2 and field6 columns of our load table.
The blog4_unload UNLOADJOB object defines a job that will unload our data using the blog4_unloadmap object (which is made of up from our query and format objects) and blog4_unldevarray.
And finally, the blog4_load LOADJOB object defines a load job that will load our data using the blog4_loadmap to the blog4_loaddevarray.
Run the unload and load jobs in parallel with the onpload command like we did in part 2 and HPL will pull field2 and field6 from the unload table and load them into field2 and field6 of the load table.
I hope this information has helped shed some light on HPL and it's mysterious voodoo like powers that enable it to quickly load and unload sick amounts of data. The fun doesn't have to end here, everything we've done so far has been with the default HPL configuration. If you want to make things run even faster you can take a look at the IDS docs and fine tune the HPL config paramters stored in the $INFORMIXDIR/etc/pconfig.std and try adding multiple devices to your device array.
Trackback Pings
TrackBack URL for this entry:
http://www.ibmdatabasemag.com/blog/main/archives/2008/07/the_informix_hi_2.html
« DB2 LUW Performance: Statement Analysis Introduction - Hunting Elephants and Mosquitoes | Main | DB2 9 DBA Certification: Create and insert statements, plus dbmtrk »
This is a public forum. CMP Media and its affiliates are not responsible for and do not control what is posted herein. CMP Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.
Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Media's Terms of Service.
Important Note: This comment area is NOT intended for commercial messages or solicitations of business.
