<?xml version="1.0" encoding="iso-8859-1"?><rss version="2.0"xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"xmlns:admin="http://webns.net/mvcb/"xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>IBM Database Magazine</title><link>http://ibmdatabasemag.com</link><description></description><language>en-us</language><copyright>Copyright 2006, CMP Media.</copyright><item><title><![CDATA[Top Stories on IBMDatabaseMag.com--April]]></title><link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=207400341&cid=RSSfeed]]></link><description><![CDATA[Oldies But Goodies Join Emerging Technologies]]></description><pubDate>Thu, 17 Apr 2008 14:40:00 EDT</pubDate><keywords><![CDATA[Top Stories]]></keywords><blurb><![CDATA[Oldies But Goodies Join Emerging Technologies]]></blurb><authors><![CDATA[]]></authors><body><![CDATA[<P>
Take a spin through the most popular article statistics for ibmdatabasemag.com and you&rsquo;ll find that good advice has impressive staying power. Top-read articles this month (not including blog posts) include: <br /><br />
&bull; <a href="http://www.ibmdatabasemag.com/showArticle.jhtml?articleID=15300107">Say What? Plans + DBRMS + Packages + Collections + Versions = Confusion</a><br />
No matter how long programmers have worked with DB2 for z/OS and OS/390, they still ask me to tell them the difference between a plan and a package &#8212; and what in the heck a collection is. I planned to write a column on this topic so I could just point them to DB2 Magazine, Quarter 4, 2003 and let the column answer the questions. Well one column became two, and two morphed into three. And, if they keep changing DB2 (as they will do), the three may one day become four.
<br /><br />
&bull; <a href="http://www.ibmdatabasemag.com/showArticle.jhtml?articleID=193105293">Embedding SQL in Unix Scripts</a><br />
One of the advantages of Unix and Linux is the ability to use scripts for developing systems and programs. Lester Knutsen introduces the concept of using shell scripts with embedded SQL to access Informix databases.
<br /><br />
&bull; <a href="http://www.ibmdatabasemag.com/showArticle.jhtml?articleID=191600717">Configuring DB2 9 with WebSphere Application Server 6.1</a><br />
Properly configuring data and application servers is the first step in any application development project. Deepak Vorha explains how to configure DB2 9 with WebSphere Application Server (WAS) 6.1, a J2EE-based application platform for developing applications commonly used to develop JDBC applications. <br /><br />
&bull; <a href="http://www.ibmdatabasemag.com/showArticle.jhtml?articleID=202805692">Monitoring and Tuning Memory in DB2 9 for Linux, Unix, and Windows</a><br />
DB2 9's self-tuning memory manager, which dynamically tunes to a specific workload, is one of many memory enhancements in that release. Understanding how DB2 memory usage works lets you get the most out of the new capabilities. <br /><br />
&bull; <a href="http://www.ibmdatabasemag.com/showArticle.jhtml?articleID=206800839">IBM Data Studio: Dawn of a New Era in Enterprise Data Management</a><br />
IBM's ambitious vision for its IBM Data Studio suite, which works with DB2 and Informix today and other vendors' platforms in the future, will change the way data is handled from creation through destruction.
<P>
<br /><br clear="all" />
<P>
</p>]]></body></item><item><title><![CDATA[Editor's Note: DB2 Magazine is now IBM Database Magazine ]]></title><link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=206800435&cid=RSSfeed]]></link><description><![CDATA[The magazine's new name reflects its broad coverage of DB2, Informix, U2, and related technologies and solutions. ]]></description><pubDate>Mon, 25 Feb 2008 12:55:00 EST</pubDate><keywords><![CDATA[DB2 Magazine, IBM Database Magazine, Informix Dynamic Server, IBM Data Studio, Kim Moutsos, DB2, Universe, UniData]]></keywords><blurb><![CDATA[The magazine's new name reflects its broad coverage of DB2, Informix, U2, and related technologies and solutions. ]]></blurb><authors><![CDATA[Kim Moutsos]]></authors><body><![CDATA[<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/columns/moutsos_kim.jpg" alt="Kim Moutsos" class="Image_Float-Left" border="1" height="90" width="90">Welcome to the first issue of <em>IBM Database Magazine</em> &mdash; or, more accurately, the first issue in which that name appears on the cover. Truth be told, this publication (more recently called <em>DB2 Magazine</em>) has been about more than just DB2 for years.<br />
<br />
When the magazine launched in 1996, DB2 on the distributed platforms was brand new, the future of the mainframe and IMS seemed precarious, and Informix Corp. was flying high. Then came a period of turbulence for Informix and the ground-shaking technology explosion of the e-business boom.<br /><br />
Of course, most of us remember what happened next. The mainframe regained its status as an excellent choice for securely processing the massive amounts of transactions the Internet era spawned. DB2 entered a period of innovation and launched early forms of a number of components that make up IBM's Information On Demand offerings today, including the new InfoSphere Warehouse with Optim Data Retention (just announced at press time), which grew from the DB2 Warehouse. And, after suffering through management crises and marketing lapses that undermined its well-loved technology offerings, the database management software assets of Informix Corp. were acquired by IBM in 2001.</p>
<P>
Since that time, IBM's portfolio has grown organically and through more than 20 acquisitions to cover a full spectrum of Information On Demand capabilities. It now includes a complementary range of database management software (DB2, Informix, IMS, UniVerse, and UniData).</p>
<P>
Readers have followed many of these DBMS (or data server) options for years. Yet for newcomers, the breadth of the magazine's coverage wasn't obvious. It seemed to be a magazine about DB2, period. The new name more accurately reflects the magazine's content.</p>
<P>
You'll still find your favorite columnists and topics (deep dives into DB2 with Robert Catterall and Roger Sanders and into Informix with Lester Knutsen, for example). But we'll continue to branch out to cover even more topics that are of interest regardless of which data server you're using. For example, one article covers <a href="/story/showArticle.jhtml?articleID=206800839">IBM Data Studio</a>, a new approach to managing data throughout its life cycle that works with DB2 and Informix (and even, eventually, non-IBM platforms). IBM's Deb Jenson gives some universal advice about <a href="/story/showArticle.jhtml?articleID=206800841">data governance tools</a> and best practices along with some specifics for DB2 and Informix.</p>
<P>
I hope you enjoy our expanding coverage. Stop by <a href="http://www.ibmdatabasemag.com">ibmdatabasemag.com</a> to talk about the change (see my blog post) or check out the online-only content. While there, you can participate in one of the monthly polls, post to the wiki, or sign up for the e-newsletter or digital edition. I look forward to exploring new topics with you.</p>
<P>
<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_editpage_poll.jpg" border="0" width="250"></p>]]></body></item><item><title><![CDATA[IBM Data Studio: Dawn of a New Era in Enterprise Data Management]]></title><link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=206800839&cid=RSSfeed]]></link><description><![CDATA[IBM's ambitious vision for its IBM Data Studio suite, which works with DB2 and Informix today and other vendors' platforms in the future, will change the way data is handled from creation through destruction. ]]></description><pubDate>Mon, 25 Feb 2008 05:00:14 EST</pubDate><keywords><![CDATA[IBM Data Studio, Reducing Labor Costs, Enforcing Compliance, Application Developers, Data Life Cycle, Data Governance, Enterprise Data Management, DB2, Informix, pureQuery, Developer Collaboration, Database Administration, Database Security]]></keywords><blurb><![CDATA[IBM's ambitious vision for its IBM Data Studio suite, which works with DB2 and Informix today and other vendors' platforms in the future, will change the way data is handled from creation through destruction. ]]></blurb><authors><![CDATA[Cynthia Harvey]]></authors><body><![CDATA[<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/feature_image1sm.jpg" alt="Dawn of a New Era" class="Image_Float-Left" border="1" width="250" />
<strong>IBM has a grand dream for enterprise data management: to create a single toolset that can manage every aspect of the data life cycle. This toolset would offer built-in compliance and easy integration with other IBM tools. And it would work equally well with IBM data servers (including DB2 and Informix Dynamic Server) and data management technology from Oracle, Microsoft, and Sybase, and other major players.</strong><br />
<br />
The company took a major step toward achieving that dream in October 2007, when it launched IBM Data Studio 1.1 at the IBM Information On Demand conference in Las Vegas. The free download, the first incarnation of the vision, supported DB2 and Informix Dynamic Server and included entity relationship diagramming, an SQL builder, an XML editor, pureQuery for Java, security access tools, data management tools, and other features. But that was just the first step. Work is underway to broaden the Data Studio family to include all the tools needed in every phase of data's existence.</p>
<P>
Of course, tools exist today that tackle each of these areas separately. Why should organizations be interested in a new toolset when they're already choosing from a smorgasbord of data management tools from different vendors? And how will staffers in the many different roles that tend to the life cycle (from design, to development, to deployment, to management, to governance) react to the notion of replacing their individual favorite tools with a new set?</p>
<P>
Curt Cotner, IBM Fellow, vice president, and chief technology officer for IBM Data Servers, acknowledges the challenge, but feels the benefits inherent in the comprehensive approach set IBM Data Studio apart: &quot;The value proposition we're bringing to the table is that we have a single set of components that are doing all the work.&quot; Because the separate components are all related under the covers, collaboration is improved and costs are reduced by eliminating the need to integrate tools from various vendors.</p>
<P>
Perhaps most importantly, data governance and regulatory compliance are simplified and improved. Why? Because the integrated components enable the rules, set once (for example, in the design stage), to follow the data throughout its life, no matter which team is responsible for the data at any given moment.</p>
<h3>THE DATA LIFE CYCLE DEFINED</h3>
<P>
To understand the benefits of the complete toolset vision, it helps to picture the data life cycle as a whole. As IBM sees it, data goes through five distinct stages during its life: design, development, deployment, management, and governance (see Figure 1).</p>
<P>
<strong>Figure 1: The Data Life cycle. </strong><a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f1_fig1_large.jpg" target="_blank">Click here</a> for larger figure.<br />
IBM's view of the data life cycle includes five distinct stages: design, development, deployment, management, and governance.
Along the way, data passes through the hands of many different team members.<br /><a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f1_fig1_large.jpg" target="_blank"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f1_fig1.jpg" alt="Figure 1: The Data Life cycle." width="450" height="299" border="0"></a></p>
<P>
In the <strong>design</strong> stage, business analysts and database architects create the data models that will be used to develop applications and databases that meet the organization's needs. IBM's Data Studio tools for this stage currently include logical modeling and entity relationship diagramming, with physical modeling capabilities coming soon.</p>
<P>
The next two stages, <strong>develop</strong> and <strong>deploy</strong>, are closely linked, and the tasks associated with both stages are frequently performed by people whose titles include the word &quot;developer.&quot; Under IBM's framework, the development stage refers specifically to application development, often involving Java, .Net, PHP, Ruby, or COBOL programming. The deploy stage covers database development, usually in SQL or another query language.</p>
<P>
<strong>Manage</strong> encompasses the tasks usually performed by a DBA. Those include things like day-to-day administration, configuration, performance tuning, change management, backup and recovery, and so on.</p>
<P>
In the final stage, <strong>govern</strong>, security administrators take responsibility both for the security of the organization's data resources and ensuring that the organization complies with all relevant regulations. This is also the area that includes auditing, encryption, archiving, and, ultimately, data destruction once the retention period has elapsed.</p>
<P>
Cotner is quick to point out that this is an iterative cycle, not a one-time process: &quot;Typically you go through the cycle and discover changes that need to be made. So you go through the life cycle again making the changes. It's like a constantly evolving wheel.&quot;</p>
<h3>DATA STUDIO AND COLLABORATION</h3>
<P>
According to Cotner, application experts and database experts worked much more closely together in the past than they do today: &quot;If you go back in time 15 years, the DBAs typically knew the COBOL programming language well enough that they could be active consultants to the application developers. They would help developers design their applications and code the SQL in an efficient way, and provide advice on how to get the best throughput.&quot;</p>
<P>
By contrast, much of today's application development work takes place in Java, often using highly complex frameworks like Hibernate and OpenJPA. Most database developers and DBAs aren't familiar with these tools, and as a result, they're unable to offer advice or optimize the database properly.</p>
<P>
&quot;It's really a kind of cultural problem,&quot; says Cotner. &quot;These Java frameworks that everybody uses are sufficiently abstracted away from the database so that when the DBA is talking to the developer, they aren't really speaking the same language any more. The DBA looks at what the developer is coding, and it's not obvious what that sequence of instructions would do to the database.&quot;</p>
<P>
The Data Studio solution to this cultural barrier is a new data access solution&mdash;pureQuery&mdash;which is easy for both application developers and database specialists to understand. To assist application developers, pureQuery automatically creates all the necessary SQL statements from within their Java editor, so they don't have to become SQL experts. And pureQuery assists DBAs by highlighting SQL commands within the Java code so that dataset experts can easily discern what impact the code will have an their databases. Importantly, pureQuery is an enhancement for the standard Eclipse Java editor, so the learning curve is minimal for anyone familiar with that editor. (For more in-depth information on pureQuery, see &quot;<a href="/showArticle.jhtml?articleID=202400140">The Easy Way to Quick Data Access</a>,&quot; Issue 3, 2007, DB2 Magazine.)</p>
<P>
Along with a common language, Data Studio provides a common look and feel to the tools used throughout the data life cycle. When everyone is speaking the same language and can easily learn how to use the same tools as their colleagues, collaboration becomes much more likely. In addition, it makes it easier for IT staff to move from one position to another, giving CIOs the benefit of a more fluid workforce.</p>
<h3>THE KEY TO COMPLIANCE</h3>
<P>
Today, organizations are being asked to comply with more regulations than ever. With HIPAA, Sarbanes-Oxley, GLBA, PCI standards, state and local regulations, plus international regulations all placing demands on enterprise data management, ensuring compliance throughout the data life cycle is a challenge.</p>
<P>
As one example, Cotner points to the PCI standards, which must be followed by any organization that deals with Visa or MasterCard transactions. PCI requires that credit card numbers and PINs be encrypted, and that only those with a business need have the ability to see the numbers. And, if credit card information is used for testing purposes, it must be anonymized. As DBAs conduct backup and recovery operations and developers copy records from a production environment to a test environment, it can be very difficult to ensure that all rules are being followed, particularly as staff use different tools from different vendors.</p>
<P>
With Data Studio, all the pieces of the suite will work together. As the security administrator establishes policies, those rules will be enforced by all components. Cotner explains, &quot;As you configure the software for your system, it learns which columns contain credit card numbers, which contain Social Security numbers, which contain phone numbers that shouldn't be used for marketing purposes. As it learns these things, each subsequent step of the life cycle becomes easier to manage because you don't have to reconfigure and reestablish governance policies.&quot;</p>
<P>
Data Studio's pureQuery also plays a role in helping organizations improve compliance. Many products available today rely heavily on dynamic SQL, which is nearly impossible to audit. By contrast, pureQuery relies heavily on static SQL, which locks in access paths and clearly associates each SQL statement with a specific business application so that auditors can determine exactly which actions were taken and why.</p>
<h3>SLASHING LABOR COSTS</h3>
<P>
Every CIO is under constant pressure to reduce IT costs. Data Studio assists with that challenge by greatly reducing the labor associated with each stage in the data life cycle. <br /><br />
Cotner cites industry analysts who report that most organizations spend 70 percent of their IT budget on people costs, with only 30 percent going toward hardware and software. &quot;When you bring together all of the different hardware and software solutions from the array of different vendors that you deal with, there's a tremendous amount of human labor associated with being the systems integrator for all of those piece parts,&quot; Cotner says. &quot;If we can greatly reduce the human labor-and our goal is to cut it by 50 percent for these activities-that would really be a cost savings for the customer.&quot;</p>
<P>
What would companies do with those labor savings? Cotner says most would put their people to work on more valuable projects. &quot;If we can reduce the human labor associated with these activities, then the IT organizations could use the labor savings to work on new projects that would help the business rather than the sort of tedious labor that's associated with configuring all these tools and making them talk with one another.&quot;</p>
<h3>THE NEXT STEP</h3>
<P>
While Data Studio already offers a number of important benefits, it's still a work in progress. &quot;We're not there yet,&quot; acknowledges Cotner. &quot;Nobody has an enterprise data management solution that addresses the entire data life cycle, including us. What we do have is the vision of where we want to be.&quot;</p>
<P>
Today, Data Studio 1.1 is a free solution that addresses some of the needs of each stage in the data life cycle for customers using DB2 and IDS, with limited compatibility with other databases. In the near future, IBM's other existing tools for database management will be integrated with Data Studio. They will continue to offer the functionality that DBAs enjoy today, plus the benefits of being part of a comprehensive suite. Eventually, Cotner and team plan to include an even more extensive lineup of tools, some of which will be available for free and some of which will require an upgrade fee. Ultimately, they hope that Data Studio will give their customers the option of a true enterprise-wide data management solution that spans the data life cycle.</p>
<P>
<em><strong>Cynthia Harvey</strong> is a Boise, Idaho-based freelance writer who specializes in science and technology.</em></p>
<div class="Article_Sidebar_Larger"> 
<h3>CUSTOMERS WEIGH IN</h3>
<P>
According to Anjul Bhambhri, IBM's director of tools and partner enablement for data servers, customers have shown the most interest in three aspects of Data Studio: its end-to-end nature, its use of pureQuery, and its integrated application development environment for SQL, XML, and Java.</p>
<P>
<strong>David Beulke</strong><br>
<em>Pragmatic Solutions Inc.</em></p>
<P>
IBM Gold Consultant (and contributor to this magazine) Dave Beulke of Pragmatic Solutions echoes Bhambhri's assessment of the importance of pureQuery. He was involved in the Data Studio beta trials and has begun testing the product with some of his consulting clients. In conversations with IBM Database Magazine, Beulke sang the praises of pureQuery's ability to use static SQL, improving performance and auditability. &quot;The ability to use static SQL with pureQuery is huge,&quot; said Beulke. &quot;Recently, I worked with a client who could reduce CPU usage by 7 percent thanks to this one feature.&quot;</p>
<P>
<strong>Jean-Marc Blaise</strong><br>
<em>IDB Consulting</em></p>
<P>
Another beta tester, Jean-Marc Blaise of IDB Consulting, told IBM Database Magazine that he has begun using Data Studio when he works on client sites. Blaise has been particularly impressed with the unified debugger in the application development environment. Blaise explained, &quot;If you have a procedure today, debugging that procedure can be very time consuming. But using the unified debugger and the profiling tools in Data Studio, you can pinpoint which SQL statement is going wrong. This makes debugging 40 percent faster.&quot;</p>
</div>
<div class="Article_Sidebar_Larger"> 
<table border="1" cellpadding="20" cellspacing="0" bordercolor="#FF3300">
<tr bgcolor="#FF9966"><td colspan="3" align="center" valign="top" bordercolor="#FF3300">
<h3>IBM DATA STUDIO FAMILY</h3>
<P>
The IBM Data Studio family includes a free download plus additional features and support in the fee-based version.</p></td></tr>
<tr><td width="40%" valign="top" bordercolor="#FF3300" bgcolor="#FFFF99">
<h3>IBM DATA STUDIO TODAY</h3>
<P>
IBM Data Studio is available as a no-charge download. The download includes:</p>
<ul>
<li>Entity Relationship (ER) Diagramming</li>
<li>Data Distribution Viewer</li>
<li>Integrated Query Editor</li>
<li>SQL Builder</li>
<li>SQL Routine Debugger</li>
<li>Java Routine Debugger</li>
<li>XML Editor</li>
<li>XML Schema Editor</li>
<li>Data Web Services</li>
<li>Object Management</li>
<li>Data Management</li>
<li> Update Statistics</li>
<li>Visual Explain</li>
<li>Security Access Control</li>
<li>Project Management</li>
<li>pureQuery for Java (without static SQL feature)</li>
</ul></td>
<td align="center" valign="middle" bordercolor="#FF3300"> <span style="font-size:26px;color:#FF3300;font-weight:bold;">&gt;&gt;&gt;&gt;</span></td>
<td width="40%" valign="top" bordercolor="#FF6633" bgcolor="#FFFF99">
<h3>IBM DATA STUDIO DEVELOPER</h3>
<P>
To make full use of advanced pureQuery features and obtain pureQuery support from IBM, you need to purchase IBM Data Studio Developer and IBM Data Studio pureQuery Runtime. IBM Data Studio Developer is a fully supported product that extends the capabilities of Data Studio to develop and test pureQuery applications. When you're ready to deploy your pureQuery applications, IBM Data Studio pureQuery Runtime is required to extend the capabilities of your Java application servers to host pureQuery solutions.</p>
<P>
IBM Data Studio Developer includes:</p>
<ul>
<li>All capabilities provided by the free IBM Data Studio</li>
<li>Static SQL support for pureQuery; however, the runtime to deploy pureQuery with static SQL is not included</li>
<li>pureQuery tooling, with the ability to bind static SQL.</li>
</ul>
<P>
To deploy apps that use static SQL capabilities with pureQuery requires IBM Data Studio pureQuery Runtime, which includes a fully licensed pureQuery runtime with support from IBM.</p></td></tr></table></div>
<div class="Article_Sidebar_Larger">
<h3>THE DATA STUDIO FUTURE</h3>
<P>
More components will be integrated into the various life-cycle stages IBM Data Studio manages in the coming year. Some will be components of the free download, others will be paid features. Here's what features are likely to be added in the near future.</p>
<ul>
<li><strong>Rational Data Architect</strong> will become a fully integrated modeling component of Data Studio and will remain a paid feature.</li>
<li><strong>The Data Studio Administration Console</strong>, which provides monitoring capabilities, will be a component of the free offering and is currently available as a technology preview.</li>
<li>Paid features for performance management, change management, query tuning, problem determination using pureQuery, and a test data generator are all likely additions in 2008.</li>
</ul>
</div>
<P>
<div class="Article_Sidebar_Larger">
<h3>6 Ways Data Studio Saves Time</h3>
<ol>
<li>The new pureQuery language slashes programming time by up to 50 percent.</li>
<li>Web-based monitoring tools allow DBAs to monitor databases anywhere, anytime.</li>
<li>Integrated, industry-specific XML standards simplify development.</li>
<li>Drag-and-drop creation of Web services speeds SOA and Web 2.0 initiatives.</li>
<li>Support for static SQL provides faster database access, makes it easier to find and resolve bugs, and improves auditability.</li>
<li>Eclipse-based environment allows users to get up to speed quickly.</li>
</ol>
</div>]]></body></item><item><title><![CDATA[Enterprise Data Management: Governing for Data Security ]]></title><link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=206800841&cid=RSSfeed]]></link><description><![CDATA[Regulatory compliance is important, but it isn't the only factor driving data governance adoption. The need to secure private information and keep financial information accurate should be top of mind for every organization. IBM's Deb Jenson presents data governance best practices hold true for every data server environment, with specifics drawn from DB2 and Informix. ]]></description><pubDate>Mon, 25 Feb 2008 05:00:13 EST</pubDate><keywords><![CDATA[DB2 9 Data Security, Informix Dynamic Server Security, Label Based Access Control, Data Governance, Enterprise Data Management, Data Auditing, Data Encryption, Data Vulnerability, Regulatory Compliance, SOX, HIPAA]]></keywords><blurb><![CDATA[Regulatory compliance is important, but it isn't the only factor driving data governance adoption. The need to secure private information and keep financial information accurate should be top of mind for every organization. IBM's Deb Jenson presents data governance best practices hold true for every data server environment, with specifics drawn from DB2 and Informix. ]]></blurb><authors><![CDATA[Deb Jenson]]></authors><body><![CDATA[<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/feature_image2sm.jpg" alt="Governing for Data Security" class="Image_Float-Left" border="1" width="250" />
Not so long ago, the always-on phenomenon of Internet computing pushed performance and availability to the top of IT departments' lists. Today, one concern tops even those: data governance. The basic organizational need to keep unauthorized eyes away from sensitive data (corporate secrets, customer data, personnel files, and so on) becomes even more urgent and complicated when government and industry regulations enter the mix. Many of these regulations require specific safeguards, as well as the ability to prove the safeguards are in place. And no organization wants to make the news for a breach in security.<br /><br />Data governance has emerged as a quality control discipline for assessing, managing, using, improving, monitoring, maintaining, and protecting organizational information. How does that discipline apply to the data server? What's the best approach to secure a data server and protect the data it holds? I'll explain what steps you should take and how to combine features found in most data servers today with third-party tools to address security and privacy concerns.</p>
<h3>WHAT IS DATA GOVERNANCE?</h3>
<P>
Corporate financial fraud (typically, misrepresentations of balance sheet figures by executive management) has dominated business headlines in the last seven years. The Sarbanes-Oxley Act, which requires every public company to adhere to strict financial procedures, was passed to address this kind of fraud. Sarbanes-Oxley holds executive officers responsible for any fiscal errors. Executive officers must sign off on all financial reports and face stiff consequences (large monetary fines and imprisonment) if irregularities occur. As a result, many companies now have corporate auditors who review all updates made to financial data to ensure integrity. Nobody wants their CEO to go to prison.</p>
<P>
But data governance isn't just about putting best practices in place to ensure regulatory compliance; it's about protecting data from internal or external security breaches. If you have sensitive data, it's just good business practice to ensure that data is safe. According to a recent survey by PGP Research, the average cost of a data breach is $4.8 million, and the real costs incurred range from $2 million to $22 million.</p>
<P>
Data governance can certainly be used to address the various regulatory compliance challenges a company faces, but the primary threat it should address is a security breach. The two most common threats are exposure of private data and tampering or altering of data.</p>
<P>
Many recent regulatory acts focus on the privacy of personal data &mdash; in other words, information pertaining to a specific individual. Whether that individual is a customer or an employee is irrelevant, as each individual has a right to privacy. Data privacy laws exist worldwide; companies face the strictest laws in Japan, and most companies in the United States are affected by state regulations in addition to industry and federal regulations (see Figure 1, below). Many state privacy regulations require organizations that have exposed personal information due to a security breach to immediately inform the individuals (and government officials) so that affected parties can begin to mitigate damage as quickly as possible. Typically, disclosure of security breaches is not required if the data involved was encrypted &mdash; as a result, more corporations are encrypting personal data.</p>
<P>
<b>Figure 1. Most states have notification requirement laws.</b> <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig1_large.jpg" target="_blank">Click here</a> for larger figure. <br />
<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig1_large.jpg" target="_blank"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig1.jpg" alt="Figure 1. Most states have notification requirement laws." width="450" height="215" border="0"></a></p>
<P>
Other forms of confidential information (financial data, trade secrets and so on) need protection, too, and should be included in any data protection plan.</p>
<P>
Data protection comes in proactive and reactive forms (see Figure 2, below); many organizations require a combination of both forms. Proactive strategies prevent data breaches, and reactive strategies detect breaches that have occurred (as quickly as possible).</p>
<P>
<b>Figure 2. Forms of data server
protection.</b><br/><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig2.jpg" alt="Figure 2. Forms of data server protection." width="450" height="350"></p>
<h3>ACCESS CONTROL</h3>
<P>
One common approach to proactive data protection is to implement strict policies around access control, which is designed to grant only specific individuals access to the data server and lock out everybody else.</p>
<P>
Most data servers offer another layer of protection that restricts access to specific rows or columns, instead of granting access to a whole table. In DB2 and Informix Dynamic Server, this feature is called label-based access control (LBAC). LBAC allows an organization to restrict access at the row or column level by associating user and group labels to table row or column labels. Another approach is to implement views and grant access only to the view, rather than to the underlying table. When using this approach, the only data accessible to the user is the data that is returned upon execution of the view.</p>
<P>
The following is a list of access control practices, some of which are specific to DB2; however, the underlying concepts can be applied to other data servers:</p>
<ul>
<li>Remove all PUBLIC privileges, particularly in production</li>
<li>Create database with restrictive keyword (in DB2)</li>
<li>Never grant privileges to PUBLIC for convenience, even in test settings.</li>
<li>Explicitly name the SYSADM group and restrict to as few as possible (in DB2).</li>
<li>Separate DBA privileges from security privileges by using the SECADM authorization (new in DB2 9).</li>
<li>Restrict privileges on a &quot;need to know&quot; basis.</li>
<li>Consider using LBAC to restrict access to specific rows and columns.</li>
<li>Use views to further restrict access.</li>
<li>Only grant access via stored procedures rather than to the underlying tables or views.</li>
</ul>
<h3>DATA AUDITING</h3>
<P>
Auditing is a reactive form of security, and many regulations worldwide require it. But companies also implement auditing to guarantee the integrity of financial and other sensitive data by making sure no one has tampered with it, which is particularly important for those companies regulated by Sarbanes-Oxley. Auditing business users (for example, employees in human resources and financial departments), IT users with unlimited access, and sensitive data is becoming a standard business practice.</p>
<P>
IT and auditing departments have already realized that the data from database audit collections contains a wealth of information. This audit data contains a chronological record of all database access, which is important to auditors; it can also provide insight into behavioral aspects of how the data is accessed.</p>
<P>
While audit reports can provide insight into how data is being accessed, they cannot prevent a security breach. Auditing is simply reporting on data that has already been accessed. If a security breach occurs, auditing is a useful mechanism for determining how it occurred and the scope of the incident. Both these pieces of information are necessary for analyzing the impact of the breach and the required disclosures.</p>
<P>
Most data servers provide some level of built-in auditing (usually activated by turning on an audit trace) within the data server. DB2 9.5, released in late 2007, provided significant functionality around database auditing. The audit trace generates audit log records each time a particular action that has been flagged for auditing occurs. The more information being audited, the higher the overhead, so it makes sense to audit only tables with sensitive data and users with significant access privileges. Once the log records are generated, the information is sent to a log file. This log file most likely requires some level of data mining and reporting in order to be useful. Most companies deploy third-party audit solutions to minimize the manual effort required to manage the audit process and generate the audit reports.</p>
<P>
When considering third-party audit solutions, look for those with support for your data server platforms and the specific user and table auditing you require. In addition, look for auditing solutions that provide alerts to indicate any suspicious activity that should be investigated. Finally, your auditing solution should be manageable by non-IT individuals through easy-to-use administration and reporting consoles with security permissions separate from those maintained by the data server or operating system.
<h3>DATA ENCRYPTION</h3>
<P>
Data encryption is a critical component of data security. While most organizations protect themselves from data intrusion though the use of perimeter and firewall security, data breaches continue to occur. Studies have shown that roughly 80 percent of security breaches happen within the firewall. Encryption protects the data from unauthorized access by rendering the data meaningless to anyone who doesn't have the encryption key.</p>
<P>
Several regulations require encryption of sensitive data to safeguard individual privacy. Several state privacy laws waive disclosure requirements if a security breach involves encrypted data. Disclosing security breaches can have huge financial implications, so encryption is becoming an attractive option. Encrypting data both over the wire (moving around the network) and at rest (on disk) is critical. Offline data (such as backups) must also be encrypted. Data backups are often a target for theft because they are items that are easy to steal (tapes, for example). And the fact that these offline backups are often transported to a remote site only increases the theft risk.</p>
<P>
Encryption solutions should support both online (data at rest) and offline (backup) encryption in addition to support for various encryption algorithms and keys (asymmetric, symmetric, and so on). Consider how transparent the solution is to your applications; modifying every application that requires access to the encrypted data could take years of manual effort. Look for solutions that manage keys outside the database they are protecting (some regulations require this feature).</p>
<P>
There is some debate about column-level vs. file-level encryption. Some companies feel that encrypting only sensitive data (column level) is more efficient than file-level encryption (see Figure 3). However, most column-level encryption requires modification of the applications that use the data &mdash; which could be a tremendous task. And column-level encryption can have a performance impact, as the burden of decryption is managed at the application level, rather than the file-system level; make sure to stress test any column-level implementation to fully understand the effect it could have on your application.</p>
<P>
<b>Figure 3. Column- and file-level encryption.</b> <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig3_large.jpg" target="_blank">Click here</a> for larger figure.<br />
<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig3_large.jpg" target="_blank"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig3.jpg" alt="Figure 3. Column- and file-level encryption." width="450" height="129" border="0"></a></p>
<P>
A related option, data masking, can protect data privacy in test environments. Data masking can be used to scramble sensitive data as it is being copied from one environment to another (from production to test, for example). Typically this technique is not used in production environments because there is no way to unscramble the data. Look for solutions that provide realistic data masking, meaning the scrambled data is designed to mimic the real data. For example, if your company decides to scramble Visa card numbers, data masking should be able to generate a scrambled number that reflects the pattern of a valid Visa number. Data masking can be a good solution for companies that partake in offshore or outsourced development and want to protect sensitive data during the development process.</p>
<h3>ACTIVITY AND ALERT MONITORING</h3>
<P>
Database activity monitoring is another important tool for deterring security breaches. A database activity monitor is typically a network appliance that listens to and analyzes network traffic as it passes from the client or application server to the database server. Because not all database activity comes across a network (stored procedures and local logins are examples), most database activity monitoring solutions provide an agent that resides on the data server to pick up local activity.</p>
<P>
Database activity monitoring is often confused with database auditing, because both solutions collect access activity information. However, because the activity monitor is watching live traffic across the network, it's a more proactive solution than database auditing, which often is executed much later (hours, days, or even weeks) than the activity being logged. By watching live traffic, database monitors can trip alerts during a security breach, allowing for quick action to stop or minimize the breach. Some activity monitors can actually halt suspicious or unauthorized traffic, such as multiple failed password attempts, essentially acting as a database firewall.</p>
<P>
Database activity monitoring is typically driven by policies that indicate which network traffic should be analyzed, alerted, reported, and halted. Some products available in this space provide a collaboration feature that watches network traffic and helps set policies based on normal network traffic patterns detected.</p>
<P>
Many companies use their activity monitor as a database auditing tool, as much of the information required for a database audit is collected with a database activity monitor. However, most database activity monitors don't provide detailed information related to the content of the database record and changes made; as a result, some type of database auditing implementation will likely be necessary at most companies.</p>
<h3>VULNERABILITY TESTING</h3>
<P>
Many regulations require companies to have the ability to test data server vulnerability. This testing involves trying to break into a data server using various techniques. Network-based vulnerability assessment scanners discover and assess the security of data servers in an organization. These scanners essentially mimic the process a hacker would follow to break into a data server to steal data. Most have the ability to locate, probe, report on, and even provide fixes for these security holes. Vulnerability scanners will often seek out vulnerable user IDs and passwords, excessive access, SQL injections, and Trojan horses.</p>
<P>
Look for the ability to scan from inside and outside the firewall when evaluating assessment scanners. Regular updates (as with antivirus software) are important for vulnerability scanners. As new versions of databases are released and patches are applied, break-in techniques can change from month to month. When assessing vulnerability scanners, make sure they provide regular updates to keep up with new releases and vulnerabilities as they become available.</p>
<h3>NO EASY ANSWERS</h3>
<P>
There's no magic bullet when it comes to keeping your data server safe from intrusion; it takes a well-thought-out strategy and a portfolio of security tools to meet every specific need (see Table 1). When evaluating your security strategy, analyze and prioritize your privacy, security, and compliance requirements. Implementing a full strategy takes time and, typically, more than one implementation. Knowing what's most important will help in determining what to implement first.</p>
<P>
<b>Table 1. Regulations and associated solutions.</b> <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_tab1_large.jpg" target="_blank">Click here</a> for larger table.<br />
<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_tab1_large.jpg" target="_blank"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_tab1.jpg" alt="Table 1. Regulations and associated solutions." width="450" height="290" border="0"></a></p>
<P>
Unfortunately, no database is hacker-safe; if somebody wants the data badly enough, they'll find a way to get it. Access control helps deter criminals; encryption can render the data useless to the hacker; and monitoring and auditing can minimize the effects of any breaches that do occur.</p>
<P>
And, as the final component of your security policy, you should also have a well-defined set of procedures to follow in the event of a breach. Don't wait until a breach occurs.</p>
<hr noshade width="60%">
<P>
<em><strong>Deb Jenson</strong> &#91;<a href="mailto:dejenson@us.ibm.com">dejenson@us.ibm.com</a>&#93; joined IBM in 2004 to drive strategic marketing for the DB2 data server and is now responsible for driving the strategic direction of the IBM Data Studio product. She has more than 10 years of experience in the software industry and extensive experience developing database products.</em></p><P>
<img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/feature_image2sm.jpg" alt="Governing for Data Security" class="Image_Float-Left" border="1" width="250" />
Not so long ago, the always-on phenomenon of Internet computing pushed performance and availability to the top of IT departments' lists. Today, one concern tops even those: data governance. The basic organizational need to keep unauthorized eyes away from sensitive data (corporate secrets, customer data, personnel files, and so on) becomes even more urgent and complicated when government and industry regulations enter the mix. Many of these regulations require specific safeguards, as well as the ability to prove the safeguards are in place. And no organization wants to make the news for a breach in security.<br /><br />Data governance has emerged as a quality control discipline for assessing, managing, using, improving, monitoring, maintaining, and protecting organizational information. How does that discipline apply to the data server? What's the best approach to secure a data server and protect the data it holds? I'll explain what steps you should take and how to combine features found in most data servers today with third-party tools to address security and privacy concerns.</p>
<h3>WHAT IS DATA GOVERNANCE?</h3>
<P>
Corporate financial fraud (typically, misrepresentations of balance sheet figures by executive management) has dominated business headlines in the last seven years. The Sarbanes-Oxley Act, which requires every public company to adhere to strict financial procedures, was passed to address this kind of fraud. Sarbanes-Oxley holds executive officers responsible for any fiscal errors. Executive officers must sign off on all financial reports and face stiff consequences (large monetary fines and imprisonment) if irregularities occur. As a result, many companies now have corporate auditors who review all updates made to financial data to ensure integrity. Nobody wants their CEO to go to prison.</p>
<P>
But data governance isn't just about putting best practices in place to ensure regulatory compliance; it's about protecting data from internal or external security breaches. If you have sensitive data, it's just good business practice to ensure that data is safe. According to a recent survey by PGP Research, the average cost of a data breach is $4.8 million, and the real costs incurred range from $2 million to $22 million.</p>
<P>
Data governance can certainly be used to address the various regulatory compliance challenges a company faces, but the primary threat it should address is a security breach. The two most common threats are exposure of private data and tampering or altering of data.</p>
<P>
Many recent regulatory acts focus on the privacy of personal data &mdash; in other words, information pertaining to a specific individual. Whether that individual is a customer or an employee is irrelevant, as each individual has a right to privacy. Data privacy laws exist worldwide; companies face the strictest laws in Japan, and most companies in the United States are affected by state regulations in addition to industry and federal regulations (see Figure 1, below). Many state privacy regulations require organizations that have exposed personal information due to a security breach to immediately inform the individuals (and government officials) so that affected parties can begin to mitigate damage as quickly as possible. Typically, disclosure of security breaches is not required if the data involved was encrypted &mdash; as a result, more corporations are encrypting personal data.</p>
<P>
<b>Figure 1. Most states have notification requirement laws.</b> <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig1_large.jpg">Click here</a> for larger figure. <br />
<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig1_large.jpg"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig1.jpg" alt="Figure 1. Most states have notification requirement laws." width="450" height="215" border="0"></a></p>
<P>
Other forms of confidential information (financial data, trade secrets and so on) need protection, too, and should be included in any data protection plan.</p>
<P>
Data protection comes in proactive and reactive forms (see Figure 2, below); many organizations require a combination of both forms. Proactive strategies prevent data breaches, and reactive strategies detect breaches that have occurred (as quickly as possible).</p>
<P>
<b>Figure 2. Forms of data server
protection.</b><br/><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig2.jpg" alt="Figure 2. Forms of data server protection." width="450" height="350"></p>
<h3>ACCESS CONTROL</h3>
<P>
One common approach to proactive data protection is to implement strict policies around access control, which is designed to grant only specific individuals access to the data server and lock out everybody else.</p>
<P>
Most data servers offer another layer of protection that restricts access to specific rows or columns, instead of granting access to a whole table. In DB2 and Informix Dynamic Server, this feature is called label-based access control (LBAC). LBAC allows an organization to restrict access at the row or column level by associating user and group labels to table row or column labels. Another approach is to implement views and grant access only to the view, rather than to the underlying table. When using this approach, the only data accessible to the user is the data that is returned upon execution of the view.</p>
<P>
The following is a list of access control practices, some of which are specific to DB2; however, the underlying concepts can be applied to other data servers:</p>
<ul>
<li>Remove all PUBLIC privileges, particularly in production</li>
<li>Create database with restrictive keyword (in DB2)</li>
<li>Never grant privileges to PUBLIC for convenience, even in test settings.</li>
<li>Explicitly name the SYSADM group and restrict to as few as possible (in DB2).</li>
<li>Separate DBA privileges from security privileges by using the SECADM authorization (new in DB2 9).</li>
<li>Restrict privileges on a &quot;need to know&quot; basis.</li>
<li>Consider using LBAC to restrict access to specific rows and columns.</li>
<li>Use views to further restrict access.</li>
<li>Only grant access via stored procedures rather than to the underlying tables or views.</li>
</ul>
<h3>DATA AUDITING</h3>
<P>
Auditing is a reactive form of security, and many regulations worldwide require it. But companies also implement auditing to guarantee the integrity of financial and other sensitive data by making sure no one has tampered with it, which is particularly important for those companies regulated by Sarbanes-Oxley. Auditing business users (for example, employees in human resources and financial departments), IT users with unlimited access, and sensitive data is becoming a standard business practice.</p>
<P>
IT and auditing departments have already realized that the data from database audit collections contains a wealth of information. This audit data contains a chronological record of all database access, which is important to auditors; it can also provide insight into behavioral aspects of how the data is accessed.</p>
<P>
While audit reports can provide insight into how data is being accessed, they cannot prevent a security breach. Auditing is simply reporting on data that has already been accessed. If a security breach occurs, auditing is a useful mechanism for determining how it occurred and the scope of the incident. Both these pieces of information are necessary for analyzing the impact of the breach and the required disclosures.</p>
<P>
Most data servers provide some level of built-in auditing (usually activated by turning on an audit trace) within the data server. DB2 9.5, released in late 2007, provided significant functionality around database auditing. The audit trace generates audit log records each time a particular action that has been flagged for auditing occurs. The more information being audited, the higher the overhead, so it makes sense to audit only tables with sensitive data and users with significant access privileges. Once the log records are generated, the information is sent to a log file. This log file most likely requires some level of data mining and reporting in order to be useful. Most companies deploy third-party audit solutions to minimize the manual effort required to manage the audit process and generate the audit reports.</p>
<P>
When considering third-party audit solutions, look for those with support for your data server platforms and the specific user and table auditing you require. In addition, look for auditing solutions that provide alerts to indicate any suspicious activity that should be investigated. Finally, your auditing solution should be manageable by non-IT individuals through easy-to-use administration and reporting consoles with security permissions separate from those maintained by the data server or operating system.</p>
<h3>DATA ENCRYPTION</h3>
<P>
Data encryption is a critical component of data security. While most organizations protect themselves from data intrusion though the use of perimeter and firewall security, data breaches continue to occur. Studies have shown that roughly 80 percent of security breaches happen within the firewall. Encryption protects the data from unauthorized access by rendering the data meaningless to anyone who doesn't have the encryption key.</p>
<P>
Several regulations require encryption of sensitive data to safeguard individual privacy. Several state privacy laws waive disclosure requirements if a security breach involves encrypted data. Disclosing security breaches can have huge financial implications, so encryption is becoming an attractive option. Encrypting data both over the wire (moving around the network) and at rest (on disk) is critical. Offline data (such as backups) must also be encrypted. Data backups are often a target for theft because they are items that are easy to steal (tapes, for example). And the fact that these offline backups are often transported to a remote site only increases the theft risk.</p>
<P>
Encryption solutions should support both online (data at rest) and offline (backup) encryption in addition to support for various encryption algorithms and keys (asymmetric, symmetric, and so on). Consider how transparent the solution is to your applications; modifying every application that requires access to the encrypted data could take years of manual effort. Look for solutions that manage keys outside the database they are protecting (some regulations require this feature).</p>
<P>
There is some debate about column-level vs. file-level encryption. Some companies feel that encrypting only sensitive data (column level) is more efficient than file-level encryption (see Figure 3). However, most column-level encryption requires modification of the applications that use the data &mdash; which could be a tremendous task. And column-level encryption can have a performance impact, as the burden of decryption is managed at the application level, rather than the file-system level; make sure to stress test any column-level implementation to fully understand the effect it could have on your application.</p>
<P>
<b>Figure 3. Column- and file-level encryption.</b> <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig3_large.jpg">Click here</a> for larger figure.<br />
<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig3_large.jpg"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_fig3.jpg" alt="Figure 3. Column- and file-level encryption." width="450" height="129" border="0"></a></p>
<P>
A related option, data masking, can protect data privacy in test environments. Data masking can be used to scramble sensitive data as it is being copied from one environment to another (from production to test, for example). Typically this technique is not used in production environments because there is no way to unscramble the data. Look for solutions that provide realistic data masking, meaning the scrambled data is designed to mimic the real data. For example, if your company decides to scramble Visa card numbers, data masking should be able to generate a scrambled number that reflects the pattern of a valid Visa number. Data masking can be a good solution for companies that partake in offshore or outsourced development and want to protect sensitive data during the development process.</p>
<h3>ACTIVITY AND ALERT MONITORING</h3>
<P>
Database activity monitoring is another important tool for deterring security breaches. A database activity monitor is typically a network appliance that listens to and analyzes network traffic as it passes from the client or application server to the database server. Because not all database activity comes across a network (stored procedures and local logins are examples), most database activity monitoring solutions provide an agent that resides on the data server to pick up local activity.</p>
<P>
Database activity monitoring is often confused with database auditing, because both solutions collect access activity information. However, because the activity monitor is watching live traffic across the network, it's a more proactive solution than database auditing, which often is executed much later (hours, days, or even weeks) than the activity being logged. By watching live traffic, database monitors can trip alerts during a security breach, allowing for quick action to stop or minimize the breach. Some activity monitors can actually halt suspicious or unauthorized traffic, such as multiple failed password attempts, essentially acting as a database firewall.</p>
<P>
Database activity monitoring is typically driven by policies that indicate which network traffic should be analyzed, alerted, reported, and halted. Some products available in this space provide a collaboration feature that watches network traffic and helps set policies based on normal network traffic patterns detected.</p>
<P>
Many companies use their activity monitor as a database auditing tool, as much of the information required for a database audit is collected with a database activity monitor. However, most database activity monitors don't provide detailed information related to the content of the database record and changes made; as a result, some type of database auditing implementation will likely be necessary at most companies.</p>
<h3>VULNERABILITY TESTING</h3>
<P>
Many regulations require companies to have the ability to test data server vulnerability. This testing involves trying to break into a data server using various techniques. Network-based vulnerability assessment scanners discover and assess the security of data servers in an organization. These scanners essentially mimic the process a hacker would follow to break into a data server to steal data. Most have the ability to locate, probe, report on, and even provide fixes for these security holes. Vulnerability scanners will often seek out vulnerable user IDs and passwords, excessive access, SQL injections, and Trojan horses.</p>
<P>
Look for the ability to scan from inside and outside the firewall when evaluating assessment scanners. Regular updates (as with antivirus software) are important for vulnerability scanners. As new versions of databases are released and patches are applied, break-in techniques can change from month to month. When assessing vulnerability scanners, make sure they provide regular updates to keep up with new releases and vulnerabilities as they become available.</p>
<h3>NO EASY ANSWERS</h3>
<P>
There's no magic bullet when it comes to keeping your data server safe from intrusion; it takes a well-thought-out strategy and a portfolio of security tools to meet every specific need (see Table 1). When evaluating your security strategy, analyze and prioritize your privacy, security, and compliance requirements. Implementing a full strategy takes time and, typically, more than one implementation. Knowing what's most important will help in determining what to implement first.</p>
<P>
<b>Table 1. Regulations and associated solutions.</b> <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_tab1_large.jpg">Click here</a> for larger table.<br />
<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_tab1_large.jpg"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f2_tab1.jpg" alt="Table 1. Regulations and associated solutions." width="450" height="290" border="0"></a></p>
<P>
Unfortunately, no database is hacker-safe; if somebody wants the data badly enough, they'll find a way to get it. Access control helps deter criminals; encryption can render the data useless to the hacker; and monitoring and auditing can minimize the effects of any breaches that do occur.</p>
<P>
And, as the final component of your security policy, you should also have a well-defined set of procedures to follow in the event of a breach. Don't wait until a breach occurs.</p>
<hr noshade width="60%">
<P>
Deb Jenson &#91;dejenson@us.ibm.com&#93; joined IBM in 2004 to drive strategic marketing for the DB2 data server and is now responsible for driving the strategic direction of the IBM Data Studio product. She has more than 10 years of experience in the software industry and extensive experience developing database products.</p>
<P>
---</p>]]></body></item><item><title><![CDATA[Workload Management: The Database Traffic Cop in DB2 9]]></title><link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=206800845&cid=RSSfeed]]></link><description><![CDATA[DB2 9's Workload Manager lets you set workload priorities that meet service-level agreements and keep all other traffic flowing smoothly. ]]></description><pubDate>Mon, 25 Feb 2008 05:00:12 EST</pubDate><keywords><![CDATA[DB2 9 Workload Manager, Service-Level Agreements, Database Activity, Complex Queries, DB2 Performance, WLM]]></keywords><blurb><![CDATA[DB2 9's Workload Manager lets you set workload priorities that meet service-level agreements and keep all other traffic flowing smoothly. ]]></blurb><authors><![CDATA[Howard Goldberg]]></authors><body><![CDATA[<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/feature_image3sm.jpg" alt="WLM: The Database Trafic Cop" class="Image_Float-Left" border="1" width="250" />Imagine New York City rush hour traffic during the holiday season. If you've never been there, suffice it to say that there are far more cars than the streets can accommodate. Horns blare as frustrated drivers get stuck in the middle of an intersection and end up blocking oncoming traffic. A disruption in the normal traffic flow at one intersection can have a ripple effect, causing widespread traffic delays.<br /><br />
<P>
Traffic signals do help manage the use of the limited public road infrastructure, but the system backs up quickly when it's stressed. Adding a traffic cop at busy intersections helps control the flow of cars and prevents blocked intersections. These traffic cops provide an additional level of management to preserve traffic flow, often adjusting dynamically as the volume of cars fluctuates.<br /><br />
<P>
Databases also suffer periods when high demand can cause &quot;traffic jams,&quot; which lead to poor response times and unhappy clients. In DB2 9.5 for Linux, Unix, and Windows, IBM introduced a sophisticated traffic-regulating mechanism called Workload Manager (WLM) that's embedded in the database engine.
<h3>THE RISE OF WLM</h3>
<P>
A database engine processes all work requests with the same urgency, because it lacks the necessary information to prioritize and classify incoming activity. Although this model works well most of the time, the database can better utilize its available resources when it has more knowledge about the workload and the performance service level expectations of the clients (application developers and users).<br /><br />
DB2 9.5's WLM improves database efficiency by serving as the triage point for all activities passing through the database; it systematically prioritizes and groups work with similar characteristics together using connection identification attributes and work activity types. It controls and monitors database activity and resources, utilizing threshold and work actions to meet service level agreements.</p>
<P>
Of course, workload management isn't new. In the mainframe environment, with its centralized architecture, workload management has been in use for years to help applications share resources efficiently. A single poorly performing, resource-intensive application in a mainframe environment could negatively affect all other applications. On the distributed platform, workload management wasn't considered as critical because of the dedicated resource implementation approach. With a dedicated implementation, a single application using a database typically owns the entire host, which allows the application to control its own destiny.</p>
<P>
However, the dedicated approach is losing favor. Exploding data growth rates and ever-increasing retention requirements have increased the frequency of unique and concurrent client access to the data store along with the increasing complexity of database query requests. Yet internal and external customers want response times to remain the same (or even improve).</p>
<P>
Another factor necessitating WLM is the convergence of transactional and decision support workloads into a single enterprise database. The once-clear distinction between transactional databases, which typically support sub-second response times, and decision support databases, which typically support more complex and time-consuming queries, is blurring. Bringing these workloads together helps enterprises make decisions faster using a single consolidated data store.</p>
<P>
Ultimately, data center real estate and power are finite; adding extra capacity is expensive and requires long term planning. WLM helps organizations make the most of existing hardware and information technology investments.</p>
<h3>WLM BUILDING BLOCKS</h3>
<P>
Building a WLM strategy or policy is similar to constructing an application database. While the application database is built from database objects such as tables, indexes, and views, a workload management strategy or policy is created using WLM-related database objects. These objects act as the building blocks for creating sophisticated and comprehensive resource governing and monitoring structures. Table 1 describes the WLM building blocks and classifies them by task and scope.</p>
<P>
<b>Table 1. The WLM building blocks.</b> <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab1_large.jpg" target="_blank">Click here</a> to enlarge table.<br>
<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab1_large.jpg" target="_blank"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab1.jpg" alt="Table 1. The WLM building blocks." width="450" height="150" border="0"></a></p>
<P>
These WLM components help answer two key workload management questions: Who is doing the work, and what type of work is being executed. The WLM engine answers the &quot;who&quot; question using workloads and the &quot;what&quot; question using work classes. More intrusive actions (reacting to work events such as stop execution or queuing) are performed using thresholds and work actions. I'll describe each WLM building block in more detail.</p>
<P>
<strong>Service class.</strong> A service class defines the environment for database activities assigned to it. Every service class is either a service super class or a service subclass within a service super class. A <code>service</code> class manages database activities by controlling an agent's operating system prioritization and its I/O prefetch prioritization. An agent's priority controls the amount of relative operating system priority assigned to agents running in a service class. This value can be set to a literal of default or a specific integer value. The <code>default</code> value uses the normal priority assigned by the operating system; a specified integer value will direct the operating system to adjust the agent's dispatching priority. On Unix, negative agent priority values denote a higher relative priority for the agent. The allowable range of values that an agent's priority can be set to is from plus 20 to minus 20. WLM adjusts priority by adding this number to the operating-system assigned value. The I/O prefetch priority can only be set to high, medium, or low. Varying these settings across service classes or within a super class using subclasses will create a tiered workload management scheme.</p>
<P>
By default, three service classes are predefined: default user class (<code>sysdefaultuserclass</code>), default system class (<code>sysdefaultsystemclass</code>), and default maintenance class (<code>sysdefaultmaintenanceclass</code>). In addition, each service (or super) class contains a default subclass (<code>sysdefaultsubclass</code>). Upon installation of DB2, all user requests are executed in the default user class and system requests by the default system classes. A service class can be split into subclasses for more granular control over the work assigned to that class.</p>
<P>
This example shows how to create a service class:</p>
<P>
<code>create service class sc1<br>
agent priority default<br>
prefetch priority high;</code></p>
<P>
This example shows how to create a service subclass:</p>
<P>
<code>create service class sc2<br>
under sc1<br>
agent priority default<br>
prefetch priority high;</code></p>
<P>
<strong>Workload.</strong> A workload is associated with a service class and identifies incoming work, using connection attributes, to be controlled by that service class. Two workloads are created by default: the default user workload (<code>sysdefaultuserworkload</code>) and the default administration workload (<code>sysdefaultadmworkload</code>). Each of these is mapped to the user service class (<code>sysdefaultuserclass</code>). All non-system work is assigned to the default user workload if no other custom workloads are created. This configuration is present regardless of whether WLM is activated in the database. All work requests are evaluated and assigned to a workload at connection initiation and on Unit of Work boundaries.</p>
<h3>CREATING A WORKLOAD</h3>
<P>
Listed below is an example of a create workload statement. It identifies connections associated with the <code>session_usergroup</code> of &quot;appusers&quot; and assigns them to the sc1 service class. Table 2 shows all the available workload attributes that can be used to identify incoming connections.</p>
<P>
<code>Create workload wl1 session_user group ('appusers') service class sc1;</code></p>
<P>
<strong>Workload connection attributes.</strong> Incoming work is assigned to a workload using the workload connection attributes (see Table 2). Each connection attribute can contain one or more entries.</p>
<P>
<b>Table 2. Workload connection attributes.</b> <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab2_large.jpg" target="_blank">Click here</a> to enlarge table.<br>
<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab2_large.jpg" target="_blank"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab2.jpg" alt="Table 2. Workload connection attributes." width="450" height="150" border="0"></a></p>
<P>
<strong>Client information.</strong> Client information is set with the sqleseti API or the wlm_set_client_info stored procedure. Using this stored procedure will allow an application to pass additional information to the DB2 server that can be useful for workload evaluation and reporting purposes. For example, supplementary data such as the actual client id or an application generated report id can be passed from shared application server to the database where only a single generic id was visible to the database previously. (Table 3 shows client information attributes.)</p>
<P>
<b>Table 3. Client information attributes.</b> <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab3_large.jpg" target="_blank">Click here</a> to enlarge table<br>
.<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab3_large.jpg" target="_blank"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab3.jpg" alt="Table 3. Client information attributes." width="450" height="129" border="0"></a></p>
<P>
<strong>Workload security. </strong>For WLM to use a workload object to identify incoming connections, the connection id requires direct usage authority or at a group or database role level. For example, you might grant usage on workload wl1 to group appusers. If a database is not created with the restrict option, then the sysdefaultuserworkload is granted to public by default. Lax workload security procedures (such as granting usage to public) may allow clients to route their work requests to a different service class by adjusting their connection attributes and bypass the workload security mechanisms.</p>
<P>
<strong>Work class. </strong>While a workload identifies work using connection attributes (such as session id and session group id), a work class identifies work by its activity type. The work class activity types are read, write (insert, update, merge), DML (combines read and write work types), call (stored procedures), DDL (data definition language commands such as create and drop), load (other utilities can be identified using workload attributes), and all (all class types). A work class is used with work class set and work actions to classify, route, and control resources identified by the class types.</p>
<P>
<strong>Work class set.</strong> A work class set groups work classes together. Within a work class setting, the activity is evaluated in the order in which the activity clauses were specified by the create statement or by using the position clause. For example, a select activity will be evaluated within a DML work class rather than a select work class if the DML work class is defined prior to the select work class in the work class set definition. If no work actions are associated with a work class or set, then the WLM engine will ignore the work class.</p>
<P>
The example below illustrates how work classes are combined into a work class set. The &quot;all_work_class_types&quot; work class set contains multiple work classes such as read_wc that identifies connections performing read activities. When multiple work classes are used with work actions, it will provide control over groups of behavior such as stored procedure calls (call), utility (load), DDL (object creation), or read and write (read, write) activities.</p>
<P>
<code>Create work class set all_work_class_types<br>
</code><code>(work class read_wc work type read<br>
</code><code>, work class write_wc work type write<br>
</code><code>, work class ddl_wc work type ddl<br>
</code><code>, work class call_wc work type call<br>
</code><code>, work class load_wc work type load<br>
</code><code>, work class dml_wc work type dml<br>
</code><code>, work class all_wc work type all position last);</code></p>
<P>
Work action set. A work action can be defined at a database or service class level. When a work action is defined at a service class level, the work actions allowed are mapping, prevent execution, collect activity data, collect aggregate activity data, and count activity. A work request that falls within a specified work class can be mapped or routed to a particular service subclass using the mapping action type. The work actions that can be performed at the database level are threshold identification (<code>when sqltempspace &gt; 100MB</code>), prevent execution, collect activity data, and count activity. A threshold attribute specifies the trigger point that will cause a specific work action to be executed. Table 4 lists the threshold attributes.</p>
<P>
<strong>Table 4. Threshold attributes. <br>
<img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab4.jpg" alt="Table 4. Threshold attributes." width="450" height="237"></strong></p>
<P>
The examples below illustrate how to create a work action set associated either to a threshold (<code>one_query</code>) or mapping (<code>map_dml</code>) action. The activity is identified by a work class set (<code>all_work_class_types</code>) and associated work class (<code>read_wc, dml_wc</code>).</p>
<P>
<strong>Threshold example</strong></p>
<P>
<code>create work action set database_actions<br>
</code><code>for database using work class set all_work_<br>
</code><code>class_types<br>
</code><code>(work action one_query on work class read_wc<br>
</code><code>when concurrentdbcoordactivities &gt; 1 and queuedactivities &gt; 1<br>
</code><code>stop execution,<br>
</code><code>work action two_queries on work class write_wc<br>
</code><code>when concurrentdbcoordactivities &gt; 2 and queuedactivities &gt; 2<br>
</code><code>collect activity data continue)</code></p>
<P>
<strong>Mapping example</strong></p>
<P>
<code>create work action set database_actions<br>
</code><code>for service class sc1 using<br>
</code><code>work class set all_work_class_types<br>
</code><code>(work action map_dml on work class dml_wc<br>
</code><code>map activity to sc2)</code></p>
<P>
Please note that <code>sc2</code> must be a subclass within the <code>sc1</code> superclass.</p>
<P>
<strong>Work action. </strong>A work action is not a separate object. It's a component of the work action set object. A work action can only be added or changed using the alter work action set DDL command. However, the only way to change a work action to work action set association is to drop and recreate the work action set. Here is an example on how to add a work action to an existing work action set:</p>
<P>
<code>alter work action set database_actions<br>
</code><code>add work action three_queries on work class all_wc<br>
</code><code>when concurrentdbcoordactivities &gt; 3 and queuedactivities &gt; 3<br>
</code><code>stop execution;</code></p>
<P>
<strong>Threshold.</strong> A threshold defines the limits on a resource and the subsequent action that will be taken when the threshold is crossed. A threshold can be associated with a service class, workload, work action, or any activity occurring in the database. The enforcement of the threshold can occur at a single database partition or across all database partitions in a DB2 Enterprise Server Edition Data Partitioning Feature (DPF) environment, or at a workload occurrence level. Each workload occurrence across database partitions, within a DPF environment, will have its own threshold boundary counters.</p>
<P>
The following resources can be controlled by a threshold:</p>
<ul>
<li>estimatedsqlcost (timerons)</li>
<li>sqlrowsreturned</li>
<li>sqltempspace (KB, MB, GB)</li>
<li>activitytotaltime (days, hours, or minutes)</li>
<li>connectionidletime (days, hours, or minutes)</li>
<li>concurrentworkloadoccurrences</li>
<li>concurrentdbcoordactivities and queuedactivities</li>
<li>concurrentworkloadactivities and queuedactivities</li>
<li>totaldbpartitionconnections and queuedconnections</li>
<li>totalscpartitionconnections and queuedconnections</li>
</ul>
<P>
After a threshold has been crossed, the following actions can be executed:</p>
<ul>
<li>Collect data, with or without detail</li>
<li>Stop execution</li>
<li>Continue execution</li>
<li> Queue activities (only for concurrentdbcoordactivities and totalscpartitionconnections).</li>
</ul>
<P>
The example below combines the scope of the threshold (service class) with the identification of the condition (temporary space exceeded 100 megabytes) and a threshold action (stop execution). The same threshold can be created using the work action commands with the difference being that the work action provides more control over the identification, using work classes, of the activity that the threshold will affect.</p>
<P>
<code>create threshold tempspace<br>
</code><code>for service class sc1<br>
</code><code>enforcement database<br>
</code><code>when sqltempspace &gt; 100 m<br>
</code><code>stop execution</code></p>
<h3>WORKLOAD VS. WORK CLASSES CONTROL SCOPE</h3>
<P>
The decision about how to manage or control the work occurring in the database should be driven by the level of control desired. A threshold defined on a service class with an associated workload can identify and manage work using connection attributes. If a work action set and its associated work action is chosen, when used with a work class set and work class, the incoming work will be identified by classes of activities such as read, write, load, and DDL. A workload is a more granular WLM management approach that allows for finite control over the activities occurring within the database down to an individual user. The work action approach is at a broader level of control, since it identifies a class of activities regardless of its connection attributes. However, both can be used in conjunction with each other to create very sophisticated WLM strategies. For example, a workload can be used to route work requests to a specific service class and a work action can further route these requests to a sub class within a super class using a work class set and action combination.</p>
<h3>PREDICTIVE VS. REACTIVE THRESHOLDS</h3>
<P>
There are two types of thresholds that can be set in WLM; predictive and reactive. Table 5 shows examples of both.</p>
<P>
<strong>Table 5. Threshold types. </strong><a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab5_large.jpg">Click here</a> to enlarge table.<br>
<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab5_large.jpg" target="_blank"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_f3_tab5.jpg" alt="Table 5. Threshold types." width="450" height="335" border="0"></a></p>
<P>
A predictive threshold is evaluated before the activity starts consuming resources. An example of this type of threshold is SQL estimated cost. As the name implies, the SQL cost is an estimate and may not be accurate. A create threshold DDL statement can only use the estimatedsqlcost option to set a predictive threshold. A work class can use either timeroncost (same as estimatedsqlcost) or cardinality but only for DML type operations. Another form of predictive thresholds is query concurrency and queuing. Using the concurrentdbcoordactivities and queuedactivities attributes, the WLM engine can analyze query workload and control the number of concurrent queries before they start executing.</p>
<P>
A reactive threshold is examined after a work request has begun to consume resources. There are far more reactive metrics to examine than there are predictive.</p>
<h3>START SIMPLE</h3>
<P>
WLM can be used to identify and study a database's activity patterns. Once these patterns are understood, a WLM policy can be created to maximize the efficiency of the database environment. This will be an iterative process and may require some fine-tuning.</p>
<hr noshade width="60%">
<P>
<em><strong>Howard Goldberg</strong> &#91;<a href="mailto:howard505@gmail.com">howard505@gmail.com</a>&#93; has more than 25 years of experience with database technology and data warehousing. He is currently a database architect at a major financial services firm, where he's leveraging DB2 to implement large-scale mission critical data warehouses.</em></p>]]></body></item><item><title><![CDATA[DB2 DBA: Elevating the DBA Role]]></title><link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=206800831&cid=RSSfeed]]></link><description><![CDATA[Is it time to retire the database administrator title for one that represents the real work DB2 DBAs do? ]]></description><pubDate>Mon, 25 Feb 2008 05:00:10 EST</pubDate><keywords><![CDATA[Robert Catterall, Information Life Cycle Management, Data Change Auditing, Application Enablement, DB2 Autonomics, DB2 for z/OS, DB2 for Linux, Unix, and Windows, DB2 DBA]]></keywords><blurb><![CDATA[Is it time to retire the database administrator title for one that represents the real work DB2 DBAs do? ]]></blurb><authors><![CDATA[Robert Catterall]]></authors><body><![CDATA[<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/columns/catterall_robert.jpg" alt="Robert Catterall" class="Image_Float-Left" border="1" height="90" width="90" />It's nice when one's job title accurately reflects what one actually does for a living; examples that fit include customer service representative, teacher, pilot, dentist, and electrician. How about database administrator? Is the DBA title appropriate in light of the work you do? More specifically, is it appropriate in light of the most important work you do? Does it indicate the way you deliver value to your organization? To an increasing degree, I'm not so sure.<br /><br />Animal House, the 1978 comedy about a raucous college fraternity, is one of my all-time favorite movies. The opening shot shows the main quadrangle of picturesque (and fictitious) Faber College. There, in the center of the quad, is a statue of the institution's founder. The camera pans down to the base of the statue, where we see the school's motto: &quot;Knowledge is Good.&quot; DB2 people might feel that it goes without saying that &quot;Data is Good,&quot; but let's probe a bit deeper. What is data good for? There's more than one way to answer this question. I'll suggest a couple ways and share some related thoughts about DBAs and their value-add roles (with implications for a good-fit job title, of course).</p>
<h3>DATA ENABLES APPLICATIONS</h3>
<P>
When you get right down to it, an IT department best serves an organization when it delivers (and supports) applications that enable the larger organization to do what it does (and do it better, faster, more reliably, or more cost-effectively). The application functionality provided by an IT group can be a huge competitive advantage for a company. The leader of an upstart low-fare airline that went bankrupt in the 1980s said later that his company's demise was largely due to the sophisticated yield management system of a large competitor; the application enabled the competing airline to offer at least some of its seats on overlapping routes at prices that matched those of the newcomer, while protecting the established carrier's profit margins by offering other seats (for late-booking passengers, for example) at higher prices. Applications have boosted the fortunes of companies operating in almost every industry.</p>
<P>
Applications, then, are hugely important to an organization's success, and applications don't happen without data. What does that mean for a DBA? It means that you might be of greatest value to your company when you're working in application-enablement mode. Yes, I know the tasks that might be considered &quot;traditional&quot; for a DBA (backing up and reorganizing database objects, monitoring performance, tuning SQL, implementing a physical database design) have merit. But these DBA activities are becoming relatively less important over time, while those activities that help turn application ideas into functional reality are becoming more important.</p>
<P>
Think about this development in terms of advances in DB2 technology. All kinds of things that DB2 DBAs used to do have either been made unnecessary or much less time-consuming thanks to new features delivered in various DB2 releases, The following list focuses on DB2 for z/OS, but the story's the same for DB2 for Linux, Unix, and Windows DBAs:</p>
<ul>
<li>Managing application plans was often an onerous task before package bind was delivered in DB2 version 2.3.</li>
<li>Index contention issues, which took a lot of DBA time, were virtually eliminated via Type 2 indexes provided with DB2 V4 (these also delivered row-level locking functionality).</li>
<li>Stored procedure management got a lot easier when DDL for stored procedures was delivered as part of DB2 V6.</li>
<li>The ability to dynamically rotate the partitions of a table space, provided by DB2 V8, eased management of time-series data.</li>
<li>Clone table support, a feature of DB2 9, enables DBAs to replace one table with another with much less expenditure of effort and time than previously required.</li>
</ul>
<P>
These are just a few of the DBA labor-saving features delivered over the past 20 years in succeeding releases of DB2. This list will keep growing with future DB2 versions. I have a feeling that some DBAs might be a bit concerned about the forward progress of DB2 autonomic computing (in other words, the DB2 subsystem's decreasing need for human intervention in order to run well). They might be afraid that these DB2 enhancements will reduce demand for their skills, thereby dimming employment prospects. I believe that this could in fact happen, depending in large part on whether or not DBAs let it happen. I don't see DB2 autonomic features as a threat to DBAs. Instead, I view them as a key factor in freeing up DBA time that can be reinvested at a higher rate of return (in terms of delivering value to the employing organization) in helping application architects and developers build software that makes the organization a more efficient, agile, and reliable competitor.</p>
<P>
Make no mistake &mdash; if you pull a chair up to the application-building table at your company, you'll be welcomed by the folks working there. Application architects and developers with whom I've spoken very much appreciate the help of database-savvy people when it comes to designing new application systems. Among the ways you can help:
<ul>
<li>Make sure that programmers know of the various ways DB2 data can be accessed. I'm particularly big on the use of DB2 stored procedures.</li>
<li>Make sure programmers know that the use of DB2 as a database manager doesn't mean that they can't use their favorite programming language. I'm not just talking about Java or C# here, but Perl, Python, PHP, and Ruby, as well (see my column &quot;<a href="/showArticle.jhtml?articleID=201200804">News from the Client Side</a>,&quot; in the Issue 2, 2007 issue of <em>DB2 Magazine</em>).</li>
<li>Help determine which application functions should be synchronous or asynchronous in nature. When an asynchronous, message-based pattern is appropriate, keep in mind that WebSphere MQ is preeminent in the area of message queuing and delivery software, and MQ and DB2 work well together.</li>
<li>Be supportive, from a data perspective, of plans to design applications according to the principles of service-oriented architecture (SOA). Get familiar with SOA concepts and lingo (it might be helpful to read my column &quot;<a href="/story/showArticle.jhtml?articleID=191502499">SOA for DBAs</a>&quot; in the Quarter 3, 2006 issue). Keep in mind that certain aspects of SOA tend to increase data server CPU utilization, and try not to succumb to knee-jerk opposition. Sometimes, a modest increase in CPU consumption can be a reasonable price to pay in return for applications that enhance the organization's agility and quality-of-service levels.</li>
<li>Speaking of SOA, you might want to let your application development colleagues know about IBM's new (and free) Data Studio tool. Among other things, Data Studio makes it easy to expose DB2 stored procedure calls or SQL <code>SELECT</code>s (or other DB2 data operations) as Web services. This is big.</li>
<li>Educate programmers regarding DB2 features that support application development. Do they know that DB2 provides best-in-class storage and query capabilities with respect to XML data? What do they know about user-defined functions? About recursive SQL? About dynamic addition and rotation of table partitions? Be their trusted source for this information.</li>
</ul>
<P>
You get the idea. So, does the title &quot;database administrator&quot; seem like a good fit for this kind of work? I think not, but I'll admit that I don't have an ideal substitute. Data architect? Application DBA? Application data specialist? I don't know. I'd be pleased to hear your ideas (my email address is at the end of this article).</p>
<h3>DATA ASSURES COMPLIANCE</h3>
<P>
In addition to being the foundation for success-driving applications, data is the key to meeting many of today's compliance requirements. I'm talking about laws here, with stiff penalties for non-observance. Depending on your company's industry and the country (or countries) in which it operates, you might have to keep data pertaining to certain transactions around for years. You could also be required to capture information about changes to certain data elements: who changed what, and when, along with the nature of said changes (from what to what else).</p>
<P>
As a data-oriented person, you're in a position to be a real leader in this area. Here are a couple of ways in which you can help your organization, big-time:</p>
<ul>
<li>If your company does not yet have an information life-cycle management (ILM) strategy, be a leader in terms of getting one defined and implemented. Your employer might be legally required to keep a lot of data around for a long time, but that doesn't mean that you should keep this data in your database. In fact, it's highly likely that you shouldn't keep all this historical stuff in your database. If you do, get ready for a more costly, poorer performing, harder to recover database &mdash; and who wants that? You want to get the old stuff that application users hardly ever request but that the government folks say you have to retain out of your database and into a lower-cost, secure, archival storage system from which it can be automatically (and reasonably readily) retrieved when needed. Oh, and you want this out-of-the-database-and-into-the-archive process to be automated and based on criteria you define. Software exists to do this kind of thing very well (IBM became an ILM software leader when they acquired Princeton Softech's Optim solutions). Get on the ball and show the way toward effective management of data throughout its life cycle.</li>
<li>Data change auditing is more important than ever these days. Don't just leave this to your company's legal staff; they know the requirements, but you can be the person who identifies a solution. You could start by getting to know more about IBM's DB2 Audit Management Expert, one of the more comprehensive solutions on the market. You know the old saying, &quot;an ounce of prevention is worth a pound of cure&quot;? This saying is really true when the subject is data change auditing.</li>
</ul>
<P>
Once again, the title &quot;database administrator&quot; seems inadequate with respect to capturing this very important potential aspect of your work as a DB2 person. Data minder? Data guardian? Data right-placer?</p>
<P>
Hey, how about combining both of the key data-related roles of which I've written herein, and going with Application Enabler/Data Archiver and Auditor? Do you think that AE/DAA could someday supplant DBA?</p>
<P>
Something should.</p>
<hr noshade="noshade" width="60%"> 
<P>
<em><strong>Robert Catterall</strong> &#91;<a href="mailto:rcatterall@catterallconsulting.com">rcatterall@catterallconsulting.com</a>&#93; is president of Catterall Consulting, a firm that helps clients effectively leverage relational database technology (DB2 in particular).</em></p>]]></body></item><item><title><![CDATA[Distributed DBA: DB2 Deep Compression, Part 2]]></title><link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=206800834&cid=RSSfeed]]></link><description><![CDATA[Updates in DB2 9.5 improve this space-saving and performance-boosting technology.]]></description><pubDate>Mon, 25 Feb 2008 05:00:09 EST</pubDate><keywords><![CDATA[DB2 9.5 Deep Compression, Automatic Dictionary Creation, Roger E. Sanders, Distributed DBA, DB2 Load Utility, SQL Query Performance, pureXML]]></keywords><blurb><![CDATA[Updates in DB2 9.5 improve this space-saving and performance-boosting technology.]]></blurb><authors><![CDATA[Roger E. Sanders]]></authors><body><![CDATA[<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/columns/sanders_roger.jpg" alt="Roger Sanders" class="Image_Float-Left" border="1" height="90" width="90" />In my previous column, I mentioned that the enterprise database growth rate is about 125 percent annually. As databases grow, so does the need to preserve, protect, distribute, and derive value from the data stored in them. To help minimize storage requirements for growing databases, IBM introduced deep compression technology in DB2 9. Customer feedback led to enhancements to the original technology, which were delivered with DB2 9.5. In this column, I'll introduce the compression features that are new in DB2 9.5. If you need a refresher on how deep compression works, please read my earlier column (see Resources, in right column).
<h3>AUTOMATIC DICTIONARY CREATION (ADC)</h3>
Although a table can be enabled for deep compression when it's first created, a compression dictionary can't be built until the table has been populated. In DB2 9, there were only two ways to build a compression dictionary: One was to reorganize a populated table that had been enabled for compression (by executing the <code>REORG</code> command with the <code>KEEPDICTIONARY</code> or <code>RESETDICTIONARY</code> option specified); the other was to estimate how much a populated compression-enabled table would benefit if compression were used (by executing the <code>INSPECT</code> command with the <code>ROWCOMPESTIMATE</code> option specified).</p>
<P>
Optimal compression ratios are achieved when a compression dictionary is built from an all-inclusive set of data. When a compression dictionary is built by reorganizing a table (or by using the Inspect utility), a high compression ratio results because every row in the table is used. However, IBM testing has shown that a good compression ratio is also possible when just a small amount of representative data is analyzed. (In some cases, evaluating less than 1 percent of the total number of rows available yielded a compression ratio of 45 percent.) This concept forms the basis for a new compression feature called Automatic Dictionary Creation (ADC).</p>
<P>
In DB2 9.5, if a table is enabled for compression at the time it's created, ADC will cause a compression dictionary to be built automatically once a sufficient amount of data has been stored in the table. The threshold at which ADC kicks in and begins constructing the compression dictionary depends on the table's row size. Dictionary construction typically begins when 1-2MB of pages have been allocated to the table. At that point, ADC will check to see how much user data is contained within the table - if at least 700KB of data is present, a compression dictionary will be built. (Note that these values are set internally and can't be altered.) Operations that can trigger ADC include inserts, imports, loads, and the redistribution of data across partitions.</p>
<P>
Like the compression dictionary built by the Inspect utility, the dictionary created via ADC is stored in the table at the end of the existing data. The table's preexisting records remain uncompressed until an offline table reorganization operation is performed or until the preexisting records are updated (in which case each record modified is compressed when the changes are saved). New records are compressed as they're added. (One of the goals of ADC is to build a compression dictionary that will yield a decent compression ratio without leaving a large amount of uncompressed data in the table.) Figure 1 shows what a table that was enabled for compression during creation would look like before, during, and after a compression dictionary is built by ADC.</p>
<P>
When compression is enabled for a table that is already populated (by setting the <code>COMPRESS</code> attribute to ON), a compression dictionary isn't created automatically. Rather, the next time a table growth action occurs, ADC will be triggered and a small number of records at the beginning of the table will be used to construct a compression dictionary for the table. Once the dictionary is created, data added to the table by subsequent insert, import, load, and redistribution operations will be compressed; preexisting data will remain uncompressed.</p>
<P>
As you can see, the automatic creation of a compression dictionary is controlled, in part, by a table's compression attribute. If you want to prevent ADC behavior, don't enable a table for compression until you're ready to manually build a compression dictionary and compress the data. On the other hand, if you elect to take advantage of ADC, remember that the compression ratio for the dictionary produced may not be as optimal as one created by an offline table reorganization. Also, because the table will remain online while the compression dictionary is built, the transaction that causes ADC to be initiated will experience a slight negative impact on performance when the threshold is crossed and ADC is triggered.</p>
<h3>COMPRESSION AND THE LOAD UTILITY</h3>
<P>
Another significant change in DB2 9.5 is how load operations behave when they are performed against tables enabled for deep compression. In DB2 9, if a compression dictionary had been created for a table, the Load utility would use that dictionary to compress data as it was being loaded. However, if no compression dictionary was present, the Load utility would not build one as part of the load operation. In DB2 9.5, the Load utility can construct a compression dictionary provided that the table being loaded has been enabled for compression and that a <code>LOAD REPLACE</code> operation is performed. Such an operation is initiated by executing the LOAD command with either the <code>REPLACE KEEPDICTIONARY</code> or the <code>REPLACE RESETDICTIONARY</code> option specified. (<code>A LOAD INSERT</code> operation can also result in the creation of a compression dictionary if the table being loaded has been configured for compression and the amount of data coming in triggers ADC.)</p>
<P>
If the<code> LOAD</code> command is executed with either the <code>REPLACE KEEPDICTIONARY</code> or the <code>REPLACE RESETDICTIONARY</code> option specified, and a compression dictionary doesn't exist, a new dictionary will be created. If the <code>KEEPDICTIONARY</code> option is used, the amount of data that will be required to build the compression dictionary is subject to the policies of ADC. Therefore, some of the data will be stored in the table uncompressed; once the dictionary is created, the remaining data that is loaded will be compressed using the new dictionary. On the other hand, if the <code>RESETDICTIONARY</code> option is specified, the amount of data required to build the dictionary is not subject to the policies of ADC and a compression dictionary can be built after loading just one row.</p>
<P>
If the LOAD command is executed with either the <code>REPLACE KEEPDICTIONARY</code> or the <code>REPLACE RESETDICTIONARY</code> option specified, and a compression dictionary already exists, the existing dictionary will either be recreated (<code>RESETDICTIONARY</code>) or left as it is (<code>KEEPDICTIONARY</code>), and data in the table will be compressed using the existing or new dictionary.
<br /><br />
If you want to create a new compression dictionary for a compression-enabled table named <code>EMPLOYEE</code> while performing a load operation, you would execute a command similar to this one:</p>
<P>
<code>LOAD FROM datafile.del OF DEL REPLACE RESETDICTIONARY INTO employee</code></p>
<P>
When this command is executed, assuming no compression dictionary exists for the <code>EMPLOYEE</code> table, a few records found in the file <code>DATAFILE.DEL</code> will be loaded into the <code>EMPLOYEE</code> table uncompressed. As soon as 1-2MB of data has been loaded, ADC will construct a compression dictionary using that data, and the remaining records will be compressed and written to the table directly behind the compression dictionary as they are loaded.</p>
<h3>THE ADMINISTRATIVE VIEW AND TABLE FUNCTION</h3>
<P>
To aid in evaluating the effects of deep compression on a table, the <code>ADMINTABCOMPRESSINFO</code> administrative view and the <code>ADMIN_GET_TAB_COMPRESS_INFO</code> table function were also introduced in DB2 9.5. Table 1 shows the structure of the <code>ADMINTABCOMPRESSINFO</code> administrative view.</p>
<P>
<strong>Table 1. The ADMINTABCOMPRESSINFO administrative view.</strong> <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_distdba_tab1_large.jpg" target="_blank">Click here</a> to enlarge table. <strong><br>
<a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_distdba_tab1_large.jpg" target="_blank"><img src="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_distdba_tab1.jpg" width="450" height="359" border="0"></a></strong></p>
<P>
The <code>ADMIN_GET_TAB_COMPRESS_INFO</code> table function provides a programmatic interface to the <code>ADMINTABCOMPRESSINFO</code> administrative view that can be embedded in an SQL query. The syntax for this function is:</p>
<P>
<code>ADMIN_GET_TAB_COMPRESS_INFO (</code></p>
<P>
<code><em>TableSchema, TableName, ExecMode</em>)</code></p>
<P>
where:</p>
<ul>
<li><em>TableSchema</em> identifies by name the schema where the table for which compression information is to be obtained resides.</li>
<li><em>TableName</em> identifies by name the table for which compression information is to be obtained.</li>
<li><em>ExecMode</em> identifies the mode to use when executing this function. If this parameter is assigned the value <code>REPORT</code> (the default), existing compression statistics will be retrieved. If this parameter is assigned the value <code>ESTIMATE</code>, new compression statistics will be generated based on current data in the table specified, and the results returned will reflect what the compression statistics would look like if you were to compress the table right now.</li>
</ul>
<P>
To retrieve and display compression statistics that were generated at the time a table named <code>PAYROLL.STAFF</code> was compressed, for example, you would execute a query similar to this one:</p>
<P>
<code>SELECT * FROM TABLE</code></p>
<P>
<code>(SYSPROC.ADMIN_GET_TAB_COMPRESS_</code></p>
<P>
<code>INFO('PAYROLL', 'STAFF', 'REPORT')) AS comp_info</code></p>
<P>
On the other hand, to estimate and display compression statistics for a table named <code>PAYROLL.STAFF</code> using data stored in the table right now, you would execute a query like this one:</p>
<P>
<code>SELECT * FROM TABLE</code></p>
<P>
<code>(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO('payroll', 'staff', 'ESTIMATE')) AS comp_info</code></p>
<P>
The first query will tell you the effects of the last compression operation performed; the second will tell you whether compression can be improved by generating a new compression dictionary and recompressing the data. By executing both queries on a regular basis, you can gauge the effectiveness of compression as your data evolves and changes over time.</p>
<h3>XML INLINING AND COMPRESSION</h3>
<P>
Beginning with DB2 9.5, XML documents whose size is 32KB or less can be stored in a row of a base table instead of in the default XML storage object. (Larger documents must always be stored in the default XML storage object.) If you work primarily with small XML documents, storing them &quot;inline&quot; can result in increased performance. That's because fewer I/O operations are required to query, insert, update, or delete XML documents stored directly in base table rows.</p>
<P>
Another advantage to inlining XML documents is that the documents themselves can benefit from deep compression. That's because deep compression works at the base table level and &quot;inline&quot; XML documents reside in the base data rows. Because part of the process of creating a compression dictionary involves looking for repeating patterns that are substrings of a given column, XML documents that are similar in content are viable candidates for compression, resulting in reduced storage space requirements and improved I/O efficiency.</p>
<h3>EASIER, BETTER COMPRESSION</h3>
<P>
Deep compression was introduced in DB2 9 to help reduce the amount of storage space required to house a table's data. Because more compressed rows can be stored on a single page, deep compression can also increase query performance on systems that are I/O-bound (that's not always the case for systems that are CPU-bound). DB2 9.5 ushered in a set of enhancements that make it easier to implement compression for new tables and to evaluate the effects compression has on a table. DB2 9.5 also makes it possible to take advantage of compression when bulk loading large amounts of data. With databases growing at a rate of 125 percent annually, it's probably a safe bet that more deep compression functionality will be available in future DB2 releases.</p>
<P>
<em>Special thanks to Bill Minor, manager of Data Management Services at the IBM Toronto Lab, for providing me with detailed information on the new deep compression features that were introduced in DB2 9.5 and for reviewing this column.</em></p>
<hr noshade="noshade" width="60%"> 
<P>
<em><strong>Roger E. Sanders </strong>&#91;<a href="mailto:roger_e_sanders@yahoo.com">roger_e_sanders@yahoo.com</a>&#93;, president of Roger Sanders Enterprises, Inc., is the author of sixteen books on DB2 for Linux, Unix, and Windows and teaches classes at many DB2 conferences. His latest book, </em>DB2 9 for Linux, UNIX, and Windows Advanced Database Administration Certification Study Guide (MC Press, 2008)<em>, will be available in May 2008.</em></p>]]></body></item><item><title><![CDATA[Informix DBA: The IDS 11 Sysmaster Database, Part 2 ]]></title><link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=206800836&cid=RSSfeed]]></link><description><![CDATA[More database monitoring options for tuning Informix servers.]]></description><pubDate>Mon, 25 Feb 2008 05:00:08 EST</pubDate><keywords><![CDATA[Informix Dynamic Server 11, IDS Cheetah, Lester Knutsen, Sysmaster, Informix Checkpoints, IDS]]></keywords><blurb><![CDATA[More database monitoring options for tuning Informix servers.]]></blurb><authors><![CDATA[Lester Knutsen]]></authors><body><![CDATA[<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/columns/knutsen_lester.jpg" alt="Lester Knutsen" class="Image_Float-Left" border="1" height="90" width="90" />In my previous column, I discussed five useful new tables in the Informix Dynamic Server (IDS) 11 Sysmaster database. In this column, I'll continue that list.
<br /><br />
You'll recall that the Sysmaster database, a pseudo-database that's part of the IDS installation, provides a peek into the shared memory structures of an IDS server and is very useful for monitoring your server's status and performance (see <a href="http://www.advancedatatools.com/TechInfo/InformixInfo.html" target="_blank">www.advancedatatools.com/TechInfo/InformixInfo.html</a> for more information).<br />
<br /> I'll show you some examples of how to put the new Sysmaster features in IDS 11 (formerly code-named &quot;Cheetah&quot;) to use. I'll cover seven new Sysmaster tables:</p>
<ul>
<li><strong>Sysnetclienttype, Sysnetglobal, and Sysnetworkio</strong>, which show network stats.</li>
<li><strong>Syssqltrace, Syssqltrace_info, and Syssqltrace_iter</strong>, which show SQL profile and trace information.</li>
<li><strong>Systhreads</strong>, which keeps track of threads and their wait stats.</li>
</ul>
<h3>MONITORING NETWORK ACTIVITY</h3>
<P>
Three new tables are designed to let a DBA monitor and track user and client-network activity to the IDS server. These tables are:</p>
<ul>
<li><strong>Sysnetclienttype</strong>, which shows an overview of the network activity for each client type.</li>
<li><strong>Sysnetglobal</strong>, which provides an overview of the system network.</li>
<li><strong>Sysnetworkio</strong>, which provides an overview of the system network I/O. </li>
</ul>
<P>
Sysnetclienttype shows all the client types that can attach to a server and how much network traffic is being generated between the client and the server. One client type that will always show up is sqlexec, the type assigned to users running queries, inserts, deletes, and updates. As time goes by and more activity takes place, the columns nc_reads and nc_writes will reflect the number of network reads and writes increasing. Another client type is ontape. With this type, the columns nc_reads and nc_writes will increase to show activity as a backup is performed. Use the Sysnetclienttype table to view overall network activity by type of client connecting to your server. </p>
<P>
The Sysnetglobal table contains one record showing the global settings and overall number of network reads and writes. Sysnetworkio contains the network I/O by session - and, therefore, is the place to turn to find out which session is consuming the most network resources. The columns <code>net_open_time, net_last_read, </code>and<code> net_last_write</code> contain the date and time of the last network activity in Unix time, so you need to convert it to a readable format using the dbinfo function. The following SQL statement will show the date and time of the last network activity by session:</p>
<P>
<code>select</code></p>
<P>
<code> sid,</code></p>
<P>
<code> dbinfo( 'utc_to_datetime' , net_open_time ), </code></p>
<P>
<code> -- Date/time session started net connection</code></p>
<P>
<code>dbinfo( 'utc_to_datetime' , net_last_read ), </code></p>
<P>
<code> -- Date/time session performed last net read </code></p>
<P>
<code>dbinfo( 'utc_to_datetime' , net_last_write ) </code></p>
<P>
<code> -- Date/time session performed last net write from Sysnetworkio;</code></p>
<h3>CAPTURING AND TRACING SQL STATEMENTS</h3>
<P>
The new Sysmaster tables that allow you to capture and trace SQL statements after the statement has been run are among the most exciting. To use these tables, the SQL trace feature must be turned on in the <code>ONCONFIG</code> file with the <code>SQLTRACE</code> parameter or by executing a new dba function task. The three tables are:</p>
<ul>
<li><strong>Syssqltrace</strong>, which shows detailed information about a single SQL statement.</li>
<li><strong>Syssqltrace_info</strong>, which contains information about the SQL profile trace system.</li>
<li><strong>Syssqltrace_iter</strong>, which lists SQL statement iterators. </li>
</ul>
<P>
These are not physical tables but memory locations, so you need to configure the memory amount for storing trace information when you turn tracing on. Once this memory is full, the oldest data is discarded and replaced with current data. Turning the tracing on in the <code>ONCONFIG</code> file or via the new dba function requires four parameters.</p>
<ul>
<li>Level is the amount of detail that you would like to capture; the options are off, low, med, or high. The default is off, which is why no information is captured unless you turn it on.</li>
<li>Ntraces is the number of SQL statements that will be traced and stored in memory before the memory storage is reused. The smallest number is 500, and the maximum depends on the amount of memory you want to use for this. If you enter 1000, the first SQL statement storage area will be reused by SQL statement 1,001.</li>
<li>Size is the maximum size of each trace buffer, in KB from 1 to 100.</li>
<li>Mode indicates whether the trace is for all users (global) or a specific user.</li>
</ul>
<P>
An example setting in your <code>ONCONFIG</code> file would look like this:</p>
<P>
<code>SQLTRACE Level=low,Ntraces=1000,Size=2k,Mode=global</code></p>
<P>
The new dba function can also be used to set this parameter temporarily, which comes in handy when you're trying to debug a problem. This approach keeps trace activity turned off in the <code>ONCONFIG</code> file (in other words, it won't be permanently turned on); you only turn it on when you need it. The function is part of and only works with the new sysadmin database. Here's how to turn trace activity on with the new dba function.</p>
<P>
<code>database sysadmin;</code></p>
<P>
<code>execute function task (&quot;set sql tracing on&quot;, </code></p>
<P>
<code> 1000, &quot;2k&quot;, &quot;high&quot;, &quot;global&quot; );</code></p>
<P>
Once turned on, the next 1,000 (or whatever number is specified in Ntraces) SQL statements will be captured in the sqltrace table. The command onstat -g will read this table and show the configuration settings as well as all SQL statements that have been captured for tracing. Table 1 shows fields for Syssqltrace (see page 45).</p>
<P>
Learning about SQL statements that have been executed on the IDS server can be enlightening. The Syssqltrace table contains the SQL statement, the resources used to execute the SQL, the time it took for the SQL to run, the number of disk, page, and buffer reads and writes, the number of locks used, and the number of sorts and amount of memory used. In addition, it contains the cost the IDS optimizer estimated that it would take to run this SQL. One very interesting benefit of this table is that you can compare the number of rows the IDS optimizer estimated it would return with the number of actual rows returned (<code>sql_estrows vs. sql_actualrows</code>). If there's a big difference between these two numbers, then you know the IDS optimizer doesn't have, or isn't getting, the correct statistics about the number of rows and indexes in tables. This disparity could mean that you need to run update statistics to provide the optimizer with correct figures.</p>
<h3>INFORMATION IS EVERYTHING IN IDS TUNING</h3>
<P>
The ability to get good information about your IDS server and its performance is the key factor in monitoring and tuning. These new Sysmaster tables in IDS 11 provide useful information. Try a few select statements on these new tables in dbaccess or Server Studio to get a feel for the type of information they can provide. </p>
<hr noshade="noshade" width="60%"> 
<P>
<em><strong>Lester Knutsen</strong> &#91;<a href="mailto:lester@advancedatatools.com">lester@advancedatatools.com</a>&#93; is president of Advanced DataTools Corporation, an IBM Informix consulting and training partner specializing in data warehouse development, database design, performance tuning, and Informix training and support. He is president of the Washington D.C. Area Informix User Group, a founding member of the International Informix Users Group, and an IBM Gold Consultant.</em></p>
<hr noshade="noshade" width="60%"> 
<div class="Article_Sidebar_Larger"><h3>Refresher Course</h3>
<P>
The previous Informix DBA column, available online, covered five Sysmaster tables new to IDS 11:</p>
<ul>
<li> <strong>Syscheckpoint</strong>, which keeps track of the last set of checkpoints since the server started</li>
<li><strong>Sysenv</strong>, which shows the environment variables in effect when the server was started</li>
<li><strong>Sysenvses</strong>, which shows the environment variables in effect for user sessions</li>
<li><strong>Sysmgminfo</strong>, which shows Parallel Database Query (PDQ) information</li>
<li><strong>Sysonlinelog</strong>, which keeps track of the online log.</li>
</ul></div>
<hr noshade="noshade" width="60%"> 
<div class="Article_Sidebar_Larger">
<h3>Read &quot;The IDS 11 </h3>
<P>
<a href="/showArticle.jhtml?articleID=202400460">Sysmaster Database</a></p></div>
<hr noshade="noshade" width="60%"> 
<div class="Article_Sidebar_Larger">
<h3>Table 1. Syssqltrace columns. </h3>
<P>
To view an image of the table, <a href="http://i.cmpnet.com/v2.db2mag.com/imgs/2008-issue1/dbt13n1_distdba_tab2_large.jpg" target="_blank">click here</a>.</p>
<P>
<strong><code>sql_id </code></strong>Unique SQL execution ID </p>
<P>
<code><strong>sql_address </strong></code>Address of the statement in the code block </p>
<P>
<code><strong>sql_sid </strong></code>Database session ID of the user running the SQL statement </p>
<P>
<code><strong>sql_uid </strong></code>User ID of the statement running the SQL </p>
<P>
<code><strong>sql_stmttype </strong></code>Statement type </p>
<P>
<code><strong>sql_stmtname </strong></code>Statement type displayed as a word </p>
<P>
<code><strong>sql_finishtime </strong></code>Time this statement completed (Unix) </p>
<P>
<code><strong>sql_begintxtime </strong></code>Time this transaction started </p>
<P>
<code><strong>sql_runtime </strong></code>Statement execution time </p>
<P>
<code><strong>sql_pgreads </strong></code>Number of disk reads for this SQL statement </p>
<P>
<code><strong>sql_bfreads </strong></code>Number of buffer reads for this SQL statement </p>
<P>
<code><strong>sql_rdcache </strong></code>Percentage of time the page was read from the buffer pool </p>
<P>
<code><strong>sql_bfidxreads </strong></code>Number of index page buffer reads </p>
<P>
<code><strong>sql_pgwrites </strong></code>Number of pages written to disk </p>
<P>
<code><strong>sql_bfwrites </strong></code>Number of pages modified and returned to the buffer pool </p>
<P>
<code><strong>sql_wrcache </strong></code>Percentage of time a page was written to the buffer pool </p>
<P>
<code><strong>sql_lockreq </strong></code>Total number of locks required by this SQL statement </p>
<P>
<code><strong>sql_lockwaits </strong></code>Number of times the SQL statement waited on locks </p>
<P>
<code><strong>sql_lockwttime </strong></code>Time the system waited for locks during SQL statement </p>
<P>
<code><strong>sql_logspace </strong></code>Amount of space the SQL statement used in the logical log </p>
<P>
<code><strong>sql_sorttotal </strong></code>Number of sorts that ran for the statement </p>
<P>
<code><strong>sql_sortdisk </strong></code>Number of sorts that ran on disk </p>
<P>
<code><strong>sql_sortmem </strong></code>Number of sorts that ran in memory </p>
<P>
<code><strong>sql_executions </strong></code>Number of times the SQL statement ran </p>
<P>
<code><strong>sql_totaltime </strong></code>Total amount of time spent running the statement </p>
<P>
<code><strong>sql_avgtime </strong></code>Average amount of time spent running the statement </p>
<P>
<code><strong>sql_maxtime </strong></code>Maximum amount of time spent executing the SQL statement </p>
<P>
<code><strong>sql_numiowaits </strong></code>Number of times an I/O operation had to wait </p>
<P>
<code><strong>sql_avgiowaits </strong></code>Average amount of time that the SQL statement had to wait</p>
<P>
<code><strong>sql_totaliowaits </strong></code>Amount of time that the SQL statement had to wait for I/O</p>
<P>
<code><strong>sql_rowspersec </strong></code>Average number of rows (per second) produced </p>
<P>
<code><strong>sql_estcost </strong></code>Cost associated with the SQL statement </p>
<P>
<code><strong>sql_estrows </strong></code>Estimated number of rows returned for the SQL statement </p>
<P>
<code><strong>sql_actualrows </strong></code>Number of rows returned for the SQL statement </p>
<P>
<code><strong>sql_sqlerror </strong></code>SQL error number </p>
<P>
<code><strong>sql_isamerror </strong></code>RSAM/ISAM error number </p>
<P>
<code><strong>sql_isollevel </strong></code>Isolation level of the SQL statement</p>
<P>
<code><strong>sql_sqlmemory </strong></code>Number of bytes needed to execute the SQL statement </p>
<P>
<code><strong>sql_numiterators </strong></code>Number of iterators used by the statement </p>
<P>
<code><strong>sql_database </strong></code>Database name </p>
<P>
<code><strong>sql_numtables </strong></code>Number of tables used in executing the SQL statement </p>
<P>
<code><strong>sql_tablelist </strong></code>List of table names directly referenced in the SQL statement</p>
<P>
<code><strong>sql_statement </strong></code>SQL statement that ran </p>
</div>]]></body></item><item><title><![CDATA[i5/OS DBA: New Derived Key Indexes in DB2 for i5/OS V6R1]]></title><link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=206800753&cid=RSSfeed]]></link><description><![CDATA[Used in moderation, this new DB2 feature can speed data access for even complex SQL.]]></description><pubDate>Mon, 25 Feb 2008 05:00:07 EST</pubDate><keywords><![CDATA[DB2 for i5/OS DBA V6R1, Derived Key Indexes, DB2 Access Plan, DB2 Optimizer, DB2 Key Derivation, DB2 Performance, Derived Indexes]]></keywords><blurb><![CDATA[Used in moderation, this new DB2 feature can speed data access for even complex SQL.]]></blurb><authors><![CDATA[Tom McKinley]]></authors><body><![CDATA[<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/columns/mckinley_tom.jpg" alt="Tom McKinley" class="Image_Float-Left" border="1" height="90" width="90" />A good indexing strategy is the key to good performance. The performance gain indexes provide comes from the ability to identify rows of interest without having to scan all rows of the table on every search.<br /><br />
SQL requests don't always contain simple comparisons of existing columns in tables that a normal index can solve. Users may want to see information that meets some criteria based on a derivation of the data.<br /><br />Take this statement, for example:<br /><br /><code>Select Ordernum, Name, address, quantity, unitprice, quantity*unitprice as order_amount from orders where quantity*unitprice &gt; 1000000</code><br /><br />
The results of the calculation quantity*unitprice isn't an existing column in the orders table. As a result, the access plan selected by the optimizer might scan all rows of the orders table, calculating quantity*unitprice for each row and then comparing the result to 1000000.</p>
<P>
Or it might have an index with Quantity and unit price as key columns. In that case, it could scan the index, calculating quantity*unitprice from the data in each index entry, and comparing that result to 1000000 followed by access to the corresponding selected orders table rows. Both of these access methods may be very expensive.</p>
<P>
A more cost-efficient approach might be to probe an index that had the result of quantity*unitprice as its leading key and only process keys and corresponding rows where that leading key value is &gt; 1000000. Starting with DB2 for i5/OS V6R1, such derivations are supported in the key definition for SQL DDL requests.</p>
<h3>KEY DERIVATION</h3>
<P>
Consider the example above. If the quantity*unitprice expression is commonly used or used for some performance-critical queries, you could create an index that has the result of quantity*unitprice as its leading key column. This index will provide the optimizer with information about the underlying data. If the local selection predicate quantity*unitprice &gt; 1000000 was very selective, the optimizer may choose to probe the index and ask for keys that are &gt; 1000000. Having index probe as an option for highly selective predicates allows the small number of rows to be quickly identified and processed, resulting in much faster response time than the full table or index scan.</p>
<P>
Even if the predicate isn't highly selective, the index may be used to build a list of relative row numbers (RRNs) or a dynamic bitmap that can be used to identify the selected rows in the table. That RRN list or bitmap can be used to schedule many I/Os ahead of time, resulting in less resource usage as well as better response time than the full table scan or random access to a large number of rows via an index.</p>
<P>
The following example shows how to use the new SQL support for a derived index based on the original scenario I presented:</p>
<P>
<code>Create Index Orderamt on orders (quantity * unitprice as grossamt asc)</code></p>
<P>
You can also combine column names and expressions, in which a Create Index statement would look something like this:</p>
<P>
<code>Create Index Ordernumoamt on orders (orderkey asc, quantity * unitprice as grossamt asc)</code></p>
<P>
You can specify a function as part of the derivation. And you can combine those derivations with non-derived-key columns as shown in this example:</p>
<P>
Create Index Startest.item_fact_name_ordkey on startest.item_fact (UPPER( custname) asc, orderkey asc);</p>
<P>
The use of UPPER in this case provides indexed access for case-insensitive searches. Data in character columns is often a mixture of upper and lower case. To consistently find the correct results, it's common to issue a SELECT with a WHERE clause that uses UPPER of the character column (for example, WHERE UPPER(LASTNAME)='MCKINLEY').</p>
<P>
All arithmetic (+, -, *, /, **) and string (CONCAT, SUBSTR) operators are supported inside the column list on a Create Index SQL statement.</p>
<P>
System functions typically used include:</p>
<P>
<code>CHAR QUARTER<br>
</code><code>COALESCE STRIP<br>
</code><code>CONCAT SUBSTR<br>
</code><code>DAY SUBSTRING<br>
</code><code>DAYOFMONTH TRANSLATE<br>
</code><code>DAYOFWEEK TRIM<br>
</code><code>MONTH UPPER<br>
</co