Data integrity refers to the validity, consistency, and accuracy of the data in a database. It cannot be overstated that the level of accuracy of the information retrieved from the database is in direct proportion to the level of data integrity imposed within the database.
Data integrity is one of the most important aspects of the database design process, and it should not be underestimated, overlooked, or even partially neglected. To make any of these mistakes would result in a high risk of undetectable errors
There are three types of data integrity and are as follows:
Entity Integrity
Referential Integrity
Domain Integrity
This is the “Table-level integrity” which ensures that the field that identifies each record within the table is unique and is never missing its value.
Entity integrity requires the specification of a primary key (PK) for each table.
Key Notes about Primary Key:
Each table can have zero or one primary key.
Primary key should not be Null and if the primary key is a composite key, make sure that each component should not be Null.
Every primary key explicitly defined for a table must be associated with a corresponding unique index.
If you do not create a unique index for a primary key, then an incomplete key is defined for the table, making the table inaccessible.
A unique constraint is similar to a primary key constraint which also enforces unique values on an individual or a group of columns. Each table can have zero, one, or many unique constraints consisting of one or more columns each. The values stored in the unique column, or combination of columns, must be unique within the table. Unique constraint column should not be Null.
A unique index needs to be created on the columns of the unique constraint to ensure uniqueness. The only difference between Primary Key constraint and Unique Constraint is a table can have only one primary key constraint but can have more than one unique constrains.
In addition of creating unique index in primary key column or unique constraint column (which is mandatory), you can create as many unique indexes as we need on any other columns of the table to ensure uniqueness.
The following table will show the difference between unique index on Primary Key/Unique constraint column and other column other than primary/unique constraint column
Primary Key or unique constraint column | Column other than primary/unique constraint but defined with unique index |
---|---|
A table can contain only one primary key constraint and multiple unique constraints | A table can contain multiple unique indexes |
Cannot allow NULL values | Can allow NULL values |
Supports Referential Integrity | Cannot support Referential Integrity |
This is a “Relationship-level integrity” which ensures that the relationship between a pair of tables is sound and that there is synchronization between the two tables whenever data is entered, updated, or deleted.
Referential integrity is achieved through the foreign key.
A foreign key (FK) is a column or combination of columns used to establish and enforce a link between the data in two tables.
A link is created between two tables by adding the column or columns that hold one table's primary key values to the other table. This column becomes a foreign key in the second table.
The table with the primary key is called parent table and the table with the foreign key is called dependent table (or child table).
Referential integrity (RI) means each row in a dependent table must have a foreign key that is equal to a primary key in the parent table.
When inserting a row with a foreign key in the dependent table, DB2 checks the values of the foreign key column against the values of the primary key column in the parent table. If there is a matching primary key column, the insert is allowed. If there is no matching primary key column, the insert will not happen.
A new row can be inserted in the parent table as long as the primary key value of the new row is unique.
When updating foreign key values in the dependent table, DB2 checks whether there is a matching primary key in the parent table or not. If there is a matching primary key, update is allowed. If there is no matching primary key, update is not allowed.
The primary key in the parent table cannot be updated if any rows in the dependent tables refer to it.
Deleting a row with a foreign key in the dependent table is permitted.
When deleting a row with a primary key in the parent table, DB2 takes one of the following actions as indicated while defining the table.
RESTRICT: Disallows the deletion of the primary key row if any foreign keys relate to that row.
CASCADE: Allows the deletion of the primary key row and also deletes the foreign key rows that relate to it.
SET TO NULL: Allows the deletion of the primary key row and, instead of deleting all related foreign key rows, sets the foreign key columns to NULL.
Operation | Child Table | Parent Table |
---|---|---|
Insert | Allowed if the foreign key value matches the value of Primary key of the parent table | Allowed as long as the primary key value is unique |
Update | Allowed if the foreign key value matches the value of Primary key of the parent table | Allowed if there are no foreign key references in the child tables |
Delete | Allowed | Depending upon the action specified during table definition Restrict: not allowed if there are any foreign key references Cascade: allowed and deletes the foreign key references if any in the child tables SET TO NULL: allowed and a Null value will be set in the foreign key references of the child tables |
This is the “Field-level integrity” which ensures that the structure of every field is sound, that the values in each field are valid, consistent, and accurate, and that fields of the same type (such as City fields) are consistently defined throughout the database.
Domain integrity is enforced using:
Data Type and Length
Default Values
NULL Values
Check constraint
By specifying the data type and maximum length for each column when a table is created, the DBMS will automatically ensure that only the correct type of data with the maximum length allowed is stored in that column.
When columns are created within tables, they can be assigned a default value that will be used when inserting or loading the data which do not provide an explicit value for that column. Each column can have only one default value. User can provide a default value for a column. If there is no user specific default value, DB2 will assign the default value based on the data type of that column.
Some columns cannot have a meaningful value in every row. DB2 uses a special value indicator, the null value, to stand for an unknown or missing value. A null value is a special value that DB2 interprets to mean that no data is present.
If you do not specify otherwise,DB2 allows any column to contain null values. Users can create rows in the table without providing a value for the column.
Using the NOT NULL clause enables you to disallow null values in the column. Primary keys must be defined as NOT NULL.
Let see an example to understand clearly
Product ID | Product Desc | Category | SRP | Qty On Hand | Total Value |
---|---|---|---|---|---|
7001 | A | Accessories | 75.00 | ||
7002 | B | 65.00 | 20 | 1,300 | |
7003 | C | Accessories | 36.00 | 33 | 1,118.00 |
7004 | D | Components | 23.50 | 16 | 376.00 |
7005 | E | Bikes | 1,200 | ||
7006 | F | 7.45 | 10 | 74.50 |
If a null value is used in a mathematical expression, that expression will evaluate to Null. In figure Products, Total Value is derived from the mathematical expression "[SRP] * [Qty On Hand]."
Note however, that the value for the Total Value field is missing where the Qty On Hand value is Null, resulting in a null value for the Total Value field as well. This is logically reasonable—if the number is unknown, the value will necessarily be unknown.
Also there is a serious undetected error that occurs if all the values in the Total Value field are then added together: an inaccurate total. The only way to obtain an accurate total is to provide a value for the entries in the Qty On Hand field that are currently null.
The aggregate functions such as COUNT, SUM, AVG, MAX and MIN do not handle NULL in the same way as ordinary functions and operators. Instead of returning NULL, they only take non-NULL fields into consideration while computing the result.
For example, Sum(Qty On Hand) returns the value 79 (20+33+16+10). If all the six rows had been summed, the result would have been NULL.
There is one exception to this rule: COUNT(*) returns the count of all rows, even rows whose fields are NULL. But COUNT(FieldName) behaves like the other aggregate functions in that it only counts rows where the specified field is not NULL. COUNT(*) and COUNT(FieldName) never return NULL.If there are no rows in the output, both functions return 0. COUNT(FieldName) also returns 0 if all FieldName fields in the output are NULL.
The following figure shows the results of a summary query that counts the total number of occurrences of each category in the PRODUCTS table shown above. The value of Total Occurrences in the summary query is the result of the function expression "Count([Total Occurrences])." Notice that the summary query shows 0 occurrences of an unspecified Category, implying that each product has been assigned a category. This information is clearly inaccurate because there are two products in the PRODUCTS table that have not been assigned a category.
Category | Total Occurrences |
---|---|
0 | |
Accessories | 2 |
Bikes | 1 |
Components | 1 |
A check constraint is a rule that specifies the values that are allowed in one or more columns of every row of a table. Check constraint enforces business rules directly into the database without requiring additional application logic. This can be defined during column definition.
If you have any doubts or queries related to this chapter, get them clarified from our Mainframe experts on ibmmainframer Community!