Fault Tolerance

Home | Previous Page | Next Page The Database Server > Introducing the Database Server > Extended Parallel Server >

Fault Tolerance

The database server uses the following logging and recovery mechanisms to protect data integrity and consistency in the event of an operating-system or media failure:

Storage-space and logical-log backups
External backup and restore
Fast recovery
Point-in-time restore
Restartable restore
Mirroring
Data replication

Storage Space and Logical-Log Backups of Transaction Records

The database server manages data in logical storage units called storage spaces. The most common storage space is the dbspace, in which the database server stores traditional data such as integer, decimal and floating-point numbers, fixed-length or variable-length character strings, and so forth.

The database server stores transaction records and changes to the database server in logical-log files. You can back up storage spaces and logical-log files while users are accessing databases.

You can also create incremental backups online. Incremental backups enable you to back up only data that has changed since the last backup, which reduces the amount of time required to back up and restore your data and logical logs.

After a med ia failure, if critical data was not damaged (and the database server remains in online mode), you can restore only the data that was on the failed media, leaving other data available during the restore.

If the failure causes the database server to go offline, you can restore all data, including the critical data and the root dbspace. An offline restore is called a cold restore.

For more information on backup and restore, refer to the IBM Informix: Backup and Restore Guide.

Backup Verification

If you use ON-Bar as your backup tool, you might want to verify the completeness and consistency of a storage-space backup. After you successfully verify a storage-space backup, you can restore it safely. If ON-Bar indicates problems with the backup, contact IBM Technical Support. The IBM Informix: Backup and Restore Guide explains how to use ON-Bar to verify backups.

On Extended Parallel Server, you can issue a single command that verifies the data after it is backed up.

External Backup and Restore

An external backup allows you to make copies of disks that contain storage spaces without using ON-Bar. Later on, you can use an external restore to restore these disks to the database server without using ON-Bar or the IBM Informix Storage Manager. External backups are especially useful if your site has special hardware or software that allows rapid copying of data directly to and from your primary data disks. The IBM Informix: Backup and Restore Guide explains external backup and restore.

Fast Recovery

When the database server starts up, it checks if the physical log is empty because that implies that it shut down in a controlled fashion. If the physical log is not empty, the database server automatically performs an operation called fast recovery. Fast recovery automatically restores databases to a state of physical and logical consistency after a system failure that might have left one or more transactions uncommitted. During fast recovery, the database server uses its logical log and physical log to perform the following operations:

Restore the databases to their state at the last checkpoint
Roll forward all committed transactions since the last checkpoint
Roll back any uncommitted transactions

The database server spawns multiple threads to work in parallel during fast recovery. For a detailed explanation of fast recovery, refer to Checkpoints and Fast Recovery.

Point-in-Time Restore

After the time of the backup, you can restore data from backup media to a specified point in time. This feature enables you to restore a corrupted database to a point at which you know that the data was reliable. For more information, refer to the IBM Informix: Backup and Restore Guide.

Mirroring

When you use disk mirroring, the database server writes data to two locations. Mirroring eliminates data loss due to storage device failures. If mirrored data becomes unavailable for any reason, the mirror of the data is available immediately and transparently to users.

The database server relies on the operating system for bad-sector mapping. When the database server confirms the failure of a disk, it suspends I/O operations on that chunk. If the chunk has been mirrored, the database server directs I/O requests to the mirror. If an unmirrored chunk containing critical information fails, the database server shuts down immediately. Critical information includes logical-log files, the physical log, and the root dbspace. The chunks on which this data resides are referred to as critical dbspaces.

Important:

It is recommended that you mirror critical dbspaces that contain logical-log files, the physical log, and the root dbspace.

For more information about mirroring and critical data, refer to Mirroring.