Managing Checkpoints

Home | Previous Page | Next Page Tuning Configuration for Resource Usage > Tuning Background-Process Parameters >

Managing Checkpoints

To manage checkpoints efficiently, the database server designates a set of fuzzy buffers that can be flushed in the background between checkpoints, thus reducing the number of buffers that must be flushed during checkpoints.

A checkpoint occurs in the following circumstances:

When the checkpoint interval that you specify elapses
To specify checkpoint intervals in terms of time or the number of logical log records created, set the configuration parameters described in Specifying the Interval Between Checkpoints.
When the most recent checkpoint record is contained in the logical-log file that will next become the current log file
For information, see Managing the Logical Log.
When the physical log reaches 75 percent of its allocated size
For information, see Managing the Physical Log.

Make sure that pages are cleaned often enough for the sqlexec thread that executes a query or transaction to find available pages in shared memory buffers. If the sqlexec thread cannot find available pages in the buffer pool, it writes its data to disk (a foreground write) and waits for buffer pages to be freed.

Foreground writes should be eliminated or kept to a minimum. If foreground writes occur, increase the number of LRU queues or increase the size of the buffer pool. For information, see Specifying the Number of Least Recently Used Queues. To monitor the frequency of foreground writes, use xctl onstat -F. Use onstat -g ckp for checkpoint data including fuzzy writes.

The output of this command includes the following information:

The checkpoint record number and the duration of the checkpoint
The number of buffers flushed
The number of fuzzy buffers flushed if any remain from the set that should have been flushed in the background between checkpoints
The number of buffers not flushed
The number of pages in the physical log
The number of log pages and log records written since the last checkpoint
The number of completed transactions since the last checkpoint
The number of open transactions at the checkpoint

At checkpoints, the page cleaners should be writing to chunks. Generally, database servers that run OLTP applications should generate higher numbers in the LRU Writes column and database servers that run DSS applications should generate higher numbers in the Chunk Writes column.

Specifying the Interval Between Checkpoints

You adjust the interval between checkpoints primarily to manage tradeoffs between performance and fast recovery after emergency shutdown. Backups, restores, and fast recovery after emergency shutdown take less time if checkpoints occur often, but transaction performance improves if frequently used pages in buffers are flushed to disk less often.

To specify the interval between checkpoints, use the CKPTINVTL configuration parameter. Specify this interval as a number of seconds.

In most instances, fuzzy checkpoints are performed instead of full checkpoints to improve transaction throughput.

Fuzzy checkpoints are faster than full checkpoints because the database server flushes fewer pages to disk. Because fuzzy check-points take less time to complete, the database server returns more quickly to processing queries and transactions. All changes to the data since the last full checkpoint or fast recovery are recorded in the logical log. In an emergency, fast recovery and rollback can return the database to a consistent state.
Full checkpoints flush all dirty pages in the buffer pool to disk to ensure that the database is physically consistent. The database server can skip a checkpoint if all data is physically consistent at the check-point time.

For information about when fuzzy checkpoints are performed and when full checkpoints are performed, refer to the discussion of checkpoints in the IBM Informix: Extended Parallel Server Administrator's Guide.

The database server writes a message to the message log to note the time that it completes a checkpoint. To read these messages, use onstat -m. Checkpoints also occur whenever the physical log becomes 75 percent full. However, with fuzzy checkpoints the physical log does not fill as rapidly because fuzzy operations in pages are not physically logged. If you set CKPTINTVL to a long interval, you can use physical-log capacity to trigger checkpoints based on actual database activity instead of at a fixed interval. Nevertheless, a long checkpoint interval can increase the time that is needed for recovery if the system fails.

Depending on your throughput and data-availability requirements, you can choose an initial checkpoint interval of five, ten, or fifteen minutes, with the understanding that checkpoints might occur more often if physical-logging activity requires them because checkpoints must occur whenever the physical log becomes 75 percent full. The default interval is five minutes.

Consider that a normal OLTP workload requires about one minute of recovery time for about five minutes of work. A very heavy OLTP workload might require one minute of recovery time for only three minutes of work. DSS workloads, because they are read-intensive, can achieve a one-minute recovery time with less frequent checkpoints.

Specifying the Number of Least Recently Used Queues

The buffer pool is distributed among least-recently-used (LRU) queues. Each LRU queue consists of a set of dirty pages and a set of unchanged pages. The dirty pages are flushed to disk either by a background flusher thread or during a checkpoint.

The LRUS configuration parameter specifies the number of LRU queue pairs to set up within the shared-memory buffer pool. Configuring more LRU queues allows more page cleaners to operate. Unless you also increase the size of the BUFFERS parameter correspondingly, when you increase the number of LRU queues, you also reduce the size of each queue. For a single-processor system, the recommended setting for the LRUS parameter is a minimum of 4. For multiprocessor systems, set the LRUS parameter to a minimum of 4 or 4 * NUMCPUVPS, whichever is greater.

For information about the function and structure of the LRU queue pairs, see the IBM Informix: Extended Parallel Server Administrator's Reference.

Tuning Write-Cache Rates

To increase write-cache rates and bring them up to at least 85 percent, adjust the LRU_MAX_DIRTY and LRU_MIN_DIRTY configuration parameters. For information about increasing the corresponding read-cache rate, see Tuning BUFFERS to Improve the Read-Cache Rate.

The LRU_MAX_DIRTY and LRU_MIN_DIRTY configuration parameters to control how often pages are flushed to disk between full checkpoints.

To monitor the percentage of dirty pages in LRU queues, use xctl onstat -R. If the number of dirty pages consistently exceeds the LRU_MAX_DIRTY limit, you have too few LRU queues or too few page cleaners. First use the LRUS parameter to increase the number of LRU queues. If the percentage of dirty pages still exceeds LRU_MAX_DIRTY, use the CLEANERS parameter to increase the number of page cleaners.

Specifying the Number of Checkpoint Page Cleaner Threads

The CLEANERS configuration parameter specifies the number of page-cleaner threads to run during checkpoints. Because the database server writes to chunks during checkpoints, the number of cleaners is determined by the average number of chunks to which checkpoints write.

For installations that support fewer than 20 disks, it is recommended one page-cleaner thread for each disk that contains database server data. For installations that support between 20 and 100 disks, it is recommended one page-cleaner thread for every two disks. For larger installations, it is recommended one page-cleaner thread for every four disks.

Tip:

The CKPTINTVL, LRU_MIN_DIRTY and LRU_MAX_DIRTY configuration parameters can be tuned with the onutil SET command while the server is running.