If the onstat -g xqs output shows data skew, use ISA or onstat -g xmf to monitor the high-speed interconnect between coserver nodes for possible bottlenecks.
XMF Information --------------- Cosvr_id: 1 Domain_Cnt: 1 Poll Information: Domain_ID Interval Current Average Cycle Wk_Cycles In_DG/Sig ----------------------------------------------------------------------- 0 10 10 10 621918498 113281 N / N ... Coserver Information: ID X_Msgs X_Bytes R_Msgs R_Bytes X_Rtrns R_Dupls XOffs XO_Cycls ----------------------------------------------------------------------- 1 1529 1109420 1529 1109420 0 0 0 0 2 1244 323321 111658 1399806 3 4 0 0 3 376 315612 334 57054 11 173 0 0
Follow these general guidelines:
Coserver Information displays the following statistics by coserver. The values in these fields indicate whether or not traffic between coservers is balanced and not skewed toward any one coserver:
Because some high-speed interconnects have limited kernel buffer space for each connection, their buffer space might be exhausted when interconnect traffic is heavy. When the buffer space is exhausted, some packets are dropped and must be retransmitted.
If you see a large number of retransmits (X_Rtrns) with a low number of duplicate packets received (R_Dupls), you might adjust the setting of the SENDEPDS configuration parameter. It is usually not worth altering this parameter unless you see a large number of retransmits with a low number of duplicate packets received, which means that packets are being transmitted but are not arriving at the remote end. For more information about the SENDEPDS parameter, consult your machine notes file.
The values in the Average, Cycle, and Wk_Cycles fields indicate how often the database server checks the interconnect without any work to do, as calculated by the following formula:
percent_poll_work = Wk_Cycles / Cycle
A low percentage by itself does not indicate a problem. However, if a query is taking a long time to complete, but the percentage of poll time that results in work is low, a problem exists somewhere. The problem might be that a coserver is down, a data skew exists, or the fragmentation strategy is incorrect. To locate the problem, check the status of the nodes and coservers, the statistics in the onstat -g dfm output, and other onstat utility output.