Solaris

Solaris Recipe

  1. CPU core(s) should not be consistently saturated.
  2. Program memory should not page out of RAM.
  3. Input/Output interfaces such as network cards and disks should not be saturated, and should not have poor response times.
  4. TCP/IP and network tuning, whilst sometimes complicated to investigate, may have dramatic effects on performance.
  5. Operating system level statistics and optionally process level statistics should be periodically monitored and saved for historical analysis.
  6. Review operating system logs for any errors, warnings, or high volumes of messages.
  7. Review snapshots of process activity, and for the largest users of resources, review per thread activity.
  8. If the operating system is running in a virtualized guest, review the configuration and whether or not resource allotments are changing dynamically.
  9. If there is sufficient network capacity for the additional packets, consider reducing the default TCP keepalive timer (tcp_keepalive_interval) from 2 hours to a value less than intermediate device idle timeouts (e.g. firewalls).
  10. Test disabling delayed ACKs

Also review the general topics in the Operating Systems chapter.

General

Check the system log for any warnings, errors, or repeated informational messages.

# less /var/adm/messages

Query the help manual for a command:

$ man vmstat # By default, contents are sent to more
$ man -a malloc # There may be multiple manuals matching the name. Use -a to show all of them.

An Analysis of Performance, Scaling, and Best Practices for IBM WebSphere Application Server on Oracle's SPARC T -Series Servers: http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/ibm-websphere-sparc-t5-2332327.pdf

Review the Solaris tuning in the latest SPECjEnterprise results submitted by Oracle:

The Solaris Management Console (smc) is no longer supported in recent releases: http://docs.oracle.com/cd/E26502_01/html/E29010/gltfb.html

Processes

Query basic process information:

$ ps -elf | grep java
 F S      UID   PID  PPID   C PRI NI     ADDR     SZ    WCHAN    STIME      TIME CMD
 0 S noaccess  1089     1   0  40 20        ?  15250        ?   Jan 28 ?  339:02 /usr/java/bin/java -server -Xmx128m...

By default, the process ID (PID) is the number in the fourth column. You can control which columns are printed and in which order using -o.

The built-in ps command may not show the entire command line. An alternative ps is often available:

$ /usr/ucb/ps auxwww

Central Processing Unit (CPU)

Query physical processor layout:

# psrinfo -pv
The physical processor has 16 cores and 128 virtual processors (0-127)
The core has 8 virtual processors (0-7)...

# prtdiag -v
Memory size: 2GB        
CPU  Freq      Size        Implementation         Mask    Status      Location
0    1503 MHz  1MB         SUNW,UltraSPARC-IIIi    3.4    on-line     MB/P0
1    1503 MHz  1MB         SUNW,UltraSPARC-IIIi    3.4    on-line     MB/P1...

Ensure there are no errant processes using non-trivial amounts of CPU.

vmstat

Query processor usage:

$ vmstat 5 2
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s3 s5 s7 --   in   sy   cs us sy id
 0 0 0 4415400 739680 77 859 5  3  4  0  8 -0  3 -1  0  325 1634  476  2  2 96
 0 0 0 4645936 1232224 3  5  0  0  0  0  0  0  0  0  0  285  349  274  0  1 99

The documentation on the first line of vmstat is unclear:

Without options, vmstat displays a one-line summary of the virtual memory activity since the system was booted. (http://docs.oracle.com/cd/E19683-01/816-0211/6m6nc67ac/index.html)

Experimentation shows that, with options (such as interval or count), the first line also displays statistics since the system was booted:

# vmstat
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s3 s5 s7 --   in   sy   cs us sy id
 0 0 0 3932200 329624 79 857 1  1  1  0  2 -0  3 -0  0  351 1970  764  2  3 95
# vmstat 5
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s3 s5 s7 --   in   sy   cs us sy id
 0 0 0 3932184 329616 79 857 1  1  1  0  2 -0  3 -0  0  351 1970  764  2  3 95
 0 0 0 3527808 70608 2780 25799 3 2 2 0  0  0  2  0  0  445 14699 2383 15 44 41
 0 0 0 3527784 70728 2803 26009 0 0 0 0  0  0  0  0  0  430 14772 2387 15 44 42

Example to capture vmstat in the background:

INTERVAL=1; FILE=vmstat_`hostname`_`date +"%Y%m%d_%H%M"`.txt; date > ${FILE} && echo "VMSTAT_INTERVAL = ${INTERVAL}" >> $FILE && nohup vmstat ${INTERVAL} >> $FILE &

Per processor utilization

Query per-processor utilization:

$ mpstat 5 2
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0  425   0  115    34   26  202    7   51   14    0   838    2   2   0  96
  1  434   0   98   290  185  274    5   52   16    0   797    2   2   0  96
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    1    15    9   93    3   21    0    0   159    0   0   0 100
  1    2   0    3   280  175  181    2   22    0    0   172    0   0   0  99...

pgstat

pgstat: http://docs.oracle.com/cd/E23824_01/html/821-1462/pgstat-1m.html

prstat

By default, prstat prints the damped average % CPU statistics for processor usage by individual processes or threads. Without arguments, prstat will periodically update the screen with relatively accurate 'average' information (this may be at variance with data returned from vmstat due to the difference in how it's calculated):

$ prstat

Although the prstat documentation does not explicitly mention this, by default, the reported CPU usage is decayed over time. This can be confirmed with the Java program at https://raw.githubusercontent.com/kgibm/problemdetermination/master/scripts/java/ConsumeCPU.java. For example, if a Java program uses 50% CPU from time T1 to time T2 (after which its CPU usage goes to approximately 0), and you start to take prstat at time T2, the first iteration will report about 50%, and the second iteration may report a decayed value, and so on in the following iterations. Therefore, prstat may not show the "current" processor usage of processes but may include some historical processor usage.

Use the -mv options to gather accurate interval-based statistics:

$ prstat -mv

For example, use prstat in micro-stat mode with the following options -mv for detailed, interval-accurate statistics, -n to limit the number of processes to report, and an interval and iteration count to print in batch mode:

$ prstat -mvcn ${MAXPROCESSES} ${INTERVAL} ${ITERATIONS}
$ prstat -mvcn 5 10 3
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
 26649 root     5.9  17 1.0  12  45 0.0  19 0.1  2K  84 47K   0 prstat/1
 26237 root     0.3 0.1 0.0 0.7 1.3 0.0  98 0.0  72   5 493   0 sshd/1...

The first iteration of prstat includes CPU data from before the start of prstat. In general, for "current" processor usage, review the second and subsequent iterations.

Be careful of relying upon any interpretation of prstat without it operating in -m 'micro-stat' mode, since there is no accurate timebase to the intervals against which percentage calculations can ever be accurately maintained.

Per-thread CPU usage

Use the -L flag along with -p $PID to display accumulated CPU time and CPU usage by thread (light-weight process [LWP]):

$ prstat -mvcLn ${MAXTHREADS} -p ${PID} ${INTERVAL} ${ITERATIONS}
$ prstat -mvcLn 50 -p 1089 10 12
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID      
  1089 noaccess  119M  100M sleep   59    0   3:12:24 0.0% java/14
  1089 noaccess  119M  100M sleep   59    0   1:55:58 0.0% java/35
  1089 noaccess  119M  100M sleep   59    0   0:00:00 0.0% java/38
  1089 noaccess  119M  100M sleep   59    0   0:00:00 0.0% java/36...

prstat -L for threads has similar behavior to prstat for processes. Without -mv, it reports damped average % CPU. With -mv, the first iteration includes CPU data from before the start of prstat.

CPU Statistics

Query available CPU statistics:

# cpustat -h
...
    event specification syntax:
    [picn=]<eventn>[,attr[n][=<val>]][,[picn=]<eventn>[,attr[n][=<val>]],...]

    event0:  Cycle_cnt Instr_cnt Dispatch0_IC_miss IC_ref DC_rd DC_wr...
    event1:  Cycle_cnt Instr_cnt Dispatch0_mispred EC_wb EC_snp_cb...

Query CPU statistics:

# cpustat -c EC_ref,EC_misses 5 2
   time cpu event      pic0      pic1
  5.011   0  tick   2037798     90010
  5.011   1  tick   1754067     85031
 10.011   1  tick   2367524    101481
 10.011   0  tick   4272952    195616
 10.011   2 total  10432341    472138

The cputrack command is basically the same as cpustat but works on a per-process level.

Interrupts

Interrupt statistics can be queried with intrstat:

$ intrstat 5 2

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       bge#0 |         0  0.0         4  0.0
       glm#0 |         3  0.0         0  0.0
      uata#0 |         0  0.0         0  0.0

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       bge#0 |         0  0.0         8  0.0
       glm#0 |        23  0.0         0  0.0
      uata#0 |         0  0.0         0  0.0...

Query interrupts per device:

$ vmstat -i
interrupt         total     rate
--------------------------------
clock        3244127300      100
--------------------------------
Total        3244127300      100

Hardware Encryption

Recent versions of the IBM SDK that run on Solaris support the hardware encryption capabilities of the Ultra-SPARC T2 CMT processor through the IBMPKCS11Impl security provider which is the first in the java.security provider list:

Physical Memory (RAM)

Program memory should not page out of RAM. This can be monitored with the api, apo, and apf columns in vmstat -p. For example:

# vmstat -p 5 3
     memory           page          executable      anonymous      filesystem
   swap  free  re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  fpf
 4902128 1116760 76 851 1   0   0    0    0    0    0    0    0    0    1    1
 4304784 931536 25 31   0   0   0    0    0    0    0    0    0    0    0    0
 4304560 931320 447 5117 0  0   0    0    0    0    0    0    0    0    2    0

The first line of output is a set of statistics from boot and can usually be discarded.

Monitoring Swap Resources: http://docs.oracle.com/cd/E23824_01/html/821-1459/fsswap-52195.html

Input/Output (I/O)

Query disk usage:

$ df -h
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c1t0d0s0       63G    60G   3.3G    95%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   4.4G   1.6M   4.4G     1%    /etc/svc/volatile
fd                       0K     0K     0K     0%    /dev/fd
swap                   4.5G    49M   4.4G     2%    /tmp
swap                   4.4G    56K   4.4G     1%    /var/run...

When encountering "too many open files" ulimit issues use:

lsof -p <pid>

Use iostat for basic disk monitoring. For example:

$ iostat -xtcn 5 2
   tty         cpu
 tin tout  us sy wt id
   0    1   2  2  0 96
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    1.1   0   0 c0t0d0
    0.5    2.8    4.8    8.6  0.0  0.1    0.0   18.6   0   1 c1t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c1t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    4.6   0   0 wassun1:vold(pid463)
   tty         cpu
 tin tout  us sy wt id
   0   98   0  0  0 99
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d0
    0.0    2.4    0.0    7.0  0.0  0.0    0.0   19.3   0   1 c1t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c1t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 wassun1:vold(pid463)...

An alternative is fsstat:

$ fsstat -F
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   set    ops   ops   ops bytes   ops bytes
7.11M 3.03M  632K 45.6G 1.97M  35.9G 90.5M 3.97G 1.35T  906M  241G ufs
    0     0     0 1.48G     0  1.43G 13.0M  723M  254G 46.9K 8.20M proc
    0     0     0   255     0     25    22     0     0     0     0 nfs
    0     0     0     0     0      0     0     0     0     0     0 zfs
    0     0     0  785M     0      0     0     0     0     0     0 lofs
 239M 13.5M  225M  272M  105K   549M 23.9K  209K  362M  226M 91.6G tmpfs
    0     0     0 10.3M     0      0     0    30 4.27K     0     0 mntfs
    0     0     0     0     0      0     0     0     0     0     0 nfs3
    0     0     0     0     0      0     0     0     0     0     0 nfs4
    0     0     0   488     0     28    19     0     0     0     0 autofs

Query swap usage:

$ swap -s
total: 876400k bytes allocated + 45488k reserved = 921888k used, 4645872k available

Zettabyte File System (ZFS)

Consider isolating the ZFS intent log to a separate disk.

Networking

Query socket information:

$ netstat -an
TCP: IPv4
   Local Address      Remote Address      Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
      *.32772           *.*                0         0 49152      0    LISTEN
127.0.0.1.32833         127.0.0.1.32794   32768      0 32768      0 ESTABLISHED...

When running into "too many open files" use

netstat -an | grep ESTA | wc -l

Query socket statistics periodically:

$ netstat -i 5 2
    input   bge0      output       input  (Total)    output
packets   errs  packets errs  colls  packets   errs  packets  errs  colls
122009930   0  7978053 0     0      152528566   0   38496689 0     0     
33          0         6 0     0      33          0   6        0     0     ...

Starting with Solaris 11, use dlstat for network utilization (http://docs.oracle.com/cd/E23824\_01/html/821-1458/ggjew.html):

# dlstat -r -i 1
   LINK   IPKTS  RBYTES   INTRS   POLLS   CH<10 CH10-50   CH>50
e1000g0 101.91K  32.86M  87.56K  14.35K   3.70K     205       5
  nxge1   9.61M  14.47G   5.79M   3.82M 379.98K  85.66K   1.64K
  vnic1       8     336       0       0       0       0       0
e1000g0       0       0       0       0       0       0       0
  nxge1  82.13K 123.69M  50.00K  32.13K   3.17K     724      24
  vnic1       0       0       0       0       0       0       0

# dlstat -t -i 5
   LINK   OPKTS  OBYTES  BLKCNT UBLKCNT
e1000g0  40.24K   4.37M       0       0
  nxge1   9.76M 644.14M       0       0
  vnic1       0       0       0       0
e1000g0       0       0       0       0
  nxge1  26.82K   1.77M       0       0
  vnic1       0       0       0       0

Query detailed socket statistics:

# netstat -s
TCP    tcpRtoAlgorithm     =     4    tcpRtoMin           =   400
    tcpRtoMax           = 60000    tcpMaxConn          =    -1
    tcpActiveOpens      = 2162575    tcpPassiveOpens     = 349052
    tcpAttemptFails     = 1853162    tcpEstabResets      = 19061...

Ping a remote host. In general, and particularly for LANs, ping times should be less than a few hundred milliseconds with little standard deviation.

$ ping -ns 10.20.30.1
PING 10.20.30.1 : 56 data bytes
64 bytes from 10.20.30.1: icmp_seq=0. time=77.9 ms
64 bytes from 10.20.30.1: icmp_seq=1. time=77.2 ms
64 bytes from 10.20.30.1: icmp_seq=2. time=78.3 ms
64 bytes from 10.20.30.1: icmp_seq=3. time=76.9 ms

snoop

Capture network packets using snoop (http://www-01.ibm.com/support/docview.wss?uid=swg21175744, http://docs.oracle.com/cd/E23823_01/html/816-5166/snoop-1m.html).

Capture all traffic:

$ su
# nohup snoop -r -o capture`hostname`_`date +"%Y%m%d_%H%M"`.snoop -q -d ${INTERFACE} &
# sleep 1 && cat nohup.out # verify no errors in nohup.out

Use Wireshark to analyze the network packets gathered (covered in the Major Tools chapter).

Use -s to only capture part of the packet.

snoop does not have built-in support for log rollover.

Kernel

List available kernel statistics:

# kstat -l
bge:0:bge0:brdcstrcv
bge:0:bge0:brdcstxmt...

Query kernel statistics:

# kstat -p -m cpu_stat -s 'intr*'
cpu_stat:0:cpu_stat0:intr    1118178526
cpu_stat:0:cpu_stat0:intrblk    122410
cpu_stat:0:cpu_stat0:intrthread    828519759
cpu_stat:1:cpu_stat1:intr    823341771
cpu_stat:1:cpu_stat1:intrblk    1671216
cpu_stat:1:cpu_stat1:intrthread    1696737858

KSSL

On older versions of Solaris and older programs linked with older libraries, you may need to enable the KSSL kernel module, if available, to fully utilize hardware encryption (e.g. TLS performance): http://docs.oracle.com/cd/E19253-01/816-5166/6mbb1kq5t/index.html

truss

Truss can be used to attach to a process and print which kernel/system calls are being made:

# truss -p ${PID}

Warning: truss can have a large performance effect when used without filters.

Modifying Kernel Parameters

Some kernel parameters can be set by modifying the /etc/system file and rebooting (http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter1-9.html). For example:

set lim_fd_max = 10000

Some networking parameters can be set using the ipadm set-prop command. These updates are persisted on reboot (unless the -t option is specified). For example:

# ipadm set-prop -p _time_wait_interval=15000 tcp

ipadm command: http://docs.oracle.com/cd/E26502_01/html/E29031/ipadm-1m.html

The ipadm command replaces the "ndd" command in recent versions of Solaris: http://docs.oracle.com/cd/E26502_01/html/E28987/gmafe.html

Note that Solaris 11 changed the names of some of the network tunable parameters: http://docs.oracle.com/cd/E26502_01/html/E29022/appendixa-28.html

Networking

Update the TIME_WAIT timeout to 15 seconds by running # ipadm set-prop -p _time_wait_interval=15000 tcp (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunesolaris.html)

Update the FIN_WAIT_2 timeout to 67.5 seconds by running # ipadm set-prop -p tcp_fin_wait_2_flush_interval=67500 tcp (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunesolaris.html)

Update the TCP keepalive interval to 15 seconds by running # ipadm set-prop -p _keepalive_interval=15000 tcp (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunesolaris.html, http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter4-31.html)

Update the TCP listen backlog to 511 by running # ipadm set-prop -p _conn_req_max_q=511 tcp (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunesolaris.html)

Update the maximum send and receive buffer sizes to 4MB by running # ipadm set-prop -p max_buf=4194304 tcp (http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter4-31.html)

Update the maximum value of the TCP congestion window to 2MB by running # ipadm set-prop -p _cwnd_max=2097152 tcp (http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter4-31.html)

Update the default send window size to 1MB by running # ipadm set-prop -p send_buf=1048576 tcp (http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter4-31.html)

Update the default receive window size to 1MB by running # ipadm set-prop -p recv_buf=1048576 tcp (http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter4-31.html)

Process Limits

Update the maximum file descriptors to 10,000 by updating these lines in /etc/system and rebooting (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunesolaris.html):

set lim_fd_max = 10000
set rlim_fd_cur = 10000

dtrace

Dtrace is a very powerful, dynamic tracing tool. For more information, see http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Intro

Sample 5-level user stack traces for Java processes:

# dtrace -n 'profile-1001 /execname == "java"/ { @[ustack(5)] = count(); }'

Print a stack trace any time a function is called:

# dtrace -n 'syscall::read:entry /execname == "bash"/ { ustack(); }'

List probes:

# dtrace -ln 'proc:::'

Useful scripts:

DTrace scripts sometimes refer to time in Hertz. To convert: secs = 1/hertz

FlameGraphs

# git clone https://github.com/brendangregg/FlameGraph
# cd FlameGraph
# dtrace -x ustackframes=100 -n 'profile-99 /arg1/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.stacks
# ./stackcollapse.pl out.stacks > out.folded
# ./flamegraph.pl out.folded > out.svg

Logical Domains, Zones, and Processor Sets/Pinning

Logical domains, or LDOMs, are a way to virtualize the physical hardware to partition it into multiple guest operating system instances. List domains: ldm list-bindings

Non-global zones, or containers, are a way to virtualize an operating system instance further while sharing the base operating system image and runtime (the parent global zone).

Zones can be used to accomplish processor sets/pinning using resource pools. In some benchmarks, one JVM per zone can be beneficial.

  • First, stop the non-global zone
  • List zones: zoneadm list -vi
  • Enable resource pools: svcadm enable pools
  • Create resource pool: poolcfg -dc 'create pool pool1'
  • Create processor set: poolcfg -dc 'create pset pset1'
  • Set the maximum CPUs in a processor set: poolcfg -dc 'modify pset pset1 (uint pset.max=32)'
  • Add virtual CPU to a processor set: poolcfg -dc "transfer to pset pset1 (cpu $X)"
  • Associate a resource pool with a processor set: poolcfg -dc 'associate pool pool1 (pset pset1)'
  • Set the resource set for a zone: zonecfg -z zone1 set pool=pool1
  • Restart the zone: zoneadm -z zone1 boot
  • Save to /etc/pooladm.conf: pooladm -s
  • Display processor sets: psrset
  • Show the processor set a process is associated with (PSET column): ps -e -o pid,pset,comm