Solaris
Solaris Recipe
- CPU core(s) should not be consistently saturated.
- Program memory should not page out of RAM.
- Input/Output interfaces such as network cards and disks should not be saturated, and should not have poor response times.
- TCP/IP and network tuning, whilst sometimes complicated to investigate, may have dramatic effects on performance.
- Operating system level statistics and optionally process level statistics should be periodically monitored and saved for historical analysis.
- Review operating system logs for any errors, warnings, or high volumes of messages.
- Review snapshots of process activity, and for the largest users of resources, review per thread activity.
- If the operating system is running in a virtualized guest, review the configuration and whether or not resource allotments are changing dynamically.
- If there is sufficient network capacity for the additional packets, consider reducing the default TCP keepalive timer (tcp_keepalive_interval) from 2 hours to a value less than intermediate device idle timeouts (e.g. firewalls).
- Test disabling delayed ACKs
Also review the general topics in the Operating Systems chapter.
General
Check the system log for any warnings, errors, or repeated informational messages.
# less /var/adm/messages
Query the help manual for a command:
$ man vmstat # By default, contents are sent to more
$ man -a malloc # There may be multiple manuals matching the name. Use -a to show all of them.
An Analysis of Performance, Scaling, and Best Practices for IBM WebSphere Application Server on Oracle's SPARC T -Series Servers: http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/ibm-websphere-sparc-t5-2332327.pdf
Review the Solaris tuning in the latest SPECjEnterprise results submitted by Oracle:
The Solaris Management Console (smc) is no longer supported in recent releases: http://docs.oracle.com/cd/E26502_01/html/E29010/gltfb.html
Processes
Query basic process information:
$ ps -elf | grep java
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TIME CMD
0 S noaccess 1089 1 0 40 20 ? 15250 ? Jan 28 ? 339:02 /usr/java/bin/java -server -Xmx128m...
By default, the process ID (PID) is the number in the fourth column. You can control which columns are printed and in which order using -o.
The built-in ps
command may not show the entire command
line. An alternative ps
is often available:
$ /usr/ucb/ps auxwww
Central Processing Unit (CPU)
Query physical processor layout:
# psrinfo -pv The physical processor has 16 cores and 128 virtual processors (0-127) The core has 8 virtual processors (0-7)... # prtdiag -v Memory size: 2GB CPU Freq Size Implementation Mask Status Location 0 1503 MHz 1MB SUNW,UltraSPARC-IIIi 3.4 on-line MB/P0 1 1503 MHz 1MB SUNW,UltraSPARC-IIIi 3.4 on-line MB/P1...
Ensure there are no errant processes using non-trivial amounts of CPU.
vmstat
Query processor usage:
$ vmstat 5 2
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s3 s5 s7 -- in sy cs us sy id
0 0 0 4415400 739680 77 859 5 3 4 0 8 -0 3 -1 0 325 1634 476 2 2 96
0 0 0 4645936 1232224 3 5 0 0 0 0 0 0 0 0 0 285 349 274 0 1 99
The documentation on the first line of vmstat is unclear:
Without options, vmstat displays a one-line summary of the virtual memory activity since the system was booted. (http://docs.oracle.com/cd/E19683-01/816-0211/6m6nc67ac/index.html)
Experimentation shows that, with options (such as interval or count), the first line also displays statistics since the system was booted:
# vmstat
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s3 s5 s7 -- in sy cs us sy id
0 0 0 3932200 329624 79 857 1 1 1 0 2 -0 3 -0 0 351 1970 764 2 3 95
# vmstat 5
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s3 s5 s7 -- in sy cs us sy id
0 0 0 3932184 329616 79 857 1 1 1 0 2 -0 3 -0 0 351 1970 764 2 3 95
0 0 0 3527808 70608 2780 25799 3 2 2 0 0 0 2 0 0 445 14699 2383 15 44 41
0 0 0 3527784 70728 2803 26009 0 0 0 0 0 0 0 0 0 430 14772 2387 15 44 42
Example to capture vmstat in the background:
INTERVAL=1; FILE=vmstat_`hostname`_`date +"%Y%m%d_%H%M"`.txt; date > ${FILE} && echo "VMSTAT_INTERVAL = ${INTERVAL}" >> $FILE && nohup vmstat ${INTERVAL} >> $FILE &
Per processor utilization
Query per-processor utilization:
$ mpstat 5 2
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 425 0 115 34 26 202 7 51 14 0 838 2 2 0 96
1 434 0 98 290 185 274 5 52 16 0 797 2 2 0 96
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 1 15 9 93 3 21 0 0 159 0 0 0 100
1 2 0 3 280 175 181 2 22 0 0 172 0 0 0 99...
pgstat
pgstat: http://docs.oracle.com/cd/E23824_01/html/821-1462/pgstat-1m.html
prstat
By default, prstat prints the damped average % CPU statistics for processor usage by individual processes or threads. Without arguments, prstat will periodically update the screen with relatively accurate 'average' information (this may be at variance with data returned from vmstat due to the difference in how it's calculated):
$ prstat
Although the prstat documentation does not explicitly mention this, by default, the reported CPU usage is decayed over time. This can be confirmed with the Java program at https://raw.githubusercontent.com/kgibm/problemdetermination/master/scripts/java/ConsumeCPU.java. For example, if a Java program uses 50% CPU from time T1 to time T2 (after which its CPU usage goes to approximately 0), and you start to take prstat at time T2, the first iteration will report about 50%, and the second iteration may report a decayed value, and so on in the following iterations. Therefore, prstat may not show the "current" processor usage of processes but may include some historical processor usage.
Use the -mv options to gather accurate interval-based statistics:
$ prstat -mv
For example, use prstat in micro-stat mode with the following options -mv for detailed, interval-accurate statistics, -n to limit the number of processes to report, and an interval and iteration count to print in batch mode:
$ prstat -mvcn ${MAXPROCESSES} ${INTERVAL} ${ITERATIONS}
$ prstat -mvcn 5 10 3
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
26649 root 5.9 17 1.0 12 45 0.0 19 0.1 2K 84 47K 0 prstat/1
26237 root 0.3 0.1 0.0 0.7 1.3 0.0 98 0.0 72 5 493 0 sshd/1...
The first iteration of prstat includes CPU data from before the start of prstat. In general, for "current" processor usage, review the second and subsequent iterations.
Be careful of relying upon any interpretation of prstat without it operating in -m 'micro-stat' mode, since there is no accurate timebase to the intervals against which percentage calculations can ever be accurately maintained.
Per-thread CPU usage
Use the -L flag along with -p $PID to display accumulated CPU time and CPU usage by thread (light-weight process [LWP]):
$ prstat -mvcLn ${MAXTHREADS} -p ${PID} ${INTERVAL} ${ITERATIONS}
$ prstat -mvcLn 50 -p 1089 10 12
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
1089 noaccess 119M 100M sleep 59 0 3:12:24 0.0% java/14
1089 noaccess 119M 100M sleep 59 0 1:55:58 0.0% java/35
1089 noaccess 119M 100M sleep 59 0 0:00:00 0.0% java/38
1089 noaccess 119M 100M sleep 59 0 0:00:00 0.0% java/36...
prstat -L for threads has similar behavior to prstat for processes. Without -mv, it reports damped average % CPU. With -mv, the first iteration includes CPU data from before the start of prstat.
CPU Statistics
Query available CPU statistics:
# cpustat -h
...
event specification syntax:
[picn=]<eventn>[,attr[n][=<val>]][,[picn=]<eventn>[,attr[n][=<val>]],...]
event0: Cycle_cnt Instr_cnt Dispatch0_IC_miss IC_ref DC_rd DC_wr...
event1: Cycle_cnt Instr_cnt Dispatch0_mispred EC_wb EC_snp_cb...
Query CPU statistics:
# cpustat -c EC_ref,EC_misses 5 2
time cpu event pic0 pic1
5.011 0 tick 2037798 90010
5.011 1 tick 1754067 85031
10.011 1 tick 2367524 101481
10.011 0 tick 4272952 195616
10.011 2 total 10432341 472138
The cputrack command is basically the same as cpustat but works on a per-process level.
Interrupts
Interrupt statistics can be queried with intrstat:
$ intrstat 5 2
device | cpu0 %tim cpu1 %tim
-------------+------------------------------
bge#0 | 0 0.0 4 0.0
glm#0 | 3 0.0 0 0.0
uata#0 | 0 0.0 0 0.0
device | cpu0 %tim cpu1 %tim
-------------+------------------------------
bge#0 | 0 0.0 8 0.0
glm#0 | 23 0.0 0 0.0
uata#0 | 0 0.0 0 0.0...
Query interrupts per device:
$ vmstat -i
interrupt total rate
--------------------------------
clock 3244127300 100
--------------------------------
Total 3244127300 100
Hardware Encryption
Recent versions of the IBM SDK that run on Solaris support the hardware encryption capabilities of the Ultra-SPARC T2 CMT processor through the IBMPKCS11Impl security provider which is the first in the java.security provider list:
- http://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.security.component.80.doc/security-component/pkcs11implDocs/supportedcards.html
- http://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.security.component.80.doc/security-component/pkcs11implDocs/cardobservations.html
Physical Memory (RAM)
Program memory should not page out of RAM. This can be monitored with
the api, apo, and apf columns in vmstat -p
. For
example:
# vmstat -p 5 3
memory page executable anonymous filesystem
swap free re mf fr de sr epi epo epf api apo apf fpi fpo fpf
4902128 1116760 76 851 1 0 0 0 0 0 0 0 0 0 1 1
4304784 931536 25 31 0 0 0 0 0 0 0 0 0 0 0 0
4304560 931320 447 5117 0 0 0 0 0 0 0 0 0 0 2 0
The first line of output is a set of statistics from boot and can usually be discarded.
Monitoring Swap Resources: http://docs.oracle.com/cd/E23824_01/html/821-1459/fsswap-52195.html
Input/Output (I/O)
Query disk usage:
$ df -h
Filesystem size used avail capacity Mounted on
/dev/dsk/c1t0d0s0 63G 60G 3.3G 95% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 4.4G 1.6M 4.4G 1% /etc/svc/volatile
fd 0K 0K 0K 0% /dev/fd
swap 4.5G 49M 4.4G 2% /tmp
swap 4.4G 56K 4.4G 1% /var/run...
When encountering "too many open files" ulimit issues use:
lsof -p <pid>
Use iostat for basic disk monitoring. For example:
$ iostat -xtcn 5 2
tty cpu
tin tout us sy wt id
0 1 2 2 0 96
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.1 0 0 c0t0d0
0.5 2.8 4.8 8.6 0.0 0.1 0.0 18.6 0 1 c1t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.6 0 0 wassun1:vold(pid463)
tty cpu
tin tout us sy wt id
0 98 0 0 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
0.0 2.4 0.0 7.0 0.0 0.0 0.0 19.3 0 1 c1t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 wassun1:vold(pid463)...
An alternative is fsstat:
$ fsstat -F
new name name attr attr lookup rddir read read write write
file remov chng get set ops ops ops bytes ops bytes
7.11M 3.03M 632K 45.6G 1.97M 35.9G 90.5M 3.97G 1.35T 906M 241G ufs
0 0 0 1.48G 0 1.43G 13.0M 723M 254G 46.9K 8.20M proc
0 0 0 255 0 25 22 0 0 0 0 nfs
0 0 0 0 0 0 0 0 0 0 0 zfs
0 0 0 785M 0 0 0 0 0 0 0 lofs
239M 13.5M 225M 272M 105K 549M 23.9K 209K 362M 226M 91.6G tmpfs
0 0 0 10.3M 0 0 0 30 4.27K 0 0 mntfs
0 0 0 0 0 0 0 0 0 0 0 nfs3
0 0 0 0 0 0 0 0 0 0 0 nfs4
0 0 0 488 0 28 19 0 0 0 0 autofs
Query swap usage:
$ swap -s
total: 876400k bytes allocated + 45488k reserved = 921888k used, 4645872k available
Zettabyte File System (ZFS)
Consider isolating the ZFS intent log to a separate disk.
Networking
Query socket information:
$ netstat -an
TCP: IPv4
Local Address Remote Address Swind Send-Q Rwind Recv-Q State
-------------------- -------------------- ----- ------ ----- ------ -----------
*.32772 *.* 0 0 49152 0 LISTEN
127.0.0.1.32833 127.0.0.1.32794 32768 0 32768 0 ESTABLISHED...
When running into "too many open files" use
netstat -an | grep ESTA | wc -l
Query socket statistics periodically:
$ netstat -i 5 2
input bge0 output input (Total) output
packets errs packets errs colls packets errs packets errs colls
122009930 0 7978053 0 0 152528566 0 38496689 0 0
33 0 6 0 0 33 0 6 0 0 ...
Starting with Solaris 11, use dlstat for network utilization (http://docs.oracle.com/cd/E23824\_01/html/821-1458/ggjew.html):
# dlstat -r -i 1
LINK IPKTS RBYTES INTRS POLLS CH<10 CH10-50 CH>50
e1000g0 101.91K 32.86M 87.56K 14.35K 3.70K 205 5
nxge1 9.61M 14.47G 5.79M 3.82M 379.98K 85.66K 1.64K
vnic1 8 336 0 0 0 0 0
e1000g0 0 0 0 0 0 0 0
nxge1 82.13K 123.69M 50.00K 32.13K 3.17K 724 24
vnic1 0 0 0 0 0 0 0
# dlstat -t -i 5
LINK OPKTS OBYTES BLKCNT UBLKCNT
e1000g0 40.24K 4.37M 0 0
nxge1 9.76M 644.14M 0 0
vnic1 0 0 0 0
e1000g0 0 0 0 0
nxge1 26.82K 1.77M 0 0
vnic1 0 0 0 0
Query detailed socket statistics:
# netstat -s
TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400
tcpRtoMax = 60000 tcpMaxConn = -1
tcpActiveOpens = 2162575 tcpPassiveOpens = 349052
tcpAttemptFails = 1853162 tcpEstabResets = 19061...
Ping a remote host. In general, and particularly for LANs, ping times should be less than a few hundred milliseconds with little standard deviation.
$ ping -ns 10.20.30.1
PING 10.20.30.1 : 56 data bytes
64 bytes from 10.20.30.1: icmp_seq=0. time=77.9 ms
64 bytes from 10.20.30.1: icmp_seq=1. time=77.2 ms
64 bytes from 10.20.30.1: icmp_seq=2. time=78.3 ms
64 bytes from 10.20.30.1: icmp_seq=3. time=76.9 ms
snoop
Capture network packets using snoop (http://www-01.ibm.com/support/docview.wss?uid=swg21175744, http://docs.oracle.com/cd/E23823_01/html/816-5166/snoop-1m.html).
Capture all traffic:
$ su
# nohup snoop -r -o capture`hostname`_`date +"%Y%m%d_%H%M"`.snoop -q -d ${INTERFACE} &
# sleep 1 && cat nohup.out # verify no errors in nohup.out
Use Wireshark to analyze the network packets gathered (covered in the Major Tools chapter).
Use -s to only capture part of the packet.
snoop does not have built-in support for log rollover.
Kernel
List available kernel statistics:
# kstat -l
bge:0:bge0:brdcstrcv
bge:0:bge0:brdcstxmt...
Query kernel statistics:
# kstat -p -m cpu_stat -s 'intr*'
cpu_stat:0:cpu_stat0:intr 1118178526
cpu_stat:0:cpu_stat0:intrblk 122410
cpu_stat:0:cpu_stat0:intrthread 828519759
cpu_stat:1:cpu_stat1:intr 823341771
cpu_stat:1:cpu_stat1:intrblk 1671216
cpu_stat:1:cpu_stat1:intrthread 1696737858
KSSL
On older versions of Solaris and older programs linked with older libraries, you may need to enable the KSSL kernel module, if available, to fully utilize hardware encryption (e.g. TLS performance): http://docs.oracle.com/cd/E19253-01/816-5166/6mbb1kq5t/index.html
truss
Truss can be used to attach to a process and print which kernel/system calls are being made:
# truss -p ${PID}
Warning: truss can have a large performance effect when used without filters.
Modifying Kernel Parameters
Some kernel parameters can be set by modifying the /etc/system file and rebooting (http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter1-9.html). For example:
set lim_fd_max = 10000
Some networking parameters can be set using the ipadm set-prop command. These updates are persisted on reboot (unless the -t option is specified). For example:
# ipadm set-prop -p _time_wait_interval=15000 tcp
ipadm command: http://docs.oracle.com/cd/E26502_01/html/E29031/ipadm-1m.html
The ipadm command replaces the "ndd" command in recent versions of Solaris: http://docs.oracle.com/cd/E26502_01/html/E28987/gmafe.html
Note that Solaris 11 changed the names of some of the network tunable parameters: http://docs.oracle.com/cd/E26502_01/html/E29022/appendixa-28.html
Networking
Update the TIME_WAIT timeout to 15 seconds by running # ipadm set-prop -p _time_wait_interval=15000 tcp (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunesolaris.html)
Update the FIN_WAIT_2 timeout to 67.5 seconds by running # ipadm set-prop -p tcp_fin_wait_2_flush_interval=67500 tcp (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunesolaris.html)
Update the TCP keepalive interval to 15 seconds by running # ipadm set-prop -p _keepalive_interval=15000 tcp (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunesolaris.html, http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter4-31.html)
Update the TCP listen backlog to 511 by running # ipadm set-prop -p _conn_req_max_q=511 tcp (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunesolaris.html)
Update the maximum send and receive buffer sizes to 4MB by running # ipadm set-prop -p max_buf=4194304 tcp (http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter4-31.html)
Update the maximum value of the TCP congestion window to 2MB by running # ipadm set-prop -p _cwnd_max=2097152 tcp (http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter4-31.html)
Update the default send window size to 1MB by running # ipadm set-prop -p send_buf=1048576 tcp (http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter4-31.html)
Update the default receive window size to 1MB by running # ipadm set-prop -p recv_buf=1048576 tcp (http://docs.oracle.com/cd/E23824_01/html/821-1450/chapter4-31.html)
Process Limits
Update the maximum file descriptors to 10,000 by updating these lines in /etc/system and rebooting (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunesolaris.html):
set lim_fd_max = 10000
set rlim_fd_cur = 10000
dtrace
Dtrace is a very powerful, dynamic tracing tool. For more information, see http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Intro
Sample 5-level user stack traces for Java processes:
# dtrace -n 'profile-1001 /execname == "java"/ { @[ustack(5)] = count(); }'
Print a stack trace any time a function is called:
# dtrace -n 'syscall::read:entry /execname == "bash"/ { ustack(); }'
List probes:
# dtrace -ln 'proc:::'
Useful scripts:
- Sample user and kernel CPU stacks: https://raw.githubusercontent.com/kgibm/problemdetermination/master/scripts/dtrace/stack_samples.d
- Summarize syscalls: https://raw.githubusercontent.com/kgibm/problemdetermination/master/scripts/dtrace/method_times_summary.d
- Track specific syscall times: https://raw.githubusercontent.com/kgibm/problemdetermination/master/scripts/dtrace/method_times_tree.d
DTrace scripts sometimes refer to time in Hertz. To convert: secs = 1/hertz
FlameGraphs
# git clone https://github.com/brendangregg/FlameGraph
# cd FlameGraph
# dtrace -x ustackframes=100 -n 'profile-99 /arg1/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.stacks
# ./stackcollapse.pl out.stacks > out.folded
# ./flamegraph.pl out.folded > out.svg
Logical Domains, Zones, and Processor Sets/Pinning
Logical domains, or LDOMs, are a way to virtualize the physical hardware to partition it into multiple guest operating system instances. List domains: ldm list-bindings
Non-global zones, or containers, are a way to virtualize an operating system instance further while sharing the base operating system image and runtime (the parent global zone).
Zones can be used to accomplish processor sets/pinning using resource pools. In some benchmarks, one JVM per zone can be beneficial.
- First, stop the non-global zone
- List zones: zoneadm list -vi
- Enable resource pools: svcadm enable pools
- Create resource pool: poolcfg -dc 'create pool pool1'
- Create processor set: poolcfg -dc 'create pset pset1'
- Set the maximum CPUs in a processor set: poolcfg -dc 'modify pset pset1 (uint pset.max=32)'
- Add virtual CPU to a processor set: poolcfg -dc "transfer to pset pset1 (cpu $X)"
- Associate a resource pool with a processor set: poolcfg -dc 'associate pool pool1 (pset pset1)'
- Set the resource set for a zone: zonecfg -z zone1 set pool=pool1
- Restart the zone: zoneadm -z zone1 boot
- Save to /etc/pooladm.conf: pooladm -s
- Display processor sets: psrset
- Show the processor set a process is associated with (PSET column): ps -e -o pid,pset,comm