Liberty in OpenShift Get System Dump Recipe

  1. This procedure requires logging into oc with a user with cluster-admin superuser privileges
    1. Ensure you're logged in with oc
  2. Find the relevant pod(s):
    $ oc get pods --namespace $NAMESPACE
    NAME                            READY   STATUS    RESTARTS   AGE
    mypod-7d57d6599f-tq7vt          1/1     Running   0          12m
  3. Remote into the pod (if it has more than one container, specify the container with -c $CONTAINER):
    $ oc rsh --namespace $NAMESPACE -t $PODNAME
    sh-4.4$ 
  4. Execute the following command to determine the core_pattern:
    cat /proc/sys/kernel/core_pattern
  5. If core_pattern starts with a |, then it will be sent to the worker node. Otherwise, ensure the specified directory exists in the container.
  6. Execute server javadump with the system dump option. The $WLP path is usually either /opt/ibm/wlp or /opt/ol/wlp depending on whether it's WebSphere Liberty or OpenLiberty, respectively:
    $WLP/bin/server javadump --include=system
    It is a common and expected error that the core dump is not found since it goes to the worker node; for example:

    The core file created by child process with pid = $PID was not found

  7. If core_pattern did not start with a |, retrieve the core dump from the core_pattern directory inside the container. Otherwise, continue to the next steps.
  8. Exit the remote shell:
    exit
  9. Find the worker node of the pod:
    oc get pod --namespace $NAMESPACE --output "jsonpath={.spec.nodeName}{'\n'}" $PODNAME
  10. Start a debug pod on the worker node:
    oc debug node/$NODE -t
  11. If core_pattern ends with systemd-coredump, dumps should be in /var/lib/systemd/coredump/. If it ends with apport, dumps should be in /var/crash/ or /var/lib/apport/coredump/. If it ends with rdp, review /opt/dynatrace/oneagent/agent/conf/original_core_pattern.
  12. List the directory from the last step with:
    chroot /host/$DUMPSDIRECTORY
  13. Now we'll use this debug pod to download the file. First start a looping output so that the debug pod doesn't timeout by executing:
    while true; do echo 'Sleeping'; sleep 8; done
  14. Next, open a new terminal and find the debug pod and namespace:
    $ oc get pods --field-selector=status.phase==Running --all-namespaces | grep debug
    openshift-debug-node-pwcn42r47f       worker3-debug       1/1     Running            0                  3m38s
  15. Use the above namespace (first column) and pod name (second column) to download the core dump from the worker node from the Storage location above, making sure to prefix the Storage location with /host/; for example:
    oc cp --namespace openshift-debug-node-pwcn42r47f worker3-debug:/host/var/lib/systemd/coredump/core.kernel-command-.1000650000.08b9e28f46b348f3aabdffc6896838e0.2923161.1659552745000000.lz4 core.dmp.lz4
  16. After the download completes, in the previous terminal window, type Ctrl^C to exit the loop and then type exit to end the debug pod
  17. Upload core.dmp.lz4