Access Keys:
Skip to content (Access Key - 0)









How to diagnose a JVM crash on Rhino?

Print this page

Introduction

This document provides information and guidance on some procedures for diagnose a JVM crash on Rhino.

What does a JVM crash mean?

A rare issue that some application developers find themselves chasing is that of Rhino that terminates with a JVM crash, or fatal error. There are various possible reasons for a Java Virtual Machine crash. For example, a JVM crash can occur due to a bug in the JAVA HotSpot VM, in a system library, in a Java SE library or API, in application native code, or even in the operating system. However, eternal factors, such as resource exhaustion in the operating system can also cause a JVM crash.

First thing to check on a JVM crash

In general, for the JVM crash, first check the logs of all Rhino cluster members. If a JVM error message is part of the logs of any node then a Java Virtual Machine error has occurred. After checking Rhino's logs, determine whether or not the hardware or operating system is causing the problem. Look at the logs of the local machine and determine whether the machine has had a history of restarts, kernel panics, process segmentation faults and so forth. On Unix machines, system logs can be viewed using the dmesg command or by viewing logs in /var/logs. If the crash appears to be a one-off, the node can simply be restarted. In a production system, the -k flag can be used with ./start-rhino.sh to automatically restart a node in the event of failure (such as a JVM crash).

Locating the fatal error log

Locating the fatal error log is also a very important step for diagnosing a JVM crash. The fatal error log is a file named hs_err_pid<pid>.log where <pid> is the process id of the process(i.e. $RHINO_HOME$/$NODE_ID$/work/rhino.pid containing the current process ID of Rhino). Normally this fatal error file is created in the same working directory of the rhino process (i.e. $RHINO_HOME$/$NODE_ID$/). View this file and try to determine which part of the JVM caused the crash. This may provide clues as to how to resolve the situation.

Sample crashes

This section shows a number of examples and explains how the error logs are used to suggest the cause of a JVM crash.

A JVM crash in the hotspot compiler thread

If the "Current thread" is a JavaThread "CompilerThread0", there might be possible a compiler bug that you have encountered. Here is an example of a such a crash:

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  Internal Error (/home/chrisphi/east/ws/6415406/ws5u15/hotspot/src/share/vm/opto/loopopts.cpp, 850), pid=21426, tid=11
#
# Java VM: Java HotSpot(TM) Server VM (1.5.0_15ea_TEST_150_15ea+cr6415407_01_chrisphi_2007.10.12_11:18-debug mixed mode)
#
# Error: assert(dom_depth(n_ctrl) <= dom_depth(x_ctrl),"n is later than its clone")

---------------  T H R E A D  ---------------

Current thread (0x082088e0):  JavaThread "CompilerThread0" daemon [_thread_in_native, id=11]

It might be possible to temporarily workaround the issue by switching the compiler to run the HotSpot Client VM, or excluding the method that provokes the JVM crash from compilation if there is one.

A JVM crash in compiled code

If the JVM crash is in compiled code (i.e. the problematic frame is marked with a "J"), there might be a compiler bug that has resulted in incorrect code generation. Here is an example of a such a crash :

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGBUS (0xa) at pc=0xf917bb8c, pid=4952, tid=18
#
# Java VM: Java HotSpot(TM) Server VM (1.5.0_09-b01 mixed mode)
# Problematic frame:
# J  com.opencloud.deployed.Profile_Table_192.ProfileIndexVPNOCBB_Persistence.doStore(Lcom/opencloud/resource/memdb/ChangeSet;)V
#

Stack: [0xd2380000,0xd2400000),  sp=0xd23ff430,  free space=509k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J  com.opencloud.deployed.Profile_Table_192.ProfileIndexVPNOCBB_Persistence.doStore(Lcom/opencloud/resource/memdb/ChangeSet;)V
v  ~I2CAdapter
j  com.opencloud.deployed.Profile_Table_192.ProfileIndexVPNOCBB_Persistence.batchStore(Ljava/util/Iterator;)V+41
j  com.opencloud.deployed.Profile_Table_192.ProfileIndexVPNOCBB_Container.persistenceStore(Ljava/util/Iterator;)V+1
...
V  [libjvm.so+0x199d94]
V  [libjvm.so+0x2bffb4]
V  [libjvm.so+0x2df58c]
V  [libjvm.so+0x2db128]
V  [libjvm.so+0x670740]

It might be possible to temporarily workaround the issue by switching the compiler to run the HotSpot Client VM, or excluding the method that provokes the crash from compilation.

If you get a JVM crash in compiled code on the HotSpot Server VM and it is reproducible, please collect the following information to analyse the root cause of the crash.

  • Rerun on the HotSpot Server VM with excluding the method that provokes the crash from compilation, such as the -XX:CompileCommand=exclude,com/opencloud/deployed/Profile_Table_192.ProfileIndexVPNOCBB_Persistence,doStore flag added to the command line.
  • Provide `uname -a` and the time-stamp of the crash.
  • Provide `java -version` and `java -fullversion` (to know the exact version of JDK).
  • Locate the core file if it is available.
  • Attach the native debugger (pstack, pmap, pldd on Solaris) to the core file (ONLY on the machine where core was generated).
  • Record the cpu stats during the JVM crash.
  • Locate the fatal error log.

A JVM crash in native code

If the fatal error log indicates the JVM crash was in a native library (i.e. the problematic frame is marked with a "C"),, there might be a bug in native/JNI library code. For example, consider the following extract from the header of a fatal error log:

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGBUS (0x7) at pc=0xf73d7a62, pid=19500, tid=688157
#
# Java VM: Java HotSpot(TM) Server VM (1.5.0_06-b05 mixed mode)
# Problematic frame:
# C  [libzip.so+0xfa62]
#

---------------  T H R E A D  ---------------

Current thread (0xcab30a28):  JavaThread "StageWorker/0" [_thread_in_native, id=19627]
...
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libzip.so+0xfa62]
C  [libzip.so+0x10124]  ZIP_GetEntry+0xd8
C  [libzip.so+0x36b9]  Java_java_util_zip_ZipFile_getEntry+0x14b
...
c5ae4000-c5bae000 r-xs 00000000 00:15 65848888  /home/rhino/work/deployments/101/unit1143400190944/.nfs0000000003ecc63800000010
c5bae000-c5c00000 r-xs 00000000 00:15 65849319  /home/rhino/work/deployments/101/unit1143400190944/.nfs0000000003ecc7e700000011

In this case a SIGBUS arose with a thread executing in the library libzip.so.

In general, if you get a JVM crash in a native library, please collect the following information to analyse the root cause of the crash.

  • Rerun with the -Xcheck:jni option added to the command line.
  • Locate the core file if it is available.
  • Attach the native debugger (dbx, dgb on Solaris) to the core file.
  • Locate the fatal error log.

The solution for this JVM crash is to run the test with Rhino installed on a local file system. It looks like the Rhino install is mounted over NFS. If you mount your Rhino installation over NFS this could be causing apparent "modifications" which upset Java's libzip library.

Finding a workaround

If a JVM crash arises with Rhino, and the crash appears to be caused by a bug in the JAVA HotSpot VM, then it might be desirable to find a temporary workaround. However, if the JVM crash arises continuously with Rhino that is deployed with the latest JDK then the crash should be reported to OpenCloud by sending an email to OpenCloud Support.

If the fatal error log indicates that the crash has arisen in HotSpot compiler thread or compiled code, then it is possible that you have triggered a compiler bug. In the case of example from "A JVM crash in compiled code", there are two potential workarounds:

  1. Use the -client option
    Edit the configuration file (i.e. $RHINO_HOME$/$NODE_ID$/read-config-variables) by changing "-server" to "-client", so that Rhino is run with the "-client" option to specify the HotSpot Client VM.
  2. Exclude the method from compilation
    Edit the configuration file (i.e. $RHINO_HOME$/$NODE_ID$/read-config-variables) by adding -XX:CompileCommand=exclude,com/opencloud/deployed/Profile_Table_192.ProfileIndexVPNOCBB_Persistence,doStore flag to it.

In order to verify that the HotSpot VM correctly located, you need to look for the following log information on Rhino console at runtime.

### Excluding compile:	com.opencloud.deployed.Profile_Table_192.ProfileIndexVPNOCBB_Persistence::doStore

What to collect in order to ask OpenCloud for help?

In general, before you send email to OpenCloud support, use the OpenCloud Developer Portal or Sun Bug Database whether your issue is known. When you are reporting an issue on JVM crash, you need to be clear about the following information:

  • Which Rhino version are you running on?
  • Which Java version are you running on?
  • Which operation system are you running on?
  • A summary of the issue would be helpful to the OpenCloud engineer.
  • Locate the core file if it is available.
  • Attach the native debugger (pstack, pmap, pldd on Solaris) to the core file (ONLY on the machine where core was generated).
  • Locate the fatal error log.
Adaptavist Theme Builder Powered by Atlassian Confluence