02.09
Today I ran into a weird issue while installing Oracle Grid Control Agent 10.2.0.3 on Linux. Right after typing “runInstaller”, OUI crashed because of segmentation fault… Let me talk about some of the troubleshooting maneuvers you may need to perform should you find yourself in similar troubles.
Here are the relevant details:
- OS: Red Hat Enterprise Linux Server 5.3 x86-64
- GC Agent: Oracle Enterprise Manager 10g Grid Control Release 3 (10.2.0.3) for Linux x86-64
- GC Console: Oracle Enterprise Manager 10g Release 5 (10.2.0.5) Grid Control for Microsoft Windows 32-bit
And here’s the error message (the most interesting portions):
Unexpected Signal : 11 occurred at PC=0xE44F46A7
Function=[Unknown.]
Library=(N/A)
[..]
Current Java thread:
at sun.awt.motif.MToolkit.init(Native Method)
at sun.awt.motif.MToolkit.<init>(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[..]
Heap at VM Abort:
Heap
def new generation total 576K, used 84K [0xe6510000, 0xe65b0000, 0xe7090000)
eden space 512K, 4% used [0xe6510000, 0xe65152f8, 0xe6590000)
from space 64K, 100% used [0xe65a0000, 0xe65b0000, 0xe65b0000)
to space 64K, 0% used [0xe6590000, 0xe6590000, 0xe65a0000)
tenured generation total 6212K, used 4461K [0xe7090000, 0xe76a1000, 0xefb10000)
the space 6212K, 71% used [0xe7090000, 0xe74eb5f8, 0xe74eb600, 0xe76a1000)
compacting perm gen total 5632K, used 5398K [0xefb10000, 0xf0090000, 0xf3b10000)
the space 5632K, 95% used [0xefb10000, 0xf00558b0, 0xf0055a00, 0xf0090000)
Local Time = Tue Feb 8 09:45:48 2011
Elapsed Time = 1
#
# The exception above was detected in native code outside the VM
#
# Java VM: Java HotSpot(TM) Client VM (1.4.2_08-b03 mixed mode)
#
To go past this show-stopper I tried a few things…
The Heap report produced by java at crash time, seemed to indicate a memory shortage. By editing the “install/oraparam.ini” file, you can tweak how much RAM is available for OUI’s JVM. Just alter “JRE_MEMORY_OPTIONS” value.
JRE_MEMORY_OPTIONS=" -Xms512m -Xmx2048m"
This is also a safe place to put additional command line parameters: they’ll mostly be passed to java’s command line. I said “mostly” because OUI wrapper/launcher seems to check some sort of allowed parameters list and may refuse to go on if somethings doesn’t look right.
The “-XX:MaxPermSize=32m” is one of the knobs that doesn’t pass the sanity check. In order to run OUI’s JVM by hand, with the right parameters, just keep the first lines of runInstaller (the ones starting with ‘Arg:‘):
Arg:1:-Doracle.installer.library_loc=/tmp/OraInstall2011-02-08_04-55-33PM/oui/lib/linux:
Arg:2:-Doracle.installer.oui_loc=/tmp/OraInstall2011-02-08_04-55-33PM/oui:
Arg:3:-Doracle.installer.bootstrap=TRUE:
[..]
Arg:20:-timestamp:
Arg:21:2011-02-08_04-55-33PM:
Arg:22:-nowelcome:
Strip “^Arg:“, “^\d*:“, “:$“, add a trailing “ \” and you’ll have an OUI launching shell script you can alter at will.
Increasing JVM’s memory led to no effect. Heap report looked fine (usage percentages went down) but crash was still there.
Another useful switch is “-XX:+ShowMessageBoxOnError“. It makes java halt on error, allowing us to attach a debugger and perform a stack backtrace, e.g.:
An error has just occurred.
To debug, use 'gdb /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/bin/java 4866'; then switch to thread -136623920
#1 0xf7e462b6 in nanosleep () from /lib/libc.so.6
#2 0xf7e460df in sleep () from /lib/libc.so.6
#3 0xf7bdc6d7 in os::message_box ()
from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#4 0xf7bd9c52 in os::handle_unexpected_exception ()
from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#5 0xf7bddbf6 in JVM_handle_linux_signal ()
from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#6 0xf7bdc9d8 in signalHandler ()
from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#7 <signal handler called>
#8 0x6d4626a7 in ?? ()
#9 0x6d6d75b9 in XtToolkitInitialize () from /usr/lib/libXt.so.6
I also tried to “inject” a couple of newer JVM’s into the stage directory. The quickest way is to borrow it from another installer.
./Linux_x86_64_Grid_Control_full_102030/Disk1/stage/Components/oracle.swd.jre
1.4.2.8.0
./p6810189_10204_Linux-x86-64/Disk1/stage/Components/oracle.swd.jre
1.4.2.14.0
The server’s has a “working” directory were Oracle patches/products are stored before use. In my case, changing OUI’s JVM from 1.4.2.8 to 1.4.2.14 is a matter of copying:
to:
Then modifing the same “oraparam.ini” file mentioned before.
JRE_LOCATION=../stage/Components/oracle.swd.jre/1.4.2.14.0/1/DataFiles
You could as well download a specific JRE from http://java.sun.com (sorry: from Oracle) and:
- install the new JRE somewhere
- unzip (-t) the “filegroup1.jar” file that corresponds to OUI’s “factory” JRE. Note how the directories are laid out (something like: “jre/1.4.2”). Modify the new JRE accordingly.
- zip the new JRE, rename the resulting file to “filegroup1.jar”, copy it in the right place.
- modify oraparam.ini and choose the JVM version you’ll boot OUI into.
/opt/orastage/Linux_x86_64_Grid_Control_full_102030/Disk1/stage/Components/oracle.swd.jre
[oracle@racnode01 oracle.swd.jre]$ find . -type f
./1.4.2.8.0/1/DataFiles/filegroup1.jar # <-- factory
./1.4.2.8.0/1/DataFiles/filegroup2.jar
./1.4.2.8.0/1/DataFiles/filegroup3.jar
./1.4.2.8.0/1/DataFiles/filegroup4.jar
./1.4.2.8.0/1/DataFiles/filegroup5.jar
./1.4.2.14.0/1/DataFiles/filegroup1.jar # <-- stolen from patchset p6810189
./1.4.2.14.0/1/DataFiles/filegroup2.jar
./1.4.2.14.0/1/DataFiles/filegroup3.jar
./1.4.2.14.0/1/DataFiles/filegroup4.jar
./1.4.2.14.0/1/DataFiles/filegroup5.jar
./1.4.2.19.0/1/DataFiles/filegroup1.jar # <-- downloaded by hand
Three different JREs, each of them segfaulting in the same spot, as we saw in the backtrace:
Who’s the owner of libXt?
libXt-1.0.2-3.1.fc6 i386
After making sure that none of the running processes was using that package contents, I decided to remove it (rpm -e –nodeps libXt-1.0.2-3.1.i386) and reinstall it. Surprisingly, OUI worked flawlessy after this last action. Too bad I can’t really explain why. 🙁 libXt version didn’t change before/after reinstall. I should diff it anyway with what’s left untouched on other RAC cluster members. I’ll update the post when I have a stricter explanation…