Wednesday, February 11, 2009

Weird IBM RPM Catastrophics Error



Our latest IBM RPM customer is opted for a relatively smaller investment in the solution, their system configuration of RPM involved the following components:

DB2 Express-C 9.5
Tomcat 5.5
JDK 5.0
Windows 2003
IBM RPM 7.1.1.2

Based on the instructions of the official guide of IBM RPM 7.1.1.2, there are some additional steps needed for the proper installation of the solution. You can refer to the guide for further information.

The installation took almost 1 day due to our unfamiliarity with such environment because we usually deal with IBM WAS and DB2 ESE. So we did it slowly to avoid unnecessary mistakes.

The installation was a successful one with no error in the logs.

However, a post installation testing procedure revealed some drastic and yet can be frigthening issues, some of them includes:

* When clicked on Investment Map in the Dashboard, it will respond with a EOleException Catastrophics error with some Windows hexadecimal code that translated to a very general error message.

* When try to save text key in the rich text control in the portlet, it will not be saved.

* More EOleException Catastrophic error in other features.

Very interesting indeed because none of the logs (Windows event, tomcat logs and DB2 logs) were unable to reveal any useful hints on the error.

We decided to open a case with IBM Support since the team will be busy doing other functional works.

The communication was established for almost 2 months with no apparent solution to the problem. We did the regular log compilation, scenario description, screen shots blah blah.... At the end, the support even suggested that the DB2 Express-C v9.5 is not supported by the RPM and thus unable to escalate it further. :-S

Since the project is almost near the completion and the risk of this error haunting us is getting more obvious and serious, the team decided to commit more efforts to fix it.

Here are the problem solving sequences we took:

Note: Our VM Test environment doesn't have the EOleException problem. Thus we believe it is environment specific, not a product bug.


1. Disable all irrelevant services in the OS and test
- Unchanged. Rule out the problem of software conflicting
- Enable back everything

2. Run the RPM Client outside the server machine, i.e. client workstation
- Unchanged. Confirmed it is server based problem. RPM Standalone and Plugins behaves the same.

3. Accidentally we had tried this scenario and found out something interesting.
We opened up the RPM Client and triggered the EOleException catastrophics error and then clicked Ok to continue. Then we recycled the Tomcat while the client was opened. The error will not triggered again and everything is fine, including the rich text control saving problem.
- This has revealed an important hint that the problem is caused by something shared between the client program and the Tomcat in the server.
- Since the client program is a Windows executable, possibly this got to do with some DLLs or something that the client program "loaded" before the Tomcat. I can't be sure since I don't have access to the source codes.

4. We checked the java.library.path from the Tomcat logs and found that there are another copy of xercesImpl.jar and xmlApi.jar in the path. To play safe, we removed these files and tested again. No luck.
- Rules out the problem of JAR class loading because logically the client program and Tomcat doesn't really use them together (I think).

5. A tip that we learnt from the Support is that we can stream out the logs generated by the client program using the "RPMStdIn.exe > logs.txt" and this log will contain information such as stored procedure called with parameters.
- By triggering the error, now the logs.txt contains the stored procedure that causes the error.

6. From 5 above, we copied the call information and executed it directly in DB2
- No errors returned from DB2. This confirmed the problem with how Tomcat or something inside Tomcat interpret or handle the SP call.

7. Confirm the JDBC Driver
- No problem as we directly used the drivers from the local DB2 Express-C.

8. Ruling out so many things and we start to suspect something more fundamental causes this, i.e. the Java execution environment. A check on the Tomcat JRE indicated the use of JRE 1.5.0.2. Cross referenced to Tomcat 5.5 and RPM 7.1.1.2 and none of the manual said anything about a particular level of J2SE 5.0 to be used.
- Since we running out of options, we decided to upgrade the JRE to 1.5.0.16 since that version was used in our VM too (LOL, we realized that too late I guess)
- Bingo, the error is no more.
- To reconfirm, we installed JRE1.5.0.2 in our VM and let the Tomcat used it and we managed to resimulate the problem, which is something that we were unable to perform since the reporting of the case to the Support.
- Note: The mode of the Tomcat in the environment is using jvm.dll. I'm not sure whether the same problem will happen with other modes.


All this havoc just because of insufficient information from the official about supported environments for RPM. Aiks.

P/S: IBM RPM guide never mentioned about the DB2 editions supported and this might be a grey area on the supportability of any RPM deployments.

Professionally, I strongly believe that CA Clarity did a great job on explicitly documented down the supported environments to the details on levels, fix packs, versions and so on. It can help to get rid of weird problem(s) like the one we encountered here.


No comments: