When Node Manager and OPMN Won’t Play Nicely
Who doesn’t love a solid VM workstation? I’m a big fan of these given their scalability and low maintenance (just update that master image and redeploy!). Not to mention the fact that I can use fancy buzzwords like “cloud” and “virtualization”. Jokes aside, VMs are awesome developer tools and here at Red Pill, we recommend using them in BI deployments.
I recently had an issue with VM clones running OBIEE within a domain. This was due to the fact that within this domain, the machine name could not be identical to another machine. This presented a problem as when OBIEE was installed on the “Master Image” (this VM image serves as the parent image that the others are cloned from) it used the machine name as the host name. This is all fine and well, and there are usually no issues if you are in a virtualization environment that does not use machine names as unique identifiers (Amazon Web Services comes to mind). Since each machine name was changed when it cloned, the host name was now out of sync with the host name that OBIEE uses. There are excellent articles on the web that detail this process, and those articles were certainly a helpful starting point for my adventure in host name changes. But what if you have done all of the config for new host names and you start getting Node Manager and OPMN errors?
Here are the vitals of my system after completing host name changes within the Admin Console, Enterprise Manager, and various config and start/stop scripts:
- Admin Server started in Running Mode
- BI_Server1 failed to start
- All EM components started except for core application and essbaseserver1
- My OBIEE version is 220.127.116.11.0 on Windows 2008R2
A tidy list of things to trouble shoot, but I was up for the challenge.
Let’s start with getting BI_Server1 online. When attempting to start the server, I was seeing that there were issues contacting the Node Manager. After double checking the listening ports and addresses in both the Admin Console, and in the Node Manager config files, I closed Node Manager and then I attempted to boot Node Manager from via command line. I used the nmConnect command, and was promptly given an error. The Node Manager error I experienced was the following:
Odd, right? All of my configurations are correct in Admin Console and the config files, so why would the connection be refused? The secret lays deep within the folders of the user_projects directory of OBIEE. If you navigate to the <middleware_home>\user_projects\domains\bifoundation_domain\servers\bi_server1\data\nodemanager directory, you’ll find a collection of files.
Specifically, we’re interested in the .lck, .pid, and the .state files. The .lck (a lock file), .pid (a process identifier file), and .state (a state file) were present and were still referencing the old Node Manager set ups. What’s the solution? I created a folder within that directory called “archive”, and moved these old files into it. Then, I re-ran the nmConnect command. Voila! The Node Manager booted right up, and it created the new .lck, .pid and .state files. After that, I went back into my Admin Console and started BI_Server1 with no issues.
Now that I had BI_Server1 started, I went into EM and tried starting coreapplication. No dice. Again I went to the command line to try an opmn startall command. This time, I had a little more to work with based on the output.
After checking that all config files were properly set, I headed over to the states folder within the opmn directory: <middleware_home>\instances\instance1\config\OPMN\opmn
Again, we see some files in that directory which are created during the startup and running of the opmn processes. Just like before, I created an “archive” folder, placed all of the old files in it, and ran my opmnctl startall command again. The opmn services fired up, and we can see now that all of the services are alive by running an opmnctl status command.
The main point here is that even with proper configurations in both the Admin Console and configuration files, your services may still fail if the state files are referencing an old process. Make sure those files are refreshed by removing the old ones and clearing the directory for new files to be created.
Many thanks to Neal Achord for his help in diagnosing these issues.