SuperUser
  Posts:26
 |
| 01/20/2004 1:21 PM |
|
Not only looking for advice here, but I think you all should be aware of this. When an NFS mounted File System goes stale or fails, the caiUxOs agent appears to use excessive amounts of CPU trying to get file system status. The NFS filesystem is not being monitored by the agent but it still churns. An example of how this is caused occurred when a system admin temporarily mounted a cdrom on his unix workstation to one of our production servers to do an install. Instead of unmounting the cdrom when he was finished, he simply shut down his workstation. This caused unix commands like DF to hang on the clock indefinitely and the caiUxOs agent began using 100% of available CPU. And it never notifies us of any alerts. A second example is when we recently had a SAN communication failure and all SAN hosted filesystems were knocked offline. The same conditions occurred. The fact that a number of our servers are going to SAN makes this CPU issue a very high impact in our production environment. Does anyone know a fix or faced this issue before? Is there anyway to cause the caiUxOs to time out during filesystem polls and also send an alert when there is a mounted filesystem problem?
|
|
|
|
SuperUser
  Posts:26
 |
| 01/20/2004 3:00 PM |
|
I have never seen either of the examples you have given (our Unix folks = ignore CPU messages). =20 If you are getting the CPU critical messages, perhaps you could issue = some unix commands to find the cause and even kill it? |
|
|
|
SuperUser
  Posts:26
 |
| 01/20/2004 3:45 PM |
|
It not that we are monitoring the CPU utilization, it's more the issue that the agent actually hangs and does not do much else until the filesystem is repaired. Stopping and starting the caiUxOs agent does not help as the the agent still trys to determine filesystem status when it is restarted. To put it in a nutshell, if a "df -tk" seems to get hung on the server (usually lists a few filesystems and stops without returning a prompt) then the caiUxOs is going to churn a CPU trying to do the same thing. And it increases its usage at each polling period by starting a request on a request on a request and so on, because the previous one never ended. |
|
|
|
SuperUser
  Posts:26
 |
| 01/20/2004 4:00 PM |
|
| We had this issue on our AIX box that was connected to an optical jukebox. We at first thought it was hardware failure of some type, because the box would hose itself. We finally discovered it was the caiUxOS agent having an issue with the optical drives even though we never told it to monitor. This seemed to be caused by the way the jukebox software swapped discs out. So we had to remove the agent, otherwise the box was useless. Recently we have changed our methods for this storage process and we put the agent back on the box with no issue. |
|
|
|
SuperUser
  Posts:26
 |
| 01/20/2004 4:00 PM |
|
We have not seen this particular problem with the OS agent, however we have experienced the same results with the SAP agent. We are running the Unicenter SAP agent on AIX 5.1 64 bit and on occasion this agent will stop releasing the CPU and continue rising until the box seizes. We wrote a script to recognize this situation and recycle the agent - not a resolution but better than the alternative. |
|
|
|