»  Forums
Minimize FLCAUG Discussion Forums
Print  
 
Subject: EM Message Processing Limitations?
You are not authorized to post a reply.
Author Messages

SuperUser

Advanced MemberUser is Offline
Posts:26

09/07/2004 1:34 PM  
We've run into an issue that has us concerned (and a bit confused) about the robustness and inner workings of EM. It appears that if EM receives multiple messages that match a message record, and if the message record does any kind of shelling out, some messages may not be processed (with no indication of failure.) To reproduce, create a message record that matches an snmp trap, and make an action of COMMAND that does a cawto to another server.

In our case, I've found that once "x" number of matches occurs (which fires the cawto) the matching continues to occur, but the cawto messages never appear on the destination server. In my latest testing, I sent 300 snmp traps from Server1 to Server2. Server2 should have done 300 cawto actions to Server3, which it LOOKS like it did (in the Console Log.) Server3 only received an average of 170 cawto messages, though. I tested with both W2K and W2K3 servers as Server2, with the same results. Again, the Console Log at Server2 indicates that all 300 actions fired...

Has anyone else seen any of these "Bermuda Triangle" situations?

SuperUser

Advanced MemberUser is Offline
Posts:26

09/07/2004 2:00 PM  
You might want to try bumping up the number of Max available threads available to Event Management. You can locate this in the caugui settings

SuperUser

Advanced MemberUser is Offline
Posts:26

09/07/2004 2:08 PM  
We already have it at 1000, so I don't think that's an issue. We're more concerned with the fact that there was never even an indication that there was a problem.

SuperUser

Advanced MemberUser is Offline
Posts:26

09/07/2004 2:22 PM  
Are you sure that the simulate box is not checked?
Also Are you using one CAIOPRDB, or are there any EVAL NODE fields configured? I have seen loops before that happed because there is a forward back to itself from a message action. Can you elaborate on the architecutre? Is it 3 tier?

SuperUser

Advanced MemberUser is Offline
Posts:26

09/07/2004 2:29 PM  
Simulate is not checked and no EVAL NODE is defined. As I posted in the original, we are getting all message when not "overloaded" and do receive some even when "overloaded," so the message records/actions are defined correctly. The architecture is as stated above, too. Server1 sends a trap to Server2, Server2's EM does a cawto to Server3.

SuperUser

Advanced MemberUser is Offline
Posts:26

09/07/2004 2:40 PM  
Have you also bumped up the setting below. Could be your exceeding the default five minute wait for a thread to become available. I'm assuming it's not the system receiving the cawto's, because I get Console daemon not receiving messages when that happens.

CA_OPR_MAX_WAIT Specifies the number of seconds Event Management will wait for a thread to process a message action. If a thread does not become available in that

SuperUser

Advanced MemberUser is Offline
Posts:26

09/07/2004 2:44 PM  
Well what is you server's hardware specs. I have seen some rough numbers but depending on hops, NIC speed, RAM and CPU I think on a modest system it is arounf 80 to 100 traps processed a second. Since everything runs in RAM for EM it needs to have plenty of head room. Have you used the AEC recorder/replay utility to re-run the same messages that EM is seeing? Has CATRPD died on you at all. I know there are some fixes for that particular event. Also do you have the CAITRPDB database enabled and using the ANO trap editor? If so turn it off and see if processing is any better. I once used a SNMP trap loader utility to find my EM servers breaking point on simple and complex traps. I believe that it does not take much to clog CATRPD due to complex message records and actions, nto to mention anything else installed on the server.

SuperUser

Advanced MemberUser is Offline
Posts:26

09/07/2004 2:56 PM  
Our CA_OPR_MAX_WAIT is set at the default 300 seconds, but since our max threads is set at 1000 and I'm testing with 300 actions (plus the messages that do make it do so in under a minute) I don't think that's it, either. The problem does seem to be with Server2's EM though, because we also use SAF which should eliminate any receipt problems at Server3.

As far as the Server goes:
Microsoft Windows 2000 Server
Version 5.0.2195 Service Pack 3 Build 2195
ProLiant DL360
GenuineIntel ~996 Mhz
Total Physical Memory 523,808 KB

We don't have AEC installed, and we don't translate traps. CATRAPD doesn't seem to be freaking out either, as subsequent traps go through just fine. It's kind of like shaking up a bottle of beer and squirting it through a keyhole. Some of the beer makes it and some doesn't, and the next beer you squirt does the same thing. NOTE: Neither I (nor my employer) recommend such a shameful waste of beer in a production environment.
You are not authorized to post a reply.
Forums > Discussion Forums > Unicenter Topics > EM Message Processing Limitations?



ActiveForums 3.6
 
     
Date » 20 July, 2008    Copyright 2002-2008 by RavenSystems & FL CA Users Group Login : Register
Inspired by Nina