Mar 21, 2011

SCOM 2007 - How To Raise Alerts Using WMI Event Rule (and get the desired variable in alert description)

I post this after one of my customers requirement. He asked me to create a rule that raises an alert if a service in automatic startup mode is stopped.Here is how i did:
  • Go to the Authoring pane and create a new rule.

  • Select Alert Generating Rules -> Event Based -> WMI Event (Alert). Store the new rule in a specific Management Pack (not Default one).
     

  • Give a name to your rule and select Windows Server as rule target.

  • Enter root\cimv2 for the WMI Namespace and the following query: select * from __InstanceOperationEvent within 60 where TargetInstance isa 'Win32_Service' and TargetInstance.StartMode = 'Auto' and TargetInstance.State = 'Stopped'. This query will catch wmi events raised each time a windows service in automatic startup mode enters the stopped state. Enter 60 seconds as the Poll Interval. More info about such WMI queries here

  • Leave the default setings for the alert configuration for the moment.
 
  •  As a test let's stop the Automatic Updates service on the RMS. We can see alert is raised but nothing in this alert tells us which service has stopped.
 
  •  To see which parameters can be inserted in the alert we have to look inside the alert in the database. Open SQL Management Studio, open the OperationsManager database and open the dbo.Alert table. Find our alert using the TimeRaised column and copy the content of the Context field.
 
  •  Paste that content into a XML editor. By expanding the XML tree we can see that the caption of the stopped service is there inside tags EventData -> DataItem -> Collection Name="TargetInstance" -> Property Name="Caption".
 
  • To get this value inside our alert description, open the rule we created and open the alert properties. By doing some logic comparison with the built-in parameters available for alert description I established that the following text would return our service's caption: $Data/EventData/DataItem/Collection[@Name="TargetInstance"]/Property[@Name="Caption"]$.

  • A much efficient alert is now raised.
 

Solution: The ACS forwarder in Operations Manager 2007 may frequently log connection and disconnection events

Here's an issue I came across recently that I thought would be worth a mention here on our blog.  The issue is one where the ACS forwarder service shows frequent connections and disconnections on Windows XP POS computers but it could happen on any ACS Forwarder if the accounts being used or the permissions are configured incorrectly.  But first a little background:
The ACS Forwarder is a separate service (AdtAgent.exe) called the Operations Manager Audit Forwarding Service.  It is deployed automatically with the Operations Manager 2007 agent but must be explicitly enabled to initiate security log collection. The Operations Manager Audit Forwarding Service listens to the local Windows Event Log service and processes security events, in near real-time, then forwards the events to a central collector. During failover and connectivity outages the local Security log acts as the Forwarding Service queue.
After you install the ACS Collector and database you can then remotely enable this ACS Forwarding Service on agents through the Operations Manager 2007 console by running the Enable Audit Collection task.  By default the ACS Forwarding service runs using the Network Service Account, but since in Windows XP POS it does not have read permission on the security event log you may see frequent connection and disconnection events logged in the Operations Manager event log on the ACS Collector server:


Log Name: Operations Manager
Source: AdtServer
Date:
Event ID: 4628
Task Category: None
Level: Information
Keywords: Classic
User: NETWORK SERVICE
Computer:
Description: An Audit Forwarder connected.
Name: <>
Address: <>
Port: 266
DbId: 2
Value: 1
Log Name: Operations Manager
Source: AdtServer
Date:
Event ID: 4629
Task Category: None
Level: Warning
Keywords: Classic
User: NETWORK SERVICE
Computer: <>
Description: An Audit Forwarder disconnected.
Name: <>
DbId: 2
Value: 1
Reason: Forwarder initiated disconnect or connection broken.
Log Name: Operations Manager
Source: AdtServer
Date:
Event ID: 4628
Task Category: None
Level: Information
Keywords: Classic
User: NETWORK SERVICE
Computer: <>
Description: An Audit Forwarder connected.
Name: <>
Address: <>
Port: 2314
DbId: 2
Value: 1
For detailed troubleshooting you can enable debugging for the ACS Forwarder and gather more information. To enable verbose logging on a local Forwarder, create a new registry value and restart the AdtAgent service as shown below.  Just remember to turn off debugging when completed.
1. Browse to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AdtAgent\Parameters
2. Create DWORD value = TraceFlags and set it to a decimal value of 524420.
3. Restart AdtAgent service.
4. Review log, C:\Windows\Temp\AdtAgent.log.  You'll probably see something similar to this:
[20100420 143003,721][Error  ]EventLogReader::Open(0x1520): 0x00000522
[20100420 143003,721][Error  ]OpenReaders(): EventReader::Open(Security) returned 0x00000522.
[20100420 143003,721][Warning]AgentRun(): Transmit() returned 0x00000522
[20100420 143003,721][Info   ]AgentRun(): Disconnecting after Transmit() returned 0x00000522.

Error Number 0x00000522 resolves to "A required privilege is not held by the client"
5. Turn off debugging when done by removing above created DWORD and restarting the AdtAgent service.
Resolution
There are two resolutions to fix this issue.
1. Grant Read permission on the security event log for the Network Service account.  For information on this see the following Knowledge Base article:
KB323076 - How to set event log security locally or by using Group Policy in Windows Server 2003
or
2. Change the “Log on As” value for the ACS Forwarding service from “Network Service” to “Local System”.
Hope this helps,

OpsMgr 2007 : How to Generate alerts based on generic CSV log file

Here's a small document I put up together to outline how to generate alerts out of the CSV file. Apparently the steps are outlined below and for convenience I have also attached the PDF with screen shots attached.
Step 1: Go to  Authoring TAB , right click on "RULES" and Select "Create New Rules"
Step 2: Expand Alert Generating Rules expand Event Based
Step 3: Select Generic CSV Text Log (Alert)
Step 4: Enter the rule name, Description. Click select to pick a target class.
Step 5: For testing purpose choose "Windows Computer"
Step 6: Enter the Directory path where log resides for ex " c:\logs "
Step 7: In the pattern you could include log pattern for ex ( FileDDMMYYY.log )
             file*.log to represent all log files
Step 8: You can specify the separator in the CSV ex , ; /
Step 9: Considering you might have multiple values separated by comma(or any other separator)
the next steps is to specify a condition generally Params/Param[1] would indicate first column in CSV file of the active row.
for purpose of this example proved Params/Param[1] matchregularexpression test
Step 10: Provide matching  alert priority/severity
 There are some special variables that you can use to print in alert description
===========================================================
Log file Directory :               $Data/EventData/DataItem/LogFileDirectory$
Log file name:                       $Data/EventData/DataItem/LogFileName$
Column Data:                       $Data/EventData/DataItem/Params/Param[1]$
===========================================================
In case you decide to use a monitor
====================================================
Log file Directory :             $Data/Context/LogFileDirectory$
Log file name:                         $Data/Context/LogFileName$
Column Data:                          $Data/Context/Params/Param[1]$
Ex test,abcd,efgh is the line
$Data/Context/Params/Param[1]$ should contain test
$Data/Context/Params/Param[2]$ should contain abcd and so on
====================================================

How to create a computer group for VMware servers or Hyper-V servers

While tuning an Operations Manager environment that is heavily virtualized on VMware, we started receiving alerts that the “Total Percentage Interrupt Time is too high” on several of the virtualized servers. During discussions with the local virtualization SMEs, they indicated that this metric is handled differently in a virtual versus physical server configuration (see http://www.vmware.com/pdf/vmware_timekeeping.pdf and http://searchservervirtualization.techtarget.com/tip/0,289483,sid94_gci1373898,00.html) for details. To properly tune this environment we decided to create two groups:
  • One for VMware servers
  • One for Hyper-V servers
The intent was to then create overrides (or disable alerts) as we identified what needs to be handled differently on the virtual systems. 


Hyper-V Server Group

This process was very simple for the Hyper-V servers, as there is an existing attribute (Virtual Machine) that indicates if the server is virtual in Hyper-V. The steps to create this group from the Operations Manager console: Navigate to the Authoring space -> Groups, right-click and create a new group. We named ours Hyper-V virtual, and when we specified the dynamic members we defined them to where Windows Computer / Virtual Machine equals True.

Next, we checked the members of the group (right-click, view group members) to verify the group was working.

VMware Server Group

For VMware, we needed to create an attribute that would identify systems virtualized with VMware, and then use the attribute to populate the group membership. From the Operations Manager console -> Authoring space –> Management Pack Objects –> Attributes, we right-clicked and created a new attribute. We labeled the attribute as VMware tools, provided a description, defined it to a target of Windows Computer (which set it to Windows Computer_Extended) based upon a registry key, and stored it in a custom MP (not the default MP shown in the screenshot below).

Next we defined the registry setting as a key for the HKLM\System\CurrentControlSet\Services\VMTools if it exists, and to run daily (86400 seconds).

After adding this attribute, we used the Monitoring / Discovered Inventory view (which started as blank, but it soon added records as they were discovered; as shown in this next screenshot.

To complete the process we created a new computer group (Operations Manager console –> Authoring space -> Groups, right-click and create a new group) called VMware Virtual based upon the new attribute created in the Windows Computer_Extended class.
 
The end-result was two different computer groups that now represent the two different virtualization technologies available in this environment. These can be used as part of overrides to set different thresholds for alerts such as the “Total Percentage Interrupt Time is too high” alert discussed in the beginning of this posting.
 
Note: After this was posted, Pete Zerger pointed out the Virtual Machine Discovery MP for Operations Manager 2007 (http://www.systemcentercentral.com/PackCatalog/PackCatalogDetails/tabid/145/IndexID/12572/Default.aspx), which extends existing discovery of virtual machines to include VMware guests. This is a much cleaner solution than discussed above. Pete wrote this to make it easy to override hardware MPs so they leave VMs alone.