Jan 26, 2010

SC Operations Manager 2007 SP1 - Configuration Steps: SCOM2k7 Basic Service Monitor with a Diagnostic and a Recovery for a Stopped Service



We can use SCOM2k7 to solve many of the operational challenges faced in maintaining healthy Windows Servers.  In this scenario I will generate a Basic Service Monitor.  The intent of this Basic Service Monitor is to ensure SCOM2k7 offers a 'Critical Alert' when the specific Service of focus is stopped (in this case I am using the Print Spooler Service - but this could be a core Line of Business Application Service as another example).  In addition, I then author a Diagnostic.  A Diagnostic in SCOM2k7 is programmatic logic that determines what else was occuring when the Service stopped (or runs a custom diagnostic application as another example).  Finally, I generate a Recovery.  A Recovery in SCOM2k7 is programmatic logic that restarts the Service (in this example - again it could do anything programmatically necessary).
Here are the general steps:


  1. Focus the Monitor Console on the 'Windows Computer' object.
  2. Generate a Basic Service Monitor against the Print Spooler Service as a Critical Alert when the Service Stops for all Windows Computers.
  3. Generate a Diagnostic that automatically lists all Applications and Process IDs during the time the Service Stops.
  4. Generate a Recovery that automatically starts the Print Spooler Service and clears the Critical Alert.
 
 1.  Here is the Operations Manager 2007 Main Console focused on the 'Computers' view.  This view shows us the general health, by view of the computers monitored by SCOM2k7.


2.  Upon 'right mouse clicking' on a single computer object our focus is directed to the 'Health Explorer' view.  This view provides us detailed insight into the numerous 'Monitors' within SCOM2k7.


 
3.  The Health Explorer provides a good example of the Core Services monitored by the Windows Server Management Pack.  We will add our Basic Service Monitor to appear under the 'Availabillity' section of the Health Rollup Monitor.



4.  Here I move into the 'Authoring' space from the Main Console.



5.  In the Authoring Space I then 're-Scope' to focus attention on the 'Windows Computer' object.  This provides the ability for all Windows Computers to receive this Monitor if chosen.  If only a select 'Group' of Windows Computers should receive this Monitor we would then generate a unique Group and an Override to focus receipt of the Monitor.


 
6.  I move to the 'Application' Health Rollup, then focus on creating a new 'Unit Monitor'.  In this case our Unit Monitor will be a Basic Service Monitor.



7.  On the 'Create a Unit Monitor' Wizard I select 'Windows Services', then 'Basic Service Monitor'.  Notice also I have shifted the focus to a custom Management Pack (instead of the Default Management Pack).



8.  Confirmation of the unique changes for this Basic Service Monitor.


 
9.  I prefer to 'disable' a Monitor upon initial creation.  Then, upon completion, i review the Monitor to ensure setting correctness.



10.  I title this Basic Service Monitor 'ITPS Lab - Print Spooler Service Monitor' and disable the Monitor initially.



11.  Again, Basic Service Monitors could be focused at any Service running on a Windows Server.  Here the example includes the Print Spooler Service (titled by its short name as 'Spooler').

 

12.  The default values for a Basic Service Monitor for the Health State are 'Healthy (Green Check Box)' or 'Service Not Running (Red X Box)'.



13.  Next in the Wizard I configue the Alert.  This entails defining whether or not an Alert should be raised, and the associated verbiage for the Alert Description.  Notice the rich number of variables available fromt he SCOM2k7 attributes for the detail.  I choose the Server DNS Name (FQDN).



14.  Upon completion of the Monitor I move back into the Monitoring space to review the Basic Service Monitor properties.  Notice the positioning of this Monitor under the 'Availability' rollup for all Windows Computers.



15.  I 'walk through' the properties of this Basic Service Monitor to double check settings prior to enabling.  I prefer having the initial Monitor disabled as it also provides the ability to generate the appropriate Overrides to focus the Monitor at the proper Server Group (s).


 
16.  Focused at the Print Spooler Service.



17.  In a 'Critical State' if the Print Spooler Service is not running.



18.  A detailed Alert Description provides useful insight when attempting to remedy an Alert.



19.  This Alert is set to 'Automatically resolve the alert when the Monitor returns to a Health State'.  We can use SCOM2k7 Reporting to view the frequency of this Alert as it will not show in the Alert Pane upon remedy.



20.  The 'Diagnostic and Recovery' tab is currently empty.  I will return to this Tab to configure both a Diagnostic and Recovery.



21.  I could input relevent Product Knowledge here.  Useful for Help Desk Staff and other System Engineers assigned to assist with a remedy.



22.  No Overrides configured so far.  We could focus this Basic Service Monitor at a Server or a Group of Servers using this Override Screen.



23.  Now I move back to the 'Diagnostic and Recovery' tab to input a unique Diagnostic.  Remember, a Diagnostic runs when an Alert is triggered.  In this example, when the Print Spooler Service stops this Diagnostic (and the Recovery I will configure) execute.



24.  I have choses to 'Run a Command' for this Diagnostic.


 
25.   This Diagnostic will run automatically using a basic call to the 'tasklist' executable.  The output will be a list of Tasks running at the time the Alert is triggered.



26.  The 'tasklist' executable with some input variables to spice up the output formatting.



27.  Now I move to configure the 'Recovery'.  This is actually what makes the Service restart.


 
28.  Again, like the Diagnostic, I choose to execute a basic command.  Any form of Script would be appropriate here as well.



29.  The selection for this Recovery is to 'Run Recovery Automatically' and 'Recalculate Monitor State after Recovery Finishes'.  I am inputting (but do not show the screen) a command of '%windir%\system32\net.exe start spooler' to provide this Recovery.



30.  Now a validation both the Diagnostic and Recovery are in place.  Since they are I am ready to test.



31.  I 'enable' the Basic Service Monitor in preparation for testing.



32.  I validate the 'Alert View' is clear of all Alerts.



33.  I then move to the Services MMC and Stop the Print Spooler Service.



35.  Within under 1 Minute my Basic Service Monitor Alert is triggered.  In the 'background' my Diagnostic will automatically trigger as well as my Recovery to restart the Print Spooler Service.



36.  Notice the Alert Description includes the full DNS Server Name for this Server as configured in the Alert Description properties.



37.  In no time the Print Spooler Service is restarted and the Basic Service Monitor Alert is cleared (as configured).  I could view the number, frequency and Server detail in the Operations Manager Reports for Alerts or Events.



38.  Here I move to the Computers View and select the Server used for testing.  When I open Health Explorer I will be able to see the 'history' of the Monitor and associated Health Status.



39.  Notice several important details on the Health Explorer screen.  First we can see the Monitor by name under the Availibility Rollup for this Server (the Monitor is titled 'ITPS Lab - Print Spooler Service Monitor').  Second, we can view the State Change Events for this Monitor and examine the time/date when the Monitor was last triggered in the upper right window pane.  Finally, we can see the Diagnostic (titled 'ITPS Lab - Print Spooler Service - Lists Applications Running When Print Spooler Stopped') and the associated output from the Diagnostic to assist with persistent problems or generate forensic detail for why the Service stopped.

No comments:

Post a Comment