SCCM SCOM WSUS: 2011-10-16

Randomly, you might see a single MonitoringHost.exe process on an agent, consuming 100% CPU. (Or 50%, or 25% depending on how many cores you have). This process will stay at this level, and will not recover. If you restart the OpsMgr HealthService, the problem goes away, and might not return for days or even weeks.

This particular symptom, might be due to an XML spinlock issue… this is a core Windows OS issue, and there is a hotfix available, which I have on my HOTFIX LINK

The KB is 968967 :
“The CPU usage of an application or a service that uses MSXML 6.0 to handle XML requests reaches 100% in Windows Server 2008, Windows Vista, Windows XP Service Pack 3, or other systems that have MSXML 6.0 installed”
I have seen that most customers are affected by this issue from time to time. I have seen it very commonly in my lab, on Server 2008 Domain controllers, and my Server 2008 Hyper-V hosts…

A note on patching Server 2008:

When you go to download this hotfix for a server 2008 machine – it is very misleading on which hotfix to even get. Here is the list of all available fixes:

For patching Server 2008 – you need to download the “Windows Vista” hotfix – in either x86 or x64, depending on your OS version:

Monitoring for this condition:
You can easily write a threshold monitor targeting agent or HealthService, to track the monitoringhost process \ %processor time threshold, and set it to alert when it has multiple consecutive samples above a defined threshold.

Here is an example of creating this monitor:
Authoring Pane > Monitors > New Unit Monitor > Windows Performance Counters > Static Thresholds > Single Threshold > Consecutive Samples over Threshold.

Give it a custom name that follows your documented custom Monitor naming standard, target “Health Service”, and put this under Performance rollup.

Hit the “Select” button (in SP1 – select “Browse”) In the perf counter picker – choose a server with an installed agent, choose the Object “Process” the counter “%Processor Time” and the Instance “MonitoringHost”, and click OK.

Since there are multiple MonitoringHost processes… we will add a Wildcard to the Instance name in the monitor…. this will monitor ANY MonitoringHost process for high CPU. Set the Interval to every 1 minute.

For the number of consecutive samples, and threshold… that is up to you. For me – I will say that if I detect a single MonitoringHost process using more than 50% CPU, over all 5 consecutive samples (5 minutes) then I consider that bad:

At this point…. you can simply alert on the condition, or event try and add a recovery script – that will bounce the health service. Generally, bouncing the HealthService when one of the processes is using all the CPU is not always 100% reliable… especially from a “NET STOP & NET START” type command. I have found it more reliable to just kill the MonitoringHost process in this condition, and allow it to respawn…. but your mileage may vary.
http://blogs.technet.com/kevinholman/archive/2008/03/26/using-a-recovery-in-opsmgr-basic.aspx

In general - you should evaluate all hotfixes available, and only apply those applicable to your environment. However, some of these below I have seen impact almost every environment, and should be heavily considered.
This list is nothing official.... this is just a general list of the recommended hotfixes I end up proactively applying to most environments.... it is not a complete list of ALL hotfixes, and you may be affected by other issues.

Before we get to the lists – some general guidance on hotfixes to make you more successful:

ALWAYS - on Server 2008 OS, run the hotfix MSI from an elevated command prompt window. This will launch the install of the hotfix, and then launch the boot-strapper window in an elevated process – which is required. Do this regardless of the UAC configuration of the 2008 OS.
ALWAYS - make sure you read the instructions to understand if the hotfix is a SQL update, installed to the RMS, MS, and/or Gateway, AND/OR applies to agents as well.
ALWAYS - make sure you double-check the DLL version of the updated files to make sure the hotfix successfully applied after installing.
ALWAYS - make sure you double-check the \AgentManagement directory of the management servers and gateways, to make sure if there is an agent update, the x86 and x64 MSP was copied over correctly.
ALWAYS – when installing a hotfix/cumulative update on an OpsMgr server role, run the downloaded MSI, such as “SystemCenterOperationsManager2007-SP1-KB954049-X86-X64-ENU.MSI” – and install the “System Center 2007 Hotfix Utility” to the DEFAULT location – and then kick off the update FROM THE UI that comes up by clicking “Run Software Update”. This is critical and not following this process is the cause for many failures to apply the hotfix DLL’s, or failure to copy the agent MSP update files to the \Agentmanagement directory. NEVER run the MSP files manually on a SCOM server role… because the additional steps run by the boot-strapper will not execute if you do that. The only exception to this – is running from the command line. See: http://blogs.technet.com/b/kevinholman/archive/2010/10/12/command-line-and-software-distribution-patching-scenarios-for-applying-an-opsmgr-cumulative-update.aspx
ALWAYS check the language version of the hotfix, and make sure it is the same language version as your SCOM base install. For instance – if you have a English base SCOM install – do not download a localized German version of a hotfix and apply it – or it can break the English SCOM base install.
ALWAYS log on to your OpsMgr role servers using a domain user account that meets the following requirements:

SCOM administrator role
Member of the Local Administrators group on all SCOM role servers (RMS, MS, GW, Reporting)
SA privileges on the SQL server instances hosting the Operations DB and the Warehouse DB.

These rights (especially the user account having SA priv on the DB instances) are often overlooked. These are the same rights required to install SCOM, and must be granted to apply major hotfixes and upgrades (like RTM>SP1, SP1>R2, etc…) Most of the time the issue I run into is that the SCOM admin logs on with his account which is a SCOM Administrator role on the SCOM servers, but his DBA’s do not allow him to have SA priv over the DB instances. This must be granted temporarily to his user account while performing the updates, then can be removed, just like for the initial installation of SCOM as documented HERE. At NO time do your service accounts for MSAA or SDK need SA priv to the DB instances…. unless you decide to log in as those accounts to perform an update (which I do not recommend).

Common OpsMgr 2007 Post-R2 hotfixes:
This list ABSOLUTELY assumes you are at OpsMgr R2-RTM level as a base (6.1.7221.0).

Hotfix	Update Files	Resolves	Applies to:	Comments
MP Update	Microsoft.SystemCenter.2007.mp 6.1.7695.0 Microsoft.SystemCenter.OperationsManager.2007.mp 6.1.7695.0 Microsoft.SystemCenter.OperationsManager.AM.DR.2007.mp 6.1.7695.0 Microsoft.SystemCenter.OperationsManager.Reports.2007.mp 6.1.7695.0 ODR.mp 6.1.7695.0	New reports, knowledge, monitors, rules. See MP Guide.	MP import only	I recommend this update for ALL OpsMgr R2 environments.
2495674 R2 CU5	OpsMgr 2007 R2 CU5 Cumulative Update http://www.microsoft.com/download/en/details.aspx?id=26938 Multiple. See KB Article. Note this is a DLL update, MP updates, and SQL scripts update.	Many updates. See KB article for all Cumulative updates at LINK	RMS MS GW Agents AuditCollector Console WebConsole MP Import TSQL Script	This hotfix includes a SQL script, which you execute on the database in a query window.
971233	none	The console shows customized subscriptions SMTP{`GUID`} after you upgrade to OpsMgr R2 from OpsMgr SP1	Operations Database (TSQL only)	I recommend this hotfix only if you are impacted with this issue.

Common OpsMgr 2007 Post-SP1 hotfixes:
This list ABSOLUTELY assumes you are at OpsMgr SP1 level as a base (6.0.6278.0). These DO NOT APPLY these to OpsMgr R2.

Hotfix	Update Files	Resolves	Applies to:	Comments
MP Update	Microsoft.SystemCenter.2007.mp 6.0.6709.0 Microsoft.SystemCenter. OperationsManager.2007.mp 6.0.6709.0 Microsoft.SystemCenter. OperationsManager.AM.DR.2007.mp 6.0.6709.0	Agent restarts, many other critical enhancements	Management Pack Import only (Import via console once extracted)	I recommend this update for ALL OpsMgr SP1 environments.
2028594	SP1 Cumulative Update 1. Multiple files. See KB article	Many. See KB article	RMS MS GW Agents Consoles MP Import SQL Scripts (OpsDB and DW)	I recommend this update for ALL OpsMgr SP1 environments. ***Note: This update REQUIRES 971541 as a prerequisite
971541	SP1 Rollup hotfix. Multiple files. See KB article	Many. See KB article	RMS MS GW Reporting Agents Console MP Import	I recommend this update for ALL OpsMgr SP1 environments.
972881	Managedentitychange.sp.sql	The changes to the display name of a managed entity are not synchronized in the Operations Manager Data Warehouse database	Data Warehouse Database (T-SQL only)	I recommend this update for ALL OpsMgr SP1 environments. This hotfix includes a SQL script, which you execute on the database in a query window.
954643	Managementpackinstall.sp.sql	Event ID 31569 is logged after you install a management pack that includes reports on a System Center Operations Manager 2007 SP1 server	Data Warehouse Database (T-SQL only)	I recommend this hotfix only if you are impacted with these events. This hotfix includes a SQL script, which you execute on the database in a query window.
974254	Autotablecreation.sql Viewcreatesprocs.sql	1. Unable to create large number of groups. 2. Import fails when importing an MP or when creating a MP from a template	Operations Database (TSQL only)	I recommend this hotfix only if you are impacted with this issue. This hotfix includes a SQL script, which you execute on the database in a query window.

Common related Windows Operating System Hotfixes:
This list is not sorted by OS or anything special – just a collection of OS related hotfixes that SCOM might require, or might fix an issue with the OS that impacts OpsMgr. These can apply to SP1 or R2 environments.

Hotfix	Resolves	Applies to:	Comments
2470949	The RegQueryValueEx function returns a very large incorrect value for the "Avg. Disk sec/Transfer" performance counter in Windows Server 2008 R2 or in Windows 7	Any OpsMgr Agent Managed or Server role running on Windows 2008 R2 or Windows 2008 R2 SP1, or Win7	I recommend this hotfix to be applied to any Server 2008R2 or Win7 machine, if it is agent managed or holds a SCOM server role.
2495300	Invalid "Avg. Disk sec/Transfer" value returned by the RegQueryValueEx function in Windows Server 2008 or in Windows Vista	Any OpsMgr Agent Managed or Server role running on Windows 2008 or Vista	I recommend this hotfix to be applied to any Server 2008 or Vista machine, if it is agent managed or holds a SCOM server role.
981314	The "Win32_Service" WMI class leaks memory in Windows Server 2008 R2 and in Windows 7	RMS MS GW Agent (only if running on Windows 2008 R2 or Win7)	I recommend this hotfix to be applied to any Server 2008R2 or Win7 machine, if it is agent managed or holds a SCOM server role. This hotfix is already included in Server 2008 R2 Service Pack 1
981263	Management servers or assigned agents unexpectedly appear as unavailable in the Operations Manager console in Windows Server 2003 or Windows Server 2008 (ESE jet database corruption)	RMS MS GW Agent	I recommend this hotfix for all RMS, MS, and GW roles running Windows Server 2003 SP2, or Windows Server 2008 SP2. Apply to agent machines if you feel you are impacted by this issue.
933061	WMI Stability in Server 2003	Agent (2003 OS only)	I recommend this hotfix for all agent managed computers running Windows Server 2003, SP1 or SP2, x86 or x64
955360	Cscript 5.7 update for Server 2003	Agent (2003 OS only)	I recommend this hotfix for all agent managed computers running Windows Server 2003, SP1 or SP2, x86 or x64
968760	High handle count on the RMS A managed application has a high number of thread handles and of event handles in the Microsoft .NET Framework 2.0	RMS	I recommend this hotfix is you are experiencing high handle count on the RMS. This hotfix requires SP2 for the OS and .NET 2.0 SP2.
968967	The CPU usage of an application or a service that uses MSXML 6.0 to handle XML requests reaches 100% in Windows Server 2008, Windows Vista, Windows XP Service Pack 3, or other systems that have MSXML 6.0 installed (Spinlock)	RMS MS GW Agent	I recommend this hotfix if you are impacted with this issue, which is very common. You might find a MonitoringHost.exe process randomly *stuck* at 100% CPU. If so – this hotfix might be applicable.
951327	The System Center Operations Manager 2007 console may crash in Windows Server 2008 or in Windows Vista when you open the Health Explorer window	Any Vista or Server 2008 computer with a SCOM console installed	I recommend this hotfix only if you run the console on Server 2008 or Vista. This hotfix is already included in Server 2008 SP2.
952664	The Event Log service may stop responding because of a deadlock on a Windows Server 2008-based or Windows Vista-based computer	RMS MS GW Agent	I recommend this hotfix only if you host an OpsMgr server or agent role on Vista or Server 2008. *This hotfix is already included in Server 2008 SP2.*
953290	An application may crash when it uses legacy methods to query performance counter values in Windows Vista or in Windows Server 2008	RMS MS GW Agent	I recommend this hotfix only if you host an OpsMgr server or agent role on Vista or Server 2008. *This hotfix is already included in Server 2008 SP2.*
958661	FIX: Small memory leaks may occur when you use RSCA to query runtime statistics in IIS 7.0	Any OpsMgr Agent/Server role with IIS 7.0 installed	I recommend this hotfix in all cases where you are monitoring servers with IIS 7.0 installed, and use the IIS Management pack. *This hotfix is already included in Server 2008 SP2.*
958807	Windows Server 2008 Failover Clustering WMI provider does not correctly handle invalid characters in the private property names causing WMI queries to fail	Any Server 2008 agent managed cluster node	I recommend this hotfix only if you are impacted with this issue, and use the current Cluster MP. *This hotfix is already included in Server 2008 SP2.*

Make sure you see these additional posts on the subject of hotfixes:
http://blogs.technet.com/kevinholman/archive/2008/06/25/a-little-tidbit-on-hot-fixes-for-opsmgr.aspx
http://blogs.technet.com/kevinholman/archive/2008/06/24/how-do-i-know-which-hotfixes-have-been-applied-to-which-agents.aspx
http://blogs.technet.com/kevinholman/archive/2008/06/27/a-report-to-show-all-agents-missing-a-specific-hotfix.aspx
http://blogs.technet.com/kevinholman/archive/2009/02/25/applying-an-opsmgr-hotfix-to-a-rms-cluster-node-some-things-to-be-aware-of.aspx

Oct 17, 2011

Do you randomly see a MonitoringHost.exe process consuming lots of CPU?

Which hotfixes should I apply?

Total Pageviews