Jul 1, 2010

SCCM Create a collection of computers that need a restart after patches

Sometimes after you patch the machine should restart during it's maintanence window.  There are always times with the client it told to restart but it never does.
This will create a collection of such computers.  From here I normally set an advertisment to run the "shutdown -r -f -c Patch restart -t 300"
This way the computer restarts during the normal MW.








select SMS_R_SYSTEM.ResourceID,SMS_R_SYSTEM.ResourceType,SMS_R_SYSTEM.Name,
SMS_R_SYSTEM.SMSUniqueIdentifier,SMS_R_SYSTEM.ResourceDomainORWorkgroup,
SMS_R_SYSTEM.Client
from SMS_R_System
inner join SMS_G_System_PatchStatusEx on SMS_G_System_PatchStatusEx.ResourceID = SMS_R_System.ResourceId where SMS_G_System_PatchStatusEx.LastStateName = "reboot pending"

Jun 29, 2010

Video TechDays 2010: WMI for the SCCM Admin




Since its first debut System Center Configuration manager and its predecessors have been relying heavily on the Windows Management Instrumentation (WMI) architecture. WMI is omni-present is System Center Configuration Manager, from queries over dynamic collections, through hardware inventory and storing client and Management Point settings and policies, under the hood you will find WMI just about anywhere. Given this omni-presence it should come as no surprise that the stability of WMI at your Site Systems and clients is crucial to a stable System Center Configuration Manager implementation. Knowing WMI, by consequence, is a great asset to any System Center Configuration Administrator. In this session you will learn the ins-and-outs of the WMI architecture in general and how it applies to System Center Configuration Manager. You’ll learn about the available namespaces and classes and the extended WMI Query language (WQL) that is specific to System Center Configuration Manager. This session will cover the tools available to have a peak at WMI yourself as well as to the WMI-related tool called policy spy that comes with the System Center Configuration Manager toolkit. By the end of this session you’ll know what the WMI architecture looks like, how System Center Configuration Manager uses it, and how you can use that knowledge to your advantage, be it to be able to better troubleshoot System Center Configuration Manager issues, better understand the product, or to automate tasks through scripting or programming. In the end this session will make you a better System Center Configuration Manager administrator.

OpsMgr - Cross Platform Discovery Errors

The key to being able to monitor a server is being able to discover that server :), until you can get the server into Operations Manager you aren't going to be able to do much with it.  While the discovery process for Unix and Linux servers seems simple enough, there is a lot going on behind the scenes that is hidden by the wizard.  In a previous entry I went over a successful discovery path (OpsMg and Cross Plat-Getting Started), for this post I'm going to go over some of the errors that can occur and how to resolve them.
The first one I'll talk about is Not Enough Entropy, this one required a little digging to figure out what was wrong.  The exact error is Failed to allocate resource of type random data: Failed to get random data - not enough entropy.


Entropy
I've had this issue when discovering both RHEL and SLES servers and it is related to certificate generation. 
There are two ways to solve this problem, you can recreate the /dev/random file or do a manual agent install.
For both fixes, clean off the partially installed agent using the commands





  1. rpm -e scx
  2. rm -rf /etc/opt/microsoft/scx
Then if you want to make it so that discovery will work from the wizard use the commands
  1. rm /dev/random
  2. mknod -m 644 /dev/random c 1 9
  3. chown root:root /dev/random
A manual install requires copying the appropriate package from %Program Files%\System Center Operations Manager 2007\AgentManagement\UnixAgents to the Unix\Linux machine and installing it directly.
After fixing the install issue, switch the /dev/random file back to a signed random file using the commands:
  1. rm /dev/random
  2. mknod -m 644 /dev/random c 1 8
  3. chown root:root /dev/random
Next, let's look at Unspecified Problem, this is one where I am sure there is a whole gamut of reasons why it occurs.  The text is Starting Microsoft SCX CIM Server:  Unspecified Problem. 
Unspecified 
The key here is that we can see that the certificate was generated by the statement "Generating certificate with hostname..." so we know we need to look at things after the certificate creation.  The only reason I have found for this error is the firewall, after installation and certificate generation there is a validation step.  If you watch the steps through the wizard, the error pops up almost immediately so the wizard is unable to verify the agent suggesting a communication issue.  Ensure that port 1270 has been opened on the firewall and try to discover again.
Some of the other errors I've run into over time are:
Access is Denied, this one pops up from time to time when an agent installation failed for some reason, you fixed the underlying reason and tried again. The problem is the partially installed agent is blocking the re-install, the fix is to clean off the agent and do a fresh install the same way we  did for Not Enough Entropy.
Cannot connect to port 1270, this one typically occurs when there is a library path issue on the monitored server.  If you go to the server, you'll likely see that the service failed to start. Trying to restart the service will give you the name of the library that cannot be found.  
The typical resolution path for linux is:
  1. scxadmin -restart all
  2. See what library is missing 
  3. find / -name   
  4. vi /etc/ld.so.conf 
  5. add path to missing library  
  6. ldconfig to reload dynamic loader  
  7. scxadmin -restart all   

The path for Solaris is the same for steps 1 - 3 but differs when it comes to setting the library path:
  1. crle to see the current path
  2. crle -l to update the path (include the old path plus the new path because the command is a replacement, not an append) 
  3. scxadmin -restart all  

Can not resign certificate, /etc/opt/microsoft/ssl/scx-host-.pem already exists,in this situation the re-creation of a certificate was attempted but failed because there was a previously generated certificate on the target host.  If you want to generate a new certificate, simply delete the contents of the /etc/opt/microsoft/ssl directory.  Alternatively you can export the certificate and trust it on the management server.

winrm failed to connect in a timely manner, this can happen if the target server is over loaded. OpenPegasus will time out after 20 seconds or so and this can result in a failure to validate the agent was properly installed.  The fix here is to ensure the agent was in fact installed using scxcimcli ei -n root/scx CIM_ManageElement on the target server and then retrying the discovery.
 
There are  many other things that couild go wrong during discovery but in most cases the error message you receive should help you determine how to fix the problem. One thing to watch is at what phase the error occurred: Initial discovery (name resolution issues), Installation (user account issues), Signing (certificate issues), Validation (configuration issues), knowing where to start looking is half the battle to getting our servers successfully discovered.