In-Depth
Avoiding Active Directory Disasters
Active Directory can crash your whole computing environment if it goes down, but following some tips and performing a few best practices will keep it running smoothly.
- By Greg Shields
- 02/01/2010
There's an evil lurking in your network.
It's an evil of always-on services relying on hundreds of settings you likely haven't looked at for nearly a decade. With its tendrils intertwined into every part of your business's network operations, the great Cthulhu himself is nowhere as gargantuan, nor as terrifying, as the evil that could be your Active Directory.
Got your attention? Good -- because while AD itself isn't evil, its outage could create a disaster of such massive proportions that your entire business would be at risk.
The problem with AD isn't that it's unstable. Nor is it poorly designed. And there's plenty of information and detailed instructions about its services freely available on the Internet. AD is an exceptionally well-built platform for hosting a business's computing infrastructure.
AD's central problem today has much to do with its longevity, and the fact that it rarely goes down. In fact, when it does go down, it takes every single part of your computing environment with it.
Worsening this problem is IT pros' lack of personal experience with their own AD infrastructures. Many corporate AD infrastructures were built before any of us joined our companies. Many remain atop Windows 2000 or Windows 2003 and include design elements that are out of date, no longer necessary or massively in need of modernization. While these configurations likely aren't causing you a problem today, they can be a nightmare should they fail down the road.
While AD itself has been updated with the release of Windows Server 2008 R2, it's unlikely that R2's bleeding-edge technology has made it to your data center. So, if your AD still runs atop an older OS, spread this article out in front of you, open a Terminal Services connection to your domain controller (DC) and follow along. Fixing these settings now might just save your bacon in the future.
Morphed Sysvol Folders
With the release of Windows Server 2008, Microsoft elevated its Sysvol replication technology off of the much-reviled File Replication Service (FRS). Replacing it is the new and more robust Distributed File System-Replication (DFS-R).
The older FRS split administrators into two camps: You either hated it, or you didn't know anything about it. FRS worked well in environments with only a few DCs, but quickly became problematic as the number of DCs went up.
For environments that still use FRS, Sysvol folder morphing occurs when a folder is updated at nearly the same time on two different DCs. Because replication occurs after the fact, the replication service doesn't know which change is authoritative and renames the conflicting folder to folderName_NTFRS_guidName.
Getting rid of morphed folders eliminates the name conflict and ensures that the correct configuration is properly replicated around your forest. The process can be done by moving morphed folders out of Sysvol and then returning them, or by renaming morphed folders and allowing them to replicate. Microsoft provides details on the procedure here.
Broken Group Policy Linkages
Did you know that your Group Policy is actually split into two different halves? One half, the Group Policy Container, lives in your AD and replicates through AD replication. The other half, the Group Policy Template, is stored in the Sysvol and is replicated through Sysvol replication (FRS or DFS-R).
While these two halves are stored separately, they must remain consistent to be useful. If one half becomes unlinked from the other, Group Policy may be unable to apply its settings. This most commonly happens when administrators attempt to manually adjust Group Policy Objects (GPOs) directly within AD or the Sysvol instead of using the Group Policy Management Console (GPMC).
Available in both the Windows 2000 Server and Windows Server 2003 resource kits is a tool called GPOTOOL.EXE. This tool, which still works today atop Windows 7 and against a Windows Server 2008 R2 AD, verifies the consistency of both halves of Group Policy. Running this tool from your desktop quickly reports your list of GPOs, hopefully returning a "Policy OK" message. If your tool reports anything different, start troubleshooting or consider recreating the identified policy.
DNS Aging and Scavenging
Remember the Domain Name System (DNS) versus Windows Internet Name Service (WINS) wars of not too many years ago? At that time, DNS was a newfangled technology not entirely trusted by IT. Today, most administrators see DNS as a set-it-and-forget-it service. Today's properly configured dynamic DNS seamlessly handles name registration and resolution for even the most complex of networks with little extra effort.
DNS's problem is similar to the others in this article: Most of us forgot to keep a regular eye on this service once it no longer needed our regular attention. Some configurations have been forgotten as we've moved on to bigger and more interesting projects.
One setting in particular that plagues more domains than you'd think relates to the aging and scavenging of DNS records. The problem lies in the two separate settings required to turn on this activity, one of which often gets missed.
A dynamic DNS service needs some mechanism to remove records when they're no longer relevant. Microsoft's mechanism to do this is through the scavenging process. Each dynamic record is automatically given an "age" through its Record Time Stamp. You can view the time stamps of your existing dynamic records by turning on Advanced mode in DNS Manager and viewing the properties of a record.
DNS clients automatically refresh their time stamp whenever they start up, at their Dynamic Host Configuration Protocol (DHCP) lease renewal, or every 24 hours. If the record isn't updated within a configured number of days, the DNS server will scavenge, or remove, the record from the database.
Most environments' problems here lie in the fact that scavenging is configured in two places. First, using DNS Manager, scavenging must be enabled in the Advanced tab associated with the DNS server itself. Once configured in this location, scavenging must be configured to run by clicking the Aging button in the General tab of each DNS zone.
Enabled Unused Network Cards
A common configuration item in times gone past -- but one that isn't as remembered today -- is disabling unused network cards. These unused cards on member servers can cause problems for connectivity, but tend not to have a big impact on operations. Those same unused network cards on DCs, however, can cause intermittent connection failures.
The problem is the propensity for network cards to register themselves with DNS. An unused network card can often gain an inappropriate DHCP address or an Automatic Private IP Address, and subsequently register that address in DNS. Because clients locate a DC's services by their DNS SRV record, those records can send clients down a route that's inaccessible.
With today's computers using greater numbers of network cards than in years past, this configuration item actually grows more problematic as it gets more forgotten. Check yours today.
Orphaned Domains, DCs, Sites and Connection Objects
While "the forest" always endures as a whole entity, internally your Windows forest can be quite malleable over time. New domains, DCs, sites and site links find themselves being created and destroyed for testing, demonstration, special projects or other reasons. As result, all but the best-governed Windows forests have orphaned objects lingering around.
Domains and DCs are the most obvious of these, as well-meaning IT pros connect test or evaluation domains to the production domain. While leaving these domains in place doesn't often cause painful outages, their presence as orphans can clog up the Event Log with unnecessary errors and complicate domain decommissioning. If you have orphaned domains and DCs in your forest, consider removing them with a combination of NTDSUTIL, ADSIEDIT (needed for very old forests) and the DNS Manager console.
The more insidious of these orphaned objects are extra sites and site links. As businesses grow, new sites are commonly also added into Active Directory Sites and Services. The problem with these manual additions is that they're not always removed when extra sites are decommissioned.
Manually created site links can also be a problem. These site links were typically created for one of two reasons: First, well-meaning IT pros may have manually created them long ago, later forgetting that they were no longer relevant as the site structure evolved. These links today should be reevaluated and possibly removed.
The second reason was due to a long-ago limitation with AD's Knowledge Consistency Checker (KCC), a service that automatically creates and manages connection objects. Previously, KCC experienced problems as the number of connection objects grew past a certain point, requiring larger AD structures to manually create and manage site link structure. While that ancient problem has been resolved, many domains still run today with manually created connection objects. Those manually created connection objects can complicate the KCC's automated administration, and can be the source of replication failures or other difficult-to-troubleshoot errors. Many of those manual connections are artifacts from the time when recently christened MCSEs felt the need to use their new skills to "customize" their domains. If any of these situations relates to your domain, consider taking another look at your site settings to verify that they're up-to-date with your current corporate structure.
DSRM Passwords Unknown
Your DC's Directory Services Restore Mode (DSRM) passwords are probably the "great forgotten passwords" in your domain. These passwords are set when you create a new DC, but they're rarely revisited after that initial event.
DSRM passwords are required when rebooting a DC into DSRM, which is a mode that's only required when you've already lost data. Complicating their administration, they've also historically been set on a per-DC basis, so different DCs can be configured with different passwords. Not having correct DSRM password information turns a small problem into a big one at the very moment they're needed the most.
You can reset your DSRM passwords the traditional way using NTDSUTIL, using the "set DSRM password" command. This process needs to be completed on each and every DC in your forest. Alternatively, with the release of Knowledge Base article 961320 atop Windows Server 2008, Microsoft has added a mechanism for synchronizing DSRM passwords with a domain account. This process ensures that all DSRM passwords are equivalent across the board, making your administration significantly easier. Learn more information about this process here.
Painfully Restorable GPOs
Your GPOs are yet another lurking evil. These objects ensure a consistent configuration of servers and workstations. Yet for many, they do so with an unrecognized dark side: They themselves are not backed up in a way that makes them easily restorable.
The problem is that most environments back up their domain's System State data with the expectation that this data can easily restore a lost GPO. This couldn't be further from the truth. Returning a lost GPO using a System State restore is often more difficult than simply remembering the GPO's settings and creating it again from scratch.
Thankfully, two options are available that significantly improve the backup and restore process. The first are two scripts that are part of the GPMC Scripts (downloadable from Microsoft's Web site). Backupallgpos.wsf and restoreGPO.wsf are two scripts in this download that back up and restore one or many GPOs through the command line. Scheduling a backup of your GPOs using this script-based backup solution will greatly speed your restore time should someone accidentally delete a GPO.
Windows PowerShell users who run atop Windows Server 2008 R2 AD infrastructures can also enjoy the Backup-GPO and Restore-GPO cmdlets. Similar in result to the GPMC Scripts, these PowerShell cmdlets enjoy the added flexibility obtained through the rest of the PowerShell shell. For information about these cmdlets, as well as each of the other new Group Policy cmdlets available for R2, click here.
DCs as Virtual Machines
Finally, there's an insidious malevolence that cloaks itself within a helpful new capability.
IT environments today are all about virtualization. Every IT environment is either virtualized, partially virtualized or thinking very, very hard about starting their project soon. However, there are a number of problems associated with virtualizing DCs -- and many of these problems are not understood by IT pros. For a run-down of the list of concerns, see Microsoft's excellent article here.
Two of these concerns must be highlighted because they can harm your entire forest with only a few mouse clicks. First, virtualized DCs should never be paused. Pausing a DC is an action that's quite different than simply powering it off. By pausing the DC for extensive periods of time, its database can become inconsistent through the creation of lingering objects. Those objects will eventually cause replication to fail.
The second problem with virtualized DCs stems from one of the benefits of virtualization itself: Single-file restore of failed servers. With a virtualized server, it's easy to simply restore that server's .VHD or .VMDK file to return the server back to a point in time. The issue with DCs has to do with AD's method of replication. AD doesn't support any method of restoring a "snapshot" of a server. Doing so can cause a situation called Update Sequence Number rollback, where replication partners of the incorrectly restored DC can have inconsistent database objects. Particularly problematic is that there's no mechanism to make these objects consistent. If you value the consistency of your AD database, the only mechanism for restoring a DC is through Microsoft's traditional restoration processes.
Upgrading to Windows Server 2008 R2
Keeping up-to-date with the latest and greatest in DC technology is one of the best ways to ensure a consistent AD. Windows Server 2008 R2 adds a new Domain and Forest Functional Level; vast amounts of command-line power with its built-in PowerShell cmdlets; the new Active Directory Web Service and Active Directory Administrative Center; a useful Health Model for seeing when components experience issues; a Best Practices Analyzer for ensuring your architecture is smartly designed; managed service accounts; and the highly compelling AD Recycle Bin for quickly restoring deleted AD objects.
All of these features come with upgrading your forest, but most don't truly kick in until you accomplish that task in whole. For many, that process will take a while. Until then, take another look at your AD configuration. While many of the items in this article might feel like ancient problems, you'd be surprised at how they're still found in many AD structures. In fact, the misconfigurations are so common that Microsoft has been known to offer an Active Directory Health Check service as one of their consulting services.
So, while AD itself isn't evil, there can be plenty of problems buried within its hundreds of different settings. If you haven't sat down with your AD in a while, consider spending a day verifying your configuration. You might find more evil there than you thought.