Knowing how AD replication works in Windows 2000 can help you tune it for optimal system performance.
Active Directory Answers: Active Directory Updating
Knowing how AD replication works in Windows 2000 can help you tune it for optimal system performance.
- By Curt Simmons
- 02/01/2001
Consider this scenario: You’re running Windows 2000. You sit down at
a domain controller within an Active Directory site and create a new user
account. In another building, the new user logs on to a computer and is
immediately authenticated by a different domain controller within the
AD site. How did it happen? The answer is AD replication.
In order for all domain controllers in an Active Directory site to continually
have the same database information, there must be replication among them.
Each domain controller in the AD network maintains its own AD database.
Without replication, each domain controller’s database copy would quickly
become a hopeless collection of inaccurate data.
AD has two types of replication: intra-site and inter-site. Intra-site
replication occurs within a site while inter-site replication occurs between
sites. The two types are different animals, and this article explores
AD replication intra-site replication — how it’s created and how it works.
The Concepts To Understand
Before getting into AD replication, let’s make sure you’re up to speed
with basic AD concepts. Active Directory is Microsoft’s answer to distributed
networking. AD provides cohesiveness to a distributed network by storing
information about network resources and making those resources easy for
users to find. All resources stored in AD are called objects. User accounts,
group accounts, computer accounts, shared folders, printers, and all other
resources are AD objects. For each object, there’s a set of AD attributes.
An attribute helps define an object.
For example, a user account may have attributes like username, password,
email address, telephone number, and so on. Resources are organized on
a domain basis, often using the Organizational Units (OUs) that are new
to Win2K. Domains and OUs give you a logical view of the network. Sites,
on the other hand, are used by AD to manage replication and user traffic
over often more expensive and unreliable network WAN links. (see Figure
1).
|
Figure 1. Domains are used to
group resources logically, often using Organizational
Units. By contrast, sites are physical groupings.
Domains within a site typically share fast, inexpensive
network connections. |
By definition, a domain is a logical grouping of resources, which serves
as a security and administrative boundary. A site, on the other hand,
is a physical grouping. Sites can contain multiple domains and are built
on inexpensive and fast network connections. A Win2K site can be built
on one or more IP subnets. The network connections can be either inexpensive
LAN technologies or a high-speed backbone.
AD uses site information to configure AD replication, so the importance
of planning your sites can’t be overstated. As you’ll see in the remainder
of this article, AD builds its own intra-site topology and that topology
assumes your sites have fast, inexpensive bandwidth. When you plan AD
sites, closely examine the sites’ available bandwidth. A collection of
subnets without fast, inexpensive bandwidth shouldn’t be configured as
one site. Within a site, AD assumes you have adequate bandwidth, which
it will use freely—it assumes there’s plenty of it, and that it isn’t
costly.
The Basics of Intra-site Replication
The process of updating AD information in Win2K is actually quite interesting.
As I mentioned, when a change is made to the AD database on one domain
controller, it must be made to other domain controllers in order to keep
the information current. This is what we mean by intra-site replication.
As you know, there are no longer PDCs and BDCs in Win2K networks. All
domain controllers function as peers, and AD replication works in the
same way.
There’s no single, master replicator. Multi-master replication is used,
so all Win2K domain controllers are responsible for the replication of
AD database information using IP remote procedure calls (RPCs). In terms
of replication, the domain controllers function as peers, and each domain
controller has a write-able copy of the AD database. This design alone
provides replication fault tolerance. Because there isn’t a single master
replicator, the failure of one domain controller within the AD environment
doesn’t affect replication with other domain controllers. When a user
or administrator makes a change to an AD object, the change is made on
one of the domain controllers in the AD environment. After the change
is made, all other domain controllers have outdated information, so that
change must be replicated to all domain controllers. The domain controllers
automatically handle this job, which is transparent to users and administrators.
The Challenge of Latency
There are two important points you should remember about intra-site replication.
Replication within a site is typically frequent in order to reduce latency—the
time delay that occurs when data between domain controllers isn’t accurate.
For example, let’s say you create a new user account in a particular site.
You create the account on a single domain controller, and now that account
data must be replicated to all other domain controllers in the site. If
the user tries to log on before the data is replicated across the site,
the logon may fail because a domain controller that hasn’t received the
replication data would refuse the user access to the network—even though
the user actually has a valid account. This latency period is the time
during which data is inaccurate across the site.
AD replication must work quickly to avoid as much latency as possible
so that database information across the site is accurate. Because AD assumes
that connections within a site are fast and inexpensive, intra-site replication
occurs frequently, automatically, without any compression, and without
a schedule. AD, in other words, chooses updated information over latency,
since it assumes there’s plenty of bandwidth to use.
Inter-site replication is different. Since data in those cases must travel
from site to site, frequently over expensive or unreliable WAN connections,
replication schedules and managing latency becomes a much larger issue.
Replication is always a tradeoff between latency and the expense of connections
required for inter-site communication. For inter-site replication, you
can use the Sites and Services tool to configure the frequency of replication,
depending on your available bandwidth, and adjust the replication schedule
to find a balance between bandwidth and latency for your network.
What’s the Effect?
You might logically wonder what happens to intra-site network bandwidth
if replication occurs frequently and without a schedule. The full answer
remains to be seen as Win2K is rolled out in large, distributed networks.
Theoretically, however, replication traffic shouldn’t cause a bandwidth
problem because replication occurs at the attribute level. For example,
say you change a user account phone number. When that change is replicated,
only the phone number attribute is replicated — not all the data for the
entire object. With this approach, replication traffic should be minimal,
although I’m not willing to bet my career on that just yet.
Replication Topology
Now that you understand multi-master replication, you might be wondering
how to set it up for your site. Actually, you don’t need to — AD automatically
creates its own replication topology within a site. This is done with
the Knowledge Consistency Checker (KCC) service in AD. The KCC creates
a topology, or a series of pathways, between domain controllers within
the site, using replication partners. When AD is installed on the first
domain controller, it creates a default site first. As domain controllers
are installed and added to the site, the KCC determines how to include
them in the replication map. Domain controllers receive replication data
either directly from replication partners or transitively through indirect
replication partners. Regardless of the relationship, AD always tries
to create two pathways to every domain controller. That way, if one domain
controller fails, the "loop" isn’t broken, and an alternative route can
be used (see Figure 2).
|
Figure 2. The Knowledge Consistency
Checker (KCC) service in Active Directory creates
at least two pathways to every domain controller.
If one DC fails, an alternative route can be used. |
As your environment changes, for example, with the addition or removal
of domain controllers, the KCC adjusts its topology to accommodate the
change. The KCC can make adjustments dynamically as needed to ensure that
replication can reach each domain controller in the site.
Optimal Performance
As an AD administrator, what do you need to configure? Actually, nothing.
Since the KCC automatically generates and makes changes to the topology,
AD takes care of intra-site topology for you. You can force replication
to occur, although it really isn’t necessary. You can also tell AD to
check its replication topology using the Sites and Services tool shown
in Figure 3. Aside from such tasks, AD takes care of the replication topology
by itself.
The key to optimal performance is to plan your network infrastructure
carefully before deploying AD. A major portion of that planning process
should be the examination of available bandwidth at each site. Remember
that AD uses site information that you configure in the Sites and Services
tool to determine how replication should occur. Within a site, AD assumes
that fast and inexpensive bandwidth is available. If that assumption is
incorrect, you need to back up and look at your site configuration.
|
Figure 3. You can check replication
topology with the Sites and Services tool. |
How AD Replication Works
So how do the domain controllers replicate information and keep up with
each other’s database changes? All AD domain controllers in a domain are
aware of each other’s presence due to the replication topology generated
and managed by the KCC. Since they’re aware of each other, they simply
have to make certain that replication data gets to each domain controller.
This process begins with an "originating update."
For example, let’s say you change a user account’s password on a particular
domain controller. All other domain controllers now have outdated information
regarding the password, so the change must be replicated. The domain controller
on which you made the password change issues an originating update to
the other domain controllers. Depending on which change you make, a certain
kind of originating update is issued:
- Add — When you add a
new object to AD, such as a user, group, printer, and
so on, an Add originating update is issued.
- Modify — When you modify
an object, for example, when you change a user’s password,
the Modify originating update is issued.
- ModifyDN — When you
change the name of an object or an object’s parent,
or when you move an object into a new parent’s domain,
the ModifyDN originating update is issued.
- Delete — When you delete
an object, the Delete originating update is issued.
Thus, when you make a change to the database, the domain controller on
which the change was made issues a particular type of originating update
to the other domain controllers. This originating update lets the other
domain controllers know that changed data needs to be replicated The originating
update becomes a replicated update on those domain controllers once the
replication process is completed. (See Figure 4.)
|
Figure 4. When you make a change
to a database, the domain controller on which the
change was made issues an originating update so that
replication can occur. |
How do domain controllers know if the replicated changes are new? This
is determined through the use of Update Sequence Numbers (USNs). Each
domain controller has a USN table that contains a USN number for each
attribute. When an attribute is changed on a domain controller, that attribute’s
USN number is updated. Now, all other domain controllers have an outdated
USN. When replication occurs, the change to the object is replicated with
the new USN. All other domain controllers make this change to their databases
and update their USN tables, so they’re accurate. USNs work well because
they do away with the need for specific timestamps, although timestamps
are still maintained in order to break replication ties. For example,
let’s say an administrator changes a user’s password on one domain controller
and a different administrator changes the same user’s password on a different
one. AD will use the timestamp to break the tie between the two, with
the latest timestamp "winning."
How AD Solves Replication Problems
AD replication is much more precise because USNs are primarily used instead
of timestamps. Because replication uses USNs rather than timestamps, you
don’t have to worry about precise time synchronization between domain
controllers — a frustrating and frequently difficult configuration problem.
However, as with any process, the potential for problems exists, and AD’s
replication process is no exception. AD contains built-in mechanisms designed
to solve certain kinds of replication problems when they occur. Let’s
consider the two major ones — unnecessary replication and replication
collisions.
As mentioned earlier, AD automatically creates a replication topology
loop. This loop ensures that replication reaches all domain controllers
in the site in a timely manner and that replication can continue if a
domain controller fails. However, a potential problem with the loop could
occur with unnecessary replication traffic, such as when a domain controller
receives replication updates more than once. To make sure this doesn’t
happen, AD uses a process called propagation dampening, which allows a
domain controller to detect when replication has already reached another
domain controller in the loop. When the domain controller detects this,
the change won’t be replicated to that other domain controller (see Figure
5).
|
Figure 5. "Propagation dampening"
allows a domain controller to detect when replication
has already reached another DC in the loop, so that
the change is not repeated. |
Propagation dampening works by using two vectors — an Up-to-Date Vector
and a High Watermark vector. Vectors are pairs of data that contain a
globally unique identifier (GUID) and the USN. The Up-to-Date Vector is
made up of server USN pairs held by each server containing the highest
originating update received from each domain controller. In a like manner,
the High Watermark Vector contains the highest attribute USN for any given
object. By using both of these vectors, domain controllers can detect
when replication has already reached another domain controller and then
kill the replication. Without propagation dampening, replication could
continue to flow around the loop over and over.
Another potential problem AD automatically resolves is replication collisions.
Replication collisions occur when two different administrators make a
change to the same attribute on the same object at different domain controllers.
For example, two administrators change a user’s password at the same time
on two different domain controllers. When those changes are replicated,
there’s a replication collision. AD tries to reduce the number of collisions
by replicating data on an attribute level rather than on an object level.
For example, for a particular user account, one administrator might change
the password while another admin changes the user’s phone number. The
same object is being changed, but those changes affect different attributes.
This doesn’t result in a collision.
In the event of a collision, AD detects and takes necessary steps to
solve the collision problem. AD uses timestamps and version numbers to
break the collision. Although timestamps have been replaced by USNs, they’re
still maintained to resolve collisions. AD examines an attribute’s timestamp
to see which update has the highest timestamp and also examines the attribute’s
version number. Because each originating update of an attribute increases
the version number, AD examines both. The replicated change with the highest
numbers wins, and that attribute change is replicated. It’s extremely
unlikely that the timestamp and version numbers would be the same, but
in such a case, AD can also use the directory system agent’s (DSA) GUID
to determine the final winner in the collision. As with other replication
features, this process is invisible.
The Jury is Still Out
Although the final judgment on the effectiveness of AD’s intra-site replication
process, both in terms of latency and network bandwidth, remains to be
made, the process is effective from a design perspective. Because of AD’s
ability to dynamically configure its own intra-site replication needs
without human intervention, administrators can spend their time focusing
on other network tasks — a design that benefits us all.