In-Depth
Kerberos Authentication 101: Understanding the Essentials of the Kerberos Security Protocol
Knowing the basics of this pervasive protocol can be critical in troubleshooting and solving Windows security problems.
While Windows IT professionals deal with security on a daily basis, very few understand the under-the-hood protocol, Kerberos. Kerberos is a security protocol in Windows introduced in Windows 2000 to replace the antiquated NTLM used in previous versions of Windows.
Kerberos has several important advantages. For example, it:
- is very secure, preventing various types of intrusion attacks
- uses "tickets" that can be securely presented by a client or a service on the client's behalf to a server for access to services
- permits Cross-Forest Trusts to use transitive properties and eliminate the "full mesh" scenario; all domains in both forests establish a trust with a single Kerberos trust at the root
- permits interoperability with other Kerberos realms such as Unix; this permits non-Windows clients to authenticate to Windows domains and gain access to resources
- provides authentication across the Internet for Web apps
Therefore, it's important to have a good understanding of how the Kerberos protocol works and be familiar with the details of the security functions. This will help with diagnosing a variety of security issues. In addition, IT professionals should understand how Windows Time Service works because Kerberos security is highly dependent on time services.
Kerberos, or Cerberus, is a three-headed dog in Roman mythology that guards the gates of the underworld, preventing inhabitants there from escaping. The Kerberos protocol prevents the bad guys from getting in. There are three components to Kerberos: the client, a service and a third-party that both client and service trust. I love the statement made by Fulvio Ricardi in his Kerberos Protocol Tutorial: Kerberos is "… an authentication protocol for trusted clients on untrusted networks." So, if Kerberos is designed to trust on an untrusted network, it should be even more effective on a trusted corporate network.
The Shared Secret
As noted previously, a key feature is the shared secret and a password that doesn't travel on the network. Thus the service (on the server) and the client (workstation) both know the password. The following scenario describes how this works:
- An account is created on the domain controller, or DC (the Kerberos Key Distribution Center or KDC) and given a password.
- The Kerberos client adds a text string (SALT) to the unencrypted password, along with a Kerberos version number (kvno), and runs those things through the "string2Key" conversion application. The "shared secret" is created. The SALT string is the username.
- At the workstation, the user enters the account name and password and requests certain services. The Kerberos client generates the secret key on the client. Because Kerberos uses the same algorithm to generate this secret key as was used on the KDC, the two secret keys will match as long as the username and password entered are the same.
- The user and the Authentication Service (AS) running on the KDC communicate using the shared secret.
Authentication and Authorization
Using the shared secret method, a user can log in and get access to some application or service, as illustrated in Figure 1. The APIs used are shown in the figure, such as "AS_REQ." The user logs into a workstation with an existing account. The AS_REQ API makes the request of the server by sending the user name. AS_REQ is encrypted. The KDC uses the shared secret associated with that user to decrypt the AS_REQ packet. If successful, the request is honored and a "Ticket Granting Ticket" (TGT) is returned in the AS_REP packet. The TGT can then be used by the client to prove the user is who she says she is and is properly authenticated. This ticket is good for a configurable time period.
[Click on image for larger view.] |
Figure 1. How a user can log in and access an application using the shared secret method. |
If the user wants access to some service or application on a server that requires a service ticket, the TGT just obtained is presented to the server hosting the Ticket Granting Service (TGS) using the TGS_REQ. In a Windows domain, the TGS, like the AS, is hosted on each DC. The TGS contacts the database to find the shared secret, decrypts the AS_REQ and grants the service ticket. The service ticket is encrypted by the Session Key, which is shared by services only. The user cannot decrypt a service ticket. The service ticket is returned using the TGS_REQ. The client cannot decrypt the service ticket because only servers can do that, but it can send it on. The client then sends the service ticket to the application server using the AP_REQ. This is like a locked box inside a locked box. The outer box (packet) can be opened by the service because it has the user's shared secret. It can then open the service ticket because it has the shared Session Key with the TGS. The user is thus validated. The application server would then apply the appropriate permissions to the user to determine if the action requested (such as read, write, change to a document) is granted to the user. If mutual authentication is required, the application server uses the AP_REP to tell the client which service was requested, as a security measure.
The Replay Attack
A replay attack occurs when an intruder steals the packet and presents it to the service as if the intruder were the user. The user's credentials are there -- everything needed to access a resource. This is mitigated by the features of the "Authenticator," which is illustrated in Figure 2. The Authenticator is created for the AS_REQ or the TGS_REQ and sends additional data, such as an encrypted IP list, the client's timestamp and the ticket lifetime. If a packet is replayed, the timestamp is checked. If the timestamp is earlier or the same as a previous authenticator, the packet is rejected because it's a replay. In addition, the time stamp in the Authenticator is compared to the server time. It must be within five minutes (by default in Windows).
[Click on image for larger view.] |
Figure 2. The Authenticator mitigates the possibility of a replay attack. |
If the time skew is greater than five minutes the packet is rejected. This limits the number of possible replay attacks. While it is technically possible to steal the packet and present it to the server before the valid packet gets there, it is very difficult to do.
It's fairly well known that all computers in a Windows domain must have system times within five minutes of each other. This is due to the Kerberos requirement.
Pre-Authentication
In previous versions of Kerberos (v4 and older), a password was not required for authentication. A simple valid user name would authenticate the user. In Kerberos v5, a password is required. This is called Pre-Authentication. It's possible to disable Pre-Authentication in order to provide backward compatibility for old Kerberos v4 libraries and Unix apps and so on.
Warning: Disabling Pre-Authentication is a serious degradation of security.
One of the components of the Authenticator is the ticket lifetime, also configurable in Group Policy. This permits the user to access server resources without re-authenticating for 10 hours by default, and is renewable without intervention by the user.
Time Services
As noted, the Windows Time Service is critical to proper functioning of the Kerberos security model. To keep system clocks on all computers in the domain within five minutes, Windows has used the Network Time Protocol (NTP) since Windows Server 2003, rather than the old Simple Network Time Protocol (SNTP) used previously. NTP uses a "reference clock" on each computer. The reference clock is set at UTC (think GMT) time and doesn't change from computer to computer, no matter what time zone the computer is in. This is often confusing to administrators, as it seems that a computer in Belgium would not be within the five-minute time skew of a computer in Atlanta, five time zones away.
It's important to separate the computer's reference clock from what you see in the Date and Time display in the notification area of the taskbar. The Date and Time display is just a convenient way for users to see what the local time is and has nothing to do with time synchronization for time services. Note that changing the time in the Date and Time display in fact does change the time of the reference clock by the delta that you choose.
For instance, as shown in Figure 3, if the UTC time is 13:00, and I'm in Atlanta (GMT -5), then the Date and Time display shows the time as 08:00. If I change the Date and Time display to 09:00, (Figure 4) then the reference clock is set ahead 1 hour to 14:00 when the UTC on all other machines is 13:00. This causes the time skew. That's why you can fix two computers that have a large time skew by changing the time with the Date and Time feature.
[Click on image for larger view.] |
Figure 3. UTC time is 13:00, but Date and Time shows the local time in Atlanta. |
Warning: Before changing the time, make sure you are indeed one hour out of sync with the actual time or it will cause authentication failures. You can change the time for certain troubleshooting techniques, but be careful that everything is correct when you finish.
[Click on image for larger view.] |
Figure 4. Changing the reference clock on one machine can cause time skew. |
Note that you can change the time zone and it will not affect the reference clock time. In my example, if I change my time zone to the U.S. Pacific Time zone, the display will show the time as 05:00, but the reference clock will remain unchanged.
This is demonstrated by a situation I found in our lab some time ago. I had a DC in Brussels that had been installed with the incorrect time zone. Rather than showing the Belgium time zone (UTC + 1:00), it showed Pacific Time (U.S. and Canada). It had actually been like this for a couple of years before we noticed it. The local admin had not noticed the displayed time was off from the actual local time. Yet there were no replication failures, no W32Time errors, and no authentication failures. So we changed the time zone and the display changed, but there was no effect on the reference clock. If the local admin had noticed that the displayed time was nine hours slow and changed the time rather than the time zone, then that DC would have a nine-hour time skew and authentication failures would have resulted.
In an Active Directory domain, time services are pre-configured out of the box. Figure 5 shows the hierarchical time service structure. The PDC of the forest root domain is the authoritative time server for the forest and the root domain. The PDCs of each child domain will use the forest root domain PDC as their authoritative time source. DCs in each domain use the PDC as their time source and clients use their authenticating DC as their time source. Note that while configuring an external time source to sync with the root PDC is a good idea, it's not required.
[Click on image for larger view.] |
Figure 5. The hierarchical time service structure. |
Troubleshooting Tip: External time servers can harm the domain and forest if they experience errors. In one case, I saw an external time server back the time on the PDC to a year previous, logging event 52 in the system event log and causing widespread authentication failure. To prevent this, see Microsoft KB 884776 for a registry value to prevent time changes in larger than pre-defined increments. In this case, I set it at 15 minutes.
Troubleshooting Windows Time Issues
Windows 2000, 2003 and 2008 all contain a utility called W32tm.exe, a utility for diagnosing and fixing time-sync issues. However the Windows 2000 version has different options, which will not be described here. Time sync errors will be manifested in a number of ways:
- System event log: look for W32time errors. These will be fairly descriptive, so read them. Note that over an unreliable network connection you might see events stating that the time server couldn't be found. Just keep reading to see if it eventually found one.
- For DCs, Repadmin/Showrepl or Repadmin/replsum/ bysrc/bydest/sort:delta will show time-sync errors.
Users logging in or accessing network resources will get authentication failures. Logins will sometimes display an error saying the time is out of sync with the DC.
There are several key options in the W32tm.exe utility to resolve time errors:
- W32tm –resynch. This forces a clock resync on the local computer. I always try this one first if there are events stating that the sync to the server is lost.
- W32tm /config/syncFromFlags:DomHier. This forces the DCs to get time in the normal domain hierarchy scheme -- such as resetting them all to a default configuration.
- W32tm /monitor/domain:WTEC. This lists the time skew for each DC, in the "WTEC" domain, with the PDC being the reference.
- W32tm /stripchart. This allows comparison of any two computers (as opposed to /monitor, which only does DCs).
The W32tm/monitor command is very handy to see if all DCs are within acceptable time skews of the PDC. To correct the time sync, you could go directly to the DC and set the time, or try the /syncFromFlags option of W32tm.exe. In addition, NTP has some self-healing power. It will look at the time difference, divide it by two and reset it. Over time NTP can correct some small time skews.
Note that if the W32time service is disabled, logins will fail. Make sure the time service is set to Start and Automatic.
A good understanding of Kerberos and the Windows Time Service is critical to be able to diagnose authentication issues. While this article did not have the space to do an exhaustive description, it did provide the basics. For further study there are some excellent references that I recommend in "Get More Info."