Oops, I’ve just hosed my domain controller


Here are some links I found useful when adding a backup domain controller to our network then messing up all authentication by ruining the replication process.

In general there’s a lot of good information on Daniel Petri’s site, inparticular:
http://www.petri.co.il/how_to_install_active_directory_on_windows_2003.htm

http://www.petri.co.il/transferring_fsmo_roles.htm

However, after successfully adding an additional domain controller it would not authenticate.  It wouldn’t authenticate because the sysvol stuff would not replicate.  But it took ages to realize the problem was a File Replication Service (ntfrs) failure though the information was in the event log.  So the first big lesson is: look in the event log!

It wouldn’t replicate because one of the existing replica sets on the original domain controller was corrupt.  There used to be a second domain in the AD forest that was broken and forgotten but had been slowly accumulating so many failed replications that the replicas became corrupt.  So I followed the advice in the event log to set:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters]
Enable Journal Wrap Automatic Restore=1

to clear the existing replicas which with hindsight *is* the right thing to do.  But by this stage I’d raced ahead and made the DNS server on the new domain controller the master DNS and transferred all the FMSO roles to the domain controller.  Big mistake because then when the File Replication Service (ntfrs) re-started, it looked to the *new* domain controller for current domain information.  Of course it wasn’t there – the original failure occurred trying to update the new domain controller!  So ntfrs kindly(!) removed the existing NETLOGIN and SYSVOL shares on the original domain controller and then there was no login information.

Fortunately I came across a posting that refered to a Microsoft knowledgebase article and suggested the original replication problem was caused by my racing ahead.  So I used ntdsutil (the Windows 200x resource kit is your friend) to transfer the FMSO role back to the original domain controller.

This knowledgebase article did save the day by showing me how to reset the original replica set by hand.

http://support.microsoft.com/kb/315457

There’s *a lot* of redundant information in this article (stuff about moving files to a temporary location) but the key information is setting the “burflags”.  Good advice.  After following the instructions (well, but not dogmatically) and restarting ntfrs the SYSVOL and NETLOGIN shares reappeared.  Now when I started the ntfrs service on the new domain controller the replication succeeded (woohoo!) the SYSVOL and NETLOGIN shared appeared on this server too.

If you are running Exchange, don’t forget that you may need to change where it looks for the GC, PDC and configuration.  Open the Exchange System Manager console, open the Servers folder, right-click on the server and select properties and finally select the “Directory Access” page.  Now you can include the new domain controller in the list of AD servers Exchange will use to find out about its configuration.

So what have I learned:

Look at the event log!

Disable the firewall on the DC or, at least make sure File and Printer sharing is enabled!

There is good information in the ether but its not in one place and is drowned out by poorly informed individuals propagating some piece of “folklore” they’ve heard.  An example of the folklore is that you should wait “a few hours” or “a day” for domain changes to be replicated across domain controllers as if its some heuristic process.  The reality is that if you have to look after hundreds of domain controllers the updates are going to seem mytstical but only if you wait for the ntfrs processes to kick-in at their own leisurely pace.  However you can force replication by issuing repadmin commands or by forcing replication using the AD Sites and Services snap-in or just by restarting the ntfrs service on a domain controller. 

For the rest of us with just a few domain controllers it is entirely deterministic.  You can force replication and look in the event log to see when a task has completed.

Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


Other Posts

Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

Be the first to leave a comment!