A database availability group (DAG) is the base component of the Mailbox server high availability and site resilience framework built into Microsoft Exchange Server. A DAG is a group of up to 16 Mailbox servers that hosts a set of databases and provides automatic database-level recovery from failures that affect individual servers or databases.

A DAG is a boundary for mailbox database replication, database and server switchovers and failovers, and an internal component called Active Manager. Active Manager, which runs on every Mailbox server, manages switchovers and failovers within DAGs.

Each member of a DAG must be running the same operating system and have the same Exchange version. Different versions of Exchange server cannot be in the same DAG.

Database availability group quorum models

  • Exchange DAG uses the Windows failover cluster.
  • Failover clusters use the concept of quorum, which uses a consensus of voters to ensure that only one subset of the cluster members (which could mean all members or a majority of members) is functioning at one time. If the cluster loses quorum, all DAG operations terminate and all mounted databases hosted in the DAG dismount. In this event, administrator intervention is required to correct the quorum problem and restore DAG operations. Quorum is important to ensure consistency, to act as a tie-breaker to avoid partitioning, and to ensure cluster responsiveness.
  • There is no need to install or configure the Windows failover clusters, it will be installed and configured while configuring the Exchange Database availability group. So, management is on the Exchange level and not on the failover cluster manager unless it is really required. Additionally, for IP-less DAG, Failover cluster will be installed but not configured.

Node and File Share Majority: DAGs with an even number of members use the failover cluster’s Node and File Share Majority quorum mode. When in failure of any DAG member, witness server is used to provide one DAG member with a weighted.

Node Majority: DAGs with an odd number of members use the failover cluster’s Node Majority quorum mode. In this mode, each member gets a vote, and each member’s local system disk is used to store the cluster quorum data. If the configuration of the DAG changes, that change is reflected across the different disks. The change is only considered to have been committed and made persistent if that change is made to the disks on half the members (rounding down) plus one.

Note: File share witness to be configured while creating database availability group whether it is used for voting or note. The quorum model is adjusted automatically by the DAG as you add or remove members.

DAG Network:

A DAG network is a collection of one or more subnets used for either replication traffic or MAPI traffic. Each DAG contains a maximum of one MAPI network and zero or more replication networks.

Automatic of manual DAG network config. Default is Automatic. Automatically sets up the network interfaces depending on which NIC is set to register in DNS.

When DAG is set for Autoconfig, you cannot edit or view the properties of the networks. Only after setting the DAG to manual configuration.

High Availability Terminology

  • High Availability – Solution must provide data availability, service availability, and automatic recovery from failures
  • Disaster Recovery – Process used to manually recover from a failure
  • Site Resilience – Disaster recovery solution used for recovery from site failure
  • *over – Short for switchover/failover; a switchover is a manual activation of one or more databases; a failover is an automatic activation of one or more databases after a failure

High Availability Feature Names

  • Mailbox Resiliency – Name of Unified High Availability and Site Resilience Solution
  • Database Mobility – The ability of a single mailbox database to be replicated to and mounted on other mailbox servers
  • Incremental Deployment – The ability to deploy high availability /site resilience after Exchange is installed
  • Exchange Third Party Replication API – An Exchange-provided API that enables use of third-party replication for a DAG in lieu of continuous replication
  • Database Availability Group – A group of up to 16 Mailbox servers that host a set of replicated databases
  • Mailbox Database Copy – A mailbox database (.edb file and logs) that is either active or passive

Understating the *overs

  • Within a datacenter
    • Database or server *overs
  • Datacenter level: switchover
  • Between datacenters
    • Database or server *overs
  • Assumptions:
    • Each datacenter is a separate Active Directory site
    • Each datacenter has live, active messaging services
    • Standby datacenter must be active to support single database *over

Active Manager

  • Exchange component that manages *overs
    • Runs on every server in the DAG
    • Selects best available copy on failovers
    • Is the definitive source of information on where a database is active
      • Stores this information in cluster database
      • Provides this information to other Exchange components
    • Two Active Manager roles: PAM and SAM
  • Primary Active Manager (PAM)
    • Runs on the node that owns the cluster group
    • Gets topology change notifications
    • Reacts to server failures
    • Selects the best database copy on *overs
  • Standby Active Manager (SAM)
    • Runs on every other node in the DAG
    • Responds to queries about which server hosts the active copy of the mailbox database
  • Both roles are necessary for automatic recovery
    • If Replication service is stopped, automatic recovery will not happen

Active Manager Selection of Active Database Copy

In earlier versions of Exchange, the BCS process evaluated several aspects of each database copy to determine the best copy to activate. These included:

  • Copy queue length
  • Replay queue length
  • Database status
  • Content index status

In the newer version, below are the additional checks performed by Active Manager (listed in the order in which they are performed)

  • All Healthy – Checks for a server hosting a copy of the affected database that has all heath states in a healthy state
  • Up to Normal Healthy – Checks for a server hosting a copy of the affected database that has all health sets Medium and above in a healthy state
  • All Better than Source – Checks for a server hosting a copy of the affected database that has health sets in a state that is better than the current server hosting the affected copy
  • Same as Source – Checks for a server hosting a copy of the affected database that has health sets in a state that is the same as the current server hosting the affected copy

DAC

  • DAC mode is designed to prevent split brain from occurring by including a protocol called Datacenter Activation Coordination Protocol (DACP). After a catastrophic failure, when the DAG recovers, it won’t automatically mount databases even though the DAG has a quorum. Instead DACP is used to determine the current state of the DAG and whether Active Manager should attempt to mount the databases.
  • DACP was created to address this issue. Active Manager stores a bit in memory (either a 0 or a 1) that tells the DAG whether it’s allowed to mount local databases that are assigned as active on the server. When a DAG is running in DAC mode (which would be any DAG with three or more members), each time Active Manager starts up the bit is set to 0, meaning it isn’t allowed to mount databases

Monitoring High Availability and Site Resilience

  • You can use the Get-MailboxDatabaseCopyStatus cmdlet to view status information about mailbox database copies.
  • You can use the Test-ReplicationHealth cmdlet to view continuous replication status information about mailbox database copies.
  • Crimson channels in the Applications and Services logs area, Applications and Services Logs > Microsoft > Exchange.
  • CollectOverMetrics.ps1 reads DAG member event logs to gather information about database operations (such as database mounts, moves, and failovers) over a specific time period.
  • CollectReplicationMetrics.ps1 collects data from performance counters related to database replication.
  • CheckDatabaseRedundancy.ps1 script is to monitor the redundancy of replicated mailbox databases by validating that there is at least two configured and healthy and current copies, and to alert you when only a single healthy copy of a replicated database exists. In this case, both active and passive copies are counted when determining redundancy.

Creating a 3 node IP-Less DAG in 2 sites

Now, we will see the steps to create a 3 node IP-Less DAG in Exchange 2016 part of 2 Active directory sites. Below are the details,

Active Directory Sites

  • Prod
    • PDC – 10.0.0.2
    • ADC – 10.0.0.3
  • DR
    • ADC – 10.10.10.2

Exchange Servers:

  • EXCH01 – 10.0.0.10
  • EXCH02 – 10.0.0.11
  • EXCH03 – 10.10.10.10

File Share Witness:

  • MGMT – 10.0.0.5

Before creating DAG, add the “Exchange Trusted Subsystem” to the “Administrators” group. Create a folder in MGMT and enable the sharing.

Now, Login to Exchange Admin Center, Under Server–> Database availability groups–>Click on + Add

Enter the Name for the DAG, the Witness Server and Directory

DAG has been created successfully. Next step is to add the DAG members. Select the DAG and click on “Manage DAG Membership”

Add the nodes and click on Save

All 3 nodes will be listed as operational servers in DAG status,

Its recommended to turn on the DAC mode when the DAG is split across 2 sites/datacentres. This can be set by using the below command,

DAGNetworks will be automatically configfured by using the NIC’s available in the server and if needs to be adjusted, manual DAG network configuration needs to be enabled on the DAG properties.

Set-DatabaseAvailabilityGroup “DAG” –ManualDagNetworkConfiguration:$true

Once done, use the command Set-DatabaseAvailabilityGroupNetwork to configure the DAG networks for MAPI and replication.

Happy learning!! 🙂

By Ashok M

A technology enthusiast with 9+ years of experience in Planning, Designing, Implementation, Migration and Operations of various Microsoft Infrastructure & Cloud Services. Extensive knowledge of Cloud Computing, Microsoft Messaging & Collaboration, Digital Transformation, IT Services & Emerging technologies. • One of the Authors of the book – “Reimagine Remote Working with Microsoft Teams : A practical guide to increasing your productivity and enhancing collaboration in the remote world” - https://www.amazon.com/Reimagine-Remote-Working-Microsoft-Teams/dp/1801814163 • Blogger at CloudExchangers - https://cloudexchangers.com/ • Microsoft Community Contributor in Microsoft Q&A - https://docs.microsoft.com/en-us/users/ashokm-8240 • Microsoft Certified Professional in MS Azure, Microsoft365, MS Teams and Skype for Business