Improved Front End pool cold start capability:
Whenever a new user is created, he/she is allocated to a routing group. Each of this routing groups will have a primary, a secondary and a backup secondary. Primary replica server of a routing group will be handling any service request from the user in that group. All users in the same routing group will be homed on the primary replica. At any given time, If the primary replica fails, all users homed in that Routing Group will be failing over to the secondary replica. Every time a Routing group lose a replica, windows fabric will try to rebuild another replica. If 2 out of 3 replicas goes offline the routing group will go in to quorum loss state. If the replicas are not coming back online we need to run Reset-CsPoolRegistrarState with the QuorumLossRecovery switch. A quorum loss recovery reset is carried on a pool level. When we execute this command only the details of the users in the routing group which had lost the quorum will be reloaded from the backup store. Other routing group which has the quorum will never be impacted by this command.
If a routing group is completely down, RtCSrv will not start. We need to Use Reset-CsPoolRegistrarState cmdlet with the QuorumLossRecovery parameter.
Start-CsPool is now used to to start a Skype for Business Server pool but with certain customization. Every time you start a pool, it tries to populate the information about all the routing groups hosted in that pool. With this cmdlet you have an option to choose what is really required as per the current situation.
There are 3 switches which you can use along with Start-Cspool.
If true ($True), user data is reloaded from the backup store for any routing groups currently in quorum loss. The default is false ($False.). This is something same as what is done by Reset-CsPoolRegistrarState cmdlet with the QuorumLossRecovery parameter.
Specifies one or more routing groups by GUID to skip during startup. Use this parameter if one or more of the routing groups are having problems getting placed on servers. Sometimes when you try to start FE service it may get stuck, pending on placing users to a specific routing group. Here this one routing group will in turn delay the time taken to bring the service online, affecting all other routing groups placed in the same box. Using the skip parameter you always have an option to skip any routing group with known issues during the service startup time. We can troubleshoot the affected routing group separately and get it fixed.
If i’m using Start-Cspool with QuorumLossRecovery $true and SkipRoutingGroup “GUID”, the user data for all routing groups which are in quorum loss state, except the one in the skip list will be reloaded.
Suppose a server is not functional due to network or hardware failure we can skip this server lookup during the pool startup. Note, still minimum number of servers are required for the pool to be functional. The cmdlet will check for those conditions while trying to implement this parameter.
If you have any suggestions or feedback, please feel free to comment below.