Complex Dual-Site Failure

A complex dual-site failure may result in a large number of non-replicated transactions, depending on how the systems restart. This situation occurs if both systems are down, the secondary comes up for a period, assumes primary status, and then fails again while the original primary remains down.

If the original primary comes up first after the second failure, represented as Site A in Figure 7.3, it will be primary. When Site B comes up, it will act as the secondary and be forced to rollback all transactions performed while it was primary and Site A was down. These transactions become non-replicated transactions. If Site B came up first, then the non-replicated transactions would occur when Site A restarted. These would be the transactions while A was primary after B failed.

Represents Complex Dual-site Failure

Figure 7.3. Complex Dual-Site Failure

When recovering from these situations, the primary is always the current system of record when the secondary comes up. The secondary must roll its database back to the transaction with the highest journal sequence number for which both systems are known to be in agreement, and "catch up" from there. All transactions backed out of the database as part of the rollback must then be reconciled on the primary.