EDB Postgres Database High Availability and Disaster Recovery Design

Option 1 – EDB Database Architecture on EDB Postgres Failover Manager (EFM)

HA and DR with EFM (with Primary DB HA is configured at Primary and DR site and Disclosure/Data warehouse (DWH) DB HA is configured only at DR site)


·     On Primary DC Site, Between Primary node and Hot standby nodes, Asynchronous streaming replication is configured between Master (Primary DB) and two slave nodes (i.e. for Replication1 and Replication 2 DB).

·     Between Primary DC site to DR site over WAN, Asynchronous streaming replication is configured between Primary Master node and DR site slave nodes (i.e. for Primary DB and Replication DB).

·     EFM tool is used for Primary DB Master slave’s configuration for HA at Primary site and DR at DR site. In this case total 5 nodes EFM cluster for Primary DB (i.e. one Master and two slaves at Primary site and 2 slaves at DR site).

·     For Disclosure/Data Warehouse DB, considered one node (i.e. Master) at Primary site and slave node at DR. Not considered HA at Primary site (Configured with HA but at DR site).

·     Between Primary DC site to DR site over WAN, Asynchronous streaming replication is configured between DWH Primary Master node and DR site DWH slave node.

·     EFM tool is used for DWH DB Master slave’s configuration for HA at Primary site and DR at DR site. In this case total 2 nodes EFM cluster for DWH DB (i.e. one Master at Primary site and one slave at DR site).

·     Data will be extracted from one Replication DB which will further use for ETL processing at Disclosure/DWH database.

·     In this simpler scenario, Primary DB is configured using Master-Slave configuration using EDB EFM tool for failover with all nodes having the same configuration. At Primary site, one master, two slaves (i.e. read only DB for replication DB) and two slaves (i.e. read only) at DR site (i.e. one for Primary DB and other for replication DB). Failure of Master DB (Read/Write) DB at Primary site would result in first standby/save in priority order being promoted to Master. If we set the priority at Primary site then one of the replication DB will turn on as Master and in case of failure of entire primary site and based on priority defined, one of the slave at DR site will turn on as Master for Primary DB and other node will work as replication DB. Same configuration for Disclosure DB (i.e. one master at Primary site and one slave at DR site for HA) except slave node is not configured at Primary site for Disclosure DB. Failover can be achieved through DR site.

·     Automatic reconfiguration would balance load between remaining nodes.

Pros:

·         Failover and reconfiguration after service restoration are automatic

·         Load balancing can be an integral part

·         Model works in bare metal, VM, cloud and container architectures

 Cons:

·         Management of network and pooling needs addressing

·         Replication of data with no co-ordination will be quickly become unmanageable

·         Failover processes would be cumbersome

·         EFM instances are hot (licensable)

 

Option 2 – EDB Database on VMWare – HA and DR without EFM with OS Cluster


On Primary DC Site, Primary node is configured in Active/Passive in VMWare environment (with OS cluster) for HA and standby nodes, Asynchronous streaming replication is configured between Primary DB and Replication DB1 & Replication DB2.

·     Between Primary DC site to DR site over WAN, Asynchronous streaming replication is configured between Primary Active node and DR site for DR Primary DB and replication DB.

·     For Disclosure/DWH DB configuration, on Primary DC Site, nodes are configured in Active/Passive in VMWare environment for HA (with OS cluster).

·     Between Primary DC site to DR site over WAN, Asynchronous streaming replication is configured between DWH Primary Master node and DR site DWH DB.

·     Data will be extracted from one Replication DB which will further use for ETL processing at Disclosure/DWH database.

·     In this scenario, failure at the Primary site of the primary node would result in passive node to activate as Master for Primary and DWH DB and failure of Primary site would result in failover to DR site.

Pros:

·         Clustering of OS resources

·         Reduced Database license count

Cons:

·         Model works only in bare metal, VM

·         Split Brain potential

·         Failover mechanism will need to be implemented and tested (non-trivial)

·         Single point of failure for disk storage

·         Replication instances will need to be manually coordinated

·         No automatic recovery and reconfiguration after restoration of configuration from failover

·         Failover processes would be cumbersome



Author is Mr. Shashank Shekhar (Technical/Solution Architect), possessing 18+ years of experience in IT industry. Shashank is certified Exadata, Oracle Database professional, having strong technical skills in Digital technologies. Also, TOGAF certified.

Comments

Post a Comment