repmgr之使用witness自动failover

编程入门行业动态更新时间:2024-10-11 09:19:14

repmgr之使用witness自动failover

接之前的环境，部署如下：

db1 10.4.9.166 primary
db2 10.4.9.218 standby
db3 10.4.9.250 witness

witness作用：

witness服务是一个普通的PostgreSQL 实例，不是流复制的一部分：其目的是，如果发生故障转移的情况，提供证据，证明主节点本身不可用，而不是不同的物理网络中断导致的脑裂。

witness服务的典型用例是一个双节点流复制集群，其中主和备用位于不同位置。在主节点的同一网络创建witness服务，如果主节点不可用，则备节点可以决定是否可以在不脑裂风险的情况下提升为主节点：如果备节点网络上只是和witness或主节点中的一个节点不通，，则很可能存在网络中断，它不应该切换为主节点。如果备节点和witness节点想通，但和主节点不通，这证明不是网络中断，而是主节点本身不可用，因此它可以切换为主节点。

#查看当前状态repmgr -f /home/postgres/repmgr/repmgr5.2.1.conf cluster showID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                             
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------1  | db1  | primary | * running |          | default  | 100      | 3        | host=db1 user=repmgr dbname=repmgr port=1921 connect_timeout=22  | db2  | standby |   running | db1      | default  | 100      | 3        | host=db2 user=repmgr dbname=repmgr port=1921 connect_timeout=23  | db3  | witness | * running | db2      | default  | 0        | n/a      | host=db3 user=repmgr dbname=repmgr port=1922 connect_timeout=2#关闭原主节点
pg_ctl stop 
waiting for server to shut down.... done
server stopped
#原备节点自动提升为主节点，日志如下：
tail -f repmgr_melot.log 
[2021-05-06 19:40:39] [NOTICE] STANDBY PROMOTE successful
[2021-05-06 19:40:39] [DETAIL] server "db2" (ID: 2) was successfully promoted to primary
[2021-05-06 19:40:39] [INFO] checking state of node 2, 1 of 3 attempts
[2021-05-06 19:40:39] [NOTICE] node 2 has recovered, reconnecting
[2021-05-06 19:40:39] [INFO] connection to node 2 succeeded
[2021-05-06 19:40:39] [INFO] original connection is still available
[2021-05-06 19:40:39] [INFO] 0 followers to notify
[2021-05-06 19:40:39] [INFO] switching to primary monitoring mode
[2021-05-06 19:40:39] [NOTICE] monitoring cluster primary "db2" (ID: 2)
[2021-05-06 19:40:39] [INFO] child node "db3" (ID: 3) is not yet attached
[2021-05-06 19:41:39] [NOTICE] new witness "db3" (ID: 3) has connected#原备节点查看状态，可见正在切换：
repmgr -f /home/postgres/repmgr/repmgr5.2.1.conf cluster showID | Name | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                                             
----+------+---------+----------------------+----------+----------+----------+----------+----------------------------------------------------------------1  | db1  | primary | ? unreachable        | ?        | default  | 100      |          | host=db1 user=repmgr dbname=repmgr port=1921 connect_timeout=22  | db2  | standby | ! running as primary |          | default  | 100      | 4        | host=db2 user=repmgr dbname=repmgr port=1921 connect_timeout=23  | db3  | witness | * running            | db2      | default  | 0        | n/a      | host=db3 user=repmgr dbname=repmgr port=1922 connect_timeout=2WARNING: following issues were detected- unable to connect to node "db1" (ID: 1)- node "db1" (ID: 1) is registered as an active primary but is unreachable- node "db2" (ID: 2) is registered as standby but running as primaryHINT: execute with --verbose option to see connection error messages#隔几分钟再次查看，已经切换完成：
repmgr -f /home/postgres/repmgr/repmgr5.2.1.conf cluster showID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                             
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------1  | db1  | primary | - failed  | ?        | default  | 100      |          | host=db1 user=repmgr dbname=repmgr port=1921 connect_timeout=22  | db2  | primary | * running |          | default  | 100      | 4        | host=db2 user=repmgr dbname=repmgr port=1921 connect_timeout=23  | db3  | witness | * running | db2      | default  | 0        | n/a      | host=db3 user=repmgr dbname=repmgr port=1922 connect_timeout=2WARNING: following issues were detected- unable to connect to node "db1" (ID: 1)HINT: execute with --verbose option to see connection error messages#原主节点作为新的备节点加入到集群：
repmgr node rejoin -f /home/postgres/repmgr/repmgr5.2.1.conf -d 'host=10.4.9.218 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/opt/pgsql12/bin/pg_rewind -D '/opt/pgdata/pg_root' --source-server='host=db2 user=repmgr dbname=repmgr port=1921 connect_timeout=2'"
NOTICE: 0 files copied to /opt/pgdata/pg_root
INFO: creating replication slot as user "repmgr"
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=db1 user=repmgr dbname=repmgr port=1921 connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "pg_ctl start -D /opt/pgdata/pg_root"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2repmgr -f /home/postgres/repmgr/repmgr5.2.1.conf cluster showID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                             
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------1  | db1  | standby |   running | db2      | default  | 100      | 3        | host=db1 user=repmgr dbname=repmgr port=1921 connect_timeout=22  | db2  | primary | * running |          | default  | 100      | 4        | host=db2 user=repmgr dbname=repmgr port=1921 connect_timeout=23  | db3  | witness | * running | db2      | default  | 0        | n/a      | host=db3 user=repmgr dbname=repmgr port=1922 connect_timeout=2#进行switchover切换回原来的主备状态
repmgr standby switchover -f /home/postgres/repmgr/repmgr5.2.1.conf --siblings-follow
NOTICE: executing switchover on node "db1" (ID: 1)
NOTICE: local node "db1" (ID: 1) will be promoted to primary; current primary "db2" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "db2" (ID: 2)
NOTICE: issuing CHECKPOINT on node "db2" (ID: 2) 
DETAIL: executing server command "/opt/pgsql12/bin/pg_ctl stop -D /opt/pgdata/pg_root -m fast"
INFO: checking for primary shutdown; 1 of 10 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 4/94000028
NOTICE: promoting standby to primary
DETAIL: promoting server "db1" (ID: 1) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "db1" (ID: 1) was successfully promoted to primary
INFO: local node 2 can attach to rejoin target node 1
DETAIL: local node's recovery point: 4/94000028; rejoin target node's fork point: 4/940000A0
INFO: creating replication slot as user "repmgr"
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=db2 user=repmgr dbname=repmgr port=1921 connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/opt/pgsql12/bin/pg_ctl start -D /opt/pgdata/pg_root"
NOTICE: replication slot "repmgr_slot_1" deleted on node 2
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1
NOTICE: node  "db1" (ID: 1) promoted to primary, node "db2" (ID: 2) demoted to standby
NOTICE: executing STANDBY FOLLOW on 1 of 1 siblings
INFO:  node 3 received notification to follow node 1
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
NOTICE: switchover was successful
DETAIL: node "db1" is now primary and node "db2" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully#查看集群状态，已切换原来状态repmgr -f /home/postgres/repmgr/repmgr5.2.1.conf cluster showID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                             
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------1  | db1  | primary | * running |          | default  | 100      | 5        | host=db1 user=repmgr dbname=repmgr port=1921 connect_timeout=22  | db2  | standby |   running | db1      | default  | 100      | 4        | host=db2 user=repmgr dbname=repmgr port=1921 connect_timeout=23  | db3  | witness | * running | db1      | default  | 0        | n/a      | host=db3 user=repmgr dbname=repmgr port=1922 connect_timeout=2

下面列出三个节点的配置文件

#db1
cat  repmgr5.2.1.conf 
node_id=1
node_name='db1'
conninfo='host=db1 user=repmgr dbname=repmgr port=1921 connect_timeout=2'
data_directory='/opt/pgdata/pg_root'
pg_bindir='/opt/pgsql12/bin'
use_replication_slots=true
failover=automatic
shutdown_check_timeout=10
monitoring_history=yes
reconnect_interval=5
reconnect_attempts=3
service_start_command='/opt/pgsql12/bin/pg_ctl start -D /opt/pgdata/pg_root'
service_stop_command='/opt/pgsql12/bin/pg_ctl stop -D /opt/pgdata/pg_root -m fast'
service_restart_command='/opt/pgsql12/bin/pg_ctl restart -D /opt/pgdata/pg_root'
service_reload_command='/opt/pgsql12/bin/pg_ctl reload -D /opt/pgdata/pg_root'
repmgrd_service_start_command='repmgrd -f /home/postgres/repmgr/repmgr5.2.1.conf --verbose --monitoring-history'
repmgrd_service_stop_command='kill `cat /tmp/repmgrd.pid`'
log_file='/home/postgres/repmgr/repmgr_melot.log'
promote_command='/opt/pgsql12/bin/repmgr standby promote -f /home/postgres/repmgr/repmgr5.2.1.conf --log-to-file'
follow_command='/opt/pgsql12/bin/repmgr standby follow -f /home/postgres/repmgr/repmgr5.2.1.conf --log-to-file'#db2
node_id=2
node_name='db2'
conninfo='host=db2 user=repmgr dbname=repmgr port=1921 connect_timeout=2'
data_directory='/opt/pgdata/pg_root'
pg_bindir='/opt/pgsql12/bin'
log_file='/home/postgres/repmgr/repmgr_melot.log'
use_replication_slots=true
failover=automatic
monitoring_history=yes
reconnect_interval=5
reconnect_attempts=3
service_start_command='/opt/pgsql12/bin/pg_ctl start -D /opt/pgdata/pg_root'
service_stop_command='/opt/pgsql12/bin/pg_ctl stop -D /opt/pgdata/pg_root -m fast'
service_restart_command='/opt/pgsql12/bin/pg_ctl restart -D /opt/pgdata/pg_root'
service_reload_command='/opt/pgsql12/bin/pg_ctl reload -D /opt/pgdata/pg_root'
repmgrd_service_start_command='repmgrd -f /home/postgres/repmgr/repmgr5.2.1.conf --verbose --monitoring-history'
repmgrd_service_stop_command='kill `cat /tmp/repmgrd.pid`'
promote_command='/opt/pgsql12/bin/repmgr standby promote -f /home/postgres/repmgr/repmgr5.2.1.conf --log-to-file'
follow_command='/opt/pgsql12/bin/repmgr standby follow -f /home/postgres/repmgr/repmgr5.2.1.conf --log-to-file'
shutdown_check_timeout=10#db3
node_id=3
node_name='db3'
conninfo='host=db3 user=repmgr dbname=repmgr port=1922 connect_timeout=2'
data_directory='/opt/pgdata_witness/pg_root'
pg_bindir='/opt/pgsql12/bin'
log_file='/home/postgres/repmgr/repmgr_witness.log'
use_replication_slots=true
failover=automatic
shutdown_check_timeout=10
monitoring_history=yes
reconnect_interval=5
reconnect_attempts=3
service_start_command='/opt/pgsql12/bin/pg_ctl start -D /opt/pgdata_witness/pg_root'
service_stop_command='/opt/pgsql12/bin/pg_ctl stop -D /opt/pgdata_witness/pg_root -m fast'
service_restart_command='/opt/pgsql12/bin/pg_ctl restart -D /opt/pgdata_witness/pg_root'
service_reload_command='/opt/pgsql12/bin/pg_ctl reload -D /opt/pgdata_witness/pg_root'
repmgrd_service_start_command='repmgrd -f /home/postgres/repmgr/repmgr5.2.1.conf --verbose --monitoring-history'
repmgrd_service_stop_command='kill `cat /tmp/repmgrd.pid`'
promote_command='/opt/pgsql12/bin/repmgr standby promote -f /home/postgres/repmgr/repmgr5.2.1.conf --log-to-file'
follow_command='/opt/pgsql12/bin/repmgr standby follow -f /home/postgres/repmgr/repmgr5.2.1.conf --log-to-file'

参考：
.1/repmgrd-automatic-failover.html

更多推荐

repmgr之使用witness自动failover

本文发布于:2024-03-11 20:38:42，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1729849.html