Debezium异常退出问题排查小记

编程入门 行业动态 更新时间:2024-10-06 19:21:39

Debezium异常退出问题排查<a href=https://www.elefans.com/category/jswz/34/1764832.html style=小记"/>

Debezium异常退出问题排查小记

作者:瀚高PG实验室(Highgo PG Lab)- 徐云鹤

问题描述

今天在一套测试环境下遇到了如下问题,当数据库执行update后,debezium中止并报错:

[FATAL] Replication stream was unexpectedly terminated: ERROR:  no known snapshots
CONTEXT:  slot "myslot", output plugin "debezium", in the change callback, associated LSN 19/B6A4A448

故障处理过程

首先查询数据库运行日志。

2021-08-09 11:31:45.370 CST [62063] postgres@mydb app=debezium LOG:  received replication command: IDENTIFY_SYSTEM
2021-08-09 11:31:45.373 CST [62063] postgres@mydb app=debezium LOG:  received replication command: START_REPLICATION SLOT "myslot" LOGICAL 19/B22FE808 ("error_policy" 'exit',"table_list_path" '/home/postgres/app/table_list.conf')
2021-08-09 11:31:45.373 CST [62063] postgres@mydb app=debezium LOG:  starting logical decoding for slot "myslot"
2021-08-09 11:31:45.373 CST [62063] postgres@mydb app=debezium DETAIL:  streaming transactions committing after 19/B3000000, reading WAL from 19/B22FE808
2021-08-09 11:31:45.373 CST [62063] postgres@mydb app=debezium LOG:  logical decoding found consistent point at 19/B22FE808
2021-08-09 11:31:45.373 CST [62063] postgres@mydb app=debezium DETAIL:  There are no running transactions.
Mon Aug  9 11:31:46 CST 2021
2021-08-09 11:31:46.629 CST [62063] postgres@mydb app=debezium ERROR:  no known snapshots
2021-08-09 11:31:46.629 CST [62063] postgres@mydb app=debezium CONTEXT:  slot "myslot", output plugin "debezium", in the change callback, associated LSN 19/B6A4A448

初步怀疑是复制槽的问题,在主库查询出该复制槽,进行手动删除。让debezium重新生成一个新复制槽,发现不行,报错依旧,包括重启数据库,都不好使。。
再次分析日志,通过上述日志中的报错信息“DETAIL: There are no running transactions.”,定位源码是
snapbuild.c中执行SnapBuildFindSnapshot函数。源码内容如下。

	/** a) No transaction were running, we can jump to consistent.** This is not affected by races around xl_running_xacts, because we can* miss transaction commits, but currently not transactions starting.** NB: We might have already started to incrementally assemble a snapshot,* so we need to be careful to deal with that.*/if (running->oldestRunningXid == running->nextXid){if (builder->start_decoding_at == InvalidXLogRecPtr ||builder->start_decoding_at <= lsn)/* can decode everything after this */builder->start_decoding_at = lsn + 1;/* As no transactions were running xmin/xmax can be trivially set. */builder->xmin = running->nextXid;	/* < are finished */builder->xmax = running->nextXid;	/* >= are running *//* so we can safely use the faster comparisons */Assert(TransactionIdIsNormal(builder->xmin));Assert(TransactionIdIsNormal(builder->xmax));builder->state = SNAPBUILD_CONSISTENT;SnapBuildStartNextPhaseAt(builder, InvalidTransactionId);ereport(LOG,(errmsg("logical decoding found consistent point at %X/%X",(uint32) (lsn >> 32), (uint32) lsn),errdetail("There are no running transactions.")));return false;}

执行判断条件running->oldestRunningXid == running->nextXid为true,说明获取的的旧的事务ID和新的事务ID相同,抛出上述日志信息。
继续通过查询源码定位到是tuptoaster.c中执行init_toast_snapshot函数抛出的异常。

/* ----------* init_toast_snapshot**	Initialize an appropriate TOAST snapshot.  We must use an MVCC snapshot*	to initialize the TOAST snapshot; since we don't know which one to use,*	just use the oldest one.  This is safe: at worst, we will get a "snapshot*	too old" error that might have been avoided otherwise.*/
static void
init_toast_snapshot(Snapshot toast_snapshot)
{Snapshot	snapshot = GetOldestSnapshot();if (snapshot == NULL)elog(ERROR, "no known snapshots");InitToastSnapshot(*toast_snapshot, snapshot->lsn, snapshot->whenTaken);
}

这一块代码就是初始化一个合适的TOAST快照,会获取最老的一个快照,由于没获取到导致snapshot变量为NULL,导致异常被抛出。
目前通过修改表结构,将行外存储改为行内存储。

alter table complain_info alter linknum SET STORAGE plain;
vacuum full complain_info;

经验证后问题不再出现。
至此问题初步解决。

更多推荐

Debezium异常退出问题排查小记

本文发布于:2024-02-06 15:33:52,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1750006.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:小记   异常   Debezium

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!