Narayana / XA如何从TM故障中恢复？(How do Narayana/XA recover from TM failures?)

编程入门行业动态更新时间:2024-10-27 02:22:42

我试图推断可以保证同步数据源的系统/框架可以采取的故障恢复操作。我一直无法找到Narayana恢复机制的明确解释。

问题1：Narayana本质上是否采用两阶段提交来确保跨两个数据源的分布式事务？

Q2：有人可以解释Narayana在这种情况下的行为吗？

应用程序希望将X保存到2个数据存储 Narayana的事务管理器（TM）生成事务ID并将信息写入磁盘 TM现在向两个数据存储发送准备消息每个数据存储都使用prepare_success响应 TM更新本地事务日志并向两个数据存储发送提交消息 TM失败（永久）。并且由于网络上的数据包丢失，只有一个数据存储接收提交消息。但是其他数据存储接收并成功处理提交消息。

这两个数据存储现在彼此不同步（一个源具有另一个源中不存在的附加事务）。

启动新TM时，它无权访问旧的事务状态记录。因此TM无法在其中一个数据存储中启动丢失事务的恢复。

那么2PC / Narayana / XA如何声称他们保证可以保持2个数据存储同步的分布式事务呢？从我的立场来看，他们只能以非常高的概率维护同步数据存储，但他们无法保证。

问题3：另一种我不清楚应用程序/框架行为的情况。考虑以下交错事务（在同一记录上 - 或至少与部分重叠的记录集）：

Di =数据源i Ti =交易i Pi =准备交易i的消息

D1接收P1; 响应P1_success

D2接收P2; 响应P2_success

D1接收P2; 响应P2_failure

D2接收P1; 响应P1_failure

网络数据包到达不同数据源的顺序可以确定哪个准备请求成功。这是否意味着在有争议的记录的高交易速度下 - 所有交易都可能会一直失败（直到记录经历较低的交易请求率）？

有人可能会说我们选择的是一致性而不是可用性，但与ACID系统不同，不能保证至少有一个事务会成功（从而避免可能持久的死锁）。

I was trying to reason about failure recovery actions that can be taken by systems/frameworks which guarantee synchronous data sources. I've been unable to find a clear explanation of Narayana's recovery mechanism.

Q1: Does Narayana essentially employ a 2-phase commit to ensure distributed transactions across 2 datasources?

Q2: Can someone explain Narayana's behavior in this scenario?

Application wants to save X to 2 data stores Narayana's transaction manager (TM) generates a transaction ID and writes info to disk TM now sends a prepare message to both data stores Each data store responds back with prepare_success TM updates local transaction log and sends a commit message to both data stores TM fails (permanently). And because of packet loss on the network, only one data store receives the commit message. But the other data stores receives and successfully processes the commit message.

The two data stores are now out of sync with each other (one source has an additional transaction that is not present in the other source).

When a new TM is brought up, it does not have access to the old transaction state records. So the TM cannot initiate the recovery of the missing transaction in one of the data stores.

So how can 2PC/Narayana/XA claim that they guarantee distributed transactions that can maintain 2 data stores in sync? From where I stand, they can only maintain synchronous data stores with a very high probability, but they cannot guarantee it.

Q3: Another scenario where I'm unclear on the behavior of the application/framework. Consider the following interleaved transactions (both on the same record - or at least with a partially overlapping set of records):

Di = Data source i Ti = Transaction i Pi = prepare message for transaction i

D1 receives P1; responds P1_success

D2 receives P2; responds P2_success

D1 receives P2; responds P2_failure

D2 receives P1; responds P1_failure

The order in which the network packets arrive at the different data sources can determine which prepare request succeeds. Does this not mean that at high transaction speeds for a contentious record - it is possible that all transactions will keep failing (until the record experiences a lower transaction request rate)?

One could argue that we are choosing consistency over availability but unlike ACID systems there is no guarantee that at least one of the transactions will succeed (thus avoiding a potentially long-lasting deadlock).

最满意答案

我会推荐你关于Narayana 2PC如何工作的文章https://developer.jboss.org/wiki/TwoPhaseCommit2PC

对你的问题

Q1 ：你已经在评论中提到 - 是的，Narayana使用2PC = Narayana实现了XA规范（pubs.opengroup.org/onlinepubs/009680699/toc.pdf）。

Q2 ：场景中的步骤不准确。 Narayana在准备时写入磁盘，而不是在事务启动时写入。

应用程序将X保存到2个数据存储 TM现在向两个数据存储发送准备消息每个数据存储都使用prepare_success响应 TM将有关准备好的事务及其ID的永久信息永久保存到事务日志存储中 TM向两个数据存储发送提交消息 ...

我不同意2PC声称保证保持2个数据存储同步。我也想知道这个问题（例如在这里问到https://developer.jboss.org/message/954043 ）。 2PC声称保证ACID属性。有2个商店同步是CAP一致性的类型。

在这个Narayana中严格依赖于特定资源管理器（数据存储或数据存储的jdbc驱动程序）的功能。 ACID宣布

原子性 - 整个事务被提交或回滚（没有信息发生时，没有关于资源同步的信息）一致性 - 在事务结束之前和之后系统处于一致状态耐用性 - 即使发生崩溃，也会存储所有内容隔离 - （棘手的一个，留在最后） - 作为ACID，我们必须是可序列化的。那就是你可以“逐个”观察发生的交易。如果我采用一个非常简单的例子，为了表明我的观点 - 期望DB在事务开始时以一种天真的方式实现锁定整个数据库 - 你提交了jms消息，这已被处理，现在你不提交db记录。当DB工作在可序列化的隔离级别（这就是ACID要求的那样！）时，您的下一次写/读操作必须等到“正在进行准备”的事务得到解决。 DB只是卡住了等待。如果你读到你将无法得到答案，所以你不能说什么是价值。然后Narayana的恢复管理器在建立连接并提交之后进入准备好的事务。并且您阅读的操作返回的信息是“正确的”。

Q3 ：对不起，我不明白这个问题。但是，如果您声明The order in which the network packets arrive at the different data sources can determine which prepare request succeeds. 那么你是对的，你注定要失败的交易，直到网络变得更加稳定。

I would refer you to my article on how Narayana 2PC works https://developer.jboss.org/wiki/TwoPhaseCommit2PC

To your questions

Q1: you already mentioned that in the comment - yes, Narayana uses 2PC = Narayana implements the XA specification (pubs.opengroup.org/onlinepubs/009680699/toc.pdf).

Q2: The steps in the scenario are not precise. Narayana writes to disk at time of prepare is called, not at time the transaction is started.

Application saves X to 2 data stores TM now sends a prepare message to both data stores Each data store responds back with prepare_success TM saves permanently info about the prepared transaction and its ID to transaction log store TM sends a commit message to both data stores ...

I don't agree that 2PC claims to guarantee to maintain 2 data stores in sync. I was wondering about this too (e.g. asked here https://developer.jboss.org/message/954043). 2PC claims guaranteeing ACID properties. Having 2 stores in sync is kind of what CAP consistency is about.

In this Narayana strictly depends on capabilities of particular resource managers (data stores or jdbc drivers of data stores). ACID declares

atomicity - whole transaction is committed or rolled-back (no info when it happens, no info about resources in sync) consistency - before and when the transaction ends the system is in consistent state durability - all is stored even when a crash occurs isolation - (tricky one, left at the end) - for being ACID we have to be serializable. That's you can observe transactions happening "one by one". If I take a pretty simplified example, to show my point - expecting DB being implemented in a naive way of locking whole database when transaction starts - you committed jms message, that's processed and now you don't commit the db record. When DB works in the serializable isolation level (that's what ACID requires!) then your next write/read operation has to wait until the 'in-flight prepared' transaction is resolved. DB is just stuck and waiting. If you read you won't get answer so you can't say what is the value. The Narayana's recovery manager then come to that prepared transaction after connection is established and commit it. And you read action returns information that is 'correct'.

Q3: I don't understand the question, sorry. But if you state that The order in which the network packets arrive at the different data sources can determine which prepare request succeeds. then you are right, you are doomed to get failing transaction till network become more stable.

更多推荐

本文发布于:2023-07-23 11:58:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1231607.html