背景:我正在开发一个系统,其中开发人员似乎使用一个函数执行MYSQL查询像SELECT MAX(id)AS id FROM TABLE
Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).
我知道这是一个可怕的做法(因为并发请求会混乱记录),我试图与非技术/管理团队沟通,他们的回答是...
I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...
"Oh okay, we'll only face this problem when we have (a) a lot of users, or (b) it'll only happen when two people try doing something at _exactly_ the same time"
我不同意任何一点,认为我们会比我们计划更快地遇到这个问题。但是,我试图计算(或者计算一个机制)来计算在我们开始看到混乱的链接之前有多少用户应该使用系统。
I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.
任何数学见解进入?再次,我知道它的可怕的做法,我只是想了解在这种情况下的变量...
Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...
更新:感谢您的意见,我们正在朝着正确的方向前进,并得到固定的代码。
Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!
推荐答案潜在的不利情况很可能。关键是如果他们是可能的。只要有问题发生的非平凡的概率,如果它是已知的,应该避免。
The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.
这不是我们在谈论改变一行函数调用5000线怪物来处理可能的边缘情况。我们正在谈论的是将呼叫缩短到更易读和更正确的用法。
It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.
我同意@Mark Baker有一些性能考虑,但是由于 id 是主键,因此 MAX 查询将非常快速。当然, LAST_INSERT_ID()会更快(因为它只是从会话变量读取),但只有一个微不足道的数量。
I kind of agree with @Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.
而且你不需要很多用户来发生这种情况。所有你需要的是很多并发请求(甚至不是很多)。如果插入的开始和select的开始之间的时间为50毫秒(假设事务安全的数据库引擎),那么你只需要每秒20个请求, 。关键是错误的窗口是不平凡的。如果你说每秒20个请求(实际上并不是很多),而假设一般人每分钟访问一页,你只能说1200个用户。这是为了定期发生。它可能只与2个用户发生一次。
And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.
直接从关于主题的MySQL文档:
You can generate sequences without calling LAST_INSERT_ID(), but the utility of using the function this way is that the ID value is maintained in the server as the last automatically generated value. It is multi-user safe because multiple clients can issue the UPDATE statement and get their own sequence value with the SELECT statement (or mysql_insert_id()), without affecting or being affected by other clients that generate their own sequence values.更多推荐
在MYSQL中使用SELECT MAX(id)而不是PHP中的mysql
发布评论