MySQL检查时间范围内值变化的存在

编程入门 行业动态 更新时间:2024-10-24 01:56:31
本文介绍了MySQL检查时间范围内值变化的存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

为了简单起见,我在MySQL中有以下表格:

Suppose for simplicity that I have the following table in MySQL:

CREATE TABLE `events` ( `pv_name` varchar(60) NOT NULL, `time_stamp` bigint(20) unsigned NOT NULL, `value` text, `value_valid` tinyint(1) NOT NULL, PRIMARY KEY (`pv_name`,`time_stamp`), ) ENGINE=InnoDB;

我试图找到最有效的查询来实现下面的等效:

I am trying to find the most efficient query to implement the equivalent of the following:

给定一对时间戳 t0 和 t1 :

每个 pv_name :

  • 从 pv_name 的行获取值,最大的 time_stamp < t0 (如果存在)。这是在时间范围开始时的过程变量的值。如果此值无效,请舍弃它。

  • Get the value from the row with this pv_name and the largest time_stamp <= t0 (if one exists). This is the value of the process variable at the beginning of the time range. If this value is not valid then discard it.

    从此 pv_name

    如果在1和2的组合值集合中有多个不同的值,则返回pv_name。

    If there is more than one distinct value among the combined set of values from 1 and 2 then return the pv_name.

    实质上,我试图找出哪些过程变量在给定时间范围内有值的变化,包括从其在时间范围开始时的值的变化。

    In essence I am trying to find which process variables had a change in value in the given time range, including a change from the value that it had at the beginning of the time range.

    表中有数十亿行的顺序,它将继续增长。在表中有大约100,000个不同的pv_names,它们将保持相当静态。

    There are on the order of billions of rows in the table and it will continue to grow. There are on the order of 100,000 distinct pv_names in the table and they will remain fairly static. The vast majority of adjacent values (ordered by time_stamp for each pv_name) are expected to be distinct.

    EDIT

    如果我要从头开始实现这个,我会做以下:pv_names的集合将被存储在一个trie。特里结构中每个pv_name的值将是到二叉搜索树的链接。二叉搜索树将存储键,值对(time_stamp,value)。这些对中的每个对的值将是在相应的time_stamp的pv_name的值。

    If I was going to implement this from scratch I would do the following: The set of pv_names would be stored in a trie. The value for each pv_name in the trie would be a link to a binary search tree. The binary search tree would store key, value pairs of (time_stamp, value). The value in each of these pairs would be the value of the pv_name at the corresponding time_stamp.

    要找出哪些pv_names在给定time_range的值有变化,t1)我将执行以下操作:遍历trie中的每个pv_name,并按照链接到其二叉搜索树。在此树中查找最小的time_stamp小于或等于t0。如果不存在,则在此树中找到小于t1的最小time_stamp。如果没有这些,则进入到trie中的下一个pv_name。否则,以递增顺序遍历time_stamps,将与当前time_stamp相关联的值与与上一个相关联的值进行比较。如果它们不同,请打印出pv_name。停止迭代time_stamps。转到trie中的下一个pv_name并重复。如果达到大于或等于t1的time_stamp,并且没有找到差异,则转到trie中的下一个pv_name并重复。不要在比较中使用time_stamp t1的值。

    To find out which pv_names had a change in value for a given time_range (t0, t1) I would do the following: Iterate through each pv_name in the trie and follow the link to its binary search tree. Find the greatest time_stamp in this tree less than or equal to t0. If none exists, find the smallest time_stamp in this tree less than t1. If none of these exist go to the next pv_name in the trie. Otherwise, iterate through the time_stamps in increasing order comparing the value associated with the current time_stamp to the value associated with the previous. If they differ, print out the pv_name. Stop iterating through the time_stamps. Go to the next pv_name in the trie and repeat. If a time_stamp greater than or equal to t1 is arrived at, and no differences have been found, then go to the next pv_name in the trie and repeat. Do not use the value for the time_stamp t1 in the comparisons.

    Simplified example: pv_name | time_stamp | value A | 1.0 | 1.15 B | 2.0 | 1.00 A | 3.0 | 1.12 B | 4.0 | 1.00 A | 5.0 | 1.00 B | 6.0 | 1.00 A | 7.0 | 3.15 B | 8.0 | 9.13 A | 9.0 | 4.30 B | 10.0 | 1.00 A | 11.0 | 9.00 B | 12.0 | 1.00 time range | values of A | values of B | result (0.0,0.5) | NULL | NULL | NULL (1.5,2.0) | 1.15 | NULL | NULL (1.5,5.0) | 1.15, 1.12 | NULL, 1.00, 1.00 | A (4.0,9.0) | 1.12, 1.00, 3.15 | 1.00, 1.00, 9.13 | A, B (13.0,14.0) | 9.00 | 1.00 | NULL

    我可以在MySQL或其他数据库中以相同或更高的效率执行相同的操作?

    Can I do the equivalent with the same or better efficiency in MySQL or another database, relational or otherwise?

    推荐答案

    SELECT pv_name FROM ( -- Query for step 1 SELECT e1.pv_name, e3.value FROM (SELECT pv_name, MAX(time_stamp) AS start_time FROM events WHERE time_stamp <= @t0 GROUP BY pv_name) AS e1 JOIN events AS e3 ON e3.pv_name = e1.pv_name AND e3.time_stamp = e1.start_time WHERE e3.value_valid UNION DISTINCT -- Query for step 2 SELECT DISTINCT pv_name, value FROM events WHERE time_stamp BETWEEN @t0 AND t1 AND value_valid ) AS x GROUP BY pv_name HAVING COUNT(*) > 1

    联合中的第一个子查询使用 SQL只选择列上具有最大值的行。我认为它应该是非常有效的,因为你的主键,但你可以尝试其中的一种其他技术。

    The first subquery in the union uses one of the techniques in SQL Select only rows with Max Value on a Column. I think it should be pretty efficient because of your primary key, but you could try one of the other techniques there.

    第二个子查询获取所有的值行 t0 - t1 时间范围,删除重复的值。

    The second subquery gets all the valild rows in the t0 - t1 time range, removing duplicate values.

    我不知道是否更有效在子查询和 UNION DISTINCT 中减少值,或者使用 UNION ALL 和 HAVING COUNT(DISTINCT value)> 1 。您需要对这两种方法进行基准化。

    I don't know if it's more efficient to do the duplicate value suppression in the subquery and UNION DISTINCT, or defer it to the end with UNION ALL and HAVING COUNT(DISTINCT value) > 1. You'll need to benchmark the two methods.

  • 更多推荐

    MySQL检查时间范围内值变化的存在

    本文发布于:2023-10-24 19:40:34,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1524821.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:范围内   时间   MySQL

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!