MySQL检查时间范围内值变化的存在

编程入门行业动态更新时间:2024-10-24 01:56:31

本文介绍了MySQL检查时间范围内值变化的存在的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

为了简单起见，我在MySQL中有以下表格：

Suppose for simplicity that I have the following table in MySQL:

CREATE TABLE `events` ( `pv_name` varchar(60) NOT NULL, `time_stamp` bigint(20) unsigned NOT NULL, `value` text, `value_valid` tinyint(1) NOT NULL, PRIMARY KEY (`pv_name`,`time_stamp`), ) ENGINE=InnoDB;

我试图找到最有效的查询来实现下面的等效：

I am trying to find the most efficient query to implement the equivalent of the following:

给定一对时间戳 t0 和 t1 ：

每个 pv_name ：

从 pv_name 的行获取值，最大的 time_stamp < t0 （如果存在）。这是在时间范围开始时的过程变量的值。如果此值无效，请舍弃它。

Get the value from the row with this pv_name and the largest time_stamp <= t0 (if one exists). This is the value of the process variable at the beginning of the time range. If this value is not valid then discard it.

从此 pv_name 。

如果在1和2的组合值集合中有多个不同的值，则返回pv_name。

If there is more than one distinct value among the combined set of values from 1 and 2 then return the pv_name.

实质上，我试图找出哪些过程变量在给定时间范围内有值的变化，包括从其在时间范围开始时的值的变化。

In essence I am trying to find which process variables had a change in value in the given time range, including a change from the value that it had at the beginning of the time range.

表中有数十亿行的顺序，它将继续增长。在表中有大约100,000个不同的pv_names，它们将保持相当静态。

There are on the order of billions of rows in the table and it will continue to grow. There are on the order of 100,000 distinct pv_names in the table and they will remain fairly static. The vast majority of adjacent values (ordered by time_stamp for each pv_name) are expected to be distinct.

EDIT

如果我要从头开始实现这个，我会做以下：pv_names的集合将被存储在一个trie。特里结构中每个pv_name的值将是到二叉搜索树的链接。二叉搜索树将存储键，值对（time_stamp，value）。这些对中的每个对的值将是在相应的time_stamp的pv_name的值。

If I was going to implement this from scratch I would do the following: The set of pv_names would be stored in a trie. The value for each pv_name in the trie would be a link to a binary search tree. The binary search tree would store key, value pairs of (time_stamp, value). The value in each of these pairs would be the value of the pv_name at the corresponding time_stamp.

要找出哪些pv_names在给定time_range的值有变化，t1）我将执行以下操作：遍历trie中的每个pv_name，并按照链接到其二叉搜索树。在此树中查找最小的time_stamp小于或等于t0。如果不存在，则在此树中找到小于t1的最小time_stamp。如果没有这些，则进入到trie中的下一个pv_name。否则，以递增顺序遍历time_stamps，将与当前time_stamp相关联的值与与上一个相关联的值进行比较。如果它们不同，请打印出pv_name。停止迭代time_stamps。转到trie中的下一个pv_name并重复。如果达到大于或等于t1的time_stamp，并且没有找到差异，则转到trie中的下一个pv_name并重复。不要在比较中使用time_stamp t1的值。

To find out which pv_names had a change in value for a given time_range (t0, t1) I would do the following: Iterate through each pv_name in the trie and follow the link to its binary search tree. Find the greatest time_stamp in this tree less than or equal to t0. If none exists, find the smallest time_stamp in this tree less than t1. If none of these exist go to the next pv_name in the trie. Otherwise, iterate through the time_stamps in increasing order comparing the value associated with the current time_stamp to the value associated with the previous. If they differ, print out the pv_name. Stop iterating through the time_stamps. Go to the next pv_name in the trie and repeat. If a time_stamp greater than or equal to t1 is arrived at, and no differences have been found, then go to the next pv_name in the trie and repeat. Do not use the value for the time_stamp t1 in the comparisons.

Simplified example: pv_name | time_stamp | value A | 1.0 | 1.15 B | 2.0 | 1.00 A | 3.0 | 1.12 B | 4.0 | 1.00 A | 5.0 | 1.00 B | 6.0 | 1.00 A | 7.0 | 3.15 B | 8.0 | 9.13 A | 9.0 | 4.30 B | 10.0 | 1.00 A | 11.0 | 9.00 B | 12.0 | 1.00 time range | values of A | values of B | result (0.0,0.5) | NULL | NULL | NULL (1.5,2.0) | 1.15 | NULL | NULL (1.5,5.0) | 1.15, 1.12 | NULL, 1.00, 1.00 | A (4.0,9.0) | 1.12, 1.00, 3.15 | 1.00, 1.00, 9.13 | A, B (13.0,14.0) | 9.00 | 1.00 | NULL

我可以在MySQL或其他数据库中以相同或更高的效率执行相同的操作？

Can I do the equivalent with the same or better efficiency in MySQL or another database, relational or otherwise?

推荐答案

SELECT pv_name FROM ( -- Query for step 1 SELECT e1.pv_name, e3.value FROM (SELECT pv_name, MAX(time_stamp) AS start_time FROM events WHERE time_stamp <= @t0 GROUP BY pv_name) AS e1 JOIN events AS e3 ON e3.pv_name = e1.pv_name AND e3.time_stamp = e1.start_time WHERE e3.value_valid UNION DISTINCT -- Query for step 2 SELECT DISTINCT pv_name, value FROM events WHERE time_stamp BETWEEN @t0 AND t1 AND value_valid ) AS x GROUP BY pv_name HAVING COUNT(*) > 1

联合中的第一个子查询使用 SQL只选择列上具有最大值的行。我认为它应该是非常有效的，因为你的主键，但你可以尝试其中的一种其他技术。

The first subquery in the union uses one of the techniques in SQL Select only rows with Max Value on a Column. I think it should be pretty efficient because of your primary key, but you could try one of the other techniques there.

第二个子查询获取所有的值行 t0 - t1 时间范围，删除重复的值。

The second subquery gets all the valild rows in the t0 - t1 time range, removing duplicate values.

我不知道是否更有效在子查询和 UNION DISTINCT 中减少值，或者使用 UNION ALL 和 HAVING COUNT（DISTINCT value）> 1 。您需要对这两种方法进行基准化。

I don't know if it's more efficient to do the duplicate value suppression in the subquery and UNION DISTINCT, or defer it to the end with UNION ALL and HAVING COUNT(DISTINCT value) > 1. You'll need to benchmark the two methods.

更多推荐

MySQL检查时间范围内值变化的存在

本文发布于:2023-10-24 19:40:34，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1524821.html