MySQL关系分区（IN AND而不是IN OR）实现的性能差异是什么？

编程入门行业动态更新时间:2024-10-14 20:26:45

本文介绍了MySQL关系分区（IN AND而不是IN OR）实现的性能差异是什么？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

因为MySQL没有内置的关系分割运算符，程序员必须实现自己的。此回答在这里有两个实施示例。

对于后代，我将在下面列出：

使用GROUP BY / HAVING

SELECT t.documentid FROM TABLE t WHERE t.termid IN ，3） GROUP BY t.documentid HAVING COUNT（DISINCT t.termid）= 3

注意，你必须使用HAVING COUNT（DISTINCT，因为重复termid为2对于同一个documentid将是一个假正数，并且COUNT必须等于

使用JOINs

SELECT t.documentid FROM TABLE t JOIN TABLE x ON x.termid = t.termid AND x.termid = 1 JOIN TABLE y ON y.termid = t.termid AND y.termid = 2 JOIN TABLE z ON z.termid = t.termid AND z.termid = 3

但是这个对于处理很多变化的标准来说可能是一个痛苦。

解决方案 div>
我在 JOIN 版本中做了一些改进;见下文。

我投票赞成JOIN方法的速度。以下是我的决定方式：

mysql> FLUSH状态; mysql> SELECT city - > FROM us_vch200 - > WHERE state IN（'IL'，'MO'，'PA'） - > GROUP BY城市 - > HAVING计数（DISTINCT状态）> = 3; + ------------- + |城市| + --------- + | Springfield | |华盛顿| + ------------- + mysql> SHOW SESSION STATUS LIKE'Handler％'; + ---------------------------- + ------- + | Variable_name |价值| + ---------------------------- + ------- + | Handler_external_lock | 2 | | Handler_read_first | 1 | | Handler_read_key | 2 | | Handler_read_last | 1 | | Handler_read_next | 4175 | - 全索引扫描（etc） + ---- + ------------- + ------ ----- + ------- + ----------------------- + ------------ + --------- + ------ + ------ + ------------------------- ------------------------- + | id | select_type |表|类型| possible_keys |键| key_len | ref |行|额外| + ---- + ------------- + ----------- + ------- + ------ ----------------- + ------------ + --------- + ------ + - ---- + --------------------------------------------- ----- + | 1 | SIMPLE | us_vch200 |范围| state_city，city_state | city_state | 769 | NULL | 4176 |使用where;使用分组（扫描）|的索引 + ---- + ------------- + ----------- + ------- + ------ ----------------- + ------------ + --------- + ------ + - ---- + --------------------------------------------- ----- +
'Extra'指出它决定处理 GROUP BY 并使用 INDEX（城市，州），即使 INDEX

它切换到 INDEX（州，城市）产生：
mysql> FLUSH状态; mysql> SELECT city - > FROM us_vch200 IGNORE INDEX（city_state） - > WHERE state IN（'IL'，'MO'，'PA'） - > GROUP BY城市 - > HAVING计数（DISTINCT状态）> = 3; + ------------- + |城市| + ------------- + | Springfield | |华盛顿| + ------------- + mysql> SHOW SESSION STATUS LIKE'Handler％'; + ---------------------------- + ------- + | Variable_name |价值| + ---------------------------- + ------- + | Handler_commit | 1 | | Handler_external_lock | 2 | | Handler_read_key | 401 | | Handler_read_next | 398 | | Handler_read_rnd | 398 | （etc） + ---- + ------------- + ----------- + --- ---- + ----------------------- + ------------ + -------- - + ------ + ------ + ---------------------------------- -------- + | id | select_type |表|类型| possible_keys |键| key_len | ref |行|额外| + ---- + ------------- + ----------- + ------- + ------ ----------------- + ------------ + --------- + ------ + - ---- + ------------------------------------------ + | 1 | SIMPLE | us_vch200 |范围| state_city，city_state | state_city | 2 | NULL | 397 |使用where;使用索引;使用filesort | + ---- + ------------- + ----------- + ------- + ------ ----------------- + ------------ + --------- + ------ + - ---- + ------------------------------------------ +
JOIN
mysql> SELECT x.city - > FROM us_vch200 x - > JOIN us_vch200 y ON y.city = x.city AND y.state ='MO' - > JOIN us_vch200 z ON z.city = x.city AND z.state ='PA' - > WHERE x.state ='IL'; + ------------- + |城市| + ------------- + | Springfield | |华盛顿| + ------------- + 集合中的2行（0.00秒） mysql> SHOW SESSION STATUS LIKE'Handler％'; + ---------------------------- + ------- + | Variable_name |价值| + ---------------------------- + ------- + | Handler_commit | 1 | | Handler_external_lock | 6 | | Handler_read_key | 86 | | Handler_read_next | 87 | （etc） + ---- + ------------- + ------- + ------ + ---- ------------------- + ------------ + --------- + ------- ------------- + ------ + -------------------------- + | id | select_type |表|类型| possible_keys |键| key_len | ref |行|额外| + ---- + ------------- + ------- + ------ + ----------- ------------ + ------------ + --------- + -------------- ------ + ------ + -------------------------- + | 1 | SIMPLE | y | ref | state_city，city_state | state_city | 2 | const | 81 |使用where;使用索引| | 1 | SIMPLE | z | ref | state_city，city_state | state_city | 769 | const，world.y.city | 1 |使用where;使用索引| | 1 | SIMPLE | x | ref | state_city，city_state | state_city | 769 | const，world.y.city | 1 |使用where;使用索引| + ---- + ------------- + ------- + ------ + ----------- ------------ + ------------ + --------- + -------------- ------ + ------ + -------------------------- +
只需要 INDEX（州，城市）。

请注意优化器如何构建自己的头脑，开始的表，可能是由于
+ ------- + ---------- + |状态| COUNT（*）| + ------- + ---------- + | IL | 221 | | MO | 81 | - 最小 | PA | 96 | + ------- + ---------- +
结论

JOIN t table）可能是最快的。此外，还需要此复合索引： INDEX（州，城市）。

要转换回您的用例：
city - > documentid state - > termid
Caveat：YMMV，因为documentid和termid的值的分布可能与测试用例我用过。

Because MySQL does not have a built in relational division operator, programmers must implement their own. There are two leading examples of implementations which can be found in this answer here.

For posterity I'll list them below:
Using GROUP BY/HAVING
SELECT t.documentid FROM TABLE t WHERE t.termid IN (1,2,3) GROUP BY t.documentid HAVING COUNT(DISINCT t.termid) = 3
The caveat is that you have to use HAVING COUNT(DISTINCT because duplicates of termid being 2 for the same documentid would be a false positive. And the COUNT has to equal the number of termid values in the IN clause.
Using JOINs
SELECT t.documentid FROM TABLE t JOIN TABLE x ON x.termid = t.termid AND x.termid = 1 JOIN TABLE y ON y.termid = t.termid AND y.termid = 2 JOIN TABLE z ON z.termid = t.termid AND z.termid = 3
But this one can be a pain for handling criteria that changes a lot.

Of these two implementation techniques, which one would offer the best performance?
解决方案
I made some improvements in the JOIN version; see below.

I vote for the JOIN approach for speed. Here's how I determined it:

HAVING, version 1
mysql> FLUSH STATUS; mysql> SELECT city -> FROM us_vch200 -> WHERE state IN ('IL', 'MO', 'PA') -> GROUP BY city -> HAVING count(DISTINCT state) >= 3; +-------------+ | city | +-------------+ | Springfield | | Washington | +-------------+ mysql> SHOW SESSION STATUS LIKE 'Handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_external_lock | 2 | | Handler_read_first | 1 | | Handler_read_key | 2 | | Handler_read_last | 1 | | Handler_read_next | 4175 | -- full index scan (etc) +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+ | 1 | SIMPLE | us_vch200 | range | state_city,city_state | city_state | 769 | NULL | 4176 | Using where; Using index for group-by (scanning) | +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
The 'Extra' points out that it decided to tackle the GROUP BY and use INDEX(city, state) even though INDEX(state, city) might make sense.

HAVING, version 2

Making it switch to INDEX(state, city) yields:
mysql> FLUSH STATUS; mysql> SELECT city -> FROM us_vch200 IGNORE INDEX(city_state) -> WHERE state IN ('IL', 'MO', 'PA') -> GROUP BY city -> HAVING count(DISTINCT state) >= 3; +-------------+ | city | +-------------+ | Springfield | | Washington | +-------------+ mysql> SHOW SESSION STATUS LIKE 'Handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_external_lock | 2 | | Handler_read_key | 401 | | Handler_read_next | 398 | | Handler_read_rnd | 398 | (etc) +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+ | 1 | SIMPLE | us_vch200 | range | state_city,city_state | state_city | 2 | NULL | 397 | Using where; Using index; Using filesort | +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
JOIN
mysql> SELECT x.city -> FROM us_vch200 x -> JOIN us_vch200 y ON y.city= x.city AND y.state = 'MO' -> JOIN us_vch200 z ON z.city= x.city AND z.state = 'PA' -> WHERE x.state = 'IL'; +-------------+ | city | +-------------+ | Springfield | | Washington | +-------------+ 2 rows in set (0.00 sec) mysql> SHOW SESSION STATUS LIKE 'Handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_external_lock | 6 | | Handler_read_key | 86 | | Handler_read_next | 87 | (etc) +----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+ | 1 | SIMPLE | y | ref | state_city,city_state | state_city | 2 | const | 81 | Using where; Using index | | 1 | SIMPLE | z | ref | state_city,city_state | state_city | 769 | const,world.y.city | 1 | Using where; Using index | | 1 | SIMPLE | x | ref | state_city,city_state | state_city | 769 | const,world.y.city | 1 | Using where; Using index | +----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
Only INDEX(state, city) is needed. The Handler numbers are the smallest for this formulation, so I deduce that it is the fastest.

Notice how the optimizer made up its own mind which table to start with, probably due to
+-------+----------+ | state | COUNT(*) | +-------+----------+ | IL | 221 | | MO | 81 | -- smallest | PA | 96 | +-------+----------+
Conclusions

JOIN (without the unnecessary t table) is probably the fastest. Plus this composite index is needed: INDEX(state, city).

To translate back to your use case:
city --> documentid state --> termid
Caveat: YMMV because the distribution of values for documentid and termid could be quite different than the test case I used.

更多推荐

MySQL关系分区（IN AND而不是IN OR）实现的性能差异是什么？

本文发布于:2023-11-24 13:23:56，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1625374.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

分区而不是差异性能关系

上一篇：如何使用PowerShell格式化分区(而不是卷)？

下一篇：比特币全节点同步加速记录（使用Bitcoin Core钱包）

发布评论取消回复

评论列表（有 0 条评论）

最近发表

荆门网站建设的重要性

win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法

您可以尝试添加 --skip-broken 选项来解决该问题您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案

关于无线网络波动大的解决办法

Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法

VS 2019 点击页面自动定位到解决方案资源管理器目录位置

（亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法

Typora官网下载的最新版本mac10.13以下版本用不了的解决办法

成功解决ModuleNotFoundError: No module named ‘torch._C‘

MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍！

热门文章

从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止

币安API错误代码1102，未发送强制参数“时间戳”

如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题

在 Node.js 中从网络流创建 blob

使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？

使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流

如何从nodejs连接laravel>laravel

使用nodejs观看目录

如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？

FirebaseError：无法从.env加载环境变量

标签列表

文件

如何在

Python

系统

java

方法

数据

错误

windows

函数

android

linux

教程

如何使用

代码

字符串

计算机

电脑

服务器

NET

应用程序

数组

PHP

MySQL

SQL

对象

项目

程序

数据库

word