使用 IN(子查询)时性能损失很大.为什么?

编程入门行业动态更新时间:2024-10-27 05:31:50

本文介绍了使用 IN(子查询)时性能损失很大.为什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我使用的是 SQL Server 2005，当我想在 IN 子句中使用子查询时过滤一些结果时，我注意到了一些奇怪的事情.例如，这是我当前的查询，它平均运行 70 秒:

I am using SQL Server 2005 and I have noticed something strange when I want to filter some results when using a subquery within an IN clause. For instance, this is my current query and it runs in 70 seconds on average:

select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales from Archive where CustomerID = 20 and ReportDate = '2/3/2011' and Phone in ( select Phone from PlanDetails where Phone is not null and Length is not null and PlannedImp > 0 and CustomerID = 20 and (StatusID <> 2 and StatusID <> 7) and SubcategoryID = 88 ) group by Phone, ZipCode

但是，如果我将它们分解为 2 个单独的查询，则每个查询的运行时间不到 1 秒.

However, if I break them down into 2 separate queries, they take under 1 second each to run.

select Phone from PlanDetails where Phone is not null and Length is not null and PlannedImp > 0 and CustomerID = 20 and (StatusID <> 2 and StatusID <> 7) and SubcategoryID = 88

和

select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales from Archive where CustomerID = 20 and ReportDate = '2/3/2011' group by Phone, ZipCode

最后，如果我这样做，它会返回与第一个查询相同的结果，但大约需要 2-3 秒:

Finally, if I do this it returns the same results as the first query, but in about 2-3 seconds:

select Phone into #tempTable from PlanDetails where Phone is not null and Length is not null and PlannedImp > 0 and CustomerID = 20 and (StatusID <> 2 and StatusID <> 7) and SubcategoryID = 88 select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales from Archive where CustomerID = 20 and ReportDate = '2/3/2011' and Phone in ( select Phone from #tempTable ) group by Phone, ZipCode

在过去的几周里，我一直注意到不仅这个查询很慢，而且任何在 IN 子句中使用(有点复杂)子查询的查询只会破坏性能.这是什么原因?

For the past few weeks I have been noticing that not just this query is slow, but any query using a (somewhat complicated) subquery inside of an IN clause just ruins the performance. What is the reason for that?

可用于这些查询中的任何一个的唯一索引是两个表的 CustomerID 上的非聚集索引.我查看了慢速查询和快速查询的执行计划，发现存档表上的非聚集索引查找是迄今为止成本最高的百分比 (80-90%).但是，唯一的区别是慢查询中的这一步的 CPU 成本为 7.1，而快速查询的 CPU 成本为 1.7.

The only indexes that are available for either of these queries to use is a nonclustered index on CustomerID for both tables. I looked at the execution plans of both the slow query and the fast ones, and saw that the nonclustered index seek on the Archive table is what takes the highest percentage of cost by far (80-90%). However, the only difference is that that step in the slow query has a CPU cost of 7.1 while the fast ones have a CPU cost of 1.7.

推荐答案

这取决于数据库系统、版本、设置等，但通常最终发生的是数据库无法(或拒绝)缓存该内部查询，因此它会在外部查询的每次迭代中执行.您正在将问题从 O(n) 效率等级更改为 O(n^2).

It depends upon the database system, the version, the settings, etc. but generally what ends up happening is the database fails (or refuses) to cache that inner query, so it is being executed every single iteration of the outer query. You are changing your problem from a O(n) efficiency class to O(n^2).