MS SQL查询性能

编程入门 行业动态 更新时间:2024-10-26 23:29:11
MS SQL查询性能 - 在vs表变量中(MS SQL query performance - in vs table variable)

我有一些字符串列表(字符串的数量从10到100不等),我需要尽可能高效地从一些大表(100K-5M记录)中选择值。 在我看来,我基本上有3个选项 - 使用'in'子句,使用表变量或使用临时表。 像这样的东西:

select col1, col2, col3, name from my_large_table where index_field1 = 'xxx' and index_field2 = 'yyy' and name in ('name1', 'name2', 'name3', ... 'nameX')

要么

declare @tbl table (name nvarchar(50)) insert @tbl(name) values ('name1', 'name2', 'name3', ... 'nameX') select col1, col2, col3, name from my_large_table inner join @tbl as tbl on (tbl.name = my_large_table.name) where index_field1 = 'xxx' and index_field2 = 'yyy'

大表具有聚簇索引(index_field1,index_field2,name,index_field3)。

实际上,对于每组名称,我从大表中有4-5个查询:根据某些逻辑选择,然后更新和/或插入和/或删除 - 每次限制查询这组名称。

名称集和查询是在.net客户端中动态构建的,因此不存在可读性,代码简单性或类似问题。 唯一的目标是达到最佳性能,因为这批次将被执行很多次。 所以问题是 - 我应该使用'in'子句,表变量或其他东西来写我的条件吗?

I have some list of strings (the number of string varies from 10 to 100) and I need to select values from some large table (100K-5M records) as efficiently as I can. It seems to me that I have 3 options basically - to use 'in' clause, to use table variable or to use temp table. something like this:

select col1, col2, col3, name from my_large_table where index_field1 = 'xxx' and index_field2 = 'yyy' and name in ('name1', 'name2', 'name3', ... 'nameX')

or

declare @tbl table (name nvarchar(50)) insert @tbl(name) values ('name1', 'name2', 'name3', ... 'nameX') select col1, col2, col3, name from my_large_table inner join @tbl as tbl on (tbl.name = my_large_table.name) where index_field1 = 'xxx' and index_field2 = 'yyy'

The large table has clustered index on (index_field1, index_field2, name, index_field3).

Actually for each set of names I have 4-5 queries from the large table: select, then update and/or insert and/or delete according to some logic - each time constraining the query on this set of names.

The name set and the queries are built dynamically in .net client, so there is no problems of readability, code simplicity or similar. The only goal is to reach the best performance, since this batch will be executed a lot of times. So the question is - should I use 'in' clause, table variable or something else to write my condition?

最满意答案

如前所述,您应该避免将表变量用于相当大的数据,因为它们不允许索引( 此处有更多详细信息)。

如果我说得对,你有多个查询使用同一组名称,所以我建议采用以下方法:

1)创建一个持久表( BufferTable )来保存单词列表: PkId, SessionId, Word 。

2)对于每个会话使用一些单词集:在这里批量插入你的单词(SessionId对于每批查询都是唯一的)。 对于数以百计的单词来说,这应该非常快。

3)写下你的查询,如下所示:

select col1, col2, col3, name from my_large_table LT join BufferTable B ON B.SessionId = @SessionId AND name = B.Word where LT.index_field1 = 'xxx' and LT.index_field2 = 'yyy'

4)SessionId的索引是名称,以获得最佳性能。

这样,您就不必为每个查询推送单词。

BufferTable最好定期清空,因为删除是昂贵的(当没有人在做某些事情时截断它是一个选项)。

As already mentioned you should avoid using table variables for pretty large data, as they do not allow indexes (more details here).

If I got it correctly, you have multiple queries using the same set of names, so I would suggest the following approach:

1) create a persistent table (BufferTable) to hold words list: PkId, SessionId, Word.

2) for each session using some set of words: bulk insert your words here (SessionId will be unique to each batch of queries). This should be very fast for tens-hundreds of words.

3) write your queries like the one below:

select col1, col2, col3, name from my_large_table LT join BufferTable B ON B.SessionId = @SessionId AND name = B.Word where LT.index_field1 = 'xxx' and LT.index_field2 = 'yyy'

4) An index on SessionId is name is required for best performance.

This way, you do not have to push the words for every query.

BufferTable is best emptied periodically, as deletes are expensive (truncate it when nobody is doing somehting on it is an option).

更多推荐

本文发布于:2023-08-02 22:03:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1381537.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:性能   MS   SQL

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!