非标识列上的聚集索引可加速批量插入？(Clustered indexes on non

编程入门行业动态更新时间:2024-10-09 03:19:52

非标识列上的聚集索引可加速批量插入？(Clustered indexes on non-identity columns to speed up bulk inserts?)

我的两个问题是：

我可以使用聚簇索引来加速大表中的大容量插入吗？如果我的IDENTITY列不再是聚簇索引，那么我能否仍然有效地使用外键关系？

详细说来，我有一个包含公司数据的非常大的数据库（在100-1000毫升行之间）。通常，在这样的表格中存在关于20-40家公司的数据，每个数据都是由“CompanyIdentifier”（INT）标记的他们自己的“块”。此外，每个公司都有大约20个部门，每个部门都有自己的“子部门”，由“部门标识”（INT）标注。

经常发生的是，整个“块”或“子块”被添加或从表中删除。我的第一个想法是在这些块上使用表分区，但是因为我使用的是SQL Server 2008标准版，所以我无权使用它。尽管如此，大多数查询都是在“块”或“子块”上执行的，而不是作为一个整体执行。

我一直在努力为以下功能优化这些表格：

在子弹上运行的查询 “Benchmarking”作为一个整体在桌面上运行的查询插入/删除大块数据。

对于1）和2）我没有遇到很多问题。我在关键字段（也包含CompanyIdentifier和DepartmentIdentifier在哪里有用）创建了几个索引，查询运行良好。

但是，3）我一直在努力寻找一个好的解决方案。我的第一个策略是始终禁用索引，批量插入大块和重建索引。这在开始时非常快，但是现在数据库中有很多公司，每次重建索引需要很长时间。

目前我的策略已经改变，只是在插入时离开索引，因为现在看起来更快。但我想进一步优化插入速度。

我似乎已经注意到，通过添加在CompanyIdentifier + DepartmentIdentifier上定义的聚簇索引，向表中加载新“块”的速度更快。在放弃此策略之前，我倾向于在IDENTITY列上添加聚簇索引，因为有几篇文章指出聚簇索引包含在所有其他索引中，因此聚簇索引应该尽可能小。但现在我正在考虑重振这一旧策略来加速插入。我的问题是，这是明智的，还是我会在其他领域遭受打击？这会真的加快我的插入速度，还是只是我的想象？

我也不确定在我的情况下是否真的需要IDENTITY列。我希望能够与其他表建立外键关系，但是我也可以使用类似CompanyIdentifier + DepartmentIdentifier + [uniquifier]的方案吗？或者它是否必须是一个表格范围，零碎的IDENTITY号码？

任何建议或解释都非常感谢。

My two questions are:

Can I use clustered indexes to speed up bulk inserts in big tables? Can I then still efficiently use foreign key relationships if my IDENTITY column is not the clustered index anymore?

To elaborate, I have a database with a couple of very big (between 100-1000 mln rows) tables containing company data. Typically there is data about 20-40 companies in such a table, each as their own "chunk" marked by "CompanyIdentifier" (INT). Also, every company has about 20 departments, each with their own "subchunk" marked by "DepartmentIdentifier" (INT).

It frequently happens that a whole "chunk" or "subchunk" is added or removed from the table. My first thought was to use Table Partitioning on those chunks, but since I am using SQL Server 2008 Standard Edition I am not entitled to it. Still, most queries I have are executed on a "chunk" or "subchunk" rather than on the table as a whole.

I have been working to optimize these tables for the following functions:

Queries that are run on subchunks "Benchmarking" queries that are run on the table as a whole Inserting/removing big chunks of data.

For 1) and 2) I haven't encountered a lot of problems. I have created several indexes on key fields (also containing CompanyIdentifier and DepartmentIdentifier where useful) and the queries are running fine.

But for 3) I have struggled to find a good solution. My first strategy was to always disable indexes, bulk insert a big chunk and rebuild indexes. This was very fast in the beginning, but now that there are a lot of companies in the database, it takes a very long time to rebuild the index each time.

At the moment my strategy has changed to just leaving the index on while inserting, since this seems to be faster now. But I want to optimize the insert speed even further.

I seem to have noticed that by adding a clustered index defined on CompanyIdentifier + DepartmentIdentifier, the loading of new "chunks" into the table is faster. Before I had abandoned this strategy in favour of adding a clustered index on an IDENTITY column, as several articles pointed out to me that the clustered index is contained in all other indexes and so the clustered index should be as small as possible. But now I am thinking of reviving this old strategy to speed up the inserts. My question, would this be wise, or will I suffer performance hits in other areas? And will this really speed up my inserts or is that just my imagination?

I am also not sure whether in my case an IDENTITY column is really needed. I would like to be able to establish foreign key relationships with other tables, but can I also use something like a CompanyIdentifier+DepartmentIdentifier+[uniquifier] scheme for that? Or does it have to be a table-wide, fragmented IDENTITY number?

Thanks a lot for any suggestions or explanations.

最满意答案

好吧，我已经对它进行了测试，并且在两个“块定义”列上放置聚簇索引可以提高我的表的性能。

插入一个块现在相对比我有一个群集IDENTITY键的情况要快，并且与没有任何聚簇索引时一样快。删除块比使用或不使用聚簇索引快。

我认为我希望删除或插入的所有记录都保存在硬盘的某个部分，这样可以使表格更快 - 这对我来说似乎是合乎逻辑的。

更新：经过一年的设计经验，我可以说为了这种工作方式，有必要安排定期重建所有索引（我们每周做一次）。否则，索引很快就会碎片化，性能也会丢失。尽管如此，我们正在使用分区表迁移到新的数据库设计，除了Enterprise Server许可证费用之外，这在各方面都基本上更好，但我们现在已经忘记了这一点。至少我有。

Well, I've put it to the test, and putting a clustered index on the two "chunk-defining" columns increases the performance of my table.

Inserting a chunk is now relatively fast compared to the situation where I had a clustered IDENTITY key, and about as fast as when I did not have any clustered index. Deleting a chunk is faster than with or without clustered index.

I think the fact that all the records I want to delete or insert are guaranteed to be all together on a certain part of the harddisk makes the tables faster - it would seem logical to me.

Update: After a year of experience with this design I can say that for this approach to work, it is necessary to schedule regular rebuilding of all the indexes (we do it once a week). Otherwise, the indexes become fragmented very soon and performance is lost. Nevertheless, we are in a process of migration to a new database design with partitioned tables, which is basically better in every way - except for the Enterprise Server license cost, but we've already forgotten about it by now. At least I have.

更多推荐

本文发布于:2023-07-18 07:41:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1157099.html