从MySQL表中删除重复值的最佳方法是什么？

编程入门行业动态更新时间:2024-10-23 09:30:02

本文介绍了从MySQL表中删除重复值的最佳方法是什么？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有以下SQL从表中删除重复值，

I have the following SQL to delete duplicate values form a table,

DELETE p1 FROM `ProgramsList` p1, `ProgramsList` p2 WHERE p1.CustId = p2.CustId AND p1.CustId = 1 AND p1.`Id`>p2.`Id` AND p1.`ProgramName` = p2.`ProgramName`;

ID 是自动增量给定 CustId ProgramName 必须是唯一的（目前不是）上述SQL大约需要4到5个小时才能完成约1,000,000条记录

Id is auto incremental for a given CustId ProgramName must be unique (currently it is not) The above SQL takes about 4 to 5 hours to complete with about 1,000,000 records

有人可以建议您从表中删除重复的方式吗？

Could anyone suggest a quicker way of deleting duplicates from a table?

推荐答案

首先，如果还没有添加索引，可以尝试向ProgramName和CustID字段添加索引。

First, You might try adding indexes to ProgramName and CustID fields if you don't already have them.

De-Duping

您可以将记录分组以识别重复，正如你这样做，抓住每个组的最小ID值。然后，只需删除其ID不是MinID的所有记录。

You can group your records to identify dupes, and as you are doing that, grab the min ID value for each group. Then, just delete all records whose ID is not one of the MinID's.

条款方法

delete from ProgramsList where id not in (select min(id) as MinID from ProgramsList group by ProgramName, CustID)

加入方法

如果每个组中有很多成员，您可能需要多次运行。

You may have to run this more than once, if there are many members per group.

DELETE P FROM ProgramsList as P INNER JOIN (select count(*) as Count, max(id) as MaxID from ProgramsList group by ProgramName, CustID) as A on A.MaxID = P.id WHERE A.Count >= 2

有些人在条款中有性能问题，有些则不会。这取决于你的索引等等。如果太慢，请尝试另一个。

Some people have performance issues with the In-Clause, some don't. It depends a lot on your indexes and such. If one is too slow, try the other.