SQL Server 2008填补了维度的空白(SQL Server 2008 filling gaps with dimension)

I have a data table as below

#data --------------- Account AccountType --------------- 1 2 2 0 3 5 4 2 5 1 6 5

AccountType 2 is headers and 5 is totals. Meaning accounts of type 2 have to look after the next 1 or 0 to determin if its Dim value is 1 or 0. Totals of type 5 have to look up at nearest 1 or 0 to determin its Dim value. Accounts of type 1 or 0 have there type as Dim.

Accounts of type 2 appear as islands so its not enough to just check RowNumber + 1 and same goes for accounsts of type 5.

I have arrived at the following table using CTE's. But can't find a quick way to go from here to my final result of Account, AccountType, Dim for all accounts

T3 ------------------- StartRow EndRow AccountType Dim ------------------- 1 1 2 0 2 2 0 0 3 3 5 0 4 4 2 1 5 5 0 1 6 6 5 1

Below code is MS TSQL copy paste it all and see it run. The final join on the CTE select statement is extremly slow for even 500 rows it takes 30 sec. I have 100.000 rows i need to handle. I done a cursor based solution which do it in 10-20 sec thats workable and a fast recursive CTE solution that do it in 5 sec for 100.000 rows, but it dependent on the fragmentation of the #data table. I should add this is simplified the real problem have alot more dimension that need to be taking into account. But it will work the same for this simple problem.

Anyway is there a fast way to do this using joins or another set based solution.

SET NOCOUNT ON IF OBJECT_ID('tempdb..#data') IS NOT NULL DROP TABLE #data CREATE TABLE #data ( Account INTEGER IDENTITY(1,1), AccountType INTEGER, ) BEGIN -- TEST DATA DECLARE @Counter INTEGER = 0 DECLARE @MaxDataRows INTEGER = 50 -- Change here to check performance DECLARE @Type INTEGER WHILE(@Counter < @MaxDataRows) BEGIN SET @Type = CASE WHEN @Counter % 10 < 3 THEN 2 WHEN @Counter % 10 >= 8 THEN 5 WHEN @Counter % 10 >= 3 THEN (CASE WHEN @Counter < @MaxDataRows / 2.0 THEN 0 ELSE 1 END ) ELSE 0 END INSERT INTO #data VALUES(@Type) SET @Counter = @Counter + 1 END END -- TEST DATA END ;WITH groupIds_cte AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY AccountType ORDER BY Account) - Account AS GroupId FROM #data ), islandRanges_cte AS ( SELECT MIN(Account) AS StartRow, MAX(Account) AS EndRow, AccountType FROM groupIds_cte GROUP BY GroupId,AccountType ), T3 AS ( SELECT I.*, J.AccountType AS Dim FROM islandRanges_cte I INNER JOIN islandRanges_cte J ON (I.EndRow + 1 = J.StartRow AND I.AccountType = 2) UNION ALL SELECT I.*, J.AccountType AS Dim FROM islandRanges_cte I INNER JOIN islandRanges_cte J ON (I.StartRow - 1 = J.EndRow AND I.AccountType = 5) UNION ALL SELECT *, AccountType AS Dim FROM islandRanges_cte WHERE AccountType = 0 OR AccountType = 1 ), T4 AS ( SELECT Account, Dim FROM ( SELECT FlattenRow AS Account, StartRow, EndRow, Dim FROM T3 I CROSS APPLY (VALUES(StartRow),(EndRow)) newValues (FlattenRow) ) T ) --SELECT * FROM T3 ORDER BY StartRow --SELECT * FROM T4 ORDER BY Account -- Final correct result but very very slow SELECT D.Account, D.AccountType, I.Dim FROM T3 I INNER JOIN #data D ON D.Account BETWEEN I.StartRow AND I.EndRow ORDER BY Account

EDIT with some time testing

SET NOCOUNT ON IF OBJECT_ID('tempdb..#data') IS NULL CREATE TABLE #times ( RecId INTEGER IDENTITY(1,1), Batch INTEGER, Method NVARCHAR(255), MethodDescription NVARCHAR(255), RunTime INTEGER ) IF OBJECT_ID('tempdb..#batch') IS NULL CREATE TABLE #batch ( Batch INTEGER IDENTITY(1,1), Bit BIT ) INSERT INTO #batch VALUES(0) IF OBJECT_ID('tempdb..#data') IS NOT NULL DROP TABLE #data CREATE TABLE #data ( Account INTEGER ) CREATE NONCLUSTERED INDEX data_account_index ON #data (Account) IF OBJECT_ID('tempdb..#islands') IS NOT NULL DROP TABLE #islands CREATE TABLE #islands ( AccountFrom INTEGER , AccountTo INTEGER, Dim INTEGER, ) CREATE NONCLUSTERED INDEX islands_from_index ON #islands (AccountFrom, AccountTo, Dim) BEGIN -- TEST DATA INSERT INTO #data SELECT TOP 100000 ROW_NUMBER() OVER(ORDER BY t1.number) AS N FROM master..spt_values t1 CROSS JOIN master..spt_values t2 INSERT INTO #islands SELECT MIN(Account) AS Start, MAX(Account), Grp FROM (SELECT *, NTILE(10) OVER (ORDER BY Account) AS Grp FROM #data) T GROUP BY Grp ORDER BY Start END -- TEST DATA END --SELECT * FROM #data --SELECT * FROM #islands --PRINT CONVERT(varchar(20),DATEDIFF(MS,@RunDate,GETDATE()))+' ms Sub Query' DECLARE @RunDate datetime SET @RunDate=GETDATE() SELECT Account, (SELECT Dim From #islands WHERE Account BETWEEN AccountFrom AND AccountTo) AS Dim FROM #data INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'subquery','',DATEDIFF(MS,@RunDate,GETDATE())) SET @RunDate=GETDATE() SELECT D.Account, V.Dim FROM #data D CROSS APPLY ( SELECT Dim From #islands V WHERE D.Account BETWEEN V.AccountFrom AND V.AccountTo ) V INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'crossapply','',DATEDIFF(MS,@RunDate,GETDATE())) SET @RunDate=GETDATE() SELECT D.Account, I.Dim FROM #data D JOIN #islands I ON D.Account BETWEEN I.AccountFrom AND I.AccountTo INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'join','',DATEDIFF(MS,@RunDate,GETDATE())) SET @RunDate=GETDATE() ;WITH cte AS ( SELECT Account, AccountFrom, AccountTo, Dim, 1 AS Counting FROM #islands CROSS APPLY (VALUES(AccountFrom),(AccountTo)) V (Account) UNION ALL SELECT Account + 1 ,AccountFrom, AccountTo, Dim, Counting + 1 FROM cte WHERE (Account + 1) > AccountFrom AND (Account + 1) < AccountTo ) SELECT Account, Dim, Counting FROM cte OPTION(MAXRECURSION 32767) INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'recursivecte','',DATEDIFF(MS,@RunDate,GETDATE()))

You can select from the #times table to see the run times :)



select tt.id, tt.dim1, it.dim2 from TallyTable tt join IslandsTable it on tt.id between it."from" and it."to"


这是另一个可能有用的想法。 这是查询:

select d.*, (select top 1 AccountType from #data d2 where d2.Account > d.Account and d2.AccountType not in (2, 5) ) nextAccountType from #data d order by d.account;

我只是在50,000行上运行它,这个版本在我的系统上花了17秒。 将表格更改为:

CREATE TABLE #data ( Account INTEGER IDENTITY(1,1) primary key, AccountType INTEGER, );

实际上已经将它减慢到大约1:33 - 令我惊讶的是。 也许其中一个会帮助你。

I think you want a join, but using an inequality rather than an equality:

select tt.id, tt.dim1, it.dim2 from TallyTable tt join IslandsTable it on tt.id between it."from" and it."to"

This works for the data that you provide in the question.

Here is another idea that might work. Here is the query:

select d.*, (select top 1 AccountType from #data d2 where d2.Account > d.Account and d2.AccountType not in (2, 5) ) nextAccountType from #data d order by d.account;

I just ran this on 50,000 rows and this version took 17 seconds on my system. Changing the table to:

CREATE TABLE #data ( Account INTEGER IDENTITY(1,1) primary key, AccountType INTEGER, );

Has actually slowed it down to about 1:33 -- quite to my surprise. Perhaps one of these will help you.


