我正在建立一个数据仓库。每个事实都有其时间戳。我需要按天,月,季度但按小时创建报告。通过查看示例,我看到日期倾向于保存在维表中。 (来源: etl-tools.info )
I'm building a data warehouse. Each fact has it's timestamp. I need to create reports by day, month, quarter but by hours too. Looking at the examples I see that dates tend to be saved in dimension tables. (source: etl-tools.info)
但是我认为,时间没有意义。尺寸表会越来越大。另一方面,具有日期维表的JOIN比在 SQL 中使用日期/时间函数更有效。
But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficient than using date/time functions in SQL.
您有什么意见/解决方案?
What are your opinions/solutions ?
(我正在使用Infobright)
(I'm using Infobright)
推荐答案我的猜测是,这取决于您的报告要求。 如果您需要类似
My guess is that it depends on your reporting requirement. If you need need something like
WHERE "Hour" = 10表示每天10:00:00到10:59:59之间的时间,那么我会使用时间维度,因为它快于
meaning every day between 10:00:00 and 10:59:59, then I would use the time dimension, because it is faster than
WHERE date_part('hour', TimeStamp) = 10因为date_part()函数将针对每一行进行评估。 您仍应在事实表中保留时间戳,以便在几天的边界内进行汇总,例如:
because the date_part() function will be evaluated for every row. You should still keep the TimeStamp in the fact table in order to aggregate over boundaries of days, like in:
WHERE TimeStamp between '2010-03-22 23:30' and '2010-03-23 11:15'使用维度字段时会很尴尬。
which gets awkward when using dimension fields.
通常,时间维度具有分钟分辨率,因此有1440行。
Usually, time dimension has a minute resolution, so 1440 rows.
更多推荐
数据仓库中的时间和日期维度
发布评论