AWS Redshift数据透视表所有维度

编程入门 行业动态 更新时间:2024-10-23 04:37:05
本文介绍了AWS Redshift数据透视表所有维度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在遵循在redshift中枢转大型表的方法:

使用Amazon RedShift/PostgreSQL旋转表

但是,我有大量的组要进行枢转,即m1,m2,... 如何遍历所有不同的值,并对每个值应用相同的逻辑,并对结果的列名进行别名命名?

解决方案

如果您希望能够转到任意数量的组,则可以将组合并为JSON字符串,然后使用 Redshift JSON函数.您可能不想对非常大的数据集执行此操作.

这是基于上面链接的问题中的示例数据的基本思想:

select DimensionA, DimensionB, json_extract_path_text(json_pivot, 'm1') m1, json_extract_path_text(json_pivot, 'm2') m2 from ( select DimensionA, DimensionB, '{' || listagg(quote_ident(MetricName) || ':' || quote_ident(MetricValue), ',') within group (order by MetricName) || '}' as json_pivot from to_pivot group by DimensionA, DimensionB )

实际上,您不希望那样运行它.内部选择是用于生成数据透视表"的内容,外部选择显示了如何引用特定的组值.

这不考虑相同暗淡组合的重复组记录,如下所示:

DimensionA DimensionB MetricName MetricValue ---------- ---------- ---------- ----------- dimA1 dimB2 m1 v13 dimA1 dimB2 m1 v23

如果数据中有这种可能,那么您将必须弄清楚该如何处理.我不确定它会如何实现.我的猜测是第一次出现会被提取.

这可能是结合使用 LISTAGG 和 REGEXP_SUBSTR 以及两个自定义分隔符.

将varchar(max)用于 JSON列类型将提供65535个字节,应该可以容纳数千个类别.

在此处稍有不同.

I am following the method to pivot a large table in redshift:

Pivot a table with Amazon RedShift / PostgreSQL

However I have a large number of groups to pivot ie, m1, m2, ... How can I loop through all distinct values and apply the same logic to each of them and alias the resulting column names?

解决方案

If you want to be able to pivot to arbitrary numbers of groups you can combine the groups into a JSON string and then extract the groups you are interested in with the Redshift JSON functions. You probably do not want to do this for very large data sets.

Here is the basic idea based on the sample data in the question linked above:

select DimensionA, DimensionB, json_extract_path_text(json_pivot, 'm1') m1, json_extract_path_text(json_pivot, 'm2') m2 from ( select DimensionA, DimensionB, '{' || listagg(quote_ident(MetricName) || ':' || quote_ident(MetricValue), ',') within group (order by MetricName) || '}' as json_pivot from to_pivot group by DimensionA, DimensionB )

In practice you would not want to run it like that. The inner select is what you would use to generate your "pivoted" table, and the outer select shows how to reference specific group values.

This does not account for duplicate group records for the same dim combination like the following:

DimensionA DimensionB MetricName MetricValue ---------- ---------- ---------- ----------- dimA1 dimB2 m1 v13 dimA1 dimB2 m1 v23

If that is a possibility in the data then you will have to figure out how to handle that. I am not sure how it would behave as implemented. My guess is the first occurrence would be extracted.

This could probably be done using a combination of LISTAGG and REGEXP_SUBSTR as well using two custom delimiters.

Using varchar(max) for the JSON column type will give 65535 bytes which should be room for a couple thousand categories.

Explained slightly differently here.

更多推荐

AWS Redshift数据透视表所有维度

本文发布于:2023-07-08 20:11:32,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1080230.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:维度   透视   数据   AWS   Redshift

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!