Excel Power Query 中的复杂转换

编程入门 行业动态 更新时间:2024-10-26 10:26:37
本文介绍了Excel Power Query 中的复杂转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有 2 个输入表.输入表1是源数据,输入表2是标准表.

I have 2 input Tables. Input Table1 is the source data and input Table 2 is a criteria table.

+--------------------------+----------+ +--------------------------+-------+ | TABLE 1 (Source data) | | TABLE 2 (Criterias) | +-------------------------------------+ +----------------------------------+ +-------------------------------------+ +----------------------------------+ | DESCRIPTION | VALUE | | PREFIX | CODE | +-------------------------------------+ +----------------------------------+ | ID | 0 | | 7235 | ABX1 | | NAME | JFMSC | | 3553 | POWQ | | TYPE | UHELQ | | 7459 | UWEER | | DFRUL | F4 | | 10012 | ABX1 | | ADDR | 10012002 | | 430 | ABX1 | | RRUL | P1 | +--------------------------+-------+ | ADDR | 723 | | RRUL | P1 | | ID | 2 | | NAME | PLLSJS | | TYPE | UHELQ | | DFRUL | P3 | | ID | 4 | | NAME | AAAARR | | TYPE | UHELQ | | DFRUL | T7 | | ADDR | 35531156 | | RRUL | P1 | | ADDR | 72358 | | RRUL | P1 | | ADDR | 86401 | | RRUL | K9 | | ID | 0 | | NAME | PPROOA | | TYPE | RRHN | | DFRUL | P1 | | ADDR | 43001 | | RRUL | T8 | | ADDR | 7459001 | | RRUL | D4 | | ADDR | 430457 | | RRUL | W2 | | ADDR | 745913 | | RRUL | P1 | | ADDR | 74598001 | | RRUL | Y5 | +--------------------------+----------+

我的目标是获得如下所示的输出表(将是表 #4),即显示了与基于表 2"的标准的字段ADDR"的每个数量相比最相似的代码.如果每个ID都有重复的CODE,我只想展示一个(唯一代码列表).

My goal is to get the an output table like below (Would be the Table #4), that shows the CODE that is THE MOST similar compared with each number of field "ADDR" based on criterias of "TABLE 2". If there are repeated CODEs for each ID, I only want to show one (unique codes list).

我在此处附加的示例文件中进行了更详细的解释 SampleV1.xlsx.

I explain in more detail in Sample file attached here SampleV1.xlsx.

我想转换基于输入表 1 和 2 中的数据以获得这样的输出表(附加文件中的所需输出表 #2):

I want to Transform the data based in Input Table 1 and 2 to get an output table like this (Desired OUTPUT TABLE #2 in file attached):

+----+--------+-------+-------+-------+------+ | ID | NAME | TYPE | DFRUL | CODE | RRUL | +----+--------+-------+-------+-------+------+ | 0 | JFMSC | UHELQ | P1 | ABX1 | P1 | | 2 | PLLSJS | UHELQ | P3 | | | | 4 | AAAARR | UHELQ | T7 | POWQ | P1 | | | | | | ABX1 | P1 | | | | | | 86401 | K9 | | 0 | PPROOA | RRHN | P1 | ABX1 | P1 | | | | | | UWEER | P1 | +----+--------+-------+-------+-------+------+

我希望有人能帮我解决这个问题.提前致谢.

I hope someone could help me with this. Thanks in advance.

推荐答案

以下是更新后的解决方案.

Below is the UPDATED solution.

总的来说,我编译了解决方案是为了尽可能减少数据问题的影响.

In general, I compiled the solution in order to be as less vulnerable to problems with data, as possible.

对数据的唯一限制是:

  • 字段集合必须有 ID 字段,ID 字段必须是集合的第一个字段.

  • Field sets must have ID field, which must be the first field of set.

    所有的 RRUL 和 ADDR 必须成对,

    all the RRUL and ADDR have to be in pairs,

    一个 ID 内的 RRUL/ADDR 对重复是可以接受的,也可以不存在.

    Duplicates of RRUL/ADDR pairs inside one ID are acceptable or absent.

    我还以某种方式编译了解决方案,以便在 ADDR 和 PREFIX 的所有可能变体中正确找到最接近的值.顺便说一句 - 有一种情况,没有包含在你的大样本中 - 当 PREFIX 比 ADDR 短但不等于它时.如果存在此类情况 - 我的解决方案会正确处理它们,但需要针对这种特定情况进行一些性能开销.

    I also compiled the solution in a way to correctly find the closest value in all possible variants of ADDR and PREFIX. By the way - there is one case, not covered in your bigsample - when PREFIX is shorter then ADDR but not equal to it. If there are such cases - my solution handles them correctly but demands some performance overhead for this particular situation.

    let Source = #"Source data", #"Added Index1" = Table.AddIndexColumn(Source, "Index", 0, 1), #"Added Custom" = Table.AddColumn(#"Added Index1", "Main Key", each if [DESCRIPTION] = "ID" then [Index] else null, type number), #"Added Custom10" = Table.AddColumn(#"Added Custom", "Last notADDR", each if [DESCRIPTION] <> "ADDR" and [DESCRIPTION] <> "RRUL" then [Index] else null), #"Filled Down" = Table.FillDown(#"Added Custom10",{"Main Key", "Last notADDR"}), #"Added Custom2" = Table.AddColumn(#"Filled Down", "Key", each [Main Key] + ( if [DESCRIPTION] = "RRUL" then [Index] - [Last notADDR] - 2 else if [DESCRIPTION] = "ADDR" then [Index] - [Last notADDR] - 1 else 0)), #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Index", "Main Key", "Last notADDR"}), #"Pivoted Column1" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[DESCRIPTION]), "DESCRIPTION", "VALUE"), #"Added Custom3" = Table.AddColumn(#"Pivoted Column1", "CODE", each if [ADDR] = null then null else let t = Table.AddIndexColumn(Table.SelectRows(Criterias, (x)=> let s=List.Sort({x[PREFIX], [ADDR]}, each Text.Length(_)) in Text.StartsWith(s{1}, s{0})), "Index") in if Table.RowCount(t) > 0 then Table.First(Table.Sort(t, (y)=> Number.BitwiseShiftLeft(Number.Abs(Text.Length([ADDR]) - Text.Length(y[PREFIX])), 16) + y[Index]))[CODE] else "Not Found"), #"Removed Columns1" = Table.RemoveColumns(#"Added Custom3",{"Key", "ADDR"}), #"Filled Down1" = Table.FillDown(#"Removed Columns1",{"ID", "NAME", "TYPE", "DFRUL"}) in #"Filled Down1"
  • 更多推荐

    Excel Power Query 中的复杂转换

    本文发布于:2023-10-30 07:56:18,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1542247.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:Excel   Power   Query

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!