有效地将数据组织成可读的.csv格式(Efficiently organizing data into a readable .csv format)

编程入门行业动态更新时间:2024-10-12 22:33:04

我试图找出一种更好的方法从大量文件中提取数据，对某些数据运行额外的计算，最后将其格式化为电子表格应用程序可读的内容。以下是我目前的做法，但我确信必须有一个更简单的方法。

首先，我创建一个定义来搜索文件并提取数据。我使用定义，因为有时数据将从几个位置编译。我通过正则表达式的混合来做这个并找到所有。如果需要，可以采用不同的格式。结果如下：

RawData= [[M1,A1,305.91,288.12,Variable_M1_A1], [M1,A2,319.07,303.70,Variable_M1_A2], [M2,A1,237437.32,191460.91,Variable_M2_A1], [M2,A2,270773.28,192581.05,Variable_M2_A2]]

我想对某些数据做的是组织它，以便从变量A和M创建网格，然后将列中的数据放在正确的网格位置。实际上看起来像一个简单的二维矩阵，第一行和第一列表示变量。

csv文件应该是什么样子：

Placeholder,A1,A2 M1,Variable_M1_A1,Variable_M1_A2 M2,Variable_M2_A1,Variable_M2_A2

我目前正在做的是创建一个空矩阵（在这种情况下为3x3），调用它然后运行以下代码。当变量匹配时，它基本上遍历所有行和所有变量，它从ResultData中为Result矩阵分配一个值。

MVar=[M1,M2] AVar=[A1,A2] for a in range(len(RawData): for b in range(len(MVar)): for c in range(len(AVar)): if RawData[a][0]==Mvar[b] and RawData[a][1]==AVar[c]: Result[b+1][c+1]=RawData[a][4]

我编写了一段代码，它将采用RawData矩阵并根据M1和A1的所有可能组合创建Result矩阵。如果我想将其输出到.csv，我只需使用csv.writer。因此，如果我只是想组织已经存在的数据，这很有用。但是，当我想用数据进行计算时 - 例如生成另一个基于RawData中未在矩阵中结束的值的列 - 变得很困难。例如，获取csv文件输出的样子：

Placeholder,A1,A2,NewA M1,Variable_M1_A1,Variable_M1_A2,(RawData[0][3]*RawData[1][2]) M2,Variable_M2_A1,Variable_M2_A2,(RawData[2][3]*RawData[3][2])

请注意，计算数据需要相同的M值但不同的A值。虽然这可以做到，但很快就会变得非常复杂。

有更简单的方法吗？

编辑：使用以下方法自动生成M和A列表：

[MethodList[i] for i,x in enumerate(MethodList) if x not in MethodList[i+1:]]

这似乎使它更容易使用，但它仍然是一个旋转的过程！

I am trying to figure out a better way to extract data from a large number of files, run additional calculations on some data, and finally format it into a something readable by a spreadsheet application. Below is how I am currently doing this but I am convinced there must be an easier way.

First I create a definition that will search the files and extract the data. I use a definition since sometimes the data will be compiled from several locations. I am doing this via a mixture of regular expressions and find all. This can be formatted differently if needed. The result of this is something like:

RawData= [[M1,A1,305.91,288.12,Variable_M1_A1], [M1,A2,319.07,303.70,Variable_M1_A2], [M2,A1,237437.32,191460.91,Variable_M2_A1], [M2,A2,270773.28,192581.05,Variable_M2_A2]]

What I wish to do with some of the data is organize it in such a way that a grid is created from the variables A and M and then the data from a column is placed in the correct grid location. Realistically looks like a simple two dimension matrix with the first row and first column indicating the variables.

What the csv file should look like:

Placeholder,A1,A2 M1,Variable_M1_A1,Variable_M1_A2 M2,Variable_M2_A1,Variable_M2_A2

What I currently am doing is creating an empty matrix (3x3 in this case) calling it Result then running the following code. The basically iterates over all lines and all variables when the variables match, it assigns the Result matrix a value from the RawData.

MVar=[M1,M2] AVar=[A1,A2] for a in range(len(RawData): for b in range(len(MVar)): for c in range(len(AVar)): if RawData[a][0]==Mvar[b] and RawData[a][1]==AVar[c]: Result[b+1][c+1]=RawData[a][4]

I wrote a chunk of code that will take the RawData matrix and create the Result matrix based off all possible combinations of M1 and A1. If I want to outport this to .csv I simply use csv.writer. So this works great if I just want to organize the data that is already there. However when I want to do calculations with the data- such as generating another column that is based off values in the RawData that did not end up in the matrix- it becomes difficult. For example to take what the csv file output could look like:

Placeholder,A1,A2,NewA M1,Variable_M1_A1,Variable_M1_A2,(RawData[0][3]*RawData[1][2]) M2,Variable_M2_A1,Variable_M2_A2,(RawData[2][3]*RawData[3][2])

Notice that the data for the calculations require the same M value but different A values. While this can be done it quickly becomes very convoluted.

Is there a simpler way to do this?

Edit: Auto generation of the M and A list using:

[MethodList[i] for i,x in enumerate(MethodList) if x not in MethodList[i+1:]]

This seems makes it easier to work with, however its still a convolute process!

最满意答案

这是用for循环和相应的if条件完成的。我仍然相信有更好的方法，但这完全可以做到。

This was completed with the for loops and corresponding if conditions. I am still convinced there is a better way but this is entirely possible to do.

更多推荐

本文发布于:2023-04-29 02:55:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1334467.html