读取多个Excel文件，添加一列，然后进行绑定

编程入门行业动态更新时间:2024-10-11 19:13:47

本文介绍了读取多个Excel文件，添加一列，然后进行绑定的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一系列要读入R的Excel文件，基于文件名添加日期列，然后绑定在一起.

I have a series of Excel files that I want to read into R, add in a date column based on the file name, then bind together.

文件的命名约定为User_Info_Jan，User_Info_Feb，User_Info_Mar.仅在文件名中引用月份，而在实际文件本身中并未实际提及.User_Info_Jan文件的示例如下:

The naming convention of the files is User_Info_Jan, User_Info_Feb, User_Info_Mar. The month is only referenced in the name of the file and not actually mentioned in the actual file itself. An example of what the User_Info_Jan files looks like:

ID Name ABC Joe Smith DEF Henry Cooper ZCS Kelly Ma

有没有一种方法可以使用文件名中的模式(模式= User_Info_)读取文件，然后添加一个名为"Month"的列，指示文件绑定到哪个月之前?

Is there a way I can read the files in using the pattern in the file name (pattern = User_Info_), then add a column called "Month" indicating what month the file is for, before binding together?

月份"列之后的示例数据"框:

Sample Data frame after month column:

ID Name Month ABC Joe Smith January DEF Henry Cooper January ZCS Kelly Ma January

绑定在一起后的示例数据帧:

Sample data frame after binding together:

ID Name Usage Month ABC Joe Smith January DEF Henry Cooper January ZCS Kelly Ma January KFY Lisa Schwartz February LFG Alex Shah March

推荐答案

我将使用伪造的文件名进行演示，但是我建议您运行的实际命令以相同的结构被注释掉.我假设 .xlsx 用于"excel文件"，但这与 .csv 一样好(只需更新模式).

I'll demonstrate with fake filenames, but the real commands I suggest you run are commented out with the same structure. I'm assuming .xlsx for "excel files", but this works equally well with .csv (just update the pattern).

# files <- list.files(path = ".", pattern = "User_Info_.*\\.xlsx$", full.names = TRUE) files <- c("./User_Info_Jan.xlsx", "./User_Info_Feb.xlsx", "./User_Info_Mar.xlsx") monthnames <- strcapture("User_Info_(.*)\\.xlsx", files, list(month = "")) monthnames # month # 1 Jan # 2 Feb # 3 Mar

至此，我们已经从每个文件名中提取月份名称.我发现 strcapture (在R中)比 gsub 好，因为如果没有匹配项，后者会返回整个字符串.基本R中的另一个替代方法是 regmatches(files，gregexpr(...))，但这似乎比这里需要的要复杂一些.另一个替代方法是 stringr :: str_extract ，如果您已经在使用 stringr 和/或其他tidyverse程序包，则可能更直观.

At this point, we've extracted the month name from each filename. I find strcapture (in base R) better than gsub, as the latter returns the entire string if there are no matches; another alternative in base R is regmatches(files, gregexpr(...)), but that seems a bit more complicated than it needs to be here. Another alternative is stringr::str_extract which might be more intuitive if you're already using stringr and/or other tidyverse packages.

从这里开始，我们可以遍历文件以读取它们.

From here, we can iterate over the files to read them in.

# out <- Map(function(mn, fn) transform(readxl::read_excel(fn), month = mn), monthnames$month, files) set.seed(42) out <- Map(function(mn, fn) transform(mtcars[sample(32,size=2),], month = mn), monthnames$month, files) out # $Jan # mpg cyl disp hp drat wt qsec vs am gear carb month # Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4 Jan # Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Jan # $Feb # mpg cyl disp hp drat wt qsec vs am gear carb month # Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Feb # Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2 Feb # $Mar # mpg cyl disp hp drat wt qsec vs am gear carb month # Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Mar # Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Mar

将这些帧列表合并为一个帧是直接的:

Combining that list-of-frames into a single frame is direct:

do.call(rbind, out) # mpg cyl disp hp drat wt qsec vs am gear carb month # Jan.Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Jan # Jan.Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Jan # Feb.Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Feb # Feb.Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Feb # Mar.Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Mar # Mar.Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Mar

所有方法的替代方法可以使用 data.table :: rbindlist 或 dplyr :: bind_rows 并分配"id"；直接列:

An alternative to all of that can use data.table::rbindlist or dplyr::bind_rows, and assign the "id" column directly:

# out <- Map(function(mn, fn) readxl::read_excel(fn), monthnames$month, files) set.seed(42) out <- Map(function(mn, fn) mtcars[sample(32,size=2),], monthnames$month, files) data.table::rbindlist(out, idcol = "month") # month mpg cyl disp hp drat wt qsec vs am gear carb # <char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> # 1: Jan 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 # 2: Jan 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 # 3: Feb 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 # 4: Feb 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 # 5: Mar 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 # 6: Mar 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 dplyr::bind_rows(out, .id = "month") # month mpg cyl disp hp drat wt qsec vs am gear carb # Chrysler Imperial Jan 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 # Hornet Sportabout Jan 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 # Mazda RX4 Feb 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 # Pontiac Firebird Feb 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 # Merc 280 Mar 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 # Hornet 4 Drive Mar 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1

后两个起作用是因为当我较早调用 Map 时，传递给内部函数的第一个参数( monthnames $ month )被用作的名称.> list 输出，这就是为什么您将 $ Jan 等视为返回列表的元素的原因. rbindlist 和 bind_rows 都使用这些名称作为"id"，使用 idcol = / .id = 时的列.(如果实际上没有名称"，则两个函数都沿它们计数.)

The latter two work because when I called Map earlier, the first argument (monthnames$month) passed to the inner function is used as the names for the list output, which is why you see $Jan etc as the elements of the returned list. Both rbindlist and bind_rows use those names as "id" columns when idcol=/.id= are used. (If no "names" are actually present, both functions count along them.)

更多推荐

读取多个Excel文件,添加一列,然后进行绑定

本文发布于:2023-10-27 04:16:27，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1532283.html