优化R Data.table与for循环功能的组合(Optimization of R Data.table combination with for loop function)

系统教程 行业动态 更新时间:2024-06-14 16:57:18
优化R Data.table与for循环功能的组合(Optimization of R Data.table combination with for loop function)

我有一个'Agency_Reference'表,其中包含'agency_lookup'列,其中包含200个字符串条目,如下所示:

α 公测 伽玛等..

我有一个包含“Campaign”列的百万行的数据框“TEST”,其中的条目如下:

Alpha_xt2010 alpha_xt2014 Beta_xt2016等..

我想遍历参考表中的每个条目,并找到每个广告系列列条目中存在的字符串,并在表格中创建一个新的agency_identifier列变量。

我当前的代码如下,执行速度慢。 请求有关如何优化相同的指导。 我想学习如何以data.table的方式做到这一点

Agency_Reference <- data.frame(agency_lookup = c('alpha','beta','gamma','delta','zeta')) TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_')) TEST$agency_identifier <- 0 for (agency_lookup in as.vector(Agency_Reference$agency_lookup)) { TEST$Agency_identifier <- ifelse(grepl(tolower(agency_lookup), tolower(TEST$Campaign)),agency_lookup,TEST$Agency_identifier)}

预期产出:

活动---- Agency_identifier

alpha_xt123 ---阿尔法

ALPHA34 ----阿尔法

Beta_xyz_34 ----公测

BETa_testing ----公测

code_delta _-----三角洲

I have a 'Agency_Reference' table containing column 'agency_lookup', with 200 entries of strings as below :

alpha beta gamma etc..

I have a dataframe 'TEST' with a million rows containing a 'Campaign' column with entries such as :

Alpha_xt2010 alpha_xt2014 Beta_xt2016 etc..

i want to loop through for each entry in reference table and find which string is present within each campaign column entries and create a new agency_identifier column variable in table.

my current code is as below and is slow to execute. Requesting guidance on how to optimize the same. I would like to learn how to do it in the data.table way

Agency_Reference <- data.frame(agency_lookup = c('alpha','beta','gamma','delta','zeta')) TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_')) TEST$agency_identifier <- 0 for (agency_lookup in as.vector(Agency_Reference$agency_lookup)) { TEST$Agency_identifier <- ifelse(grepl(tolower(agency_lookup), tolower(TEST$Campaign)),agency_lookup,TEST$Agency_identifier)}

Expected Output :

Campaign----Agency_identifier

alpha_xt123---alpha

ALPHA34----alpha

Beta_xyz_34----beta

BETa_testing----beta

code_delta_-----delta

最满意答案

尝试

TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_')) pattern = tolower(c('alpha','Beta','gamma','delta','zeta')) TEST$agency_identifier <- sub(pattern = paste0('.*(', paste(pattern, collapse = '|'), ').*'), replacement = '\\1', x = tolower(TEST$Campaign))

Try

TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_')) pattern = tolower(c('alpha','Beta','gamma','delta','zeta')) TEST$agency_identifier <- sub(pattern = paste0('.*(', paste(pattern, collapse = '|'), ').*'), replacement = '\\1', x = tolower(TEST$Campaign))

更多推荐

本文发布于:2023-04-12 20:30:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/5cbbed90738bec4642290a2ad9e92e35.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:组合   功能   table   Data   function

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!