我有一个'Agency_Reference'表,其中包含'agency_lookup'列,其中包含200个字符串条目,如下所示:
α 公测 伽玛等..我有一个包含“Campaign”列的百万行的数据框“TEST”,其中的条目如下:
Alpha_xt2010 alpha_xt2014 Beta_xt2016等..我想遍历参考表中的每个条目,并找到每个广告系列列条目中存在的字符串,并在表格中创建一个新的agency_identifier列变量。
我当前的代码如下,执行速度慢。 请求有关如何优化相同的指导。 我想学习如何以data.table的方式做到这一点
Agency_Reference <- data.frame(agency_lookup = c('alpha','beta','gamma','delta','zeta')) TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_')) TEST$agency_identifier <- 0 for (agency_lookup in as.vector(Agency_Reference$agency_lookup)) { TEST$Agency_identifier <- ifelse(grepl(tolower(agency_lookup), tolower(TEST$Campaign)),agency_lookup,TEST$Agency_identifier)}预期产出:
活动---- Agency_identifier
alpha_xt123 ---阿尔法
ALPHA34 ----阿尔法
Beta_xyz_34 ----公测
BETa_testing ----公测
code_delta _-----三角洲
I have a 'Agency_Reference' table containing column 'agency_lookup', with 200 entries of strings as below :
alpha beta gamma etc..I have a dataframe 'TEST' with a million rows containing a 'Campaign' column with entries such as :
Alpha_xt2010 alpha_xt2014 Beta_xt2016 etc..i want to loop through for each entry in reference table and find which string is present within each campaign column entries and create a new agency_identifier column variable in table.
my current code is as below and is slow to execute. Requesting guidance on how to optimize the same. I would like to learn how to do it in the data.table way
Agency_Reference <- data.frame(agency_lookup = c('alpha','beta','gamma','delta','zeta')) TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_')) TEST$agency_identifier <- 0 for (agency_lookup in as.vector(Agency_Reference$agency_lookup)) { TEST$Agency_identifier <- ifelse(grepl(tolower(agency_lookup), tolower(TEST$Campaign)),agency_lookup,TEST$Agency_identifier)}Expected Output :
Campaign----Agency_identifier
alpha_xt123---alpha
ALPHA34----alpha
Beta_xyz_34----beta
BETa_testing----beta
code_delta_-----delta
最满意答案
尝试
TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_')) pattern = tolower(c('alpha','Beta','gamma','delta','zeta')) TEST$agency_identifier <- sub(pattern = paste0('.*(', paste(pattern, collapse = '|'), ').*'), replacement = '\\1', x = tolower(TEST$Campaign))Try
TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_')) pattern = tolower(c('alpha','Beta','gamma','delta','zeta')) TEST$agency_identifier <- sub(pattern = paste0('.*(', paste(pattern, collapse = '|'), ').*'), replacement = '\\1', x = tolower(TEST$Campaign))更多推荐
发布评论