在grep中与dplyr进行过滤观察

编程入门行业动态更新时间:2024-10-24 18:24:02

本文介绍了在grep中与dplyr进行过滤观察的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在尝试使用 dplyr 和 grepl 来筛选大型数据集中的一些观察结果。如果其他解决方案更为优化，我不会收到 grepl 。

取此示例df：

df1 < - data.frame（fruit = c（apple，orange，xapple，xorange applexx，orangexx，banxana，appxxle），group = c（A，B）） df1 ＃水果组＃1苹果A ＃2橙色B ＃3 xapple A ＃4 xorange B ＃5 applexx A ＃6 orangexx B ＃7 banxana A ＃8 appxxle B

我想：

过滤掉以'x'开头的那些案例

过滤掉以'我已经设法解决了如何摆脱包含x或xx的所有内容，但是没有开始。与或结束。以下是如何摆脱所有内容中的xx（不仅仅是结尾）：

df1％>％filter （！grepl（xx，水果））＃水果组＃1苹果A ＃2橙色B ＃3 xapple A ＃4 xorange B ＃5 banxana A

这显然是错误的的视图）过滤'appxxle'。

我从来没有完全掌握正则表达式。我一直在尝试修改代码，例如： grepl（^（?! x）。* $，df1 $ fruit，perl = TRUE）它可以在过滤器命令中工作，但不太了解。

预期输出：

＃fruit group ＃1 apple A ＃2 orange B ＃3 banxana A ＃4 appxxle B

如果可能，我想在 dplyr 内进行此操作。

解决方案

我不明白你的第二个正则表达式，但是这个更基本的正则表达式似乎是诀窍：

df1％>％filter（！grepl（^ x | xx $，fruit）） ### 水果组 1苹果A 2橙色B 3 banxana A 4 appxxle B

我认为你知道这一点，但是你根本就不必使用 dplyr

df1 [！grepl（^ x | xx $，df1 $ fruit），] ### fruit group 1苹果A 2橙色B 7 banxana A 8 appxxle B

正则表达式正在寻找以 x 开始的字符串，或以 xx 结尾。 ^ 和 $ 分别是字符串的开头和结尾的正则表达式锚点。 | 是OR运算符。我们正在使用！取消 grepl 的结果，所以我们发现与内部不符的字符串正则表达式。

I am trying to work out how to filter some observations from a large dataset using dplyr and grepl . I am not wedded to grepl, if other solutions would be more optimal.

Take this sample df:

df1 <- data.frame(fruit=c("apple", "orange", "xapple", "xorange", "applexx", "orangexx", "banxana", "appxxle"), group=c("A", "B") ) df1 # fruit group #1 apple A #2 orange B #3 xapple A #4 xorange B #5 applexx A #6 orangexx B #7 banxana A #8 appxxle B

I want to:

filter out those cases beginning with 'x'

filter out those cases ending with 'xx'

I have managed to work out how to get rid of everything that contains 'x' or 'xx', but not beginning with or ending with. Here is how to get rid of everything with 'xx' inside (not just ending with):

df1 %>% filter(!grepl("xx",fruit)) # fruit group #1 apple A #2 orange B #3 xapple A #4 xorange B #5 banxana A

This obviously 'erroneously' (from my point of view) filtered 'appxxle'.

I have never fully got to grips with regular expressions. I've been trying to modify code such as: grepl("^(?!x).*$", df1$fruit, perl = TRUE) to try and make it work within the filter command, but am not quite getting it.

Expected output:

# fruit group #1 apple A #2 orange B #3 banxana A #4 appxxle B

I'd like to do this inside dplyr if possible.

解决方案

I didn't understand your second regex, but this more basic regex seems to do the trick:

df1 %>% filter(!grepl("^x|xx$", fruit)) ### fruit group 1 apple A 2 orange B 3 banxana A 4 appxxle B

And I assume you know this, but you don't have to use dplyr here at all:

df1[!grepl("^x|xx$", df1$fruit), ] ### fruit group 1 apple A 2 orange B 7 banxana A 8 appxxle B

The regex is looking for strings that start with x OR end with xx. The ^ and $ are regex anchors for the beginning and ending of the string respectively. | is the OR operator. We're negating the results of grepl with the ! so we're finding strings that don't match what's inside the regex.

更多推荐

在grep中与dplyr进行过滤观察

本文发布于:2023-11-22 08:23:38，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1616676.html