Linux学习：正则表达式

编程入门行业动态更新时间:2024-10-19 16:29:56

Linux学习：<a href=https://www.elefans.com/category/jswz/34/1770561.html style= 正则表达式"/>

Linux学习：正则表达式

正则表达式：就是处理字符串的方法

1. 正则表达式与通配符的区别？

通配符： 代表是BSAH操作接口的一个功能

正则表达式：是一种字符串处理的表示方式

2 基础正则表达式

特殊符号	代表意义
[:alnum:]	代表英文大小写字符及数字，即A-Z、a-z、0-9
[:alpha:]	代表任何英文大小写字符
[:blank:]	代表任何空格键与[tab]按键两者
[:cntrl:]	代表键盘上面控制按键，包括CR/LF/TAB/DEL等
[:digit:]	代表数字
[:graph:]	除了空格键（空格和tab键）外的其它所有按键
[:lower:]	代表小写字符
[:print:]	代表任何可以被打印出来的字符
[:punct:]	代表标点符号
[:upper:]	代表大写字符，即A-Z
[:space:]	任何可以产生空白的字符
[:xdigit:]	代表十六进制的数字类型

2.1 grep的高级应用

grep: 分析一行信息，若有需要的，就将该行拿出来

[mcb@localhost ~]$ grep [-A] [-B] [--color=auto] '查找字符' filename
-A: 后接数字，除该行外，后续的n行也被列出来
-B: 后接数字，除该行外，前面的n行也被列出来//用dmesg列出内核信息，再以grep找出含有SMB那行，并将关键词的前两行与后三行一起识别出来显示、
[mcb@localhost ~]$ dmesg | grep -n -A3 -B2 --color=auto 'SMB'
18-[    0.000000] BIOS-e820: [mem 0x00000000fffe0000-0x00000000ffffffff] reserved
19-[    0.000000] NX (Execute Disable) protection: active
20:[    0.000000] SMBIOS 2.7 present.
21-[    0.000000] DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
22-[    0.000000] Hypervisor detected: VMware
23-[    0.000000] vmware: TSC freq read from hypervisor : 2592.000 MHz
--
652-[    0.436387] pci 0000:00:07.3: [8086:7113] type 00 class 0x068000
653-[    0.437339] pci 0000:00:07.3: quirk: [io  0x1000-0x103f] claimed by PIIX4 ACPI
654:[    0.437351] pci 0000:00:07.3: quirk: [io  0x1040-0x104f] claimed by PIIX4 SMB
655-[    0.437693] pci 0000:00:07.7: [15ad:0740] type 00 class 0x088000
656-[    0.437983] pci 0000:00:07.7: reg 0x10: [io  0x1080-0x10bf]
657-[    0.438233] pci 0000:00:07.7: reg 0x14: [mem 0xfebc0000-0xfebfffff 64bit]
--
1777-[    3.369974] random: crng init done
1778-[    3.402418] systemd-journald[374]: Received request to flush runtime journal from PID 1
1779:[    4.406985] piix4_smbus 0000:00:07.3: SMBus Host Controller not enabled!
1780-[    4.445207] vmw_vmci 0000:00:07.7: Found VMCI PCI device at 0x11080, irq 16
1781-[    4.445269] vmw_vmci 0000:00:07.7: Using capabilities 0x8000003c
1782-[    4.445435] vmw_vmci 0000:00:07.7: irq 56 for MSI/MSI-X

2.2 基础正则表达式

//1.查找特定字符串，不论大小写的the
[mcb@localhost ~]$ grep -in 'the' regular_express.txt
8:I can't finish the test.^M
9:Oh! The soup taste good.^M
12:the symbol '*' is represented as start.
14:The gd software is a library for drafting programs.^M
15:You are the best is mean you are the no. 1.
16:The world &lt;Happy&gt; is the same with "glad".
18:google is the best tools for search keyword.
注意：-v是反向选择，即该行没有the才会显示在屏幕上.如-vn

2.2.1 利用[ ]来查找集合字符（重点）

[mcb@localhost ~]$ grep -n 't[ae]st' regular_express.txt
8:I can't finish the test.^M
9:Oh! The soup taste good.^M

[ ] 里面不论有几个字符，他都仅代表某“一个”字符，所以，上面的例子说明了，我需要的字串是“tast”或“test”两个字串而已！

//当不想要前面的g时，使用反向选择^
[mcb@localhost ~]$ grep -n '[^g]oo' regular_express.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!
//不想要oo前面有小写字符
[mcb@localhost ~]$ grep -n '[^a-z]oo' regular_express.txt
3:Football game is not use feet only.
[mcb@localhost ~]$ grep -n '[^[:lower:]]oo' regular_express.txt
3:Football game is not use feet only.

2.2.2 行首与行尾字符 ^ $

//只列出以the开头的那一行
[mcb@localhost ~]$ grep -n '^the' regular_express.txt
12:the symbol '*' is represented as start.
//列出开头是小写字符的那一行
[mcb@localhost ~]$ grep -n '^[[:lower:]]' regular_express.txt
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
//不想用开头是英文字母的
[mcb@localhost ~]$ grep -n '^[^[:alpha:]]' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
21:# I am VBird

行尾字符：$

//找出行尾结束为（.）的那一行
[mcb@localhost ~]$ grep -n '\.$' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world &lt;Happy&gt; is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.
//找出空白行
[mcb@localhost ~]$ grep -n '^$' regular_express.txt
22:
23:

grep -v '查找字符' 表示不要某个字符

2.2.3 任意一个字符 . 与重复字符 *

.（小数点）代表一定有一个字符的意思
*（星号）代表重复前一个字符，0到无穷多次的意思，为组合形态
.* 就代表零个或多个任意字符

//找出g??d字符串，'g..d'强调一定是4个字符
[mcb@localhost ~]$ grep -n 'g..d' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.^M
16:The world &lt;Happy&gt; is the same with "glad".
//找出至少含有两个oo的字符串所在的那一行
[mcb@localhost ~]$ grep -n 'ooo*' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
18:google is the best tools for search keyword.
19:goooooogle yes!

2.2.4 限定连续RE字符范围{ }

//找到含有两个o的字符串
[mcb@localhost ~]$ grep -n 'o\{2\}' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
18:google is the best tools for search keyword.
19:goooooogle yes!
//找出g后面接2到5个o，然后再接一个g的字符串
[mcb@localhost ~]$ grep -n 'go\{2,5\}g' regular_express.txt
18:google is the best tools for search keyword.

基础正则表达式字符汇整（characters）

RE 字符	意义与范例
^word	意义：待搜寻的字串（word）在行首！范例：搜寻行首为 # 开始的那一行，并列出行号 `grep -n '^#' regular_express.txt`
word$	意义：待搜寻的字串（word）在行尾！范例：将行尾为 ! 的那一行打印出来，并列出行号 `grep -n '!$' regular_express.txt`
.	意义：代表“一定有一个任意字符”的字符！范例：搜寻的字串可以是（eve）（eae）（eee）（e e），但不能仅有（ee）, 亦即 e 与 e 中间“一定”仅有一个字符，而空白字符也是字符！ `grep -n 'e.e' regular_express.txt`
\	意义：跳脱字符，将特殊符号的特殊意义去除！范例：搜寻含有单引号 ' 的那一行！ `grep -n \' regular_express.txt`
*	意义：重复零个到无穷多个的前一个 RE 字符范例：找出含有（es）（ess）（esss）等等的字串，注意，因为可以是 0 个，所以 es 也是符合带搜寻字串。另外，因为为重复“前一个 RE 字符”的符号，因此，在之前必须要紧接着一个 RE 字符喔！例如任意字符则为 “.” ！ *`grep -n 'ess' regular_express.txt`**
[list]	意义：字符集合的 RE 字符，里面列出想要撷取的字符！范例：搜寻含有（gl）或（gd）的那一行，需要特别留意的是，在 [] 当中“谨代表一个待搜寻的字符”，例如“ a[afl]y ”代表搜寻的字串可以是 aay, afy, aly 即 [afl] 代表 a 或 f 或 l 的意思！ `grep -n 'g[ld]' regular_express.txt`
[n1-n2]	意义：字符集合的 RE 字符，里面列出想要撷取的字符范围！范例：搜寻含有任意数字的那一行！需特别留意，在字符集合 [] 中的减号 - 是有特殊意义的，他代表两个字符之间的所有连续字符！但这个连续与否与 ASCII 编码有关，因此，你的编码需要设置正确（在 bash 当中，需要确定 LANG 与 LANGUAGE 的变量是否正确！）例如所有大写字符则为 [A-Z] `grep -n '[A-Z]' regular_express.txt`
[^list]	意义：字符集合的 RE 字符，里面列出不要的字串或范围！范例：搜寻的字串可以是（oog）（ood）但不能是（oot），那个 ^ 在 [] 内时，代表的意义是“反向选择”的意思。例如，我不要大写字符，则为 [^A-Z]。但是，需要特别注意的是，如果以 grep -n [^A-Z] regular_express.txt 来搜寻，却发现该文件内的所有行都被列出，为什么？因为这个 [^A-Z] 是“非大写字符”的意思，因为每一行均有非大写字符，例如第一行的 "Open Source" 就有 p,e,n,o.... 等等的小写字 `grep -n 'oo[^t]' regular_express.txt`
{n,m}	意义：连续 n 到 m 个的“前一个 RE 字符” 意义：若为 {n} 则是连续 n 个的前一个 RE 字符，意义：若是 {n,} 则是连续 n 个以上的前一个 RE 字符！范例：在 g 与 g 之间有 2 个到 3 个的 o 存在的字串，亦即（goog）（gooog） `grep -n 'go\{2,3\}g' regular_express.txt`

2.3 sed 工具

sed 本身也是一个管线命令，可以分析标准输入 而且 sed 还可以将数据进行取代、删除、新增、选取特定行等等的功能

[mcb@localhost ~]$ sed [-nefr] [动作]
选项与参数：
-n  ：使用安静（silent）模式。在一般 sed 的用法中，所有来自标准输入的数据一般都会被列出到屏幕上。但如果加上 -n 参数后，则只有经过 sed 特殊处理的那一行（或操作）才会被列出来。
-e  ：直接在命令行模式上进行 sed 的动作编辑；
-f  ：直接将 sed 的操作写在一个文件内， -f filename 则可以执行 filename 内的 sed 动作；
-r  ：sed 的操作使用的是扩展型正则表达式的语法。（默认是基础正则表达式语法）
-i  ：直接修改读取的文件内容，而不是由屏幕输出。动作说明：  [n1[,n2]]function
n1, n2 ：不见得会存在，一般代表“选择进行动作的行数”，举例来说，如果我的动作是需要在 10 到 20 行之间进行的，则【10,20[操作行为]】function 有下面这些咚咚：
a   ：新增， a 的后面可以接字串，而这些字串会在新的一行出现（目前的下一行）～
c   ：替换， c 的后面可以接字串，这些字串可以替换 n1,n2 之间的行！
d   ：删除，因为是删除啊，所以 d 后面通常不接任何东西；
i   ：插入， i 的后面可以接字串，而这些字串会在新的一行出现（目前的上一行）；
p   ：打印，亦即将某个选择的数据印出。通常 p 会与参数 sed -n 一起运行；
s   ：取代，可以直接进行取代的工作，通常这个 s 的动作可以搭配正则表达式！例如 1,20s/old/new/g

//将/etc/passwd的内容列出来并且打印行号，同时，请将2~5行删除
[mcb@localhost ~]$ nl /etc/passwd | sed '2,5d'1  root:x:0:0:root:/root:/bin/bash6  sync:x:5:0:sync:/sbin:/bin/sync7  shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
//删除2到最后一行
[mcb@localhost ~]$ nl /etc/passwd | sed '2,$d'1  root:x:0:0:root:/root:/bin/bash
//在第二行后面加上两行字
[mcb@localhost ~]$ nl /etc/passwd | sed '2a drink tea or....\
drink tea?'1  root:x:0:0:root:/root:/bin/bash2  bin:x:1:1:bin:/bin:/sbin/nologin
drink tea or....
drink tea?
在第二行前面的话，将2a改成2i即可；
//将2-5行替换成新内容No 2-5 number
[mcb@localhost ~]$ nl /etc/passwd | sed '2,5c No 2-5 number'1  root:x:0:0:root:/root:/bin/bash
No 2-5 number6  sync:x:5:0:sync:/sbin:/bin/sync

取出5-7行
[mcb@localhost ~]$ nl /etc/passwd | sed -n '5,7p'5  lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin6  sync:x:5:0:sync:/sbin:/bin/sync7  shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

部分数据的查找与替换功能

sed 's/要被取代的字串/新的字串/g'

先使用 grep 将关键字 MAN 所在行取出来，删除掉注解之后的数据，接下来要删除掉空白行
[mcb@localhost ~]$ cat /etc/man_db.conf | grep ‘MAN'| sed 's/#.*$//g' | sed '/^$/d'

sed 【 -i 】选项可以直接修改文件内容，这功能非常有帮助

3 扩展正则表达式

例题：去除空白行与行首为 # 的行列

方法一：利用管道命令查找两次：

[mcb@localhost ~]$ grep -v '^$' regular_express.txt | grep -v '^#'

方法二：使用扩展正则表达式的方法：

[mcb@localhost ~]$ egrep -v '^$|^#' regular_express.txt

扩展正则表达式的特殊符号：

RE 字符	意义与范例
+	意义：重复“一个或一个以上”的前一个 RE 字符范例：查找god、good、goood等的字符串。那个 o+ 代表【一个及以上的 o】所以，下面的执行成果会将第 1, 9, 13 行列出来。 `egrep -n 'go+d' regular_express.txt` *等价于：grep -n 'g.d' regular_express.txt和grep -n 'good' regular_express.txt*
？	意义：【零个或一个】的前一个 RE 字符范例：查找gd、god 这两个字串。那个 o? 代表【空的或 1 个 o 】所以，上面的执行成果会将第 13, 14 行列出来。有没有发现到，这两个案例（ 'go+d' 与 'go?d' ）的结果集合与 'god' 相同？想想看，这是为什么？答：不同，o?是查找0或1个，而o则是查找0到无穷多个 `egrep -n 'go?d' regular_express.txt`**
\|	意义：用或（ or ）的方式找出数个字串范例：查找 gd 或 good 这两个字串，注意，是【或】！所以，第 1,9,14 这三行都可以被打印出来，那如果还想要找出 dog 呢？ `egrep -n 'gd\|good' regular_express.txt` `egrep -n 'gd\|good\|dog' regular_express.txt`
( )	意义：找出“群组”字串范例：查找glad或 good这两个字串，因为 g 与 d 是重复的，所以，我就可以将 la 与 oo 列于（）当中，并以 \| 来分隔开来 `[mcb@localhost ~]$ egrep -n 'g(la\|oo)d' regular_express.txt 1:"Open Source" is a good mechanism to develop programs. 9:Oh! The soup taste good.^M 16:The world <Happy> is the same with "glad".`
( )+	意义：多个重复群组的判别范例：将【AxyzxyzxyzxyzC】用 echo 打印，然后再使用如下的方法找一下！ `echo 'AxyzxyzxyzxyzC'\|egrep 'A(xyz)+C'`