shell脚本脚本中正则表达式和sed的使用

编程入门行业动态更新时间:2024-10-18 03:37:48

shell<a href=https://www.elefans.com/category/jswz/34/1771291.html style= 脚本脚本中正则表达式和sed的使用"/>

shell脚本脚本中正则表达式和sed的使用

1、正则表达式的使用

**
1.1、概念：
正则表达式（或称Regular Expression，简称RE），是用于描述字符排列和匹配模式的一种语法规则。它主要用于字符串的分割，匹配、査找及替换操作。即正则表达式是一种文本模式，该模式描述在搜索文本时要匹配的一个或多个字符串。

这种枯燥的概念难理解，其实，正则表达式是用来匹配文件中的字符串的方法。它会先把整个文本分成一行一行的字符串，然后从每行字符串中搜索是否有符合正则表达式规则的字符串，如果有则匹配成功，如果没有则匹配失败。
注：正则表达式和通配符的区别（正则表达式用来在文件中匹配符合条件的字符串，通配符用来匹配符合条件的文件名）。其实这种区别只在 Shell 中适用，因为用来在文件中搜索字符串的命令，如 grep、awk、sed、vi 等可以支持正则表达式，而在系统中搜索文件的命令，如 ls、find、cp 等不支持正则表达式，所以只能使用 Shell 自己的通配符来进行匹配了。
在正则表达式中，我们把用于匹配的特殊符号又称作元字符。在shell中，元字符又分为基础元字符（BRE）和扩展元字符（ERE）。
基础元字符

                         基础元字符

元字符的作用
*匹配前面的一个字符或子表达式0次或任意多次。如：ahello匹配所有0个或多个a后,紧跟hello的行。即hello前面可以有任意多个a。
. 匹配除换行符和回车符（“\n”和"\r"）外的任意一个字符。例如：l…e匹配包含一个l，后跟两个任意字符，然后跟一个e的行。
. 表示匹配任意长度字符串
^ 匹配行首。例如，^hello 会匹配以 hello 开头的行。
$ 匹配行尾。例如，hello$ 会匹配以 hello 结尾的行
^$ 匹配空行
[] 匹配中括号中指定的任意一个字符，而且只匹配一个字符。例如[aoeiu]匹配任意一个元音字母， [0-9] 匹配任意一位数字，[a-z][0-9] 匹配由小写字母和一位数字构成的两位字符，[a-zA-Z] 匹配任意一位英文字母

[^] 匹配除中括号中的字符以外的任意一个字符。例如，[^0-9] 匹配任意一位非数字字符，[^a-z] 匹配任意一位非小写字母，
注意：可以用^标记做[]内的前缀，表示除[]内的字符以外的任意一个字符。比如：搜索oo前没有g的字符串的行，应用 ‘[^g]oo’ 作搜索字符串。^{符号如果出现在[]的起始位置表示否定，但是在[]的其他位置是普通字符。[}ab^c] 匹配除了a、b、^、c以外的任意单个字符。
\ 转义符，用于取消特殊符号的含义，使该特殊字符成为普通字符。例如：^.[0-9][0-9]表示匹配以一个句点和两个数字开始。
{n} 表示其前面的字符出现 n 次。例如，[0-9]{4} 匹配4位数字，1[35-9][0-9]{9} 匹配手机号码。
{n,} 表示其前面的字符出现不少于 n 次。例如，[0-9]{2,} 匹配两位及以上的数字
{n,m} 表示其前面的字符至少出现 n 次，最多出现 m 次。例如，[a-z]{6,8} 匹配 6〜8 位的小写字母
\<
\> 匹配词（word）的开始（<）和结束（>）。例如正则表达式<the>能够匹配字符串"for the wise"中的"the"，但是不能匹配字符串"otherwise"中的"the"。注意：这个元字符不是所有的软件都支持的。
扩展正则表达式
熟悉正则表达式的人应该很疑惑，在正则表达式中应该还可以支持一些元字符，比如"+"、"?"、"|"、"()"。
其实 Linux 是支持这些元字符的，只是 grep 命令默认不支持而已，因为grep把这些扩展元字符看成是普通符号，如果要想支持这些元字符，则可以使用 egrep 或 grep -E 命令。所以我们又把这些元字符称作扩展元字符。
如果査询 egrep 命令的帮助，对 egrep 的说明就是和 grep -E 一样的命令，
Shell 中支持的扩展元字符。
扩展元字符描述

匹配前面的一个字符或子表达式1次或任意多次。
如“go+gle”会匹配“gogle” “google”或“gooogle”。当然，如果“o”有更多个，则也能匹配。
egrep “go+gle” filename或grep -E “go+gel” filename
? 匹配前面的一个字符或子表达式零次或一次。例如：如 “colou?r” 可以匹配 “colour” 或 “color”
| 表示或。如“was|his”既会匹配包含“was”的行，或匹配包含“his”的行
() 将括号里的内容看成是一个整体。可以理解为由多个单个字符组成的大字符。
如“(dog)+”会匹配“dog” “dogdog” “dogdogdog”等，因为被()包含的字符会被当成一个整体。但 “hello(world|earth)” 会匹配 “hello world” 及 “hello earth”
注："+"、"?"、"|"、"()"、"{}“等扩展元字符egrep命令或grep -E是支持的。grep命令在不加-E选项的情况下可以按如下格式写：”+"、"?"、"|"、"()"、"{}"
正则表达式范例
下面举例来说明这些元字符的作用。我们已经学习过的 grep 命令支持正则表达式，所以下面的练习都需要利用 grep 命令来演示。在使用 grep 命令开始练习之前，建议大家在 ~/.bashrc 文件中建立这个别名，如下：

[root@localhost ~]# vi /root/.bashrc
alias grep=‘grep --color=auto’
执行source命令使修改生效
[root@localhost ~]#source /root/.bashrc
这样，grep 命令所匹配的字符都会使用颜色提示，更加容易理解正则表达式所具体匹配的字符串。
练习文件建立:
既然正则表达式是用来在文件中匹配字符串的，那么我们必须建立一个测试用的文件，才可以进行后续的实验。文件如下：
[root@localhost ~]# cat test_rule.txt
Mr. Li Ming said:
he was the most honest man in LampBrother.
123despise him.

But since Mr. shen Chao came,
he never saaaid those words.
5555nice!

because,actuaaaally,
Mr. Shen Chao is the most honest man
Later,Mr. Li ming soid his hot body.
1、""：前一个字符匹配0次或任意多次
注意，"“和通配符中的”"含义不同，它代表前一个字符重复 0 次或任意多次。比如，"a"并不是匹配"a"后面的任意字符，而是可以匹配所有内容，包括空白行。我们试试：

为什么会这样呢？ "a*"代表匹配 0 个 a 或无数个 a，如果是匹配 0 个 a，也就是每个字符都会匹配，所以会匹配所有内容，包括空白行。所以"a*“这样的正则表达式是没有任何意义的。
如果这样写正则表达式"aa*”，则代表这行字符串一定要有一个 a，但是后面有没有 a 都可以。也就是说，会匹配至少包含一个 a 的行。

注：“”修饰的它前面的一个字符a，而不是字符串aa
如果正则表达式是"aaa"，则会匹配最少包含两个连续 a 的字符串。
2、"."：匹配除换行符和回车符外的任意一个字符
正则表达式"."只能匹配一个字符，这个字符可以是任意字符。举个例子：

#"s…d"会匹配在s和d这两个字母之间一定有两个字符的单词
如果我想匹配在 s 和 d 字母之间有任意字符的单词, 那么该怎么写呢？“sd"这个正则表达式肯定是不行的，因为它会匹配包含 d 字符的行，s可以匹配任何字符。正确的写法应该是"s.*d”。例如：

3、"^"：匹配行首，"$"：匹配行尾
"^{":代表匹配行首，比如"}M"会匹配以大写"M"开头的行。

" " 代表匹配行尾，比如 " n "代表匹配行尾，比如"n "代表匹配行尾，比如"n"会匹配以小写"n"结尾的行。

注意，如果文档是在 Windows 中写入的，那么"M " 是不能正确执行的，因为在 W i n d o w s 中换行符是 " M "是不能正确执行的，因为在 Windows 中换行符是"^M "是不能正确执行的，因为在Windows中换行符是"M"，而在 Linux 中换行符是"KaTeX parse error: Expected group after '^' at position 114: … 这个 RPM 包即可。而"^̲"则会匹配空白行。

如果不加"-n"选项，空白行是没有任何显示的；加入了"-n"能看到空白行的行号。
4、"[]"：匹配中括号中指定的任意一个字符，且只匹配一个字符
"[]"会匹配中括号中指定的任意一个字符，注意只能匹配一个字符。比如 [ao] 要么匹配 a 字符，要么匹配一个 o 字符。

而"[0-9]"会匹配任意一个数字，例如：

#列出包含有数字的行
而"[A-Z]“则会匹配任意一个大写字母，如果正则表达式是”¹"，则代表匹配以小写字母开头的行。
“[^]”：匹配除中括号的字符以外的任意一个字符
这里需要注意，如果"^“在 [] 外，则代表的是行首；如果在 [] 内，则代表的是取反。比如”^{[a-z]“会匹配以小写字母开头的行，而”}[^a-z]"会匹配不以小写字母开头的行。

而"^[a-zA-Z]“会匹配不以字母开头的行。
5、：”"：转义符
转义符会取消特殊符号的含义。如果想要匹配使用".“结尾的行，那么正则表达式是”.KaTeX parse error: Can't use function '\.' in math mode at position 49: …所以需要在前面加入转义符，如"\̲.̲"。

6、"{n}"：表示其前面的字符出现 n 次
"{n}"中的 n 代表数字，这个正则表达式会匹配前一个字符出现 n 次的字符串，比如"zo{3}m"只能匹配"zooom"这个字符串。例如，"a{3}"就会匹配 a 字母连续出现 3 次的字符串。

如果正则表达式是"[0-9]{3}",则会匹配包含三个连续数字的字符串。

虽然"5555"有四个连续的数字，但是包含三个连续的数字，所以也是可以列出的。
只匹配以连续三个数字开头，后面紧跟小写字母的行，

"{n,}"表示其前面的字符出现不少于 n 次。
#匹配最少以连续三个数字开头的行

"{n,m}"表示其前面的字符至少出现n次，最多出现m次。

#匹配在字母s和字母i之间最少有一个a、最多有3个a的字符串
例1：显示sshd_config文件中有效的配置项
[root@localhost ~]# egrep -v “^#|KaTeX parse error: Expected 'EOF', got '#' at position 45: …ot@localhost ~]#̲ grep -E -v …” /etc/ssh/sshd_config
例2：过滤IP地址
在/tmp/hosts文件记录着相关ip地址信息，内容如下：

利用正则表达式过滤出ip地址
答案有很多种，提示：
提示：IP地址从0-255之间，要把0-255拆分成0-99，100-199，200-249，250-255，然后在分别过滤出4段就出来了。正则表达式书写如下所示：

例3：过滤手机号
在/tmp/phone_number文件中存放着电话码信息，内容如下
#cat /tmp/phone_number.txt
13421391020
13521591629
1354243919046
15421391020
16421391020
42456789898
198384732947
利用正则表达式过滤出正确的电话号码

sed命令
2.1、概念
sed全名叫stream editor，流编辑器。用无交互式的方式来编辑文本。
我们知道，vim/vi 采用的是交互式文本编辑模式，你可以用键盘命令来交互性地插入、删除或替换数据中的文本。但本节要讲的 sed 命令不同，它采用的是流编辑模式，最明显的特点是，在sed 处理数据之前，需要预先提供一组规则，sed会按照此规则来编辑数据，实现无交互式编辑数据。
sed也是支持正则表达式的，如果要使用扩展正则加参数-r
sed的执行过程：
sed编辑器逐行处理文件（或输入），并将结果发送到屏幕。
具体过程如下：
1、首先sed把当前正在处理的行保存在一个临时缓存区中（也称为模式空间）。
2、然后处理临时缓冲区中的行，完成后把该行发送到屏幕上。
3、sed每处理完一行就将其从临时缓冲区删除，然后将下一行读入，进行处理和显示。
4、处理完输入文件的最后一行后，sed便结束运行。
大家需要注意，sed 默认不会直接修改源文件数据，而是会将数据复制到缓冲区中，修改也仅限于缓冲区中的数据，并把修改结果只显示到屏幕上，除非使用"-i"选项才会直接修改文件。
sed使用
sed 命令的基本格式如下：
[root@localhost ~]# sed [选项] ‘[动作指令]’ filename
选项
选项:
-n 默认情况下，sed 会在动作指令执行完毕后，自动输出处理后的内容，而该选项会屏蔽默认输出。
-e 执行多个sed指令
-i 此选项会直接修改源文件，要慎用，修改前建议先备份源文件。
-i.bak 编辑源文件的同时创造.bak的备份
-r 使用扩展的正则表达式
动作指令:
p 打印，输出指定的行
S 替换，替换指定字符串
d 删除，删除行
a 增加行，在当前行下面插入文件
i 增加行，在当前行上面插入文件
c 把选定的行改为新的指定的文本
r 读取文件,即用于将一个独立文件的数据插入到当前数据流的指定位置
w 另存为
注意：动作指令要是用单引号或双引号括起来。
sed常用动作指令：
打印（p指令）
我们先准备一个测试文件：
[root@localhost ~]# cat test.txt
my cat’s name is betty
This is your dog
my dog’s name is frank
This is your fish
my fish’s name is george
This is your goat
my goat’s name is adam

P指令表示搜索符号条件的行或指定范围的行，并输出该行的内容。
例1：搜索包含betty关键字的行并显示
注意：搜索条件要使用“/…/”括起来。

注：p指令是默认输出所有行，找到betty的行重复打印。
如果我想指定输出某行数据，就需要"-n"选项（禁止默认输出，只打印找到betty的行）。

可以看到，用 -n 选项和 p 命令配合使用，我们可以禁止输出其他行，只打印包含匹配文本模式的行。

例2：如果想查看一下test.txt文件的第二行，只需要指定行号即可

如果需要对同一文件或行作多次指令动作，可以使用 “-e” 选项

替换（s指令）
我们先准备一个测试文件，文件内容如下：
[root@localhost ~]# cat test1.txt
it is hello too old to learn
it is hello too hello to learn never
when the cat is away,the mice will play
no cross,no crown

sed的s指令可以以行为单位进行部分数据的搜寻并替换。基本上sed的搜寻与替代的与vi相当的类似。
格式：sed ‘s/要被替换的字串/新的字串/g’
例如：[root@localhost ~]# sed ‘s/hello/never/g’ test1.txt
s "替换"命令
/…/…/ 分割符 (Delimiter)
hello 搜索字符串（要被替换的字符串）
never 替换字符串（新的字符串）
注：其实 , 分割符 “/” 可以用别的符号代替 , 比如 ","或 "|“或”@"等。
如：sed ‘s//usr/local/bin//usr/bin/’ filename
等价于 sed ‘s@/usr/local/bin@/usr/bin@’ filename
显然 , 此时用 “@” 作分割符比 “/” 好得多
注：匹配test1.txt文件中每一行的第一个hello替换为never，使用后缀/g标记会替换每一行中的所有匹配。
[root@localhost ~]# sed ‘s/hello/never/’ test1.txt //只替换每行出现的第一个词
it is never too old to learn
it is never too hello to learn never
when the cat is away,the mice will play
no cross,no crown
[root@localhost ~]# sed ‘s/hello/never/2’ test1.txt //只替换每行出现的第二个词
it is hello too old to learn
it is hello too never to learn never
when the cat is away,the mice will play
no cross,no crown
[root@localhost ~]# sed ‘s/hello/never/g’ test1.txt //全局替换
it is never too old to learn
it is never too never to learn never
when the cat is away,the mice will play
no cross,no crown
说明：/表示定界符，定界符是可以自定义的
[root@localhost ~]# head -5 /etc/passwd > mima
[root@localhost ~]# head -2 mima
root❌0:0:root:/root:/bin/bash
bin❌1:1:bin:/bin:/sbin/nologin
[root@localhost ~]# head -2 mima |sed ‘s@/sbin/nologin@/bin/bash@’
root❌0:0:root:/root:/bin/bash
bin❌1:1:bin:/bin:/bin/bash
[root@localhost ~]# head -2 mima |sed ‘s/sbin/nologin/bin/bash/’
root❌0:0:root:/root:/bin/bash
bin❌1:1:bin:/bin:/bin/bash
按行查找替换
用数字表示行范围
例1：单选替换
[root@localhost ~]# cat test2.txt
it is never too old to learn
it is never too never to learn never
when the cat is away,the mice will play
no cross,no crown
[root@localhost ~]# sed ‘2s/never/dog/’ test2.txt
it is never too old to learn
it is dog too never to learn never
when the cat is away,the mice will play
no cross,no crown
例2：多行替换
[root@localhost ~]# cat test3.txt
Doing is better than saying
Doing is better than saying
Doing is better than saying
Doing is better than saying
Doing is better than saying
[root@localhost ~]# sed ‘3,$s/better/saying/’ test3.txt
Doing is better than saying
Doing is better than saying
Doing is saying than saying
Doing is saying than saying
Doing is saying than saying
如果想把某行注释掉，让它不再生效，则可以这样做：
例如：将test2.txt文件的第4行注释掉

或

如果想把某个字符串（如never）替换为空，则可以这样做：

取得ens33网卡IP地址：

例3：多个命令使用
[root@localhost ~]# cat test2.txt
it is never too old to learn
it is never too never to learn never
when the cat is away,the mice will play
no cross,no crown
[root@localhost ~]# sed ‘s/cat/dog/’ test2.txt
it is never too old to learn
it is never too never to learn never
when the dog is away,the mice will play
no cross,no crown
[root@localhost ~]# sed ‘s/never/@@/’ test2.txt
it is @@ too old to learn
it is @@ too never to learn never
when the cat is away,the mice will play
no cross,no crown
[root@localhost ~]# sed ‘s/never/@@/;s/cat/dog/’ test2.txt //第一种用“;”分隔多个sed操作指令。
it is @@ too old to learn
it is @@ too never to learn never
when the dog is away,the mice will play
no cross,no crown
[root@localhost ~]# sed -e ‘s/never/@@/’ -e ‘s/cat/dog/’ test2.txt //第二种用-e选项
删除行（d指令）
如果需要删除文本中的特定行，可以用d指令，它会删除指定行中的所有内容。但使用该命令时要特别小心，如果你忘记指定具体行的话，文件中的所有内容都会被删除。
[root@localhost ~]# cat test1.txt
it is hello too old to learn
it is hello too hello to learn never
when the cat is away,the mice will play
no cross,no crown

什么也不输出，证明成了空文件.
[root@localhost ~]# sed ‘2d’ test1.txt //指定行号删除
it is hello too old to learn
when the cat is away,the mice will play
no cross,no crown
通过特定行区间指定，比如删除 test1.txt 文件内容中的第 2、3行：

通过特殊的文件结尾字符，比如删除test1.txt文件内容中第 3 行开始的所有的内容：

[root@localhost ~]# sed ‘/cat/d’ test1.txt //根据匹配的内容去删除
it is hello too old to learn
it is hello too hello to learn never
no cross,no crown
注：在此强调，在默认情况下 sed 并不会修改原始文件，这里被删除的行只是从 sed 的输出中消失了，原始文件没做任何改变。
添加行（i指令和a指令）
命令i(insert插入)，在指定行前面插入一行
命令a(append附加)，在指定行后面添加一行
它们的基本格式完全相同，如下所示： a（或 i）\新文本内容
例1：插入内容
例1：插入
将一个新行插入到文件的第三行前，执行命令如下：

例2：追加
将一个新行附加到文件的第三行后，执行命令如下：

例3：在文件尾部添加新行内容

在第2行到第4行后分别添加一新行（即在2,3,4行后分别插入）

例4：如果想追加或插入多行数据，则除最后一行外，每行的末尾都要加入""代表数据未完结。

修改行（c指令）
c 命令表示将指定行中的所有内容，替换成该选项后面的字符串。
"c"动作是进行整行替换的，如果仅仅想替换行中的部分数据，就要使用"s"动作了。
指令c的格式：c\用于替换的新文本
[root@localhost ~]# cat test1.txt
it is hello too old to learn
it is hello too hello to learn never
when the cat is away,the mice will play
no cross,no crown
[root@localhost ~]# sed ‘2,$c\hello world’ test1.txt
it is hello too old to learn
hello world
[root@localhost ~]# sed ‘2c\hello world’ test1.txt
it is hello too old to learn
hello world
when the cat is away,the mice will play
no cross,no crown
[root@localhost ~]# sed ‘/when/c\hello world’ test1.txt
it is hello too old to learn
it is hello too hello to learn never
hello world
no cross,no crown
#sed ‘/SELINUX=enforcing/c\SELINUX=disabled’ /etc/selinux/config

对文件的保存和读取
例1：读取
r 命令用于将一个独立文件的数据插入到当前文件的指定位置，该命令的基本格式为：[address]r filename
sed 命令会将 filename 文件中的内容插入到 address 指定行的后面，比如说：
[root@localhost ~]# cat test1.txt
it is hello too old to learn
it is hello too hello to learn never
when the cat is away,the mice will play
no cross,no crown
[root@localhost ~]# sed ‘3r /etc/hosts’ test1.txt

如果你想将指定文件中的数据插入到当前文件的末尾，可以使用 $符，例如：

/etc/hosts里的内容被读进来，显示在与test1.txt匹配的行后面，如果匹配多行，则/etc/hosts的内容将显示在所有匹配行的下面：

例2：写入
将test1.txt文件修改的行写入test11.txt文件中。

在test1.txt中所有包含hello的行都被写入test12.txt里：

对原文件直接修改
-i选项：此选项会直接修改源文件
[root@localhost ~]# sed -i ‘s/cat/dog/’ test1.txt
[root@localhost ~]# cat test1.txt
it is hello too old to learn
it is hello too hello to learn never
when the dog is away,the mice will play
no cross,no crown
[root@localhost ~]# sed -i.bak ‘s/dog/cat/’ test1.txt
[root@localhost ~]# cat 1.txt
it is hello too old to learn
it is hello too hello to learn never
when the cat is away,the mice will play
no cross,no crown
[root@localhost ~]# cat test1.txt.bak
it is hello too old to learn
it is hello too hello to learn never
when the dog is away,the mice will play
no cross,no crown
去除行首数字
[root@localhost ~]# cat test2.txt
1234 it is never too old to learn
999 it is never too never to learn never
when the cat is away,the mice will play
353463463 no cross,no crown
ni hao xx nisdfsdf
[root@localhost ~]# sed -r ‘s/²+//g’ test2.txt
it is never too old to learn
it is never too never to learn never
when the cat is away,the mice will play
no cross,no crown
ni hao xx nisdfsdf