如何删除分析的XML文本中的'BODY'标签？(How to remove 'BODY' tag in parsed xml text?)

系统教程行业动态更新时间:2024-06-14 16:57:40

我是一个新手程序员。我使用python 3和BeautifulSoup4解析了一些xml文件时遇到了问题。也就是说，解析文本显示为

"BODY { MARGIN: 0px; FONT-FAMILY: Malgun Gothic; COLOR: #000000; FONT-SIZE: 10pt}P { LINE-HEIGHT: 1.2; MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px}LI { LINE-HEIGHT: 1.2; MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px} blar - blar - blar "

'blar - blar - blar'是我想要解析的文本。

我如何删除该文本中无用的单词？

I'm a novice programmer. I got a problem with parsing some xml files using python 3 and BeautifulSoup4. That is, Parsed text is shown as

'blar - blar - blar' is the text what i want to parse.

How can i remove that useless words in that text?

最满意答案

我会用这个正则表达式。如果您缩小了想要缩小的字符串格式，可以创建更好的正则表达式。

import re text = "BODY { MARGIN: 0px; FONT-FAMILY: Malgun Gothic; COLOR: #000000; FONT-SIZE: 10pt}P { LINE-HEIGHT: 1.2; MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px}LI { LINE-HEIGHT: 1.2; MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px} blar - blar - blar" print (re.findall("(?:(?:(.*?)}){3})(.*)",text)[0][1])

这里有一个regex101让你看看：

https://regex101.com/r/m0Q3hL/1

I'd use regex for this. If you narrowed the formatting of the string you want down a bit, you could create a nicer regex.

Here's a regex101 for you to look at:

https://regex101.com/r/m0Q3hL/1

更多推荐

本文发布于:2023-04-13 12:27:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/dzcp/4511e095032f568824e6aaca915a042b.html