如何使用Python为现有HTML添加一致的空格?(How can I add consistent whitespace to existing HTML using Python?)

编程入门 行业动态 更新时间:2024-10-27 06:26:36
如何使用Python为现有HTML添加一致的空格?(How can I add consistent whitespace to existing HTML using Python?)

我刚刚开始在一个网页上工作,这个网站上的所有HTML都在一行中,这对于阅读和使用来说真的很痛苦。 我正在寻找一个工具(最好是一个Python库),它将接受HTML输入并返回相同的HTML,除了添加换行符和适当的缩进。 (所有标签,标记和内容都应保持不变。)

该库不必处理格式错误的HTML; 我首先通过html5lib传递HTML,因此它将获得格式良好的HTML。 但是,如上所述,我宁愿它不改变任何实际的标记本身; 我相信html5lib,宁愿让它处理正确性方面。

首先,有没有人知道这是否可能与html5lib? (不幸的是,他们的文档看起来有点稀疏。)如果没有,你会建议使用什么工具? 我见过有人推荐HTML Tidy,但我不确定它是否可以配置为只改变空格。 (除非插入空格,否则它会执行任何操作,如果它通过格式良好的HTML开始?)

I just started working on a website that is full of pages with all their HTML on a single line, which is a real pain to read and work with. I'm looking for a tool (preferably a Python library) that will take HTML input and return the same HTML unchanged, except for adding linebreaks and appropriate indentation. (All tags, markup, and content should be untouched.)

The library doesn't have to handle malformed HTML; I'm passing the HTML through html5lib first, so it will be getting well-formed HTML. However, as mentioned above, I would rather it didn't change any of the actual markup itself; I trust html5lib and would rather let it handle the correctness aspect.

First, does anyone know if this is possible with just html5lib? (Unfortunately, their documentation seems a bit sparse.) If not, what tool would you suggest? I've seen some people recommend HTML Tidy, but I'm not sure if it can be configured to only change whitespace. (Would it do anything except insert whitespace if it were passed well-formed HTML to start with?)

最满意答案

算法

将html解析为一些表示 将表示序列化为html

使用BeautifulSoup树构建器的示例html5lib解析器

#!/usr/bin/env python from html5lib import HTMLParser, treebuilders parser = HTMLParser(tree=treebuilders.getTreeBuilder("beautifulsoup")) c = """<HTML><HEAD><TITLE>Title</TITLE></HEAD><BODY>...... </BODY></HTML>""" soup = parser.parse(c) print soup.prettify()

输出:

<html> <head> <title> Title </title> </head> <body> ...... </body> </html>

Algorithm

Parse html into some representation Serialize the representation back to html

Example html5lib parser with BeautifulSoup tree builder

#!/usr/bin/env python from html5lib import HTMLParser, treebuilders parser = HTMLParser(tree=treebuilders.getTreeBuilder("beautifulsoup")) c = """<HTML><HEAD><TITLE>Title</TITLE></HEAD><BODY>...... </BODY></HTML>""" soup = parser.parse(c) print soup.prettify()

Output:

<html> <head> <title> Title </title> </head> <body> ...... </body> </html>

更多推荐

本文发布于:2023-08-07 05:25:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1459791.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:空格   如何使用   HTML   Python   existing

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!