如何知道Jsoup删除哪个文本?(How to know which text Jsoup remove?)

编程入门 行业动态 更新时间:2024-10-25 19:20:38
如何知道Jsoup删除哪个文本?(How to know which text Jsoup remove?) java

我有以下文字:

text<html/>text

并使用Jsoup库来清理html内容中的文本。 就像下面的代码一样:

Document clean = new Cleaner(none()).clean(myDirtyDoc);

我将为用户记录错误: Malisious content was specified: "<html/>". 但我不知道如何正确识别Jsoup干净的线条。

我曾尝试使用StringUtils.difference(cleaningValue,value),但此方法以另一种方式工作,即文档说:

Compares two Strings, and returns the portion where they differ. (More precisely, return the remainder of the second String, starting from where it's different from the first.)

结果它返回如下字符串: <html/>text 。

很高兴知道任何可以在java中用于比较字符串的diff工具。

I have the following text:

text<html/>text

And use Jsoup library in order to clean up text from html content. Namely like code below:

Document clean = new Cleaner(none()).clean(myDirtyDoc);

I am going to log error for user like: Malisious content was specified: "<html/>". But I don't know how properly identify line that Jsoup was clean.

I've tried to use StringUtils.difference(cleanedValue, value), but this method works in another way, namely documentation says:

Compares two Strings, and returns the portion where they differ. (More precisely, return the remainder of the second String, starting from where it's different from the first.)

As result it return string like this: <html/>text.

Will be good to know any diff tools that can be easily used in java for comparing strings.

最满意答案

谷歌的Diff-比赛补丁

Diff Match和Patch库提供了强大的算法来执行同步纯文本所需的操作。

差异:比较两个纯文本块并有效地返回差异列表。

匹配:给定搜索字符串,在纯文本块中找到最佳模糊匹配。 为准确性和位置加权。

修补程序:将修补程序列表应用于纯文本。 即使基础文本不匹配,也要尽最大努力应用补丁。

目前提供Java,JavaScript,Dart,C ++,C#,Objective C,Lua和Python。 无论语言如何,每个库都具有相同的API和相同的功能。 所有版本都有全面的测试工具。

有一个Line或word diffs wiki页面,描述了如何进行逐行差异。

google-diff-match-patch

The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.

Diff: Compare two blocks of plain text and efficiently return a list of differences.

Match: Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.

Patch: Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.

Currently available in Java, JavaScript, Dart, C++, C#, Objective C, Lua and Python. Regardless of language, each library features the same API and the same functionality. All versions also have comprehensive test harnesses.

There is a Line or word diffs wiki page which describes how to do line-by-line diffs.

更多推荐

本文发布于:2023-08-01 08:01:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1356735.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文本   Jsoup   remove   text

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!