我有以下文字:
text<html/>text并使用Jsoup库来清理html内容中的文本。 就像下面的代码一样:
Document clean = new Cleaner(none()).clean(myDirtyDoc);我将为用户记录错误: Malisious content was specified: "<html/>". 但我不知道如何正确识别Jsoup干净的线条。
我曾尝试使用StringUtils.difference(cleaningValue,value),但此方法以另一种方式工作,即文档说:
Compares two Strings, and returns the portion where they differ. (More precisely, return the remainder of the second String, starting from where it's different from the first.)结果它返回如下字符串: <html/>text 。
很高兴知道任何可以在java中用于比较字符串的diff工具。
I have the following text:
text<html/>textAnd use Jsoup library in order to clean up text from html content. Namely like code below:
Document clean = new Cleaner(none()).clean(myDirtyDoc);I am going to log error for user like: Malisious content was specified: "<html/>". But I don't know how properly identify line that Jsoup was clean.
I've tried to use StringUtils.difference(cleanedValue, value), but this method works in another way, namely documentation says:
Compares two Strings, and returns the portion where they differ. (More precisely, return the remainder of the second String, starting from where it's different from the first.)As result it return string like this: <html/>text.
Will be good to know any diff tools that can be easily used in java for comparing strings.
最满意答案
谷歌的Diff-比赛补丁
Diff Match和Patch库提供了强大的算法来执行同步纯文本所需的操作。
差异:比较两个纯文本块并有效地返回差异列表。
匹配:给定搜索字符串,在纯文本块中找到最佳模糊匹配。 为准确性和位置加权。
修补程序:将修补程序列表应用于纯文本。 即使基础文本不匹配,也要尽最大努力应用补丁。
目前提供Java,JavaScript,Dart,C ++,C#,Objective C,Lua和Python。 无论语言如何,每个库都具有相同的API和相同的功能。 所有版本都有全面的测试工具。
有一个Line或word diffs wiki页面,描述了如何进行逐行差异。
google-diff-match-patch
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.
Diff: Compare two blocks of plain text and efficiently return a list of differences.
Match: Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.
Patch: Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.
Currently available in Java, JavaScript, Dart, C++, C#, Objective C, Lua and Python. Regardless of language, each library features the same API and the same functionality. All versions also have comprehensive test harnesses.
There is a Line or word diffs wiki page which describes how to do line-by-line diffs.
更多推荐
发布评论