分句"/>
Java实现英文段落分句
Java按句点分割段落[重复](Java split a paragraph by periods [duplicate])
这个问题在这里已有答案:
我正在尝试构建一个正则表达式,将句子分隔成句点( . )分隔的句子。 这应该工作:
String str[] = text.split("\\.");
但是我需要添加最小的健壮性,例如检查句点后跟space和大写字母。 所以这是我的下一个猜测:
String text="The pen is on the table. The table has a pen upon it.";
String arr[] = text.split("\\. [A-Z]");
for (String s: arr)
System.out.println(s);
Output:
The pen is on the table
he table has a pen upon it.
不幸的是,我错过了这段时间后的第一个角色。 你能看到它的修复方式吗?
I'm trying to build up a regular expression which splits a paragraph in sentences separated by a period (.). That should work:
String str[] = text.split("\\.");
However I'd need to add a minimum of robustness, for example checking that the period is followed by a space and an uppercase letter. So here's my next guess:
String text="The pen is on the table. The table has a pen upon it.";
String arr[] = text.split("\\. [A-Z]");
for (String s: arr)
System.out.println(s);
Output:
The pen is on the table
he table has a pen upon it.
Unfortunately, I'm missing the first character after the period. Can you see any way it can be fixed?
原文:
更新时间:2020-02-15 02:27
最满意答案
您可以使用前瞻来查看字符串中接下来会发生什么。
text.split("\\. (?=[A-Z])");
{ "The pen is on the table", "The table has a pen upon it." }
如果你想保持时期,你也可以使用lookbehind:
text.split("(?<=\\.) (?=[A-Z])");
{ "The pen is on the table.", "The table has a pen upon it." }
You can use a lookahead to see what is coming next in the string.
text.split("\\. (?=[A-Z])");
{ "The pen is on the table", "The table has a pen upon it." }
If you want to keep the periods as well, you can also use a lookbehind:
text.split("(?<=\\.) (?=[A-Z])");
{ "The pen is on the table.", "The table has a pen upon it." }
2018-02-12
相关问答
我不熟悉Ingres中的日期函数。 让我假设-得到两个日期之间的差异。 如果数据中没有重叠,那么您可以非常轻松地执行您想要的操作。 如果没有间隙,则最小和最大日期之间的差异与每条线上的差异总和相同。 如果差异大于0,则存在间隙。 所以: select ref,
((max(to_date) - min(from_date)) -
sum(to_date - from_date)
) as total_gaps
from t
group by ref;
...
var span = $('°').get(0);
$('p, p *').each(function (index,element) {
$.each(element.childNodes, function (index,node) {
if (node.nodeType == 3) {
$.each(node.nodeValue.split('.'), function (index,f
...
这很有可能更好地使用nltk处理( 安装正确 ,那是): from nltk.tokenize import sent_tokenize
string = "This is a sentence. This is another. And here one another, same line, starting with space. this sentence starts with lowercase letter. Here is a site you may know: google.
...
在PHP中,句点是连接运算符。 将句点放入PHP $modarrayout "mod/"连接到$modarrayout ,然后将结果字符串连接到"/bar.php" 。 看这个页面: .operators.string.php In PHP, the period is the concatentation operator. Putting the periods in tells PHP to concatenate "
...
一般而言,您的方法是正确的,iText7的布局足够灵活,可以让您轻松完成所需的任务。 我看到的唯一不清楚的地方就是Paragraph实际上是一个不能自我分割的元素,布局框架中的任何类都不利于元素拆分。 你可以手动做,但没有必要。 相反,您应该直接使用IRenderer和ParagraphRenderer 。 IRenderer可以将其自身作为layout操作的结果进行分割,并且仅与包含完整数据的Paragraph相比才表示数据的必要部分。 您可以将一个IRenderer添加到CanvasRende
...
您可以使用前瞻来查看字符串中接下来会发生什么。 text.split("\\. (?=[A-Z])");
{ "The pen is on the table", "The table has a pen upon it." }
如果你想保持时期,你也可以使用lookbehind: text.split("(?<=\\.) (?=[A-Z])");
{ "The pen is on the table.", "The table has a pen upon it." }
You can us
...
\b匹配ABC.123期间的ABC.123 。 您可以更改它以避免这种情况。 例如: (?
给出完整的引用表达式: @"(?
您可能希望将#()=>
...
var arrOfPtags = document.getElementsByTagName("p");
for (var i = 0;i < arrOfPtags.length; i++){
arrOfPtags[i].setAttribute("desired_attribute", "value");
}
var arrOfPtags = document.getElementsByTagName("p");
for (var i = 0;i < arrOfPtags.lengt
...
更多推荐
Java实现英文段落分句
发布评论