将字符串拆分为句子

编程入门 行业动态 更新时间:2024-10-12 22:30:54
本文介绍了将字符串拆分为句子-忽略拆分的缩写的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在尝试将此字符串拆分为句子,但是我需要处理缩写(单词的固定格式为 x.y.:

I'm trying to split this string into sentences, but I need to handle abbreviations (which have the fixed format x.y. as a word:

content = "This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn't split the sentence. Sometimes there are problems, i.e. in this one. here and abbr at the end x.y.. cool."

我尝试过此正则表达式:

I tried this regex:

content.replace(/([.?!])\s+(?=[A-Za-z])/g, "$1|").split("|");

但是您可以看到缩写存在问题.由于所有缩写均采用 x.y.格式,因此有可能将它们作为一个单词来处理,而无需在此时拆分字符串.

But as you can see there are problems with abbreviations. As all the abbreviations are of the format x.y. it should be possible to handle them as a word, without splitting the string at this point.

"This is a long string with some numbers 123.456,78 or 100.000 and e.g.", "some abbreviations in it, which shouldn't split the sentence." "Sometimes there are problems, i.e.", "in this one.", "here and abbr at the end x.y..", "cool."

结果应为:

"This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn't split the sentence." "Sometimes there are problems, i.e. in this one.", "here and abbr at the end x.y..", "cool."

推荐答案

解决方案是匹配并捕获缩写,并使用回调构建替换项:

The solution is to match and capture the abbreviations and build the replacement using a callback:

var re = /\b(\w\.\w\.)|([.?!])\s+(?=[A-Za-z])/g; var str = 'This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn\'t split the sentence. Sometimes there are problems, i.e. in this one. here and abbr at the end x.y.. cool.'; var result = str.replace(re, function(m, g1, g2){ return g1 ? g1 : g2+"\r"; }); var arr = result.split("\r"); document.body.innerHTML = "<pre>" + JSON.stringify(arr, 0, 4) + "</pre>";

正则表达式说明:

  • \ b(\ w \.\ w \.)-将缩写(由单词字符组成,然后由.组成,然后再次将a捕获并捕获到第1组中)字字符和一个.)作为一个完整字词
  • | -或...
  • ([.?!])\ s +(?= [A-Za-z]):
    • ([.?!])-将.或?或!匹配并捕获到第2组中!>
    • \ s + -匹配1个或多个空格符号...
    • (?= [A-Za-z])-在ASCII字母之前.
    • \b(\w\.\w\.) - match and capture into Group 1 the abbreviation (consisting of a word character, then . and again a word character and a .) as a whole word
    • | - or...
    • ([.?!])\s+(?=[A-Za-z]):
      • ([.?!]) - match and capture into Group 2 either . or ? or !
      • \s+ - match 1 or more whitespace symbols...
      • (?=[A-Za-z]) - that are before an ASCII letter.

更多推荐

将字符串拆分为句子

本文发布于:2023-11-29 22:30:35,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1647812.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:字符串   句子

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!