从Java中的字符串中有效删除特定字符(一些标点符号)?

编程入门 行业动态 更新时间:2024-10-15 16:23:46
本文介绍了从Java中的字符串中有效删除特定字符(一些标点符号)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

在Java中,从String中删除给定字符的最有效方法是什么?目前,我有这样的代码:

In Java, what is the most efficient way of removing given characters from a String? Currently, I have this code:

private static String processWord(String x) { String tmp; tmp = x.toLowerCase(); tmp = tmp.replace(",", ""); tmp = tmp.replace(".", ""); tmp = tmp.replace(";", ""); tmp = tmp.replace("!", ""); tmp = tmp.replace("?", ""); tmp = tmp.replace("(", ""); tmp = tmp.replace(")", ""); tmp = tmp.replace("{", ""); tmp = tmp.replace("}", ""); tmp = tmp.replace("[", ""); tmp = tmp.replace("]", ""); tmp = tmp.replace("<", ""); tmp = tmp.replace(">", ""); tmp = tmp.replace("%", ""); return tmp; }

如果我使用某种StringBuilder或正则表达式会更快吗?或者别的什么?是的,我知道:简介并查看,但我希望有人可以提供他们头脑的答案,因为这是一项常见任务。

Would it be faster if I used some sort of StringBuilder, or a regex, or maybe something else? Yes, I know: profile it and see, but I hope someone can provide an answer of the top of their head, as this is a common task.

推荐答案

这是一个迟到的答案,只是为了好玩。

Here's a late answer, just for fun.

在这种情况下,我建议瞄准速度的可读性。当然,你可以超级可读但速度太慢,就像这个超简洁版本一样:

In cases like this, I would suggest aiming for readability over speed. Of course you can be super-readable but too slow, as in this super-concise version:

private static String processWord(String x) { return x.replaceAll("[][(){},.;!?<>%]", ""); }

这很慢,因为每次调用此方法时,都会编译正则表达式。所以你可以预编译正则表达式。

This is slow because everytime you call this method, the regex will be compiled. So you can pre-compile the regex.

private static final Pattern UNDESIRABLES = Patternpile("[][(){},.;!?<>%]"); private static String processWord(String x) { return UNDESIRABLES.matcher(x).replaceAll(""); }

这应该足够快,大多数用途,假设JVM的正则表达式引擎优化了字符类查找。这是我个人会使用的解决方案。

This should be fast enough for most purposes, assuming the JVM's regex engine optimizes the character class lookup. This is the solution I would use, personally.

现在没有分析,我不知道你是否可以通过制作自己的角色(实际代码点)查找表做得更好:

Now without profiling, I wouldn't know whether you could do better by making your own character (actually codepoint) lookup table:

private static final boolean[] CHARS_TO_KEEP = new boolean[];

填写一次,然后迭代,生成结果字符串。我会把代码留给你。 :)

Fill this once and then iterate, making your resulting string. I'll leave the code to you. :)

同样,我不会深入研究这种优化。代码变得难以阅读。性能是一个令人担忧的问题吗?还要记住,现代语言是JITted,在热身后它们会表现得更好,所以请使用一个好的分析器。

Again, I wouldn't dive into this kind of optimization. The code has become too hard to read. Is performance that much of a concern? Also remember that modern languages are JITted and after warming up they will perform better, so use a good profiler.

应该提到的一件事是原始的例子问题是非常不具备性能的,因为你正在创建一大堆临时字符串!除非编译器优化所有这些,否则该特定解决方案将执行最差。

One thing that should be mentioned is that the example in the original question is highly non-performant because you are creating a whole bunch of temporary strings! Unless a compiler optimizes all that away, that particular solution will perform the worst.

更多推荐

从Java中的字符串中有效删除特定字符(一些标点符号)?

本文发布于:2023-10-08 15:14:15,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1473001.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:标点符号   字符串   字符   Java

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!