在Java中,从String中删除给定字符的最有效方法是什么?目前,我有这样的代码:
In Java, what is the most efficient way of removing given characters from a String? Currently, I have this code:
private static String processWord(String x) { String tmp; tmp = x.toLowerCase(); tmp = tmp.replace(",", ""); tmp = tmp.replace(".", ""); tmp = tmp.replace(";", ""); tmp = tmp.replace("!", ""); tmp = tmp.replace("?", ""); tmp = tmp.replace("(", ""); tmp = tmp.replace(")", ""); tmp = tmp.replace("{", ""); tmp = tmp.replace("}", ""); tmp = tmp.replace("[", ""); tmp = tmp.replace("]", ""); tmp = tmp.replace("<", ""); tmp = tmp.replace(">", ""); tmp = tmp.replace("%", ""); return tmp; }如果我使用某种StringBuilder或正则表达式会更快吗?或者别的什么?是的,我知道:简介并查看,但我希望有人可以提供他们头脑的答案,因为这是一项常见任务。
Would it be faster if I used some sort of StringBuilder, or a regex, or maybe something else? Yes, I know: profile it and see, but I hope someone can provide an answer of the top of their head, as this is a common task.
推荐答案这是一个迟到的答案,只是为了好玩。
Here's a late answer, just for fun.
在这种情况下,我建议瞄准速度的可读性。当然,你可以超级可读但速度太慢,就像这个超简洁版本一样:
In cases like this, I would suggest aiming for readability over speed. Of course you can be super-readable but too slow, as in this super-concise version:
private static String processWord(String x) { return x.replaceAll("[][(){},.;!?<>%]", ""); }这很慢,因为每次调用此方法时,都会编译正则表达式。所以你可以预编译正则表达式。
This is slow because everytime you call this method, the regex will be compiled. So you can pre-compile the regex.
private static final Pattern UNDESIRABLES = Patternpile("[][(){},.;!?<>%]"); private static String processWord(String x) { return UNDESIRABLES.matcher(x).replaceAll(""); }这应该足够快,大多数用途,假设JVM的正则表达式引擎优化了字符类查找。这是我个人会使用的解决方案。
This should be fast enough for most purposes, assuming the JVM's regex engine optimizes the character class lookup. This is the solution I would use, personally.
现在没有分析,我不知道你是否可以通过制作自己的角色(实际代码点)查找表做得更好:
Now without profiling, I wouldn't know whether you could do better by making your own character (actually codepoint) lookup table:
private static final boolean[] CHARS_TO_KEEP = new boolean[];填写一次,然后迭代,生成结果字符串。我会把代码留给你。 :)
Fill this once and then iterate, making your resulting string. I'll leave the code to you. :)
同样,我不会深入研究这种优化。代码变得难以阅读。性能是一个令人担忧的问题吗?还要记住,现代语言是JITted,在热身后它们会表现得更好,所以请使用一个好的分析器。
Again, I wouldn't dive into this kind of optimization. The code has become too hard to read. Is performance that much of a concern? Also remember that modern languages are JITted and after warming up they will perform better, so use a good profiler.
应该提到的一件事是原始的例子问题是非常不具备性能的,因为你正在创建一大堆临时字符串!除非编译器优化所有这些,否则该特定解决方案将执行最差。
One thing that should be mentioned is that the example in the original question is highly non-performant because you are creating a whole bunch of temporary strings! Unless a compiler optimizes all that away, that particular solution will perform the worst.
更多推荐
从Java中的字符串中有效删除特定字符(一些标点符号)?
发布评论