假设您输入的内容可能如下所示(identifier1 identifier_2 23 4) 。
我想在每个标识符之后添加一个#符号,其中可以包含字母,数字和下划线。 他们只能以字母,数字和下划线的变体开头。 我的做法是这样的:
input.replaceAll("[A-Za-z0-9_]+", "$0#");但是,这也会在每个我想排除的数字后加上#符号。 结果应该是(identifier1# identifier_2# 23 4) 。 是否有可能用正则表达式解决这个问题?
Let's say you have given an input which could look like this (identifier1 identifier_2 23 4).
I want to add a # symbol after every identifier, which can contain letters, digits and underscores. They can only start with a letter followed by variations of letters, digits and underscores. My approach was something like this:
input.replaceAll("[A-Za-z0-9_]+", "$0#");However, this also puts # symbols after every single digit which I wanted to exclude. The result should be (identifier1# identifier_2# 23 4). Is it possible to solve this problem with regex?
最满意答案
你目前的正则表达式说
一个或多个大写或小写字母,数字或下划线,无论顺序如何。
根据那个正则表达式, 54是一个有效的标识符。
你真的想写
一封信,然后是任意数量的字母,数字或下划线,无论顺序如何
这将写在代码中:
input.replaceAll("[A-Za-z][A-Za-z0-9_]*", "$0#");Wiktor指出,这个正则表达式仍然会匹配那些不是标识符的东西的“标识符”。 为了解决这个问题,你可以使用下面的变体:
input.replaceAll("\\b([A-Za-z][A-Za-z0-9_]*)\\b", "$1#")这拒绝123ab123作为有效标识符,但接受123 ab123
Your current regex says
one or more upper or lower-case letters, digits, or underscores, in whatever order.
According to that regex, 54 is a valid identifier.
You actually wanted to write
a letter, followed by any number of letters, digits or underscores, in whatever order
That would be written in code as:
input.replaceAll("[A-Za-z][A-Za-z0-9_]*", "$0#");Wiktor notes that this regex will still match "identifiers" that are inside something that is not identifier-ish. To solve this, you could use the following variation:
input.replaceAll("\\b([A-Za-z][A-Za-z0-9_]*)\\b", "$1#")This rejects 123ab123 as a valid identifier, but accepts ab123 in 123 ab123
更多推荐
发布评论