我正在研究一些用于生成随机字符串的代码。 结果字符串似乎包含无效的char组合。 具体来说,我发现高代理人没有低代理人。
任何人都可以解释为什么会这样吗? 我是否必须明确生成随机低代理以遵循高代理人? 我假设这不是必需的,因为我正在使用Character类的int变体。
这是测试代码,在最近的运行中产生了以下错误配对:
Bad pairing: d928 - d863 Bad pairing: da02 - 7bb6 Bad pairing: dbbc - d85c Bad pairing: dbc6 - d85c public static void main(String[] args) { Random r = new Random(); StringBuilder builder = new StringBuilder(); int count = 500; while (count > 0) { int codePoint = r.nextInt(Character.MAX_CODE_POINT + 1); if (!Character.isDefined(codePoint) || Character.getType(codePoint) == Character.PRIVATE_USE) { continue; } builder.appendCodePoint(codePoint); count--; } String result = builder.toString(); // Test the result char lastChar = 0; for (int i = 0; i < result.length(); i++) { char c = result.charAt(i); if (Character.isHighSurrogate(lastChar) && !Character.isLowSurrogate(c)) { System.out.println(String.format("Bad pairing: %s - %s", Integer.toHexString(lastChar), Integer.toHexString(c))); } lastChar = c; } }I'm working on some code for generating random strings. The resulting string appears to contain invalid char combinations. Specifically, I find high surrogates which are not followed by a low surrogate.
Can anyone explain why this is happening? Do I have to explicitly generate a random low surrogate to follow a high surrogate? I had assumed this wasn't needed, as I was using the int variants of the Character class.
Here's the test code, which on a recent run produced the following bad pairings:
Bad pairing: d928 - d863 Bad pairing: da02 - 7bb6 Bad pairing: dbbc - d85c Bad pairing: dbc6 - d85c public static void main(String[] args) { Random r = new Random(); StringBuilder builder = new StringBuilder(); int count = 500; while (count > 0) { int codePoint = r.nextInt(Character.MAX_CODE_POINT + 1); if (!Character.isDefined(codePoint) || Character.getType(codePoint) == Character.PRIVATE_USE) { continue; } builder.appendCodePoint(codePoint); count--; } String result = builder.toString(); // Test the result char lastChar = 0; for (int i = 0; i < result.length(); i++) { char c = result.charAt(i); if (Character.isHighSurrogate(lastChar) && !Character.isLowSurrogate(c)) { System.out.println(String.format("Bad pairing: %s - %s", Integer.toHexString(lastChar), Integer.toHexString(c))); } lastChar = c; } }最满意答案
可以随机生成高或低代理。 如果这导致低代理,或高代理没有低代理,则结果字符串无效。 解决方案是简单地排除所有代理人:
if (!Character.isDefined(codePoint) || Character.isSurrogate(codePoint) || Character.getType(codePoint) == Character.PRIVATE_USE) { continue; }(从技术上讲,你也可以允许随机生成的高代理并添加另一个随机的低代理,但这只会创建其他随机代码点> = 0x10000,而这可能是未定义的或供私人使用。)
It's possible to randomly generate high or low surrogates. If this results in a low surrogate, or a high surrogate not followed by a low surrogate, the resulting string is invalid. The solution is to simply exclude all surrogates:
if (!Character.isDefined(codePoint) || Character.isSurrogate(codePoint) || Character.getType(codePoint) == Character.PRIVATE_USE) { continue; }(Technically, you could also allow randomly generated high surrogates and add another random low surrogate, but this would only create other random code points >= 0x10000 which might in turn be undefined or for private use.)
更多推荐
发布评论