字符串进行验证之前先进行规范化"/>
对字符串进行验证之前先进行规范化
对字符串进行验证之前先进行规范化
原文来自:.htm
应用系统中经常对字符串会进行各种规则的验证,不过由于字符串信息在java6中是基于unicode的4.0版本的,而java7则是unicode的6.0.0版本。
unicode的规范化格式有几种,每种的处理方式有些不一样。
NFC
Unicode 规范化格式 C。如果未指定 normalization-type,那么会执行 Unicode 规范化。
NFD
Unicode 规范化格式 D。
NFKC
Unicode 规范化格式 KC。
NFKD
Unicode 规范化格式 KD。
如果我们对输入字符串先进行验证,再规范化,Normalizer.normalize将unicode的文本转成等价的规范化格式内容,下面这个用Patternpile("[<>]")验证不通过,
- // String s may be user controllable
- // \uFE64 is normalized to < and \uFE65 is normalized to > using NFKC
- String s = "\uFE64" + "script" + "\uFE65";
- // Validate
- Pattern pattern = Patternpile("[<>]"); // Check for angle brackets
- Matcher matcher = pattern.matcher(s);
- if (matcher.find()) {
- // Found black listed tag
- throw new IllegalStateException();
- } else {
- // . . .
- }
- // Normalize
- s = Normalizer.normalize(s, Form.NFKC);
如果对输入字符串先进行规范化在进行验证,使用Patternpile("[<>]")验证就能正确判断出来,抛出IllegalStateException异常,正确过滤有问题的输入文本,
[java] view plain copy
- String s = "\uFE64" + "script" + "\uFE65";
- // Normalize
- s = Normalizer.normalize(s, Form.NFKC);
- // Validate
- Pattern pattern = Patternpile("[<>]");
- Matcher matcher = pattern.matcher(s);
- if (matcher.find()) {
- // Found black listed tag
- throw new IllegalStateException();
- } else {
- // . . .
- }
java中的Normalizer类
[java] view plain copy
- public final class Normalizer {
- private Normalizer() {};
- /**
- * This enum provides constants of the four Unicode normalization forms
- * that are described in
- * <a href=".html">
- * Unicode Standard Annex #15 — Unicode Normalization Forms</a>
- * and two methods to access them.
- *
- * @since 1.6
- */
- public static enum Form {
- /**
- * Canonical decomposition.
- */
- NFD,
- /**
- * Canonical decomposition, followed by canonical composition.
- */
- NFC,
- /**
- * Compatibility decomposition.
- */
- NFKD,
- /**
- * Compatibility decomposition, followed by canonical composition.
- */
- NFKC
- }
- /**
- * Normalize a sequence of char values.
- * The sequence will be normalized according to the specified normalization
- * from.
- * @param src The sequence of char values to normalize.
- * @param form The normalization form; one of
- * {@link java.text.Normalizer.Form#NFC},
- * {@link java.text.Normalizer.Form#NFD},
- * {@link java.text.Normalizer.Form#NFKC},
- * {@link java.text.Normalizer.Form#NFKD}
- * @return The normalized String
- * @throws NullPointerException If <code>src</code> or <code>form</code>
- * is null.
- */
- public static String normalize(CharSequence src, Form form) {
- return NormalizerBase.normalize(src.toString(), form);
- }
- /**
- * Determines if the given sequence of char values is normalized.
- * @param src The sequence of char values to be checked.
- * @param form The normalization form; one of
- * {@link java.text.Normalizer.Form#NFC},
- * {@link java.text.Normalizer.Form#NFD},
- * {@link java.text.Normalizer.Form#NFKC},
- * {@link java.text.Normalizer.Form#NFKD}
- * @return true if the sequence of char values is normalized;
- * false otherwise.
- * @throws NullPointerException If <code>src</code> or <code>form</code>
- * is null.
- */
- public static boolean isNormalized(CharSequence src, Form form) {
- return NormalizerBase.isNormalized(src.toString(), form);
- }
- }
更多推荐
对字符串进行验证之前先进行规范化
发布评论