Javascript正则表达式与非捕获组作为两种选择(Javascript regex with non

编程入门 行业动态 更新时间:2024-10-08 04:33:50
Javascript正则表达式与非捕获组作为两种选择(Javascript regex with non-capturing group as two alternatives)

我想创建一个正则表达式,它允许您输入Name和Surname等值。 但我有一些限制:

资本首字母(只有一个)然后是其他小写字母 在上一个之后,用户可以使用' , -或 (空白)之后应用与第一点相同的规则

我几乎实现了这一点,但仍然无法正常工作。 这是我的创作:

/^[A-ZÀ-ž]{1}[a-zà-ž]+[\s\'-]{0,1}(?:(?=[\s\'-]{0,1})[A-ZÀ-ž]{1}[a-zà-ž]+|(?=[\s\'-]{0,1})[a-zà-ž]+)$/i

我想在函数.test(value) Javascript中使用它。 不幸的是它也接受这些:

Test Test - Test- test Test Test-test TTest Test'test

我想接受和允许的是那些:

Test Test-Test Test Test Test'Test

我不知道我做错了什么以及如何解决这个问题......我在这里缺少什么?

I would like to create a regex, which allows you to input values like Name and Surname. But I have some restrictions:

Capital first letter (only one) and then other small letters After the previous one, user can use ', - or (whitespace) and after that apply same rule as at first point

I almost achieve this, but something still doesn't work properly. Here is my creation:

/^[A-ZÀ-ž]{1}[a-zà-ž]+[\s\'-]{0,1}(?:(?=[\s\'-]{0,1})[A-ZÀ-ž]{1}[a-zà-ž]+|(?=[\s\'-]{0,1})[a-zà-ž]+)$/i

I want to use it in Javascript with function .test(value). Unfortunately it also accepts these:

Test Test - Test- test Test Test-test TTest Test'test

What I want to be accepted and allowed are those:

Test Test-Test Test Test Test'Test

I have no idea what I'm doing wrong and how to fix this... What I'm missing here?

最满意答案

您需要分别匹配小写和大写字母。 目前,您对欧洲字母的À-ž范围包括所有小写和大写字母,甚至一些非字母。

以下是您需要的范围:

大写(基本欧洲)

基本拉丁语 - 大写拉丁字母: [AZ] 拉丁语1补充 - 字母项目 - 大写: [À-ÖØ-Þ] Latin Extended A - 欧洲拉丁语 - 大写字母: [ĀĂĄĆĈĊČĎĐĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJijĴĶĹĻĽĿŁŃŅŇŊŌŎŐŒœŔŖŘŚŜŞŠŢŤŦŨŪŬŮŰŲŴŶŸŹŻŽ]

小写(基本欧洲)

基本拉丁语 - 小写拉丁字母: [az] 拉丁语1补充 - 字母项目 - 小写: [ß-öø-ÿ] 拉丁语扩展A - 欧洲拉丁语 - 小写字母: [žſāăąćĉċčďđēĕėęěĝğġģĥħĩīĭįıĵķĸĺļľŀłńņňŋōŏőŕŗřśŝşšţťŧũūŭůűųŵŷźż]

你需要的模式是

/^[UPPER][lower]+(?:[\s'-][UPPER][lower]+)*$/

其中UPPER和lower是大写和小写字母范围/集。

那么,让我们构建模式。

var upper = '[A-ZÀ-ÖØ-ÞĀĂĄĆĈĊČĎĐĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJijĴĶĹĻĽĿŁŃŅŇŊŌŎŐŒœŔŖŘŚŜŞŠŢŤŦŨŪŬŮŰŲŴŶŸŹŻŽ]';
var lower = '[a-zß-öø-ÿžſāăąćĉċčďđēĕėęěĝğġģĥħĩīĭįıĵķĸĺļľŀłńņňŋōŏőŕŗřśŝşšţťŧũūŭůűųŵŷźż]';
var rx = new RegExp("^" + upper + lower + "+(?:[\\s'-]" + upper + lower + "+)*$");
// Let's test
var tests = ['Test ','Test - ','Test-',' test','Test-test','TTest','Test\'test','Test','Test-Test','Test Test','Test\'Test', 'Łóźćż\'żłóźćęą'];
for (var s of tests) {
  console.log(s, '=>', rx.test(s))
} 
  
 

注意有更多的字母可以用于欧洲语言。 有关更多详细信息,请参阅Unicode实用程序以供参考。

注意2 :如果您打算仅支持Chrome和其他兼容ECMAScript 2018的浏览器,则可以使用

console.log(  // ONLY WORKS IN ECMASCRIPT 2018 COMPATIBLE JS ENVIRONMENTS
  /^\p{Lu}\p{Ll}+(?:[\s'-]\p{Lu}\p{Ll}+)*$/u.test("Test'Ťĕśţ")
); 
  
 

Java定义:

String pattern = "(?U)^\\p{Lu}\\p{Ll}+(?:[\\s'-]\\p{Lu}\\p{Ll}+)*$";

如果您在Java matches()方法中使用它,请删除^和$因为它们在那里是多余的。

You need to match lower- and uppercase letters separately. Currently, your À-ž range for European letters includes all lower- and uppercase letters, and even some non-letters.

Here are the ranges you need:

Uppercase (basic European)

Basic Latin — Uppercase Latin alphabet: [A-Z] Latin 1 Supplement — Letter items - Uppercase: [À-ÖØ-Þ] Latin Extended A — European Latin - Uppercase letters: [ĀĂĄĆĈĊČĎĐĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJijĴĶĹĻĽĿŁŃŅŇŊŌŎŐŒœŔŖŘŚŜŞŠŢŤŦŨŪŬŮŰŲŴŶŸŹŻŽ]

Lowercase (basic European)

Basic Latin — Lowercase Latin alphabet: [a-z] Latin 1 Supplement — Letter items - Lowercase: [ß-öø-ÿ] Latin Extended A — European Latin - Lowercase letters: [žſāăąćĉċčďđēĕėęěĝğġģĥħĩīĭįıĵķĸĺļľŀłńņňŋōŏőŕŗřśŝşšţťŧũūŭůűųŵŷźż]

The pattern you need is

/^[UPPER][lower]+(?:[\s'-][UPPER][lower]+)*$/

where UPPER and lower are uppercase and lowercase letter ranges/sets.

So, let's build the pattern.

var upper = '[A-ZÀ-ÖØ-ÞĀĂĄĆĈĊČĎĐĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJijĴĶĹĻĽĿŁŃŅŇŊŌŎŐŒœŔŖŘŚŜŞŠŢŤŦŨŪŬŮŰŲŴŶŸŹŻŽ]';
var lower = '[a-zß-öø-ÿžſāăąćĉċčďđēĕėęěĝğġģĥħĩīĭįıĵķĸĺļľŀłńņňŋōŏőŕŗřśŝşšţťŧũūŭůűųŵŷźż]';
var rx = new RegExp("^" + upper + lower + "+(?:[\\s'-]" + upper + lower + "+)*$");
// Let's test
var tests = ['Test ','Test - ','Test-',' test','Test-test','TTest','Test\'test','Test','Test-Test','Test Test','Test\'Test', 'Łóźćż\'żłóźćęą'];
for (var s of tests) {
  console.log(s, '=>', rx.test(s))
} 
  
 

NOTE there are more letters that can be used in European languages. For more details, see Unicode Utilities for reference.

NOTE 2: if you plan to only support Chrome and other ECMAScript 2018 compatible browsers, you may use

console.log(  // ONLY WORKS IN ECMASCRIPT 2018 COMPATIBLE JS ENVIRONMENTS
  /^\p{Lu}\p{Ll}+(?:[\s'-]\p{Lu}\p{Ll}+)*$/u.test("Test'Ťĕśţ")
); 
  
 

Java definition:

String pattern = "(?U)^\\p{Lu}\\p{Ll}+(?:[\\s'-]\\p{Lu}\\p{Ll}+)*$";

If you are using it in Java matches() method, remove ^ and $ since they are redundant there.

更多推荐

本文发布于:2023-07-27 04:03:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1284871.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:两种   与非   正则表达式   Javascript   regex

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!