哪种正则表达式方法最适合验证用户输入?(Which regex method is best for validating user input? (for /f with delims vs. ec

编程入门 行业动态 更新时间:2024-10-27 00:23:36
哪种正则表达式方法最适合验证用户输入?(Which regex method is best for validating user input? (for /f with delims vs. echo %var%|Findstr /ri))

我想验证用户的输入并仅将输入限制为字母数字字符(也可以允许下划线),但我不确定哪种方法最适合这种情况。

我已经看到了关于SA的各种例子,第一个为我提出一些问题的例子如下:

:input set "in=" set /p "in=Please enter your username: " ECHO(%in%|FINDSTR /ri "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" >nul || ( goto input )

我看到第二个案例与第一个案例相同(作为期望,前导^和结束*$ )。

当以下内容也有效时,为什么需要额外的情况和^ *$ ?

:input set "in=" set /p "in=Please enter your username: " ECHO(%in%|FINDSTR /ri "[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]" >nul || ( goto input )

最后 ,我在这里注意到了FOR /F循环方法:

for /f "delims=1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ" %%a in ("%in%") do goto :input

在前面提到的FINDSTR正则表达式中使用它有没有(dis)优势?

I would like to validate a user's input and limit the input to alphanumeric characters only (underscores may be allowed as well), but i'm not sure which method is best for this.

I've seen various examples on SA and the first one that raises some questions for me is the following one:

:input set "in=" set /p "in=Please enter your username: " ECHO(%in%|FINDSTR /ri "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" >nul || ( goto input )

I see a second case that's identical to the first one (with as expection, the leading ^ and ending *$).

Why is the extra case and ^ *$ needed when the following also works?:

:input set "in=" set /p "in=Please enter your username: " ECHO(%in%|FINDSTR /ri "[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]" >nul || ( goto input )

Finally, The FOR /F loop method i've noticed on here as well:

for /f "delims=1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ" %%a in ("%in%") do goto :input

Is there any (dis)advantage in using this over the beforementioned FINDSTR regex one?

最满意答案

为了安全地验证用户输入,两种方法都是可靠的,但您必须改进它们:


findstr方法

首先,让我们关注搜索字符串,如^[...][...]*$ (其中...代表一个字符类,意思是一组字符):一个字符类[...]匹配集合中的任何一个字符... ; *表示重复,因此匹配零次或多次出现,因此[...]*匹配零个或多个出现的字符集... ; 因此, [...][...]*匹配来自集合...一个或多个字符。 前导^将匹配锚定到行的开头,尾随$将其锚定到结尾; 因此,当指定两个锚点时,整行必须与搜索字符串匹配。

关于字符类[...] :根据线程Windows FINDSTR命令的未记录的功能和限制是什么? ,课程有点儿; 例如,类[AZ]匹配小写字母b到z , [az]匹配大写字母A到Y (当然,如果进行不区分大小写的搜索,这当然无关紧要,所以当给出/I时); 类[0-9]可以匹配²或³ ,具体取决于当前的代码页; [AZ]和[az]可能会匹配特殊字母,例如Á或á ,也取决于当前代码页。 因此,为了安全地匹配某些字符,不要使用范围,而是单独指定每个字符,如[0123456789] , [ABCDEFGHIJKLMNOPQRSTUVWXYZ]或[abcdefghijklmnopqrstuvwxyz] 。

所有这些都引导我们进入以下findstr命令行:

findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$"
 

然而,使用管道echo的整个方法可能仍会失败,因为像" , & , ^ , % , ! , ( , ) , < , > , |等特殊字符可能导致语法错误或其他意外行为。为避免这种情况,我们需要建立延迟扩展 ,因此特殊字符会从命令解析器中隐藏。但是,由于管道( | )初始化任何一方的新cmd实例(继承当前环境),我们需要确保进行实际的变量扩展在左子cmd实例而不是父子实例中,如下所示:

:INPUT
set "IN="
set /P IN="Please enter your username: "

cmd /V /C echo(^^!IN^^!| findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" > nul || goto :INPUT
 

需要额外的显式cmd实例才能启用延迟扩展( /V ),因为管道启动的实例已禁用延迟扩展。

感叹号的双倍逃逸^^! 只有在父cmd实例中也启用了延迟扩展时才需要; 如果没有,单逃脱^! 已经足够了,但是双倍的逃避并没有伤害。


for /F方法

这种方法使生活更轻松,因为不涉及管道,因此,您不必处理多个cmd实例,但仍有改进的余地。 同样,特殊字符可能会导致麻烦,因此需要启用延迟扩展。

for /F循环忽略空行,例如以默认的eol字符分号开头; 。 要禁用eol选项,只需定义一个分隔符字符,这样eol就会隐藏在delims后面。 空行不会被迭代,因此在空用户输入的情况下,方法中的goto命令永远不会执行。 因此,我们必须使用if语句显式捕获空用户输入。 现在所有这些导致以下代码:

setlocal EnableDelayedExpansion
:INPUT
set "IN="
set /P IN="Please enter your username: "

if not defined IN goto :INPUT
for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do goto :INPUT

endlocal
 

这种方法只检测大写字母; 要包含小写字母,您必须将它们添加到delims选项: delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz 。

请注意,变量IN在endlocal之外不再可用,但这应该是脚本的最后一个命令。

为了检测for /F循环是否迭代,有一个未记录的特性,我们可以使用: for /F如果不迭代则返回非零退出代码,因此条件执行运算符&&或|| 可以用; 所以,当用户输入为空时,循环不会迭代,然后是|| ; 要使其工作, for /F循环必须括在括号内:

setlocal EnableDelayedExpansion
:INPUT
set "IN="
set /P IN="Please enter your username: "

if not defined IN goto :INPUT
(for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do rem/) && goto :INPUT

endlocal

For safely validating user input, both methods are reliable, but you must improve them:


findstr method

At first, let us focus on the search string like ^[...][...]*$ (where ... stands for a character class, meaning a set of characters): A character class [...] matches any one character from set ...; * means repetition, so matching zero or more occurrences, hence [...]* matches zero or more occurrences of characters from set ...; therefore, [...][...]* matches one or more occurrences of characters from set .... The leading ^ anchors the match to the beginning of the line, the trailing $ anchors it to the end; therefore, when both anchors are specified, the entire line must match the search string.

Concerning character classes [...]: According to the thread What are the undocumented features and limitations of the Windows FINDSTR command?, classes are buggy; for instance, the class [A-Z] matches small letters b to z, and [a-z] matches capital letters A to Y (this does of course not matter in case a case-insensitive search is done, so when /I is given); the class [0-9] may match ² or ³, depending on the current code page; [A-Z] and [a-z] may match special letters like Á or á, for example, also depending on current code page. Hence to safely match certain characters only, do not use ranges, but specify each character individually, like [0123456789], [ABCDEFGHIJKLMNOPQRSTUVWXYZ] or [abcdefghijklmnopqrstuvwxyz].

All this leads us to the following findstr command line:

findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$"
 

Nevertheless, the whole approach with the piped echo might still fail, because special characters like ", &, ^, %, !, (, ), <, >, | could lead to syntax errors or other unintended behaviour. To avoid that, we need to establish delayed expansion, so the special characters become hidden from the command parser. However, since pipes (|) initialise new cmd instances for either side (which inherit the current environment), we need to ensure to do the actual variable expansion in the left child cmd instance rather than in the parent one, like this:

:INPUT
set "IN="
set /P IN="Please enter your username: "

cmd /V /C echo(^^!IN^^!| findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" > nul || goto :INPUT
 

The extra explicit cmd instance is needed to enable delayed expansion (/V), because the instances initiated by the pipe have delayed expansion disabled.

The doubled escaping of the exclamation marks ^^! is only needed in case delayed expansion is also enabled in the parent cmd instance; if not, single escaping ^! was sufficient, but doubled escaping does not harm.


for /F method

This approach makes life easier, because there is no pipe involved and so, you do not have to deal with multiple cmd instances, but there is still room for improvement. Again, special characters may cause trouble, so delayed expansion needs to be enabled.

The for /F loop ignores empty lines and such beginning with the default eol character, the semicolon ;. To disable the eol option, simply define one of the delimiter characters, so eol becomes hidden behind delims. Empty lines are not iterated, so the goto command in your approach would never execute in case of empty user input. Therefore, we must capture empty user input explicitly, using an if statement. Now all this leads to the following code:

setlocal EnableDelayedExpansion
:INPUT
set "IN="
set /P IN="Please enter your username: "

if not defined IN goto :INPUT
for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do goto :INPUT

endlocal
 

This approach detects capital letters only; to include small letters as well, you have to add them to the delims option: delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.

Note that variable IN is no longer available beyond endlocal, but this should be the very last comand of your script anyway.

To detect whether or not a for /F loop iterated or not, there is an undocumented feature, which we can make use of: for /F returns a non-zero exit code if it does not iterate, hence conditional execution operators && or || can be used; so, when the user input is empty, the loop does not iterate, then ||; for this to work, the for /F loop must be enclosed within parentheses:

setlocal EnableDelayedExpansion
:INPUT
set "IN="
set /P IN="Please enter your username: "

if not defined IN goto :INPUT
(for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do rem/) && goto :INPUT

endlocal

                    
                     
          

更多推荐

本文发布于:2023-07-22 02:13:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1216007.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:最适合   哪种   方法   用户   正则表达式

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!