在PowerShell中提取部分文本(Extract part of text in PowerShell)

编程入门 行业动态 更新时间:2024-10-24 09:27:38
在PowerShell中提取部分文本(Extract part of text in PowerShell)

这是我的输入文件,它是随机的,可以是任何数字,不仅仅是9999和任何字母:以下格式将始终在- (破折号)之后。

- 9999 99AKDSLY9ZWSRK99999 9999 99BGRPOE99FTRQ99999

预期产量:

AKDSLY9ZSRK BGRPOE99TRQ

所以我需要删除每一行的第一部分,总是数字:

9999 99 9999 99

然后删除不需要的字符:

99AKDSLY9ZW →在这种情况下是W但可以是任何字母 99BGRPOE99F →在这种情况下是F但可以是任何字母

最后删除最后5位数,总是数字:

99999 99999

我正在尝试使用的是正则表达式(第一次使用它):

$result = [regex]::Matches($InputFile, '(^\d{4}\s\d{2}[A-Z0-9]\d{5}$)') -replace '\d{4}\s\d{2}', '') $result

它没有给我一个错误信息,但它没有向我显示我期望在$result看到的字符。

我期待在$result看到一些内容然后开始格式化,删除我不需要的字符。

请问这里可能有什么?

This is my input file which is random, can be any number not just 9999 and any letters: The below format will always come after a - (dash).

- 9999 99AKDSLY9ZWSRK99999 9999 99BGRPOE99FTRQ99999

Expected output:

AKDSLY9ZSRK BGRPOE99TRQ

So I need to remove the first part of each line, always numbers:

9999 99 9999 99

Then remove the not-required characters:

99AKDSLY9ZW → in this case is the W but could be any letter 99BGRPOE99F → in this case is the F but could be any letter

And finally remove the last 5 digits, always numbers:

99999 99999

What I´m trying to use, regex (first time using it):

$result = [regex]::Matches($InputFile, '(^\d{4}\s\d{2}[A-Z0-9]\d{5}$)') -replace '\d{4}\s\d{2}', '') $result

It's not giving me an error message but it's not showing me the characters I'm expecting to see at $result.

I was expecting to see something in $result to then start the formatting, deleting the characters I don't need.

What could be missing here, please?

最满意答案

尝试这样的事情:

$str = (Get-Content ... -Raw) -replace '\r' $cb = { $args[0].Groups[1].Value -replace '(?m)^.{7}' -replace '(?m).(.{3}).{5}$', '$1' } $re = [regex]'(?m)^(?<=-\n)((?:\d{4}\s\d{2}[^\n]*\d{5}(?:\n|$))+)' $re.Replace($str, $cb)

正则表达式$re匹配以连字符和换行符开头的多行子字符串,后跟一个或多个带有数字/字母组合的行。 (?<=...)是一个积极的后瞻性断言,以确保只有当带有数字/字母组合的行前面带有连字符的行(不将该行作为实际匹配的一部分)时才能获得匹配。

scriptblock $cb是一个匿名回调函数, Regex.Replace()方法调用每个匹配。 对于匹配中的每一行,它从行的开头删除前7个字符,并将该行末尾的最后9个字符替换为这些字符的第2到第4个字符。

为简单起见,示例代码从字符串中删除回车符(CR, \r ),以便所有换行符都是换行符(LF, \n )而不是默认的CR-LF。

Try something like this:

$str = (Get-Content ... -Raw) -replace '\r' $cb = { $args[0].Groups[1].Value -replace '(?m)^.{7}' -replace '(?m).(.{3}).{5}$', '$1' } $re = [regex]'(?m)^(?<=-\n)((?:\d{4}\s\d{2}[^\n]*\d{5}(?:\n|$))+)' $re.Replace($str, $cb)

The regular expression $re matches multiline substrings that start with a hyphen and a newline, followed by one or more line with your digit/letter combinations. The (?<=...) is a positive lookbehind assertion to ensure that you only get a match when the lines with the digit/letter combinations are preceded by a line with a hyphen (without making that line part of the actual match).

The scriptblock $cb is an anonymous callback function that the Regex.Replace() method calls on each match. For each line in a match it removes the first 7 characters from the beginning of the line, and replaces the last 9 characters from the end of the line with the 2nd through 4th of those characters.

For simplicity reasons the sample code removes carriage return characters (CR, \r) from the string, so that all newlines are linefeed characters (LF, \n) instead of the default CR-LF.

更多推荐

本文发布于:2023-07-20 02:56:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1192092.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文本   PowerShell   Extract   text   part

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!