如何在字符串中找到正则表达式匹配的百分比？(How can I find the percentage of a regex match on a string? [closed])

编程入门行业动态更新时间:2024-10-11 15:21:07

我参与了数字无线电传播研究，其中远程发射机在定义的时间发送预定义的信标，该信标很容易与正则表达式匹配。

但由于太阳和大气条件，它并不总是100％解码。我想要做的是计算解码的百分比。

信标格式如下：

de va6shs va6shs va6shs Loc DO46gs Olivia-4-250 NBEMS test 2218Z | | | | (Station) (Location) (Digital Mode) (UTC Time)

我真的可以用Perl计算百分比，还是应该寻找其他解决方案？

编辑：由于在我们使用的数据模式中纠错有限，经常会发生这种情况，因此随机字符通常以解码后的字符串结尾，或者字符不会解码所有这些都是在同一天的不同时间从同一个站接收到的字符串随着太阳能条件的恶化。

100% decode de ve6rfm ve6rfm ve6rfm Loc DO46gs Olivia-4-250 NBEMS test 0218Z 93.75% P!de ve6rfm ve6rfm ve6rfm Loc DO46gs Olivia-4-250 NBEMS <TAB>est F248Z 9.375% de ve6rfmr&

两个信标字符串之间唯一的区别是字符串末尾的UTC时间，但正如您所看到的那样，有一些字符无法正确解码。

正确解码字符串有64个字符。第一个错误解码的字符串有60个正确的字符。所以60/64 * 100 = 93.75％解码。

我的正则表达站呼号，三个重复的话是

/[vV][aAeEyY][15678]\w{2,3}/

这项研究涉及加拿大西部的几个不同的站点，所以我需要将它们作为传播许可证使用，并且使用上述正则表达式使我不必在每次新站点播出时更新我的脚本。

I'm involved in a digital radio propagation study where a remote transmitter sends a predefined beacon at a defined time that's easily matched with a regex.

But due to solar and atmospheric conditions it's not always a 100% decoded. What I want to do is calculate the percentage of the decode.

The beacon format is as so:

de va6shs va6shs va6shs Loc DO46gs Olivia-4-250 NBEMS test 2218Z 
     |                        |          |                   |
 (Station)               (Location) (Digital Mode)       (UTC Time)
 
Can I actually figure out the percentage with Perl, or should I be looking for another solution? 
Edit: What often happens as there is limited error correction in the data mode we are using so random characters often end up in the decoded string or characters are not decode at all these are received strings from the same station at different times of the same day as solar conditions degraded. 
100% decode 
de ve6rfm ve6rfm ve6rfm Loc DO46gs Olivia-4-250 NBEMS test 0218Z

93.75% 
P!de ve6rfm ve6rfm ve6rfm Loc DO46gs Olivia-4-250 NBEMS <TAB>est F248Z

9.375% 
de ve6rfmr&
 
The only difference there should be between the two beacon strings is the UTC time at the end of the string, but as you can see there's a few characters that didn't decode correctly. 
The correctly decodes string has 64 characters. The first incorrectly decoded string has 60 correct characters. So 60/64 * 100 = 93.75% decode. 
My regex for the station call sign, the three repeated words is  
 /[vV][aAeEyY][15678]\w{2,3}/
 
There are several different stations involved in the study across western Canada so I need to capture them as propagation permits, and using the above regex saves me from having to update my script every time a new station comes on the air.
                最满意答案
                
                    
                         问题是部分或模糊匹配之一。 有些模块可能有所帮助。 他们大多使用Levenshtein距离 ，即从另一个获得一个字符串所需的编辑数量，但还有其他方法。 查看Text :: Levenshtein中的部分列表。 查看此帖子 ，了解可提供更多内容的搜索词组。  
 以下是使用String :: Approx ， String :: Similarity和Text :: Fuzzy的示例。 None没有给出你要求的确切内容，但都检索了类似的措施，并且有可能让你获得目标的选项。  
use warnings 'all';
use strict;

my $beacon = 
    'de va6shs va6shs va6shs Loc DO46gs Olivia-4-250 NBEMS test 2218Z';
my $received = 
    'P!de ve6rfm ve6rfm ve6rfm Loc DO46gs Olivia-4-250 NBEMS <TAB>est F248Z';

# Can use an object, or the functional interface
use Text::Fuzzy qw(fuzzy_index distance_edits);
my $tf = Text::Fuzzy->new ($beacon);   

my ($offset, $edits, $distance);
# Different distance/edits
$distance = $tf->distance($received);
($offset, $edits, $distance) = fuzzy_index    ($received, $beacon);
($distance, $edits)          = distance_edits ($received, $beacon);

# Provides "similarity", in terms of edit distance
use String::Similarity;  
my $similarity = similarity $beacon, $received;

# Can be tuned, but is more like regex in some sense. See docs.
use String::Approx qw(amatch);
my @matches = amatch($beacon, $received);  # within 10% 
# amatch($beacon, ["20%"], $received);     # within 20%
# amatch($beacon, ["S0"], $received);      # no "substitutions"
 
 请仔细阅读他们的文档。  
 String::Approx如果长度不超过10％，则认为是“匹配”。 这是默认设置，模块允许调整该参数。 例如，  
amatch($beacon, ["20%"], $received);
 
 会赚20％。 可以为您提供其他可能的改进。 较新版本的模块是用C语言编写的，并且表现更好。 
The problem is one of partial or fuzzy matching. There are modules out there that may help. They mostly use Levenshtein distance, the number of edits needed to get one string from the other, but there are other methods. See a partial list in Text::Levenshtein. See this post for search phrases that will offer far more. 
Here are examples using String::Approx, String::Similarity, and Text::Fuzzy. None gives exactly what you ask but all retrieve similar measures, and have options that may allow you to get your target. 
use warnings 'all';
use strict;

my $beacon = 
    'de va6shs va6shs va6shs Loc DO46gs Olivia-4-250 NBEMS test 2218Z';
my $received = 
    'P!de ve6rfm ve6rfm ve6rfm Loc DO46gs Olivia-4-250 NBEMS <TAB>est F248Z';

# Can use an object, or the functional interface
use Text::Fuzzy qw(fuzzy_index distance_edits);
my $tf = Text::Fuzzy->new ($beacon);   

my ($offset, $edits, $distance);
# Different distance/edits
$distance = $tf->distance($received);
($offset, $edits, $distance) = fuzzy_index    ($received, $beacon);
($distance, $edits)          = distance_edits ($received, $beacon);

# Provides "similarity", in terms of edit distance
use String::Similarity;  
my $similarity = similarity $beacon, $received;

# Can be tuned, but is more like regex in some sense. See docs.
use String::Approx qw(amatch);
my @matches = amatch($beacon, $received);  # within 10% 
# amatch($beacon, ["20%"], $received);     # within 20%
# amatch($beacon, ["S0"], $received);      # no "substitutions"
 
Please look through their documentation. 
The String::Approx considers a "match" if it is not further than 10% in length. This is the default, and the module allows to adjust that parameter. For example, 
amatch($beacon, ["20%"], $received);
 
would make that 20%. Other refinements of possible use for you can be made. Newer versions of the module are written in C and are much better perfoming.