执行Boyer

编程入门行业动态更新时间:2024-10-13 08:25:03

本文介绍了执行Boyer-Moore模式匹配时是否需要考虑编码?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我即将实现Boyer-Moore模式匹配算法(具体来说是周日算法)的变体，我在问自己:我的字母大小是多少?

I'm about to implement a variation of the Boyer-Moore pattern matching algorithm (the Sunday algorithm to be specific) and I was asking myself: What is my alphabet size?

这是否取决于编码(=可能的字符数)，或者我是否可以仅假设我的字母由256个符号组成(=可以由一个字节表示的符号数)?

Does it depend on the encoding (= number of possible characters) or can I just assume my alphabet consists of 256 symbols (= number of symbols which can be represented by a byte)?

在许多其他情况下，将字符视为字节是一个问题，因为取决于编码，一个字符可以包含多个字节，但是如果在我的情况下，两个字符串都具有相同的编码，则相等的字符由相等的字节序列表示，因此我认为这没关系.

In many other situations treating characters as bytes would be a problem because depending on the encoding a character can consist of multiple bytes, but if in my case both strings have the same encoding then equal characters are represented by equal byte sequences, so I would assume it doesn't matter.

那么:我是否必须考虑编码并假定一个由实际字符组成的字母(对于Unicode，> 90000)，还是可以将文本和模式作为字节流来处理?

So: Do I have to take the encoding into account and assume an alphabet consisting of the actual characters (> 90000 for Unicode) or can I just handle the text and the pattern as a stream of bytes?

推荐答案

多字节编码可与面向字节的搜索例程 IF 一起使用，它是自我同步.

A multi-byte encoding can be used with a byte-oriented search routine IF it is self-synchronizing.

因此，您可以安全地将Boyer-Moore用于:

So, you can safely use Boyer-Moore with: