我试图在Go中编写一个扫描器来扫描延续线,并在返回之前清理线,以便返回逻辑线。因此,鉴于以下SplitLine功能(播放):
I am trying to write a scanner in Go that scans continuation lines and also clean the line up before returning it so that you can return logical lines. So, given the following SplitLine function (Play):
func ScanLogicalLines(data []byte, atEOF bool) (int, []byte, error) { if atEOF && len(data) == 0 { return 0, nil, nil } i := bytes.IndexByte(data, '\n') for i > 0 && data[i-1] == '\\' { fmt.Printf("i: %d, data[i] = %q\n", i, data[i]) i = i + bytes.IndexByte(data[i+1:], '\n') } var match []byte = nil advance := 0 switch { case i >= 0: advance, match = i + 1, data[0:i] case atEOF: advance, match = len(data), data } token := bytes.Replace(match, []byte("\\\n"), []byte(""), -1) return advance, token, nil } func main() { simple := ` Just a test. See what is returned. \ when you have empty lines. Followed by a newline. ` scanner := bufio.NewScanner(strings.NewReader(simple)) scanner.Split(ScanLogicalLines) for scanner.Scan() { fmt.Printf("line: %q\n", scanner.Text()) } }我希望代码返回类似于:
I expected the code to return something like:
line: "Just a test." line: "" line: "See what is returned, when you have empty lines." line: "" line: "Followed by a newline."但是,它在返回第一行后停止。第二次调用返回 1,,nil 。
However, it stops after returning the first line. The second call return 1, "", nil.
任何人有任何想法,或者它是一个错误?
Anybody have any ideas, or is it a bug?
推荐答案我认为这是一个错误,因为提前值> 0 并不打算进一步读取即使返回的标记为零( bufio.SplitFunc ):
I would regard this as a bug because an advance value > 0 is not intended to make a further read call, even when the returned token is nil (bufio.SplitFunc):
如果数据还没有保存完整的标记,例如在扫描行时没有换行符,SplitFunc可以返回(0,nil)来指示扫描器读取更多数据进入切片,然后再从输入的同一点开始再次尝试更长的切片。
If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, SplitFunc can return (0, nil) to signal the Scanner to read more data into the slice and try again with a longer slice starting at the same point in the input.
会发生什么? bufio.Scanner 的输入缓冲区默认为4096字节。这意味着,如果它能够执行拆分功能,它将立即读取该的金额。在你的情况下,扫描仪可以一次读取你的输入,因为它远低于4096字节。 这意味着下一次读取它会在 EOF 中产生结果,这是这里的主要问题。
What happens is this
The input buffer of the bufio.Scanner defaults to 4096 byte. That means that it reads up to this amount at once if it can and then executes the split function. In your case the scanner can read your input all at once as it is well below 4096 byte. This means that the next read it will do results in EOF which is the main problem here.
如何规避
任何非零的标记都会阻止秒。只要您返回非零标记,扫描程序不会检查 EOF 并继续执行标记程序。
How to circumvent
Any token that is non-nil will prevent this. As long as you return non-nil tokens the scanner will not check for EOF and continues executing your tokenizer.
您的代码返回 nil 标记的原因是 bytes.Replace 返回 nil 当存在没有事情要做。 append([] byte(nil),nil ...)== nil 。 您可以通过返回带有容量和无元素的片作为来防止这种情况,这将是非零: make([] byte,0,1)!= nil 。
The reason why your code returns nil tokens is that bytes.Replace returns nil when there's nothing to be done. append([]byte(nil), nil...) == nil. You could prevent this by returning a slice with a capacity and no elements as this would be non-nil: make([]byte, 0, 1) != nil.
更多推荐
扫描器提前终止
发布评论