qdap包:将零位数转换为“零”字的错误(qdap package: bug in converting zero digits to “zero” words)

编程入门 行业动态 更新时间:2024-10-18 08:34:33
qdap包:将零位数转换为“零”字的错误(qdap package: bug in converting zero digits to “zero” words)

之前(作为菜鸟)我将其作为R包错误提交,让我一起运行它。 我认为以下所有都是好的:

replace_number("123 0 boogie") [1] "one hundred twenty three boogie" replace_number("1;1 foo") [1] "one;one foo" replace_number("47 bar") [1] "forty seven bar" replace_number("0") "zero"

我认为以下所有都是坏的,因为输出中缺少“零”:

replace_number("1;0 foo") [1] "one; foo" replace_number("00 bar") [1] "bar" replace_number("0x") [1] "x"

基本上,我会说replace_number()无法处理包含数字0的字符串(“0”除外)。 这是一个真正的错误吗?

Before (as a rookie) I go submitting this as an R package bug, let me run it by y'all. I think all of the following are good:

replace_number("123 0 boogie") [1] "one hundred twenty three boogie" replace_number("1;1 foo") [1] "one;one foo" replace_number("47 bar") [1] "forty seven bar" replace_number("0") "zero"

I think all of the following are bad because "zero" is missing from the output:

replace_number("1;0 foo") [1] "one; foo" replace_number("00 bar") [1] "bar" replace_number("0x") [1] "x"

Basically, I'd say that replace_number() is incapable of handling strings that contain the digit 0 (except for "0"). Is it a real bug?

最满意答案

如果你深入研究replace_number的内脏:

unlist(lapply(lapply(gsub(",([0-9])", "\\1", text.var), function(x) { if (!is.na(x) & length(unlist(strsplit(x, "([0-9])", perl = TRUE))) > 1) { num_sub(x, num.paste = num.paste) } else { x } }), function(x) mgsub(0:9, ones, x)))

你可以看到问题出现在qdap:::num_sub

qdap:::num_sub("101", num.paste = "combine") ## "onehundredone" qdap:::num_sub("0", num.paste = "combine") ## ""

在该函数中挖掘,问题发生在具有内部代码的numb2word

ones <- c("", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine") names(ones) <- 0:9

将零值转换为空白。 如果我自己面临这个问题,我会分叉qdap repo ,转到replace_number.R ,并尝试以向后兼容的方式更改它,以便replace_number可以采用逻辑参数blank_zeros=TRUE ,它传递给numb2word并且做了正确的事,例如

ones <- c(if (blank_zeros) "" else "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine")

与此同时,我已在qdap问题列表中发布此内容。

If you dig into the guts of replace_number:

unlist(lapply(lapply(gsub(",([0-9])", "\\1", text.var), function(x) { if (!is.na(x) & length(unlist(strsplit(x, "([0-9])", perl = TRUE))) > 1) { num_sub(x, num.paste = num.paste) } else { x } }), function(x) mgsub(0:9, ones, x)))

you can see that the problem occurs in qdap:::num_sub

qdap:::num_sub("101", num.paste = "combine") ## "onehundredone" qdap:::num_sub("0", num.paste = "combine") ## ""

Digging within that function, the issue occurs in numb2word, which has internal codes

ones <- c("", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine") names(ones) <- 0:9

which convert zero values to blanks. If I were facing this problem myself I would fork the qdap repo, go to replace_number.R, and try to change this in a backward compatible way so that replace_number could take a logical argument blank_zeros=TRUE, which got passed down to numb2word and did the right thing, e.g.

ones <- c(if (blank_zeros) "" else "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine")

In the meantime I have posted this on the qdap issues list.

更多推荐

本文发布于:2023-08-07 12:22:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1464481.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:位数   转换为   错误   qdap   package

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!