之前(作为菜鸟)我将其作为R包错误提交,让我一起运行它。 我认为以下所有都是好的:
replace_number("123 0 boogie") [1] "one hundred twenty three boogie" replace_number("1;1 foo") [1] "one;one foo" replace_number("47 bar") [1] "forty seven bar" replace_number("0") "zero"我认为以下所有都是坏的,因为输出中缺少“零”:
replace_number("1;0 foo") [1] "one; foo" replace_number("00 bar") [1] "bar" replace_number("0x") [1] "x"基本上,我会说replace_number()无法处理包含数字0的字符串(“0”除外)。 这是一个真正的错误吗?
Before (as a rookie) I go submitting this as an R package bug, let me run it by y'all. I think all of the following are good:
replace_number("123 0 boogie") [1] "one hundred twenty three boogie" replace_number("1;1 foo") [1] "one;one foo" replace_number("47 bar") [1] "forty seven bar" replace_number("0") "zero"I think all of the following are bad because "zero" is missing from the output:
replace_number("1;0 foo") [1] "one; foo" replace_number("00 bar") [1] "bar" replace_number("0x") [1] "x"Basically, I'd say that replace_number() is incapable of handling strings that contain the digit 0 (except for "0"). Is it a real bug?
最满意答案
如果你深入研究replace_number的内脏:
unlist(lapply(lapply(gsub(",([0-9])", "\\1", text.var), function(x) { if (!is.na(x) & length(unlist(strsplit(x, "([0-9])", perl = TRUE))) > 1) { num_sub(x, num.paste = num.paste) } else { x } }), function(x) mgsub(0:9, ones, x)))你可以看到问题出现在qdap:::num_sub
qdap:::num_sub("101", num.paste = "combine") ## "onehundredone" qdap:::num_sub("0", num.paste = "combine") ## ""在该函数中挖掘,问题发生在具有内部代码的numb2word
ones <- c("", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine") names(ones) <- 0:9将零值转换为空白。 如果我自己面临这个问题,我会分叉qdap repo ,转到replace_number.R ,并尝试以向后兼容的方式更改它,以便replace_number可以采用逻辑参数blank_zeros=TRUE ,它传递给numb2word并且做了正确的事,例如
ones <- c(if (blank_zeros) "" else "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine")与此同时,我已在qdap问题列表中发布此内容。
If you dig into the guts of replace_number:
unlist(lapply(lapply(gsub(",([0-9])", "\\1", text.var), function(x) { if (!is.na(x) & length(unlist(strsplit(x, "([0-9])", perl = TRUE))) > 1) { num_sub(x, num.paste = num.paste) } else { x } }), function(x) mgsub(0:9, ones, x)))you can see that the problem occurs in qdap:::num_sub
qdap:::num_sub("101", num.paste = "combine") ## "onehundredone" qdap:::num_sub("0", num.paste = "combine") ## ""Digging within that function, the issue occurs in numb2word, which has internal codes
ones <- c("", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine") names(ones) <- 0:9which convert zero values to blanks. If I were facing this problem myself I would fork the qdap repo, go to replace_number.R, and try to change this in a backward compatible way so that replace_number could take a logical argument blank_zeros=TRUE, which got passed down to numb2word and did the right thing, e.g.
ones <- c(if (blank_zeros) "" else "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine")In the meantime I have posted this on the qdap issues list.
更多推荐
发布评论