旧的斜线逃逸错误给我们留下了一些混乱的数据,如下:
{ suggestions: [ "ok", "not ok /////////// ... 10s of KBs of this ... //////", ] }我想把这些坏值从数组中拉出来。 我的第一个想法是基于匹配4“/”字符的正则表达式来$pull ,但似乎正则表达式无法处理大字符串:
db.notes.count({suggestions: /\/\/\/\//}) // returns 0 db.notes.count({suggestions: {$regex: "////"}}) // returns 0我的下一个想法是使用$where查询来查找具有超过1000的suggestion字符串的文档。该查询有效:
db.notes.count({ suggestions: {$exists: true}, $where: function() { return !!this.suggestions.filter(function (item) { return (item || "").length > 1000; }).length } }) // returns a plausible number但$where查询不能用作$pull更新中的条件。
db.notes.update({ suggestions: {$exists: true}, }, { $pull: { suggestions: { $where: function() { return !!this.suggestions.filter(function (item) { return (item || "").length > 1000; }).length } } } })投
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0, "writeError" : { "code" : 81, "errmsg" : "no context for parsing $where" } })我的想法已经不多了。 我是否必须遍历整个集合,并为每个文档单独$set: {suggestions: suggestions.filter(...)} ? 有没有更好的方法从MongoDB中的大字符串数组中清除坏值?
(我只是添加了“javascript”标签来让SO正确格式化代码)
And old slash escaping bug left us with some messed up data, like so:
{ suggestions: [ "ok", "not ok /////////// ... 10s of KBs of this ... //////", ] }I would like to just pull those bad values out of the array. My first idea was to $pull based on a regex that matches 4 "/" characters, but it appears that regexes to not work on large strings:
db.notes.count({suggestions: /\/\/\/\//}) // returns 0 db.notes.count({suggestions: {$regex: "////"}}) // returns 0My next idea was to use a $where query to find documents that have suggestion strings that are longer than 1000. That query works:
db.notes.count({ suggestions: {$exists: true}, $where: function() { return !!this.suggestions.filter(function (item) { return (item || "").length > 1000; }).length } }) // returns a plausible numberBut a $where query can't be used as the condition in a $pull update.
db.notes.update({ suggestions: {$exists: true}, }, { $pull: { suggestions: { $where: function() { return !!this.suggestions.filter(function (item) { return (item || "").length > 1000; }).length } } } })throws
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0, "writeError" : { "code" : 81, "errmsg" : "no context for parsing $where" } })I'm running out of ideas. Will I have to iterate over the entire collection, and $set: {suggestions: suggestions.filter(...)} for each document individually? Is there no better way to clean bad values out of an array of large strings in MongoDB?
(I'm only adding the "javascript" tag to get SO to format the code correctly)
最满意答案
问题评论中指出的简单解决方案应该有效。 它确实适用于重新解决原始问题的测试用例。 正则表达式可以匹配大字符串,没有特殊限制。
db.notes.updateOne({suggestions: /\/\//}, { "$pull": {suggestions: /\/\//}})由于这对我不起作用,我最后讨论了所讨论的问题:通过基于字符串长度过滤数组元素来单独更新所有文档:
db.notes.find({ suggestions: {$exists: true} }).forEach(function(doc) { doc.suggestions = doc.suggestions.filter(function(item) { return (item || "").length <= 1000; }); db.notes.save(doc); });它运行缓慢,但在这种情况下这不是一个真正的问题。
The simple solution pointed out in the question comments should have worked. It does work with a test case that is a recreation of the original problem. Regexes can match on large strings, there is no special restriction there.
db.notes.updateOne({suggestions: /\/\//}, { "$pull": {suggestions: /\/\//}})Since this didn't work for me, I ended up going with what the question discussed: updating all documents individually by filtering the array elements based on string length:
db.notes.find({ suggestions: {$exists: true} }).forEach(function(doc) { doc.suggestions = doc.suggestions.filter(function(item) { return (item || "").length <= 1000; }); db.notes.save(doc); });It ran slow, but that wasn't really a problem in this case.
更多推荐
发布评论