虽然在Powershell中迭代了巨大的JSON

编程入门 行业动态 更新时间:2024-10-26 04:22:35
本文介绍了虽然在Powershell中迭代了巨大的JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个19个演出的JSON文件.一大堆相当小的物体.

I have a 19 gigs JSON file. A huge array of rather small objects.

[{ "name":"Joe Blow", "address":"Gotham, CA" "log": [{},{},{}] }, ... ]

我想遍历此JSON的根数组.带有日志的每个对象占用的内存不超过2MB.可以将一个对象加载到内存中,对其进行处理然后将其丢弃.

I want to iterate thru the root array of this JSON. Every object with the log takes no more than 2MB of memory. It is possible to load one object into a memory, work with it and throw it away.

但是文件本身是19个演出.它具有数百万个这样的对象.我发现可以使用C#和Newtonsoft.Json库对这样的数组进行迭代.您只需要读取流中的文件,一旦看到完成的对象,请对其进行序列化并将其吐出.

Yet the file by itself is 19 gigs. It has millions of those objects. I found it is possible to iterate thru such an array by using C# and Newtonsoft.Json library. You just read a file in a stream and as soon as you see finished object, serialize it and spit it out.

但是我想看看Powershell是否可以做同样的事情?不是将整个事情看成一个整体,而是要对您当前在料斗中的内容进行迭代.

But I want to see if the powershell can do the same? Not to read the whole thing as one chunk, but rather iterate what you have in the hopper right now.

有什么想法吗?

推荐答案

据我所知,convertfrom-json没有流模式,但是jq有:使用jq处理巨大的json-array文件.这段代码会将一个巨大的数组变成数组的内容,可以逐段输出.否则,转换后的6mb,400000行json文件可以使用1 gig的内存(在Powershell 7中为400 meg).

As far as I know, convertfrom-json doesn't have a streaming mode, but jq does: Processing huge json-array files with jq. This code will turn a giant array into just the contents of the array, that can be output piece by piece. Otherwise a 6mb, 400000 line json file can use 1 gig of memory after conversion (400 megs in powershell 7).

get-content file.json | jq -cn --stream 'fromstream(1|truncate_stream(inputs))' | % { $_ | convertfrom-json }

因此,例如:

[ {"name":"joe"}, {"name":"john"} ]

成为这个:

{"name":"joe"} {"name":"john"}

jq的流格式看起来与json完全不同.例如,数组看起来像这样,具有指向每个值和对象或数组结束标记的路径.

The streaming format of jq looks very different from json. For example, the array looks like this, with paths to each value and object or array end-markers.

'[{"name":"joe"},{"name":"john"}]' | jq --stream -c [[0,"name"],"joe"] [[0,"name"]] # end object [[1,"name"],"john"] [[1,"name"]] # end object [[1]] # end array

然后在将"1"截断后父文件夹"在两个值的路径中:

And then after truncating "1" "parent folder" in the path of the two values:

'[{"name":"joe"},{"name":"john"}]' | jq -cn --stream '1|truncate_stream(inputs)' [["name"],"joe"] [["name"]] # end object [["name"],"john"] [["name"]] # end object # no more end array

"fromstream()";将其转换回json ...

"fromstream()" turns it back into json...

'[{"name":"joe"},{"name":"john"}]' | jq -cn --stream 'fromstream(1|truncate_stream(inputs))' {"name":"joe"} {"name":"john"}

更多推荐

虽然在Powershell中迭代了巨大的JSON

本文发布于:2023-10-29 07:45:34,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1539125.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:迭代   Powershell   JSON

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!