快速从XML中检索数据(fast of retrieving data from XML)

编程入门 行业动态 更新时间:2024-10-28 13:29:25
快速从XML中检索数据(fast of retrieving data from XML)

我有示例xml

<?xml version="1.0" encoding="UTF-8"?> <tag_1> <tag_2>A</tag_2> <tag_3>B</tag_3> <tag_4>C</tag_4> <tag_5>D</tag_5> </tag_1> </xml>

现在我有兴趣只提取特定的数据。

例如

tag_1/tag_5 -> D

tag_1/tag_5是我的数据定义(我想要的唯一数据),它本质上是动态的,意味着明天tag_1 / tag_4将成为我的数据定义。

所以实际上我的XML是一个大型的数据集。 而这些XML有效载荷就像50,000 /小时到80,000 /小时。

我想知道是否已经有高性能的XML读取工具或一些特殊的逻辑我可以实现哪些提取数据取决于数据定义。

目前我有使用Stax解析器的实现,但它需要将近一天的时间来解析80,000个xml。

public class VTDParser { private final Logger LOG = LoggerFactory.getLogger(VTDParser.class); private final VTDGen vg; public VTDParser() { vg = new VTDGen(); } public String parse(final String data, final String xpath) { vg.setDoc(data.getBytes()); try { vg.parse(true); } catch (final ParseException e) { LOG.error(e.toString()); } final VTDNav vn = vg.getNav(); final AutoPilot ap = new AutoPilot(vn); try { ap.selectXPath(xpath); } catch (final XPathParseException e) { LOG.error(e.toString()); } try { while (ap.evalXPath() != -1) { final int val = vn.getText(); if (val != -1) { return vn.toNormalizedString(val); } } } catch (XPathEvalException | NavException e) { LOG.error(e.toString()); } return null; } }

I have sample xml

<?xml version="1.0" encoding="UTF-8"?> <tag_1> <tag_2>A</tag_2> <tag_3>B</tag_3> <tag_4>C</tag_4> <tag_5>D</tag_5> </tag_1> </xml>

Now i am interested to extract only specific data.

For example

tag_1/tag_5 -> D

tag_1/tag_5 is my data definition (the only data which i want) which is dynamic in nature that means tomorrow tag_1/tag_4 will be my data definition.

So in reality my xml is a large data set. And these xml payloads comes like 50,000/hour to 80,000/hour.

I would like to know if there already high performance xml reader tool or some special logic i can implement which extracts data depending upon data definition.

Currently i have implementation using Stax parser but its taking nearly a day to parse 80,000 xml's.

public class VTDParser { private final Logger LOG = LoggerFactory.getLogger(VTDParser.class); private final VTDGen vg; public VTDParser() { vg = new VTDGen(); } public String parse(final String data, final String xpath) { vg.setDoc(data.getBytes()); try { vg.parse(true); } catch (final ParseException e) { LOG.error(e.toString()); } final VTDNav vn = vg.getNav(); final AutoPilot ap = new AutoPilot(vn); try { ap.selectXPath(xpath); } catch (final XPathParseException e) { LOG.error(e.toString()); } try { while (ap.evalXPath() != -1) { final int val = vn.getText(); if (val != -1) { return vn.toNormalizedString(val); } } } catch (XPathEvalException | NavException e) { LOG.error(e.toString()); } return null; } }

最满意答案

这是我的代码,它可以编译xpath一次并重复使用很多次。 它编译xpath而不绑定到VTDNav实例。 它也在退出解析方法之前调用resetXPath ..但是,我没有告诉你如何用VTD预编译xml文档...以避免重复解析....并且我怀疑它可能是您的差异制造商项目...这是关于vtd-xml功能的论文引用。

http://recipp.ipp.pt/bitstream/10400.22/1847/1/ART_BrunoOliveira_2013.pdf

import com.ximpleware.*; public class VTDParser { // private final Logger LOG = LoggerFactory.getLogger(VTDParser.class); private final VTDGen vg; private final AutoPilot ap; public VTDParser() throws VTDException{ vg = new VTDGen(); ap = new AutoPilot(); ap.selectXPath("/a/b/c");// this is how you compile xpath w/o binding to an XML doc } public String parse(final String data, final AutoPilot ap1) { vg.setDoc(data.getBytes()); try { vg.parse(true); } catch (final ParseException e) { LOG.error(e.toString()); } final VTDNav vn = vg.getNav(); ap1.bind(vn); try { while (ap.evalXPath() != -1) { final int val = vn.getText(); if (val != -1) { return vn.toNormalizedString(val); } } } catch (XPathEvalException | NavException e) { LOG.error(e.toString()); } ap.resetXPath();// reset your xpath here return null; } }

This is my mod to your code which compiles xpath once and reuse many times. It compiles the xpath without binding to a VTDNav instance. It also calls resetXPath before exiting the parse method.. I, however, didn't show you how to preindex the xml docs with VTD... to avoid repetitive parsing.... and I suspect it might be the difference maker for your project... Here is a paper reference regarding the capabilities of vtd-xml..

http://recipp.ipp.pt/bitstream/10400.22/1847/1/ART_BrunoOliveira_2013.pdf

import com.ximpleware.*; public class VTDParser { // private final Logger LOG = LoggerFactory.getLogger(VTDParser.class); private final VTDGen vg; private final AutoPilot ap; public VTDParser() throws VTDException{ vg = new VTDGen(); ap = new AutoPilot(); ap.selectXPath("/a/b/c");// this is how you compile xpath w/o binding to an XML doc } public String parse(final String data, final AutoPilot ap1) { vg.setDoc(data.getBytes()); try { vg.parse(true); } catch (final ParseException e) { LOG.error(e.toString()); } final VTDNav vn = vg.getNav(); ap1.bind(vn); try { while (ap.evalXPath() != -1) { final int val = vn.getText(); if (val != -1) { return vn.toNormalizedString(val); } } } catch (XPathEvalException | NavException e) { LOG.error(e.toString()); } ap.resetXPath();// reset your xpath here return null; } }

更多推荐

本文发布于:2023-07-22 06:58:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1217907.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:快速   数据   XML   data   fast

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!