我的数据采用适当的格式,没有Apache支持的数据。 是否有关于如何编写自己的存储插件来处理此类数据的教程。
I have my data in a propriety format, None of the ones supported by Apache drill. Are there any tutorial on how to write my own storage plugin to handle such data.
推荐答案这应该是文档中的内容,但目前不是。界面不是太复杂,但是看一下现有的插件并理解正在发生的一切可能有点多了。
This is something that really should be in the docs but currently is not. The interface isn't too complicated, but it can be a bit much to look at one of the existing plugins and understand everything that is going on.
有两个主要的编写存储插件的组件,将信息暴露给查询计划器和模式管理系统,然后实际实现从数据源API到钻取记录表示的转换。
There are 2 major components to writing a storage plugin, exposing information to the query planner and schema management system and then actually implementing the translation from the datasource API to the drill record representation.
Kudu插件最近被添加,并且是一个合理的存储系统模型,具有许多可以利用的元素。我要注意的一件事是,如果您的存储系统没有分发,并且您只是计划进行所有远程读取,则不必在组扫描中对关联性/工作列表/分配进行太多工作。如果我有一段时间不久,我将尝试在界面的不同部分编写一个文档,并编写一个关于现有插件的教程。
The Kudu plugin was added recently and is a reasonable model for a storage system with a lot of the elements Drill can take advantage of. One thing I would note is that if your storage system is not distributed and you just plan on making all remote reads you don't have to do as much work around affinities/work lists/assignments in the group scan. If I have some time soon I'll try to write up a doc on the different parts of the interface and maybe write a tutorial about one of the existing plugins.
https:// github / apache / drill / tree / master / contrib / storage-kudu / src / main / java / org / apache / drill / exec / store / kudu
更多推荐
如何为apache drill编写自定义存储插件
发布评论