Hive在HDFS中存储文件的位置?

编程入门 行业动态 更新时间:2024-10-15 12:35:25
本文介绍了Hive在HDFS中存储文件的位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我想知道如何找到Hive表格和他们所代表的实际HDFS文件(或者说,目录)之间的映射。我需要直接访问表文件。

Hive将文件存储在HDFS中的位置?

解决方案

一旦你知道在哪里寻找,它们存储在HDFS上的位置很容易找到。 :)

如果您在浏览器中访问 http:// NAMENODE_MACHINE_NAME:50070 / ,它应该带您在 $ HIVE_HOME / conf 中浏览文件系统链接。

的/ usr /蜂巢/仓库。一旦我导航到该位置,我会看到我的表格的名称。点击一个表名(这只是一个文件夹)将会暴露表的分区。在我的情况下,我目前只将它分割在 date 上。当我点击这个级别的文件夹时,我会看到文件(更多的分区会有更多的级别)。这些文件是数据实际存储在HDFS上的地方。

我没有试图直接访问这些文件,我假设它可以完成。如果你正在考虑编辑它们,我会非常关心。 :) 对我来说 - 我会想出一种方法来做我所需要的,而不需要直接访问磁盘上的Hive数据。如果您需要访问原始数据,则可以使用Hive查询并将结果输出到文件。这些将具有与 HDFS 中的文件完全相同的结构(列之间的分隔符等)。我一直在做这样的查询并将它们转换为CSV。

有关如何将查询中的数据写入磁盘的部分是 cwiki.apache/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries

I'd like to know how to find the mapping between Hive tables and the actual HDFS files (or rather, directories) that they represent. I need to access the table files directly.

Where does Hive store its files in HDFS?

解决方案

The location they are stored on the HDFS is fairly easy to figure out once you know where to look. :)

If you go to NAMENODE_MACHINE_NAME:50070/ in your browser it should take you to a page with a Browse the filesystem link.

In the $HIVE_HOME/conf directory there is the hive-default.xml and/or hive-site.xml which has the hive.metastore.warehouse.dir property. That value is where you will want to navigate to after clicking the Browse the filesystem link.

In mine, it's /usr/hive/warehouse. Once I navigate to that location, I see the names of my tables. Clicking on a table name (which is just a folder) will then expose the partitions of the table. In my case, I currently only have it partitioned on date. When I click on the folder at this level, I will then see files (more partitioning will have more levels). These files are where the data is actually stored on the HDFS.

I have not attempted to access these files directly, I'm assuming it can be done. I would take GREAT care if you are thinking about editing them. :) For me - I'd figure out a way to do what I need to without direct access to the Hive data on the disk. If you need access to raw data, you can use a Hive query and output the result to a file. These will have the exact same structure (divider between columns, ect) as the files on the HDFS. I do queries like this all the time and convert them to CSVs.

The section about how to write data from queries to disk is cwiki.apache/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries

更多推荐

Hive在HDFS中存储文件的位置?

本文发布于:2023-11-24 01:10:16,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:位置   文件   Hive   HDFS

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!