Data Lake Store的备份

编程入门 行业动态 更新时间:2024-10-12 20:21:53
本文介绍了Data Lake Store的备份的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我正在为Data Lake Store(DLS)制定备份策略。我的计划是创建两个DLS帐户并在它们之间复制数据。我已经评估了几种实现这一点的方法,但是它们都不符合保留POSIX ACL的要求(使用DLS说法的权限)。 PowerShell cmdlet要求将数据从主DLS下载到VM并重新上载到辅助DLS。 AdlCopy工具仅适用于Windows 10,不保留权限,也不支持跨区域复制数据(并非这是一项艰难的要求)。数据工厂似乎是最明智的方法,直到我意识到它也不保留权限。 这导致我到最后一个选项 - Distcp。根据Distcp指南( hadoop.apache/ docs / current / hadoop-distcp / DistCp.html ),该工具支持保留权限。但是,使用Distcp的缺点是该工具必须从HDInsight运行。虽然它支持内部和集群间复制,但我宁愿没有运行的HDInsight集群仅用于备份操作。 我错过了什么吗?有没有人有任何更好的建议?

解决方案

您的评估是全面的。如果您想复制权限,这些确实是可用的选项。所以你必须选择其中一个,对不起。如果您真的想要一个可以复制权限的无服务器选项,则Azure Data Factory将不得不这样做。您可以在这里创建一个反馈项目 - feedback.azure/forums / 270578-data-factory ?

谢谢, Sachin Sheth Azure Data Lake项目经理。

I am working on a backup strategy for Data Lake Store (DLS). My plan is to create two DLS accounts and copy data between them. I have evaluated several approaches to achieve this but none of them satisfies the requirement to preserve the POSIX ACLs (permissions in DLS parlance). PowerShell cmdlets require data to be downloaded from the primary DLS onto a VM and re-uploaded onto the secondary DLS. The AdlCopy tool works only on Windows 10, does not preserve permissions and neither supports copying data across regions (not that this is a hard requirement). Data Factory seemed like the most sensible approach until I realized it also doesn't preserve permissions. Which leads me to my last option - Distcp. According to the Distcp guide (hadoop.apache/docs/current/hadoop-distcp/DistCp.html), the tool supports preserving of permissions. However, the downside of using Distcp is that the tool must be run from HDInsight. Although it supports both intra and inter-cluster copying, I would rather not have a running HDInsight cluster just for backup operations. Am I missing something? Does anyone have any better suggestions?

解决方案

Your assessment is comprehensive. Those are indeed the options that are available should you want to copy over permissions. So you will have to choose one of them, sorry. If you truly want a serverless option that would copy over the permissions, Azure Data Factory would have to be it. Could you please create a feedback item here - feedback.azure/forums/270578-data-factory?

Thanks, Sachin Sheth Program Manager, Azure Data Lake.

更多推荐

Data Lake Store的备份

本文发布于:2023-11-28 14:40:58,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:备份   Data   Lake   Store

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!