问题描述
限时送ChatGPT账号..我正在尝试从包含数百万个 IRS 文件的公共 s3 存储桶下载文件子集.我可以使用以下命令下载整个存储库:
I'm trying to download a subset of files from a public s3 bucket that contains millions of IRS files. I can download the entire repository with the command:
aws s3 同步 s3://irs-form-990/./
但是需要的时间太长了!
But it takes way too long!
我知道我应该使用 --include/--exclude 标志,但我不知道如何将它们与值列表一起使用.我有一个 csv,其中包含我想要的 2017 年所有文件的唯一标识符,但如何将它与 AWS CLI 一起使用?该列表本身有 50 万个 ID.
I know I should be using the --include / --exclude flags, but I don't know how to use them with a list of values. I have a csv that contains unique identifiers for all the files from 2017 that I'd like, but how do I use it in with AWS CLI? The list itself is half a million IDs long.
非常感谢帮助.谢谢.
推荐答案
有一个 bash 脚本可以从文件 filename.txt 中读取所有文件名.您所要做的就是将这些 ID 转换为文件名.
There is a bash script which can read all the filenames from a file filename.txt. All you have to do is to convert those IDs in filenames.
#!/bin/bash
set -e
while read line
do
aws s3 cp s3://bucket-name/$line dest-path/
done <filename.txt
这个问题之前有人问过,你可以在这里找到答案
This question was asked before and the answer you can find it here
这篇关于如何使用 AWS CLI 根据列表下载文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论