python(遍历)读取文件或文件夹

编程入门 行业动态 更新时间:2024-10-16 02:26:10

python(<a href=https://www.elefans.com/category/jswz/34/1771029.html style=遍历)读取文件或文件夹"/>

python(遍历)读取文件或文件夹

文章目录

  • 例子
  • os.listdir
  • os.walk
  • 遍历读取代码

搞机器学习或者深度学习算法很多时候需要遍历某个目录读取文件,特别是经常需要读取某个特定后缀的文件,比如图片的话可能需要读取jpg, png, bmp格式的文件。python本身的库函数功能没有这么定制化,所以就需要再重新包装一下。

例子

假设我们有如下的目录结构,以bmp结尾的是文件,其他是文件夹。下面的程序都将以该目录结构为例进行说明。

data- a- a1.bmp- b- b1.bmp- 1.bmp- 2.bmp

os.listdir

os.listdir仅读取当前路径下的文件和文件夹,返回一个列表。读取demo目录结构的代码和结果如下:

path = r'D:\data'
items = os.listdir(path)  # ==> ['1.bmp', '2.bmp', 'a', 'b']

os.walk

os.walk本身已经是遍历读取,包含所有的子文件(夹)但是其结果不像是os.listdir一样是个list,而是一个比较复杂的数据体,难以直接使用,所以一般需要再处理一下。我们可以使用for语句将其打印出来看看:

path = r'D:\data'
# part 1
for items in os.walk(path):print(items)
# part 2
for main_dir, sub_dir_list, sub_file_list in os.walk(path):print(main_dir, sub_dir_list, sub_file_list)

结果为:

# part 1
('D:\\data', ['a', 'b'], ['1.bmp', '2.bmp'])
('D:\\data\\a', [], ['a1.bmp'])
('D:\\data\\b', [], ['b1.bmp'])# part 2
D:\data ['a', 'b'] ['1.bmp', '2.bmp']
D:\data\a [] ['a1.bmp']
D:\data\b [] ['b1.bmp']

使用迭代器对os.walk()的结果进行输出,发现每一条包含三个部分(part 1),在part 2中,我们给三个部分分别起名为main_dir, sub_dir_list, sub_file_list,下面对其进行简单解释:

  • main_dir:遍历得到的路径下所有文件夹
  • sub_dir_list:main_dir下面的文件夹
  • sub_file_list:main_dir下面的文件

连接main_dirsub_file_list中的文件可以得到路径下的所有文件。

sub_dir_list在这里则没有用处,我们无需再去遍历sub_dir_list,因为它们已经包含在main_dir里了。

遍历读取代码

代码逻辑如下:

  • 需要有后缀辨别功能,并且能够同时辨别多个后缀
  • 需要有递归和非递归功能
  • 返回的是以入参path为前缀的路径,所以如果path是完整路径那么返回的就是完整路径,否则就不是
# -*- coding: utf-8 -*-
import osdef file_ext(filename, level=1):"""return extension of filenameParameters:-----------filename: strname of file, path can be includedlevel: intlevel of extension.for example, if filename is 'sky.png.bak', the 1st level extensionis 'bak', and the 2nd level extension is 'png'Returns:--------extension of filename"""return filename.split('.')[-level]def _contain_file(path, extensions):"""check whether path contains any file whose extension is in extensions listParameters:-----------path: strpath to be checkedextensions: str or list/tuple of strextension or extensions listReturns:--------return True if contains, else return False"""assert os.path.exists(path), 'path must exist'assert os.path.isdir(path), 'path must be dir'if isinstance(extensions, str):extensions = [extensions]for file in os.listdir(path):if os.path.isfile(os.path.join(path, file)):if (extensions is None) or (file_ext(file) in extensions):return Truereturn Falsedef _process_extensions(extensions=None):"""preprocess and check extensions, if extensions is str, convert it to list.Parameters:-----------extensions: str or list/tuple of strfile extensionsReturns:--------extensions: list/tuple of strfile extensions"""if extensions is not None:if isinstance(extensions, str):extensions = [extensions]assert isinstance(extensions, (list, tuple)), \'extensions must be str or list/tuple of str'for ext in extensions:assert isinstance(ext, str), 'extension must be str'return extensionsdef get_files(path, extensions=None, is_recursive=True):"""read files in path. if extensions is None, read all files, if extensionsare specified, only read the files who have one of the extensions. ifis_recursive is True, recursively read all files, if is_recursive is False,only read files in current path.Parameters:-----------path: strpath to be readextensions: str or list/tuple of strfile extensionsis_recursive: boolwhether read files recursively. read recursively is True, while justread files in current path if FalseReturns:--------files: the obtained files in path"""extensions = _process_extensions(extensions)files = []# get files in current pathif not is_recursive:for name in os.listdir(path):fullname = os.path.join(path, name)if os.path.isfile(fullname):if (extensions is None) or (file_ext(fullname) in extensions):files.append(fullname)return files# get files recursivelyfor main_dir, _, sub_file_list in os.walk(path):for filename in sub_file_list:fullname = os.path.join(main_dir, filename)if (extensions is None) or (file_ext(fullname) in extensions):files.append(fullname)return filesdef get_folders(path, extensions=None, is_recursive=True):"""read folders in path. if extensions is None, read all folders, ifextensions are specified, only read the folders who contain any files thathave one of the extensions. if is_recursive is True, recursively read allfolders, if is_recursive is False, only read folders in current path.Parameters:-----------path: strpath to be readextensions: str or list/tuple of strfile extensionsis_recursive: boolwhether read folders recursively. read recursively is True, while justread folders in current path if FalseReturns:--------folders: the obtained folders in path"""extensions = _process_extensions(extensions)folders = []# get folders in current pathif not is_recursive:for name in os.listdir(path):fullname = os.path.join(path, name)if os.path.isdir(fullname):if (extensions is None) or \(_contain_file(fullname, extensions)):folders.append(fullname)return folders# get folders recursivelyfor main_dir, _, _ in os.walk(path):if (extensions is None) or (_contain_file(main_dir, extensions)):folders.append(main_dir)return foldersif __name__ == '__main__':path = r'.\data'files = get_files(path)print(files)  # ==> ['D:\\data\\1.bmp', 'D:\\data\\2.bmp', 'D:\\data\\a\\a1.bmp', 'D:\\data\\b\\b1.bmp']folders = get_folders(path)print(folders)  # ==> ['D:\\data', 'D:\\data\\a', 'D:\\data\\b']

更多推荐

python(遍历)读取文件或文件夹

本文发布于:2024-03-10 11:21:33,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1727834.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:遍历   文件夹   文件   python

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!