在 Python 包中导入 vendored 依赖项而不修改 sys.path 或 3rd 方包

编程入门 行业动态 更新时间:2024-10-25 10:27:58
本文介绍了在 Python 包中导入 vendored 依赖项而不修改 sys.path 或 3rd 方包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

总结

我正在为 Anki(一个开源抽认卡程序)开发一系列附加组件.Anki 附加组件作为 Python 包提供,基本文件夹结构如下所示:

anki_addons/插件名称_1/__init__.py插件名称_2/__init__.py

anki_addons 由基础应用附加到 sys.path,然后使用 import 导入每个 add_on.

我一直试图解决的问题是找到一种可靠的方法来通过我的附加组件传送包及其依赖项,同时不污染全局状态或退回到手动编辑供应商包.

规格

具体来说,给定这样的附加结构......

addon_name_1/__init__.py_小贩/__init__.py图书馆1图书馆2dependency_of_library2...

...我希望能够导入包含在 _vendor 目录中的任意包,例如:

from ._vendor import library1

像这样的相对导入的主要困难在于它们不适用于还依赖于通过绝对引用导入的其他包的包(例如 library2<的源代码中的 import dependency_of_library2/code>)

解决方案尝试

到目前为止,我已经探索了以下选项:

手动更新第三方包,以便它们的导入语句指向我的 python 包中的完全限定模块路径(例如 import addon_name_1._vendor.dependency_of_library2).但这是一项乏味的工作,不能扩展到更大的依赖树,也不能移植到其他包.通过 sys.path.insert(1, <path_to_vendor_dir>) 在我的包初始化文件中将 _vendor 添加到 sys.path.这有效,但它引入了对模块查找路径的全局更改,这将影响其他附加组件甚至基础应用程序本身.这似乎是一种黑客行为,可能会在以后导致潘多拉魔盒问题(例如,同一软件包的不同版本之间的冲突等).临时修改我导入的 sys.path;但这不适用于具有方法级导入的第三方模块.根据我发现的一个示例编写一个 PEP302 样式的自定义导入器在 setuptools 中,但我就是无法理解也没有尾巴.

<小时>

我已经坚持了好几个小时,现在我开始认为我要么完全缺少一种简单的方法来做到这一点,要么我的整个方法存在根本性的错误.

有没有办法在我的代码中随附第三方包的依赖树,而不必求助于 sys.path hacks 或修改有问题的包?

<小时>

澄清一下:我无法控制如何从 anki_addons 文件夹导入加载项.anki_addons 只是基本应用程序提供的目录,所有附加组件都安装到其中.它被添加到 sys 路径中,因此其中的附加包几乎就像位于 Python 模块查找路径中的任何其他 python 包一样.

解决方案

首先,我建议不要使用 vendoring;一些主要的软件包之前确实使用了 vendoring,但为了避免不得不处理 vendoring 的痛苦,已经转用了.其中一个示例是 requests 库.如果您依赖于使用 pip install 来安装您的包的​​人,那么 只需使用依赖项 并告诉人们有关虚拟环境的信息.不要假设您需要承担保持依赖关系解开的负担,或者需要阻止人们在全局 Python site-packages 位置安装依赖项.

同时,我很欣赏第三方工具的插件环境有所不同,如果向该工具使用的 Python 安装添加依赖项很麻烦或不可能,供应商化可能是一个可行的选择.我看到 Anki 将扩展名分发为 .zip 文件而没有 setuptools 支持,所以这肯定是这样的环境.

因此,如果您选择供应商依赖项,则使用脚本来管理您的依赖项并更新它们的导入.这是您的选项 #1,但自动化.

这是pip项目选择的路径,见他们的tasks 子目录用于他们的自动化,它建立在 之上调用库.请参阅 pip 项目 vendoring README 了解他们的政策和基本原理(其中最主要的是 pip 需要 引导 本身,例如,让它们的依赖项可以安装任何东西).

您不应使用任何其他选项;您已经列举了 #2 和 #3 的问题.

使用自定义导入器的选项 #4 的问题是您仍然需要重写导入.换句话说,setuptools 使用的自定义导入器钩子根本没有解决供应商化的命名空间问题,相反,如果供应商化的包丢失,它可以动态导入顶级包(一个问题是pip 使用手册解决 分拆过程).setuptools 实际上使用了选项 #1,他们重写了供应商包的源代码.参见例如 thesetuptools 供应商子包中的 >packaging 项目;setuptools.extern 命名空间由自定义导入钩子处理,如果从供应商包导入失败,则重定向到 setuptools._vendor 或顶级名称.

用于更新供应商软件包的 pip 自动化执行以下步骤:

删除 _vendor/ 子目录中的所有内容,除了文档、__init__.py 文件和需求文本文件.使用 pip 将所有 vendored 依赖项安装到该目录中,使用名为 vendor.txt 的专用需求文件,避免编译 .pyc字节缓存文件并忽略瞬态依赖项(假定这些已在 vendor.txt 中列出);使用的命令是pip install -t pip/_vendor -r pip/_vendor/vendor.txt --no-compile --no-deps.删除由 pip 安装但在供应商环境中不需要的所有内容,即 *.dist-info*.egg-infobin 目录,以及 pip 永远不会使用的已安装依赖项中的一些内容.收集所有安装的目录和添加的文件,没有 .py 扩展名(所以任何不在白名单中的东西);这是 vendored_libs 列表.重写导入;这只是一系列正则表达式,其中 vendored_lists 中的每个名称都用于将 import 的出现替换为 import pip._vendor.. 和每个 from (.*) importfrom pip._vendor.(.*) import 的出现.应用一些补丁来清除所需的剩余更改;从销售的角度来看,只有 pip requests 补丁在这里很有趣,因为它更新了 requests 库的向后兼容层,用于 requests 的供应商包代码> 库已删除;这个补丁很元!

所以本质上,pip 方法最重要的部分,重写供应商包导入非常简单;解释为简化逻辑并删除pip特定部分,它只是以下过程:

导入shutil导入子流程进口重新从 functools 导入部分从 itertools 导入链从 pathlib 导入路径WHITELIST = {'README.txt', '__init__.py', 'vendor.txt'}def delete_all(*paths, whitelist=frozenset()):对于路径中的项目:如果 item.is_dir():Shutil.rmtree(项目,ignore_errors=True)elif item.is_file() 和 item.name 不在白名单中:item.unlink()def iter_subtree(路径):"""递归生成子树中的所有文件,深度优先"""如果不是 path.is_dir():如果 path.is_file():屈服路径返回对于 path.iterdir() 中的项目:如果 item.is_dir():从 iter_subtree(item) 产生elif item.is_file():产量项目def patch_vendor_imports(文件,替换):text = file.read_text('utf8')替换替换:文本 = 替换(文本)file.write_text(文本,'utf8')def find_vendored_libs(vendor_dir, whitelist):vendored_libs = []路径 = []对于 vendor_dir.iterdir() 中的项目:如果 item.is_dir():vendored_libs.append(item.name)elif item.is_file() 和 item.name 不在白名单中:vendored_libs.append(item.stem) # 不带扩展名else: # 不是目录或不在 whilelist 中的文件继续路径.附加(项目)返回 vendored_libs,路径定义供应商(vendor_dir):# 目标包是<parent>.<vendor_dir>;foo/_vendor ->foo._vendorpkgname = f'{vendor_dir.parent.name}.{vendor_dir.name}'# 删除所有内容delete_all(*vendor_dir.iterdir(), whitelist=WHITELIST)# 使用pip安装子进程.运行(['pip', '安装', '-t', str(vendor_dir),'-r', str(vendor_dir/'vendor.txt'),'--no-compile', '--no-deps'])# 删除不需要的东西删除所有(*vendor_dir.glob('*.dist-info'),*vendor_dir.glob('*.egg-info'),vendor_dir/'bin')vendored_libs,路径= find_vendored_libs(vendor_dir,WHITELIST)替换 = []对于 vendored_libs 中的 lib:替换 += (部分(# 导入栏 -> 导入 foo._vendor.barrepile(r'(^s*)import {}
'.format(lib), flags=re.M).sub,r'1from {} import {}
'.format(pkgname, lib)),部分(# 来自酒吧 -> 来自 foo._vendor.barrepile(r'(^s*)from {}(.|s+)'.format(lib), flags=re.M).sub,r'1from {}.{}2'.format(pkgname, lib)),)对于 chain.from_iterable(map(iter_subtree, paths)) 中的文件:patch_vendor_imports(文件,替换)如果 __name__ == '__main__':# 这假设这是 foo/_vendor 旁边的 foo 脚本here = Path('__file__').resolve().parentvendor_dir = 此处/'foo'/'_vendor'断言 (vendor_dir/'vendor.txt').exists(), '_vendor/vendor.txt 文件未找到'assert (vendor_dir/'__init__.py').exists(), '_vendor/__init__.py 文件未找到'供应商(vendor_dir)

Summary

I am working on a series of add-ons for Anki, an open-source flashcard program. Anki add-ons are shipped as Python packages, with the basic folder structure looking as follows:

anki_addons/
    addon_name_1/
        __init__.py
    addon_name_2/
        __init__.py

anki_addons is appended to sys.path by the base app, which then imports each add_on with import <addon_name>.

The problem I have been trying to solve is to find a reliable way to ship packages and their dependencies with my add-ons while not polluting global state or falling back to manual edits of the vendored packages.

Specifics

Specifically, given an add-on structure like this...

addon_name_1/
    __init__.py
    _vendor/
        __init__.py
        library1
        library2
        dependency_of_library2
        ...

...I would like to be able to import any arbitrary package that is included in the _vendor directory, e.g.:

from ._vendor import library1

The main difficulty with relative imports like this is that they do not work for packages that also depend on other packages imported through absolute references (e.g. import dependency_of_library2 in the source code of library2)

Solution attempts

So far I have explored the following options:

Manually updating the third-party packages, so that their import statements point to the fully qualified module path within my python package (e.g. import addon_name_1._vendor.dependency_of_library2). But this is tedious work that is not scalable to larger dependency trees and not portable to other packages. Adding _vendor to sys.path via sys.path.insert(1, <path_to_vendor_dir>) in my package init file. This works, but it introduces a global change to the module look-up path which will affect other add-ons and even the base app itself. It just seems like a hack that could result in a pandora's box of issues later down the line (e.g. conflicts between different versions of the same package, etc.). Temporarily modifying sys.path for my imports; but this fails to work for third-party modules with method-level imports. Writing a PEP302-style custom importer based off an example I found in setuptools, but I just couldn't make head nor tail of that.


I've been stuck on this for quite a few hours now and I'm beginning to think that I'm either completely missing an easy way to do this, or that there is something fundamentally wrong with my entire approach.

Is there no way I can ship a dependency tree of third-party packages with my code, without having to resort to sys.path hacks or modifying the packages in question?


Edit:

Just to clarify: I don't have any control over how add-ons are imported from the anki_addons folder. anki_addons is just the directory provided by the base app where all add-ons are installed into. It is added to the sys path, so the add-on packages therein pretty much just behave like any other python package located in Python's module look-up paths.

解决方案

First of all, I'd advice against vendoring; a few major packages did use vendoring before but have switched away to avoid the pain of having to handle vendoring. One such example is the requests library. If you are relying on people using pip install to install your package, then just use dependencies and tell people about virtual environments. Don't assume you need to shoulder the burden of keeping dependencies untangled or need to stop people from installing dependencies in the global Python site-packages location.

At the same time, I appreciate that a plug-in environment of a third-party tool is something different, and if adding dependencies to the Python installation used by that tool is cumbersome or impossible vendorizing may be a viable option. I see that Anki distributes extensions as .zip files without setuptools support, so that's certainly such an environment.

So if you choose to vendor dependencies, then use a script to manage your dependencies and update their imports. This is your option #1, but automated.

This is the path that the pip project has chosen, see their tasks subdirectory for their automation, which builds on the invoke library. See the pip project vendoring README for their policy and rationale (chief among those is that pip needs to bootstrap itself, e.g. have their dependencies available to be able to install anything).

You should not use any of the other options; you already enumerated the issues with #2 and #3.

The issue with option #4, using a custom importer, is that you still need to rewrite imports. Put differently, the custom importer hook used by setuptools doesn't solve the vendorized namespace problem at all, it instead makes it possible to dynamically import top-level packages if the vendorized packages are missing (a problem that pip solves with a manual debundling process). setuptools actually uses option #1, where they rewrite the source code for vendorized packages. See for example these lines in the packaging project in the setuptools vendored subpackage; the setuptools.extern namespace is handled by the custom import hook, which then redirects either to setuptools._vendor or the top-level name if importing from the vendorized package fails.

The pip automation to update vendored packages takes the following steps:

Delete everything in the _vendor/ subdirectory except the documentation, the __init__.py file and the requirements text file. Use pip to install all vendored dependencies into that directory, using a dedicated requirements file named vendor.txt, avoiding compilation of .pyc bytecache files and ignoring transient dependencies (these are assumed to be listed in vendor.txt already); the command used is pip install -t pip/_vendor -r pip/_vendor/vendor.txt --no-compile --no-deps. Delete everything that was installed by pip but not needed in a vendored environment, i.e. *.dist-info, *.egg-info, the bin directory, and a few things from installed dependencies that pip would never use. Collect all installed directories and added files sans .py extension (so anything not in the whitelist); this is the vendored_libs list. Rewrite imports; this is simply a series of regexes, where every name in vendored_lists is used to replace import <name> occurrences with import pip._vendor.<name> and every from <name>(.*) import occurrence with from pip._vendor.<name>(.*) import. Apply a few patches to mop up the remaining changes needed; from a vendoring perspective, only the pip patch for requests is interesting here in that it updates the requests library backwards compatibility layer for the vendored packages that the requests library had removed; this patch is quite meta!

So in essence, the most important part of the pip approach, the rewriting of vendored package imports is quite simple; paraphrased to simplify the logic and removing the pip specific parts, it is simply the following process:

import shutil
import subprocess
import re

from functools import partial
from itertools import chain
from pathlib import Path

WHITELIST = {'README.txt', '__init__.py', 'vendor.txt'}

def delete_all(*paths, whitelist=frozenset()):
    for item in paths:
        if item.is_dir():
            shutil.rmtree(item, ignore_errors=True)
        elif item.is_file() and item.name not in whitelist:
            item.unlink()

def iter_subtree(path):
    """Recursively yield all files in a subtree, depth-first"""
    if not path.is_dir():
        if path.is_file():
            yield path
        return
    for item in path.iterdir():
        if item.is_dir():
            yield from iter_subtree(item)
        elif item.is_file():
            yield item

def patch_vendor_imports(file, replacements):
    text = file.read_text('utf8')
    for replacement in replacements:
        text = replacement(text)
    file.write_text(text, 'utf8')

def find_vendored_libs(vendor_dir, whitelist):
    vendored_libs = []
    paths = []
    for item in vendor_dir.iterdir():
        if item.is_dir():
            vendored_libs.append(item.name)
        elif item.is_file() and item.name not in whitelist:
            vendored_libs.append(item.stem)  # without extension
        else:  # not a dir or a file not in the whilelist
            continue
        paths.append(item)
    return vendored_libs, paths

def vendor(vendor_dir):
    # target package is <parent>.<vendor_dir>; foo/_vendor -> foo._vendor
    pkgname = f'{vendor_dir.parent.name}.{vendor_dir.name}'

    # remove everything
    delete_all(*vendor_dir.iterdir(), whitelist=WHITELIST)

    # install with pip
    subprocess.run([
        'pip', 'install', '-t', str(vendor_dir),
        '-r', str(vendor_dir / 'vendor.txt'),
        '--no-compile', '--no-deps'
    ])

    # delete stuff that's not needed
    delete_all(
        *vendor_dir.glob('*.dist-info'),
        *vendor_dir.glob('*.egg-info'),
        vendor_dir / 'bin')

    vendored_libs, paths = find_vendored_libs(vendor_dir, WHITELIST)

    replacements = []
    for lib in vendored_libs:
        replacements += (
            partial(  # import bar -> import foo._vendor.bar
                repile(r'(^s*)import {}
'.format(lib), flags=re.M).sub,
                r'1from {} import {}
'.format(pkgname, lib)
            ),
            partial(  # from bar -> from foo._vendor.bar
                repile(r'(^s*)from {}(.|s+)'.format(lib), flags=re.M).sub,
                r'1from {}.{}2'.format(pkgname, lib)
            ),
        )

    for file in chain.from_iterable(map(iter_subtree, paths)):
        patch_vendor_imports(file, replacements)

if __name__ == '__main__':
    # this assumes this is a script in foo next to foo/_vendor
    here = Path('__file__').resolve().parent
    vendor_dir = here / 'foo' / '_vendor'
    assert (vendor_dir / 'vendor.txt').exists(), '_vendor/vendor.txt file not found'
    assert (vendor_dir / '__init__.py').exists(), '_vendor/__init__.py file not found'
    vendor(vendor_dir)

这篇关于在 Python 包中导入 vendored 依赖项而不修改 sys.path 或 3rd 方包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

更多推荐

[db:关键词]

本文发布于:2023-03-31 17:23:58,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:而不   包中   Python   vendored   path

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!