在Docker中进行容器化时出现TesseractNotFound问题

编程入门 行业动态 更新时间:2024-10-23 23:29:33
本文介绍了在Docker中进行容器化时出现TesseractNotFound问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

问题:

我在本地计算机上安装了 tesseract ,其路径位于/usr/local/Cellar/tesseract/4.1.1/bin/tesseract .一切工作正常,直到我在docker中将其容器化并显示以下错误消息: pytesseract.pytesseract.TesseractNotFoundError:尚未安装或不是您的PATH

I had tesseract installed in local machine and its path is at /usr/local/Cellar/tesseract/4.1.1/bin/tesseract. Everything works perfectly until I containerized it in docker with error message as: pytesseract.pytesseract.TesseractNotFoundError: is not installed or it's not your PATH

我尝试过的事情:

根据错误消息,这是我尝试过的操作:

Based on the error message, this is what I've tried:

1).在文件共享下的docker桌面应用程序中将PATH添加到/usr/local 并将文件路径从本地挂载到docker-仍然收到错误消息(不起作用)

1). Add PATH in docker desktop app under file sharing to /usr/local and mount the file path from local to docker - still getting the error message (doesn't work)

2).将 tesseract.exe 从其驻留位置移动到当前本地工作目录-仍然收到错误消息(当然,它不起作用-那时我还在想什么?)

2). Move tesseract.exe from where it resides to current local working dir - still getting the error message(of course it doesn't work - what was I even thinking back then?)

3).修改dockerfile以安装带有其依赖项的tesseract.这是dockerfile:

3). Modify dockerfile to install tesseract with its dependencies. Here is the dockerfile:

FROM python:3.7-alpine RUN apk update && apk add --no-cache tesseract-ocr WORKDIR /app COPY ./requirements.txt ./ RUN pip3 install --upgrade pip # install dependencies RUN pip3 install -r requirements.txt RUN pip3 install --upgrade PyMuPDF # bundle app source COPY . /app COPY ./ChaseOCR.py /app COPY ./BancAmericaOCR.py /app COPY ./WellsFargoOCR.py /app EXPOSE 8080 CMD ["python3", "MainBankClass.py"]

在requirements.txt文件中,还包括 pytesseract 和 tesseract 依赖项.-仍然收到错误消息(不起作用).在过去的两天里一直被困在这个问题上,这里的选择有些用完了.此链接和此链接都不适用于我的情况.任何帮助深表感谢.预先感谢.

Under requirements.txt file, pytesseract and tesseract dependencies are also included. - still getting the error message (doesn't work). Being stuck on this issue in the past 2 days and kinda running out of options here. This link and this link both don't work on my case. Any help is much appreciated. Thanks in advance.

感谢Neo的解决方案,我现在正在对其进行测试,但是其运行非常缓慢.因此,我认为最好在这里共享requirements.txt文件,以防其他问题与tesseract不相关.

Thanks to Neo's solution and I am testing it now but its running very slowly. Thus I thought it would be better to share requirements.txt file here just in case other issues are non-related to tesseract.

requirements.txt:

numpy pandas opencv-python Pillow Image pytesseract tesseract PyMuPDF python-levenshtein tabula-py

本地文件目录:

testdockerfile ├─ .vscode │ └─ settings.json ├─ BankofAmericaOCR.py ├─ ChaseOCR.py ├─ Dockerfile ├─ MainBankClass.py |- __init__.py ├─ WellsFargoOCR.py └─ requirements.txt

如果有人遇到与在docker中实现 tesseract 后仍然遇到 TesseractNotFound 问题相同的问题,以供将来参考.您需要做的是注释掉 pytesseract.pytesseract.tesseract_cmd = r'/path/to/your/tesseract (如果您设置了在本地运行的路径).之后,您还需要重新构建映像并在docker中运行该映像.没关系.

Just for future reference if anyone has the same issue as I did after implementing tesseract in docker and still getting TesseractNotFound issue. What you need to do is to comment out pytesseract.pytesseract.tesseract_cmd = r'/path/to/your/tesseract if you set the path to run it locally. After that, you also need to re-build the image and run that image in docker. It should be fine.

推荐答案

requirements.txt 中的某些python软件包具有其他先决条件.有了这个 Dockerfile ,它成功完成了整个构建过程.

Edit 3: Some of the python packages in requirements.txt have other prerequisites. With this Dockerfile it went successfully through the entire build process.

最棘手的部分是构建 opencv .代金券到 github/janza/docker-python3-opencv/blob/master/Dockerfile

The trickiest part was to build opencv. Credits to github/janza/docker-python3-opencv/blob/master/Dockerfile

. ├── Dockerfile └── requirements.txt

Dockerfile:

Dockerfile:

FROM python:3.7 RUN apt-get update \ && apt-get install -y \ build-essential \ cmake \ git \ wget \ unzip \ yasm \ pkg-config \ libswscale-dev \ libtbb2 \ libtbb-dev \ libjpeg-dev \ libpng-dev \ libtiff-dev \ libavformat-dev \ libpq-dev \ && rm -rf /var/lib/apt/lists/* RUN pip install numpy WORKDIR / ENV OPENCV_VERSION="4.1.1" RUN wget github/opencv/opencv/archive/${OPENCV_VERSION}.zip \ && unzip ${OPENCV_VERSION}.zip \ && mkdir /opencv-${OPENCV_VERSION}/cmake_binary \ && cd /opencv-${OPENCV_VERSION}/cmake_binary \ && cmake -DBUILD_TIFF=ON \ -DBUILD_opencv_java=OFF \ -DWITH_CUDA=OFF \ -DWITH_OPENGL=ON \ -DWITH_OPENCL=ON \ -DWITH_IPP=ON \ -DWITH_TBB=ON \ -DWITH_EIGEN=ON \ -DWITH_V4L=ON \ -DBUILD_TESTS=OFF \ -DBUILD_PERF_TESTS=OFF \ -DCMAKE_BUILD_TYPE=RELEASE \ -DCMAKE_INSTALL_PREFIX=$(python3.7 -c "import sys; print(sys.prefix)") \ -DPYTHON_EXECUTABLE=$(which python3.7) \ -DPYTHON_INCLUDE_DIR=$(python3.7 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \ -DPYTHON_PACKAGES_PATH=$(python3.7 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \ .. \ && make install \ && rm /${OPENCV_VERSION}.zip \ && rm -r /opencv-${OPENCV_VERSION} RUN ln -s \ /usr/local/python/cv2/python-3.7/cv2.cpython-37m-x86_64-linux-gnu.so \ /usr/local/lib/python3.7/site-packages/cv2.so RUN apt-get --fix-missing update && apt-get --fix-broken install && apt-get install -y poppler-utils && apt-get install -y tesseract-ocr && \ apt-get install -y libtesseract-dev && apt-get install -y libleptonica-dev && ldconfig && apt install -y libsm6 libxext6 && apt install -y python-opencv COPY ./requirements.txt ./ RUN pip3 install --upgrade pip # install dependencies RUN pip3 install -r requirements.txt

内部版本:

docker image build -t my-awesome-py .

运行:

docker run --rm my-awesome-py tesseract Usage: tesseract --help | --help-extra | --version tesseract --list-langs tesseract imagename outputbase [options...] [configfile...] OCR options: -l LANG[+LANG] Specify language(s) used for OCR. NOTE: These options must occur before any configfile. Single options: --help Show this help message. --help-extra Show extra help for advanced users. --version Show version information. --list-langs List available languages for tesseract engine.

更多推荐

在Docker中进行容器化时出现TesseractNotFound问题

本文发布于:2023-10-07 12:55:25,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1469444.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:容器   Docker   TesseractNotFound

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!