从OCR引擎中提取和解析特定的布局信息(Extracting and parsing specific layout info from OCR engine)

编程入门行业动态更新时间:2024-10-17 02:46:52

我试图用PHP解析来自OCR引擎的布局信息，除非他们没有提供任何细节。

我安装了Tesseract（和Leptonica）和Cuneiform。据说楔形文字在检测布局方面非常出色（即什么是文本，什么是图片等）输入是带有文本和图像的PNG文件（显然文本是图像的一部分。）

他们似乎都认为我想要输出为txt或html或者特殊的...当我想要的是它认为是文本的坐标以及它认为是图像的坐标。

楔形文字有一个“本机”输出选项，它是Cuneiform 2000格式，在Notepad ++中打开它我可以看到它是压缩的。我尝试用zip和gzip解压缩它，但都没有识别它。 Google上没有关于原生楔形文字格式的信息。

任何人都知道如何从Tesseract或Cuneiform中提取布局信息......或者有更好的想法来弄清楚包含文本块和图片的图像的布局？

I'm attempting to parse layout information from OCR engines with PHP, except they are not giving any details.

I have both Tesseract (with Leptonica) and Cuneiform installed. Supposedly Cuneiform is excellent at detecting layout (i.e. what is text, what is a picture, etc.) Input are PNG files with both text and images (obviously the text is part of the image.)

They all seem to think I want the output as txt or html or hocr... when what I want are the coordinates of what it thinks is text and what it thinks is an image.

Cuneiform has a "native" output option which is Cuneiform 2000 format, opening it up in Notepad++ I can see that it's compressed. I've tried extracting it with zip and gzip but neither recognize it. No info on Google about the native Cuneiform format either.

Anyone got any idea how to extract the layout information from Tesseract or Cuneiform... or got any better ideas to figure out the layout of an image containing text blocks and pictures?

最满意答案

看看ABBYY FineReader Engine 。它有一个非常智能的API，提供有关已识别文本的最大信息，包括其坐标。它不是免费的，但是当谈到商业软件时 - ABBYY OCR技术可以为您的产品增加一个重要的价值。

由于您正在使用PHP开发Web应用程序，因此您可能需要在www.ocrsdk.com上使用ABBYY OCR Engine Web API。它现在处于封闭测试阶段，所以现在它可以免费使用。

Have a look at ABBYY FineReader Engine. It has a very smart API that provides maximum information about the recoggnized text, including its coordinates. It's not free, but when it comes to business software – ABBYY OCR technologies can add a serious value to your product.

Since you are working on a web application in PHP, you may want to use ABBYY OCR Engine web API at www.ocrsdk.com. It's now in closed beta, so for now it's free to use.

更多推荐

本文发布于:2023-08-07 15:53:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1464543.html