【文献阅读】Hybrid model for Chinese character recognition based on Tesseract-OCR

编程知识更新时间:2023-04-05 23:17:48

总结：openCV(image preprocessing)+KNN(phrase processing)+Tesseract-OCR engine
个人感觉此篇论文质量不高，实验细节未论述，实验结果没有统计分析，言辞重复，存在低级错误

Introduction

Chinese OCR is more difficult

The number of English letters is only 26. But the number of Chinese characters that used commonly are about 2,500.
the strokes of Chinese characters are complex and similar.
The differences between the different fonts of Chinese are large.

OCR engines

Tesseract-OCR engine

the first OCR engine, supports more than 100 languages (tesseract-
ocr/tessdata, https://github/tesseract-ocr/tessdata).
The OCR engine of Tesseract- version 4.0 uses Long Short-Term Memory (LSTM).
In the Tesseract-OCR Simplified Chinese language library,the character recognition of separate words is based on the feature of standard Chinese characters.

OCRopus

also a OCR engine based on LSTM.

Ocular OCR engine

mostly uses the recognition of historical artefact.

Swift OCR

is a simple and fast OCR, Written in Swift.

Simple-ocr-openCV

is a simple python OCR engine based on OpenCV and NumPy

Background

Process of OCR

The main work of this study includes image preprocessing and phrase processing.

2.4.1 Image preprocessing

The methods of image preprocessing include binarisation, noise reduction, image tilt correction, and the like

3 OCR hybrid recognition model

3.1 Image correction

3.2 KNN phrase detection and correction

更多推荐

【文献阅读】Hybrid model for Chinese character recognition based on Tesseract-OCR

本文发布于:2023-04-05 23:17:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/454e1b84ab4c4f2b995bc173b94dfcf1.html

上一篇： Idea 2020 找不到或无法安装（官方汉化包解决方案）2020 5月以后最新版本查看Version版本下载
下一篇：返回列表

发布评论取消回复

评论列表（有 0 条评论）

【文献阅读】Hybrid model for Chinese character recognition based on Tesseract-OCR

Introduction

Chinese OCR is more difficult

OCR engines

Tesseract-OCR engine

OCRopus

Ocular OCR engine

Swift OCR

Simple-ocr-openCV

Background

Process of OCR

2.4.1 Image preprocessing

3 OCR hybrid recognition model

3.1 Image correction

3.2 KNN phrase detection and correction

发布评论取消回复

最近发表

热门文章

标签列表

【文献阅读】Hybrid model for Chinese character recognition based on Tesseract-OCR

Introduction

Chinese OCR is more difficult

OCR engines

Tesseract-OCR engine

OCRopus

Ocular OCR engine

Swift OCR

Simple-ocr-openCV

Background

Process of OCR

2.4.1 Image preprocessing

3 OCR hybrid recognition model

3.1 Image correction

3.2 KNN phrase detection and correction

相关文章

发布评论取消回复

最近发表

热门文章

标签列表