【基于百度ppocr

编程入门行业动态更新时间:2024-10-15 04:25:01

【基于百度ppocr

基于百度ppocr-v3实现验证码识别

该项目使用的验证码数据集为最基本的数字和大小写字母的随机组合，然后加入随机干扰像素+随机位置。
如：

数据集：

下载链接：

数据集预处理：

#生成总的标签文件 划分数据集
#划分数据集
import random
import os
train_path = r"E:\code\PaddleOCR\work\Verification_code"
SUM = []
for root,dirs,files in os.walk(train_path): # 分别代表根目录、文件夹、文件for file in files:imgpath = os.path.join(root,file)SUM.append(imgpath+"\t"+file.split(".")[0]+"\n")# 生成总标签文件allstr = ''.join(SUM)f = open('work/total_list.txt','w',encoding='utf-8')f.write(allstr)f.close
print("数据集数量：{}".format(len(SUM)))random.shuffle(SUM)
train_len = int(len(SUM) * 0.8)
test_list = SUM[train_len:]
train_list = SUM[:train_len]
print('训练集数量: {}, 验证集数量: {}'.format(len(train_list),len(test_list)))
#生成训练集的标签文件
train_txt = ''.join(train_list)
f_train = open('work/train_list.txt','w',encoding='utf-8')
f_train.write(train_txt)
f_train.close()
#生成测试集的标签文件
test_txt = ''.join(test_list)
f_test = open('work/test_list.txt','w',encoding='utf-8')
f_test.write(test_txt)
f_test.close()

# 准备字典
import codecsclass_set = set()
lines = []
file = open("work/total_list.txt", "r", encoding="utf-8")  # 待转换文档，这里我们使用的是数据集的标签文件
for i in file:a = i.strip('\n').split('\t')[-1]lines.append(a)
file.close
for line in lines:for e in line:class_set.add(e)
class_list = list(class_set)
class_list.sort()
print("class num: {0}".format(len(class_list)))
with codecs.open("work/new_dict.txt", "w", encoding='utf-8') as label_list:for id, c in enumerate(class_list):label_list.write("{0}\n".format(c))

文字识别模型的训练使用的字典需要包含所有希望被正确识别的字，字典需要写成如下格式，一行一个字符，并以 utf-8 编码格式保存。该项目一共使用了10个数字(0-9),26个大写字母(A-Z),26个小写字母(a-z),共62个字符，在这里我们使用集合对总的数据集中的标签内容生成字典，这方法适用于绝大多数情况下的字典生成，尤其是无法知道数据集识别文字的内容时比较好用。这里生成的数据集是SimpleDataSet格式的，也就是每行是文件名和对应的标签，中间隔着分隔符’\t’。

环境配置：

Windows10，paddlepaddle-gpu=2.3.0 -cuda10.1,python3.7,paddleocr==2.6
（后续镜像打包，Ubuntu1604，拉取cuda10.1的paddle基础镜像）

可以自己拿着数据集进行训练，只需要修改一下配置文件就行，配置文件中指定数据集的路径，执行训练模型的保存路径。修改后的yml文件直接贴在这里：

Global:debug: falseuse_gpu: trueepoch_num: 500log_smooth_window: 20print_batch_step: 100save_model_dir: ./output/v3_en_mobilesave_epoch_step: 50eval_batch_step: [0, 2000]cal_metric_during_train: truepretrained_model:  ./pretrain_models/en_PP-OCRv3_rec_train/best_accuracycheckpoints:save_inference_dir:use_visualdl: trueinfer_img: doc/imgs_words/ch/word_1.jpgcharacter_dict_path: ./work/new_dict.txtmax_text_length: &max_text_length 6infer_mode: falseuse_space_char: falsedistributed: truesave_res_path: ./output/rec/predicts_ppocrv3_en.txtOptimizer:name: Adambeta1: 0.9beta2: 0.999lr:name: Cosinelearning_rate: 0.001warmup_epoch: 5regularizer:name: L2factor: 3.0e-05Architecture:model_type: recalgorithm: SVTRTransform:Backbone:name: MobileNetV1Enhancescale: 0.5last_conv_stride: [1, 2]last_pool_type: avgHead:name: MultiHeadhead_list:- CTCHead:Neck:name: svtrdims: 64depth: 2hidden_dims: 120use_guide: TrueHead:fc_decay: 0.00001- SARHead:enc_dim: 512max_text_length: *max_text_lengthLoss:name: MultiLossloss_config_list:- CTCLoss:- SARLoss:PostProcess:  name: CTCLabelDecodeMetric:name: RecMetricmain_indicator: accignore_space: FalseTrain:dataset:name: SimpleDataSetdata_dir: ./work/Verification_codeext_op_transform_idx: 1label_file_list:- ./work/train_list.txttransforms:- DecodeImage:img_mode: BGRchannel_first: false- RecConAug:prob: 0.5ext_data_num: 2image_shape: [48, 320, 3]- RecAug:- MultiLabelEncode:- RecResizeImg:image_shape: [3, 48, 320]- KeepKeys:keep_keys:- image- label_ctc- label_sar- length- valid_ratioloader:shuffle: truebatch_size_per_card: 64drop_last: truenum_workers: 4
Eval:dataset:name: SimpleDataSetdata_dir: ./work/Verification_codelabel_file_list:- ./work/test_list.txttransforms:- DecodeImage:img_mode: BGRchannel_first: false- MultiLabelEncode:- RecResizeImg:image_shape: [3, 48, 320]- KeepKeys:keep_keys:- image- label_ctc- label_sar- length- valid_ratioloader:shuffle: falsedrop_last: falsebatch_size_per_card: 32num_workers: 4

使用paddle的原命令行，可以直接训练：
python3 tools/train.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml

acc能达到0.93，说明效果还是挺好的，当然还具有一定的提升空间，仍然存在易混字符容易识别错误，0和O，l和1，w和W，x和X，z、Z和2之间，不仅限于使用基于知识蒸馏的训练，使用数据扩增方法，还可以进一步合成数据，合成各式各样的验证码来进一步提高模型的精度。

这里重点在于将训练模型转为推理模型：
.pdparams、.pdopt、*.states为训练过程中保存的模型的参数、优化器状态和训练中间信息，多用于模型指标评估和恢复训练，所以在实际的应用中需要转换成用于预测引擎推理模型inference.pdmodel、inference.pdiparams，然后基于推理模型去做进一步的部署
python tools/export_model.py -c ./en_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/v3_en_mobile/best_accuracy Global.save_inference_dir=./inference/en_PP-OCRv3_rec/

模型部署：flask框架

import os
import socket
from flask import Flask, requestapp = Flask(__name__)def host_ip():"""查询本机ip地址:return: ip"""ip = '0.0.0.0's = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)try:s.connect(('8.8.8.8', 80))ip = s.getsockname()[0]except OSError as ex:hostname = socket.gethostname()ip = socket.gethostbyname(hostname)finally:s.close()return ip@app.route('/', methods=["GET"])
def hello_world():return 'hello world'@app.route("/captcha_ocr", methods=["POST"])
def ocr_html_post():data = request.filesfile = data['file']print(file.filename)try:os.rename(file.filename, 'cache.png')except:print("have the same name")file.save('cache.png')ocr_str = ocr('cache.png')return str(ocr_str)def ocr(img_path):from paddleocr import PaddleOCR, draw_ocrocr = PaddleOCR(use_angle_cls=True, lang="en", use_gpu=True, rec_image_shape="3, 48, 320",rec_char_dict_path="./work/new_dict.txt", rec_char_type='en',rec_algorithm='SVTR_LCNet',rec_model_dir='./inference/en_PP-OCRv3_rec/',cls_model_dir='./output/inference/ch_ppocr_mobile_v2.0_cls_infer/',det_model_dir='./output/inference/en_PP-OCRv3_det_infer/')  # need to run only once to download and load model into memory# img_path = './test/W30J.png'result = ocr.ocr(img_path, cls=True)for idx in range(len(result)):res = result[idx]for line in res:print(line)from PIL import Imageresult = result[0]image = Image.open(img_path).convert('RGB')boxes = [line[0] for line in result]txts = [line[1][0] for line in result]scores = [line[1][1] for line in result]im_show = draw_ocr(image, boxes, txts, scores, font_path='doc/fonts/simfang.ttf')im_show = Image.fromarray(im_show)im_show.save('result.jpg')print('识别结果为：', txts,'准确率为：',scores)return txts,scoresif __name__ == '__main__':# app.run(port=5067, debug=True)hostip = host_ip()app.run(debug=True, port=5067, host=hostip)

这里在模型路径指定中：文字检测和方向检测使用的是paddleocr中已有的推理模型，官方可下载，主要的区别在于识别模型是上述模型训练自定义识别的模型。在进行参数指定时需要注意将字典对应到模型训练使用的字典，否则调用训练好的模型，结果会产生比较大的出入。

调用代码：

import requests
url = "http://X.X.X.X:5067/captcha_ocr"
files = {'file': open('./test/W45G.png', 'rb')}
r = requests.post(url, files=files)print(r.text)

运行日志如下：

[2023/06/15 15:38:53] ppocr DEBUG: Namespace(alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='./output/inference/ch_ppocr_mobile_v2.0_cls_infer/', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_box_type='quad', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='./output/inference/en_PP-OCRv3_det_infer/', det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_id=0, gpu_mem=500, help='==SUPPRESS==', image_dir=None, image_orientation=False, ir_optim=True, kie_algorithm='LayoutXLM', label_list=['0', '180'], lang='en', layout=True, layout_dict_path=None, layout_model_dir=None, layout_nms_threshold=0.5, layout_score_threshold=0.5, max_batch_size=10, max_text_length=25, merge_no_span_structure=True, min_subgraph_size=15, mode='structure', ocr=True, ocr_order_method=None, ocr_version='PP-OCRv3', output='./output', page_num=0, precision='fp32', process_id=0, re_model_dir=None, rec=True, rec_algorithm='SVTR_LCNet', rec_batch_num=6, rec_char_dict_path='./work/new_dict.txt', rec_char_type='en', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_model_dir='./inference/en_PP-OCRv3_rec/', recovery=False, save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ser_model_dir=None, show_log=True, sr_batch_num=1, sr_image_shape='3, 32, 128', sr_model_dir=None, structure_version='PP-StructureV2', table=True, table_algorithm='TableAttn', table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=True, use_dilation=False, use_gpu=True, use_mp=False, use_npu=False, use_onnx=False, use_pdf2docx_api=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, use_visual_backbone=True, use_xpu=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False)
[2023/06/15 15:38:57] ppocr DEBUG: dt_boxes num : 1, elapse : 1.6441783905029297
[2023/06/15 15:38:57] ppocr DEBUG: cls num  : 1, elapse : 0.036865234375
[2023/06/15 15:38:57] ppocr DEBUG: rec_res num  : 1, elapse : 0.02038741111755371
[[[6.0, 1.0], [115.0, 3.0], [114.0, 42.0], [5.0, 40.0]], ('w36a', 0.9875728487968445)]
识别结果为： ['w36a'] 准确率为： [0.9875728487968445]
172.22.188.43 - - [15/Jun/2023 15:38:57] "POST /captcha_ocr HTTP/1.1" 200 -

okk~

镜像打包：

先来给他导出依赖：
pip freeze > requirements.txt

所有的文件转到服务器：
写个dockerfile文件：

# 拉取基础镜像
FROM registry.baidubce/ais-public/ais2.3:cuda10.1_cudnn7-ubuntu16.04-py37
# 设置环境变量
ENV PATH=/home/bml/anaconda3/envs/py3.7.4/bin:${PATH}# 构建工作目录
RUN mkdir -p /home/captcha-gpu
WORKDIR /home/captcha-gpu
# Copy contents
COPY . /home/captcha-gpu# 安装python依赖模块
RUN pip install --index-url  --requirement requirements.txt
RUN python -m pip install paddlepaddle-gpu==2.3.2.post101 -f .html
RUN pip install protobuf==3.20.0  -i 
RUN pip install onnx==1.12  -i 
# Set environment variables
CMD ["python","flask-ocr.py"]
~

okk~动用咱们的docker命令知识储备：
构建镜像：
docker build -f Dockerfile -t yolov5:v0 .
镜像导出：
docker save yolov5:v0 -o /home/yolov5_v0.tar