发票关键信息抽取SER

编程入门行业动态更新时间:2024-10-13 22:24:39

发票关键信息抽取SER

SER（Semantic Entity Recognition）：语义实体识别。语义实体识别指的是给定一段文本行，确定其类别（如姓名、住址等类别）。本文采用基于VI-LayoutXLM的多模态语义实体识别方法。

1、增值税发票数据集

2、训练模型

python tools/train.py -c my/ser_vi_layoutxlm_xfund_zh_udml.yml

配置文件 ser_vi_layoutxlm_xfund_zh_udml.yml

Global:use_gpu: trueepoch_num: &epoch_num 200log_smooth_window: 10print_batch_step: 10save_model_dir: ./output/ser_vi_layoutxlm_xfund_zh_udmlsave_epoch_step: 2000# evaluation is run every 10 iterations after the 0th iterationeval_batch_step: [ 0, 19 ]cal_metric_during_train: Falsesave_inference_dir:use_visualdl: Falseseed: 2022infer_img: my/b201.jpgsave_res_path: ./output/ser_layoutxlm_xfund_zh/resArchitecture:model_type: &model_type "kie"name: DistillationModelalgorithm: DistillationModels:Teacher:pretrained:freeze_params: falsereturn_all_feats: truemodel_type: *model_typealgorithm: &algorithm "LayoutXLM"Transform:Backbone:name: LayoutXLMForSerpretrained: True# one of base or vimode: vicheckpoints:num_classes: &num_classes 5Student:pretrained:freeze_params: falsereturn_all_feats: truemodel_type: *model_typealgorithm: *algorithmTransform:Backbone:name: LayoutXLMForSerpretrained: True# one of base or vimode: vicheckpoints:num_classes: *num_classesLoss:name: CombinedLossloss_config_list:- DistillationVQASerTokenLayoutLMLoss:weight: 1.0model_name_list: ["Student", "Teacher"]key: backbone_outnum_classes: *num_classes- DistillationSERDMLLoss:weight: 1.0act: "softmax"use_log: truemodel_name_pairs:- ["Student", "Teacher"]key: backbone_out- DistillationVQADistanceLoss:weight: 0.5mode: "l2"model_name_pairs:- ["Student", "Teacher"]key: hidden_states_5name: "loss_5"- DistillationVQADistanceLoss:weight: 0.5mode: "l2"model_name_pairs:- ["Student", "Teacher"]key: hidden_states_8name: "loss_8"Optimizer:name: AdamWbeta1: 0.9beta2: 0.999lr:name: Linearlearning_rate: 0.00005epochs: *epoch_numwarmup_epoch: 10regularizer:name: L2factor: 0.00000PostProcess:name: DistillationSerPostProcessmodel_name: ["Student", "Teacher"]key: backbone_outclass_path: &class_path my/zzsfp/class_list.txtMetric:name: DistillationMetricbase_metric_name: VQASerTokenMetricmain_indicator: hmeankey: "Student"Train:dataset:name: SimpleDataSetdata_dir: my/zzsfp/imgslabel_file_list: - my/zzsfp/train.jsonratio_list: [ 1.0 ]transforms:- DecodeImage: # load imageimg_mode: RGBchannel_first: False- VQATokenLabelEncode: # Class handling labelcontains_re: Falsealgorithm: *algorithmclass_path: *class_path# one of [None, "tb-yx"]order_method: &order_method "tb-yx"- VQATokenPad:max_seq_len: &max_seq_len 512return_attention_mask: True- VQASerTokenChunk:max_seq_len: *max_seq_len- Resize:size: [224,224]- NormalizeImage:scale: 1mean: [ 123.675, 116.28, 103.53 ]std: [ 58.395, 57.12, 57.375 ]order: 'hwc'- ToCHWImage:- KeepKeys:keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this orderloader:shuffle: Truedrop_last: Falsebatch_size_per_card: 4num_workers: 4Eval:dataset:name: SimpleDataSetdata_dir: my/zzsfp/imgslabel_file_list:- my/zzsfp/val.jsontransforms:- DecodeImage: # load imageimg_mode: RGBchannel_first: False- VQATokenLabelEncode: # Class handling labelcontains_re: Falsealgorithm: *algorithmclass_path: *class_pathorder_method: *order_method- VQATokenPad:max_seq_len: *max_seq_lenreturn_attention_mask: True- VQASerTokenChunk:max_seq_len: *max_seq_len- Resize:size: [224,224]- NormalizeImage:scale: 1mean: [ 123.675, 116.28, 103.53 ]std: [ 58.395, 57.12, 57.375 ]order: 'hwc'- ToCHWImage:- KeepKeys:keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this orderloader:shuffle: Falsedrop_last: Falsebatch_size_per_card: 4num_workers: 4

3、模型评估

python tools/eval.py -c my/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./output/ser_vi_layoutxlm_xfund_zh_udml/best_accuracy

配置文件 ser_vi_layoutxlm_xfund_zh.yml

Global:use_gpu: Trueepoch_num: &epoch_num 200log_smooth_window: 10print_batch_step: 10save_model_dir: ./output/ser_vi_layoutxlm_xfund_zhsave_epoch_step: 2000# evaluation is run every 10 iterations after the 0th iterationeval_batch_step: [ 0, 19 ]cal_metric_during_train: Falsesave_inference_dir:use_visualdl: Falseseed: 2022infer_img: my/b201.jpgd2s_train_image_shape: [3, 224, 224]# if you want to predict using the groundtruth ocr info,# you can use the following config# infer_img: train_data/XFUND/zh_val/val.json# infer_mode: Falsesave_res_path: ./output/ser/xfund_zh/reskie_rec_model_dir: kie_det_model_dir:amp_custom_white_list: ['scale', 'concat', 'elementwise_add']Architecture:model_type: kiealgorithm: &algorithm "LayoutXLM"Transform:Backbone:name: LayoutXLMForSerpretrained: Truecheckpoints:# one of base or vimode: vinum_classes: &num_classes 5Loss:name: VQASerTokenLayoutLMLossnum_classes: *num_classeskey: "backbone_out"Optimizer:name: AdamWbeta1: 0.9beta2: 0.999lr:name: Linearlearning_rate: 0.00005epochs: *epoch_numwarmup_epoch: 2regularizer:name: L2factor: 0.00000PostProcess:name: VQASerTokenLayoutLMPostProcessclass_path: &class_path my/zzsfp/class_list.txtMetric:name: VQASerTokenMetricmain_indicator: hmeanTrain:dataset:name: SimpleDataSetdata_dir: my/zzsfp/imgslabel_file_list: - my/zzsfp/train.jsonratio_list: [ 1.0 ]transforms:- DecodeImage: # load imageimg_mode: RGBchannel_first: False- VQATokenLabelEncode: # Class handling labelcontains_re: Falsealgorithm: *algorithmclass_path: *class_pathuse_textline_bbox_info: &use_textline_bbox_info True# one of [None, "tb-yx"]order_method: &order_method "tb-yx"- VQATokenPad:max_seq_len: &max_seq_len 512return_attention_mask: True- VQASerTokenChunk:max_seq_len: *max_seq_len- Resize:size: [224,224]- NormalizeImage:scale: 1mean: [ 123.675, 116.28, 103.53 ]std: [ 58.395, 57.12, 57.375 ]order: 'hwc'- ToCHWImage:- KeepKeys:keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this orderloader:shuffle: Truedrop_last: Falsebatch_size_per_card: 1num_workers: 1Eval:dataset:name: SimpleDataSetdata_dir: my/zzsfp/imgslabel_file_list:- my/zzsfp/val.jsontransforms:- DecodeImage: # load imageimg_mode: RGBchannel_first: False- VQATokenLabelEncode: # Class handling labelcontains_re: Falsealgorithm: *algorithmclass_path: *class_pathuse_textline_bbox_info: *use_textline_bbox_infoorder_method: *order_method- VQATokenPad:max_seq_len: *max_seq_lenreturn_attention_mask: True- VQASerTokenChunk:max_seq_len: *max_seq_len- Resize:size: [224,224]- NormalizeImage:scale: 1mean: [ 123.675, 116.28, 103.53 ]std: [ 58.395, 57.12, 57.375 ]order: 'hwc'- ToCHWImage:- KeepKeys:keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this orderloader:shuffle: Falsedrop_last: Falsebatch_size_per_card: 1num_workers: 1

4、模型预测

python tools/infer_kie_token_ser.py -c my/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./output/ser_vi_layoutxlm_xfund_zh_udml/best_accuracy Global.infer_img=./my/zzsfp/val.json Global.infer_mode=False

python tools/infer_kie_token_ser.py -c my/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./output/ser_vi_layoutxlm_xfund_zh_udml/best_accuracy Global.infer_img=./my/b201.jpg Global.infer_mode=True

更多推荐

发票关键信息抽取SER

本文发布于:2023-12-03 11:30:22，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1654861.html