本文介绍了从 Powerpoint 中提取表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用 python-pptx 从 PPT 中提取表格,但是,我不确定如何使用 shape.table.
from pptx import Presentationprs = 演示文稿(路径到演示文稿)# text_runs 将填充一个字符串列表,# 每个文本在演示文稿中运行一个text_runs = []对于 prs.slides 中的幻灯片:对于 slide.shapes 中的形状:如果 shape.has_table:tbl = shape.table行 = tbl.rows.countcols = tbl.columns.count我在
解决方案这似乎对我有用.
prs = 演示文稿((path_to_presentation))# text_runs 将填充一个字符串列表,# 每个文本在演示文稿中运行一个text_runs = []对于 prs.slides 中的幻灯片:对于 slide.shapes 中的形状:如果不是 shape.has_table:继续tbl = shape.tablerow_count = len(tbl.rows)col_count = len(tbl.columns)对于范围内的 r(0, row_count):对于范围内的 c (0, col_count):单元格 = tbl.cell(r,c)段落 = cell.text_frame.paragraphs对于段落中的段落:用于在paragraph.runs 中运行:text_runs.append(run.text)打印(text_runs)```I am trying to extract table from a PPT using python-pptx, however, the I am not sure how do I that using shape.table.
from pptx import Presentation prs = Presentation(path_to_presentation) # text_runs will be populated with a list of strings, # one for each text run in presentation text_runs = [] for slide in prs.slides: for shape in slide.shapes: if shape.has_table: tbl = shape.table rows = tbl.rows.count cols = tbl.columns.countI found a post here but the accepted solution does not work, giving error that count attribute is not available.
How do I modify the above code so I can get a table in a dataframe?
EDIT
Please see the image of the slide below
解决方案This appears to work for me.
prs = Presentation((path_to_presentation)) # text_runs will be populated with a list of strings, # one for each text run in presentation text_runs = [] for slide in prs.slides: for shape in slide.shapes: if not shape.has_table: continue tbl = shape.table row_count = len(tbl.rows) col_count = len(tbl.columns) for r in range(0, row_count): for c in range(0, col_count): cell = tbl.cell(r,c) paragraphs = cell.text_frame.paragraphs for paragraph in paragraphs: for run in paragraph.runs: text_runs.append(run.text) print(text_runs)```
更多推荐
从 Powerpoint 中提取表格
发布评论