简化/清理DOCX Word文档的XML

编程入门 行业动态 更新时间:2024-10-12 18:16:54
本文介绍了简化/清理DOCX Word文档的XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个Microsoft Word文档(docx),并且使用打开XML SDK 2.0生产力工具,从中生成C#代码.

I have a Microsoft Word Document (docx) and I use Open XML SDK 2.0 Productivity Tool to generate C# code from it.

我想以编程方式向文档中插入一些数据库值. 为此,我在程序应该用数据库值替换占位符的地方键入了 [[[placeholder 1]] 这样的简单文本.

I want to programmatically insert some database values to the document. For this I typed in simple text like [[place holder 1]] in the points where my program should replace the placeholders with its database values.

不幸的是,XML输出有些混乱.例如.我有一个包含两个相邻单元格的表,该表不应与它的占位符区分开.但是占位符之一被分割了 分成几段.

Unfortunately the XML output is in some kind of mess. E.g. I have a table with two neighboring cells, which shouldn't distinguish apart from its placeholder. But one of the placeholders is split into several runs.

[[好地方占位符]]

<w:tc xmlns:w="schemas.openxmlformats/wordprocessingml/2006/main"> <w:tcPr> <w:tcW w:w="1798" w:type="dxa" /> <w:shd w:val="clear" w:color="auto" w:fill="auto" /> </w:tcPr> <w:p w:rsidRPr="008C2E16" w:rsidR="001F54BF" w:rsidP="000D7B67" w:rsidRDefault="0009453E"> <w:pPr> <w:spacing w:after="0" w:line="240" w:lineRule="auto" /> <w:rPr> <w:rFonts w:cstheme="minorHAnsi" /> <w:sz w:val="20" /> <w:szCs w:val="20" /> </w:rPr> </w:pPr> <w:r w:rsidRPr="0009453E"> <w:rPr> <w:rFonts w:cstheme="minorHAnsi" /> <w:sz w:val="20" /> <w:szCs w:val="20" /> </w:rPr> <w:t>[[good place holder]]</w:t> </w:r> </w:p> </w:tc>

与 [[错误的地方占位符]]

<w:tc xmlns:w="schemas.openxmlformats/wordprocessingml/2006/main"> <w:tcPr> <w:tcW w:w="1799" w:type="dxa" /> <w:shd w:val="clear" w:color="auto" w:fill="auto" /> </w:tcPr> <w:p w:rsidRPr="008C2E16" w:rsidR="001F54BF" w:rsidP="000D7B67" w:rsidRDefault="00EA211A"> <w:pPr> <w:spacing w:after="0" w:line="240" w:lineRule="auto" /> <w:rPr> <w:rFonts w:cstheme="minorHAnsi" /> <w:sz w:val="20" /> <w:szCs w:val="20" /> </w:rPr> </w:pPr> <w:r w:rsidRPr="00EA211A"> <w:rPr> <w:rFonts w:cstheme="minorHAnsi" /> <w:sz w:val="20" /> <w:szCs w:val="20" /> </w:rPr> <w:t>[[</w:t> </w:r> <w:proofErr w:type="spellStart" /> <w:r w:rsidRPr="00EA211A"> <w:rPr> <w:rFonts w:cstheme="minorHAnsi" /> <w:sz w:val="20" /> <w:szCs w:val="20" /> </w:rPr> <w:t>bad</w:t> </w:r> <w:proofErr w:type="spellEnd" /> <w:r w:rsidRPr="00EA211A"> <w:rPr> <w:rFonts w:cstheme="minorHAnsi" /> <w:sz w:val="20" /> <w:szCs w:val="20" /> </w:rPr> <w:t xml:space="preserve"> place holder]]</w:t> </w:r> </w:p> </w:tc>

是否有可能让Microsoft Word清理我的文档,以便所有占位符都能很好地识别生成的XML?

Is there any possibility to let Microsoft Word clean up my document, so that all place holders are good to identify in the generated XML?

推荐答案

我找到了一个解决方案:Open XML PowerTools标记简化程序.

I have found a solution: the Open XML PowerTools Markup Simplifier.

我遵循了 ericwhite/blog/2011/03/09/getting-started-with-open-xml-powertools-markup-simplifier/,但它不能以1:1的方式工作(也许是因为它现在是Power Tools的2.2版?).因此,我在发布"模式下编译了 PowerTools 2.2 ,并引用了 OpenXmlPowerTools.dll .在Program.cs中,我仅更改了DOCX文件的路径. 我运行了一次程序,现在我的文档似乎还很干净.

I followed the steps described at ericwhite/blog/2011/03/09/getting-started-with-open-xml-powertools-markup-simplifier/, but it didn't work 1:1 (maybe because it is now version 2.2 of Power Tools?). So, I compiled PowerTools 2.2 in "Release" mode and made a reference to the OpenXmlPowerTools.dll in my TestMarkupSimplifier.csproj. In the Program.cs I only changed the path to my DOCX file. I ran the program once and my document seems to be fairly clean now.

Eric的博客在上面的链接中引用的代码:

Code quoted from Eric's blog in the link above:

using System; using System.Collections.Generic; using System.Linq; using System.Text; using OpenXmlPowerTools; using DocumentFormat.OpenXml.Packaging; class Program { static void Main(string[] args) { using (WordprocessingDocument doc = WordprocessingDocument.Open("Test.docx", true)) { SimplifyMarkupSettings settings = new SimplifyMarkupSettings { RemoveComments = true, RemoveContentControls = true, RemoveEndAndFootNotes = true, RemoveFieldCodes = false RemoveLastRenderedPageBreak = true, RemovePermissions = true, RemoveProof = true, RemoveRsidInfo = true, RemoveSmartTags = true, RemoveSoftHyphens = true, ReplaceTabsWithSpaces = true, }; MarkupSimplifier.SimplifyMarkup(doc, settings); } } }

更多推荐

简化/清理DOCX Word文档的XML

本文发布于:2023-11-16 14:38:45,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1605185.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文档   DOCX   Word   XML

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!