在Acrobat Pro 9.1中以编程方式更改字体(Change font programmatically in Acrobat Pro 9.1)

编程入门行业动态更新时间:2024-10-26 11:20:31

我有一个使用大量字体的大型PDF文件。我必须将它导出到另一个只能识别Arial或Times New Roman字体的应用程序。是否可以在Javascript中执行此操作？我试了这个没有运气：

/* Changes font to Arial */ var ckWord, numWords; for (var i = 0; i < this.numPages; i++) { numWords = this.getPageNumWords(i); for (var j = 0; j < numWords; j++) { ckWord = (this.getPageNthWord(i,j)) if (ckWord.font != "Arial") { ckWord.font = "Arial"; } } }

I have a large PDF file that uses a number of fonts. I have to export it to another application that only recognizes Arial or Times New Roman fonts. Is it possible to do this in Javascript? I tried this with no luck:

最满意答案

Acrobat的JS对象模型不允许您更改页面内容，不。

从视觉上说，无论如何，将一种字体粘贴到另一种字体中通常都是一个坏主意。字母之间的适当间距可以从一种字体到另一种字体变化，输出看起来很......很糟糕。这种扭曲的间距也可以抛出“找字器”算法，导致他们认为有没有单词断点，或者认为两个或更多的单词都是一个大词。

不漂亮。

真正的问题也很可能是字体本身。它甚至可能是字体编码这个问题，而不是字体本身。内容流中的字节方式被解释为字符。

您可以在文档属性对话框（control + d）“fonts”选项卡中查看不同字体使用的编码。我怀疑你的非arial字体使用了一些不寻常的东西......很可能是“Identity-H”或“Custom”。

更改PDF中文本的编码是一个非常难的问题。

最后，要查看理论上是否可以提取文本，请尝试将其复制并粘贴到Acrobat中的PDF中。如果你能做到这一点，那么其他一些程序也可以。如果你不能（或者它作为垃圾），那么其他程序可能会面临类似的缺乏成功。

那时你唯一能做的就是OCR。光学字符识别。我相信Acrobat Pro附带了一个简单的OCR程序，但我可能会弄错。我从来没用过它。

Acrobat's JS object model won't let you change page contents, no.

Kludging one font into another is generally a bad idea anyway, visually speaking. The appropriate spacing between letters can vary enough from one font to another that your output would look... well... suboptimal. This distorted spacing can also throw off "word finder" algorithms, causing them to think there are word breaks where there are none, or thinking two or more words are all one big word.

Not pretty.

It's also quite possible that the real problem is the font itself. Its even likely the font's encoding that this the problem, not the font itself. The way bytes in the content stream are interpreted as characters.

You can see the encoding used by different fonts in the document properties dialog's (control+d) "fonts" tab. I suspect your non-arial fonts are using something unusual... "Identity-H" or "Custom" most likely.

Changing the encoding of text in a PDF is a Very Hard Problem.

Finally, to see if it's even theoretically possible to extract the text, try to copy and paste it out of the PDF in Acrobat. If you can do that, then some other program can too. If you cannot (or it comes out as garbage), then other programs are likely to face a similar lack of success.

At that point the only thing you can do is OCR. Optical Character Recognition. I believe Acrobat Pro comes with a simple OCR program, though I could be mistaken. I've never used it.

更多推荐

本文发布于:2023-07-29 18:26:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1318604.html