使用ghostscript将pdf页面的整个媒体框渲染为png文件

编程入门 行业动态 更新时间:2024-10-28 13:16:53
本文介绍了使用ghostscript将pdf页面的整个媒体框渲染为png文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在尝试使用Ghostscript v9.02将Pdfs页面渲染为png文件.为此,我使用以下命令行:

I'm trying to render Pdfs pages into png files using Ghostscript v9.02. For that purpose I'm using the following command line:

gswin32c.exe -sDEVICE=png16m -o outputFile%d.png mypdf.pdf

当pdf裁剪框与媒体框相同时,此方法工作正常,但如果裁剪框小于媒体框,则仅显示媒体框,并且pdf页面的边框丢失. 我知道通常pdf查看器仅显示裁剪框,但我需要能够在png文件中看到整个媒体页面.

This is working fine when the pdf crop box is the same as the media box, but if the crop box is smaller than the media box, only the media box is displayed and the border of the pdf page is lost. I know usually pdf viewers only display the crop box but I need to be able to see the whole media page in my png file.

Ghostscript文档说,默认情况下会呈现文档的媒体盒,但这在我的情况下不起作用. 有人知道如何使用ghostscript渲染整个媒体盒吗?难道对于png文件设备来说,只渲染了裁切盒吗?我可能会忘记特定的命令吗?

Ghostscript documentation says that per default the media box of a document is rendered, but this does not work in my case. As anyone an idea how I could achieve rendering the whole media box using ghostscript?Could it be that for png file device, only the crop box is rendered? Am I maybe forgetting a specific command?

例如,此pdf 包含一些在裁剪框之外的注册标记,这些注册标记在输出png文件中不存在.有关此pdf的更多信息:

For example, this pdf contains some registration marks outside of the crop box, which are not present in the output png file. Some more information about this pdf:

  • 媒体盒:
    • 宽度:667
    • 高度:908点
    • media box:
      • width: 667
      • height: 908 pts
      • 宽度:640
      • 高度:851
      推荐答案

      好吧,现在逆转已将他的问题重新陈述为他正在寻找通用代码",让我再试一次.

      OK, now that revers has re-stated his problem into that he is looking for "generic code", let me try again.

      通用代码"的问题是PDF中可能出现许多"CropBox"语句的合法"正式表示形式.以下所有可能都是正确的,并为页面的CropBox设置了相同的值:

      The problem with a "generic code" is that there are many "legal" formal representations of "CropBox" statements which could appear in a PDF. All of the following are possible and correct and set the same values for the page's CropBox:

      • /CropBox[10 20 500 700]

      /CropBox[ 10 20 500 700 ]

      /CropBox[10 20 500 700 ]

      /CropBox [10 20 500 700]

      /CropBox [ 10 20 500 700 ]

      /CropBox [ 10.00 20.0000 500.0 700 ]

      /CropBox [ 10 20 500 700 ]

    • 对于ArtBox,TrimBox,BleedBox,CropBox和MediaBox也是一样.因此,如果要编辑它,则需要规范化" PDF源代码中的* Box表示形式.

      The same is true for ArtBox, TrimBox, BleedBox, CropBox and MediaBox. Therefor you need to "normalize" the *Box representation inside the PDF source code if you want to edit it.

      这是您的操作方式:

    • 为您的OS平台下载 qpdf .
    • 在输入的PDF上运行以下命令: qpdf --qdf input.pdf output.pdf
    • Download qpdf for your OS platform.
    • Run this command on your input PDF: qpdf --qdf input.pdf output.pdf
    • output.pdf现在将具有一种 normalized 结构(类似于上面给出的最后一个示例),即使使用像sed这样的流编辑器,它也将更易于编辑.

      The output.pdf now will have a kind of normalized structure (similar to the last example given above), and it will be easier to edit, even with a stream editor like sed.

      接下来,您需要知道唯一必需的* Box是MediaBox.这个必须存在,其他是可选的(以某种优先的方式).如果其他缺失,则它们默认为与MediaBox相同的值.因此,为了实现您的目标,我们可以简单地删除与它们相关的所有代码.我们将在sed的帮助下完成该任务.

      Next, you need to know that the only essential *Box is MediaBox. This one MUST be present, the others are optional (in a certain prioritized way). If the others are missing, they default to the same values as MediaBox. Therefor, in order to achieve your goal, we can simply delete all code that is related to them. We'll do it with the help of sed.

      该工具通常安装在所有Linux系统上-在Windows上,可从 gnuwin32.sf下载并安装一个>. (如果您决定使用.zip文件而不是Setup .exe,请不要忘记安装名为"dependencies"的文件).

      That tool is normally installed on all Linux systems -- on Windows download and install it from gnuwin32.sf. (Don't forget to install the named "dependencies" should you decide to use the .zip file instead of the Setup .exe).

      现在运行以下命令:

    • sed.exe -i.bak -e "/CropBox/,/]/s#.# #g" output.pdf
    • 这是该命令应该执行的操作:

      Here is what this command is supposed to do:

      • -i.bak告诉sed内联编辑原始文件,但还要创建后缀为.bak的备份文件(以防万一出现问题).
      • /CropBox/指出要由sed处理的第一条地址线.
      • /]/指出sed要处理的最后一条地址线.
      • s告诉sed替换从第一行到最后一个寻址行的所有行.
      • #.# #g告诉sed进行哪种替换:将地址空间中的每个任意字符('.')替换为全局空格('g')的空格('').
      • -i.bak tells sed to edit the original file inline, but to also create a backup file with a.bak suffix (in case something goes wrong).
      • /CropBox/ states the first address line to be processed by sed.
      • /]/ states the last address line to be processed by sed.
      • s tells sed to do substitutions for all lines from first to last addressed line.
      • #.# #g tells sed which kind of substitution to do: replace each arbitrary character ('.') in the address space by blanks (''), globally ('g').

      我们用空格代替所有字符(而不是什么都没有",即删除它们),因为否则,由于对象引用计数和流长度会发生变化,我们会抱怨"PDF文件损坏".

      We substitute all characters by blanks (instead of by 'nothing', i.e. deleting them) because otherwise we'd get complaints about "PDF file corruption", since the object reference counting and the stream lengths would have changed.

      您已经知道:

      gswin32c.exe -sDEVICE=png16m -o outputImage_%03d.png output.pdf

      上面的所有三个步骤都可以轻松编写脚本,我将根据您的喜好留给您.

      All the three steps from above can easily be scripted, which I'll leave to you for your own pleasure.

更多推荐

使用ghostscript将pdf页面的整个媒体框渲染为png文件

本文发布于:2023-10-30 00:10:42,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1541178.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:页面   文件   媒体   ghostscript   pdf

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!