我正在尝试使用Ghostscript v9.02将Pdfs页面渲染为png文件.为此,我使用以下命令行:
I'm trying to render Pdfs pages into png files using Ghostscript v9.02. For that purpose I'm using the following command line:
gswin32c.exe -sDEVICE=png16m -o outputFile%d.png mypdf.pdf
当pdf裁剪框与媒体框相同时,此方法工作正常,但如果裁剪框小于媒体框,则仅显示媒体框,并且pdf页面的边框丢失. 我知道通常pdf查看器仅显示裁剪框,但我需要能够在png文件中看到整个媒体页面.
This is working fine when the pdf crop box is the same as the media box, but if the crop box is smaller than the media box, only the media box is displayed and the border of the pdf page is lost. I know usually pdf viewers only display the crop box but I need to be able to see the whole media page in my png file.
Ghostscript文档说,默认情况下会呈现文档的媒体盒,但这在我的情况下不起作用. 有人知道如何使用ghostscript渲染整个媒体盒吗?难道对于png文件设备来说,只渲染了裁切盒吗?我可能会忘记特定的命令吗?
Ghostscript documentation says that per default the media box of a document is rendered, but this does not work in my case. As anyone an idea how I could achieve rendering the whole media box using ghostscript?Could it be that for png file device, only the crop box is rendered? Am I maybe forgetting a specific command?
例如,此pdf 包含一些在裁剪框之外的注册标记,这些注册标记在输出png文件中不存在.有关此pdf的更多信息:
For example, this pdf contains some registration marks outside of the crop box, which are not present in the output png file. Some more information about this pdf:
- 媒体盒:
- 宽度:667
- 高度:908点
- media box:
- width: 667
- height: 908 pts
- 宽度:640
- 高度:851
好吧,现在逆转已将他的问题重新陈述为他正在寻找通用代码",让我再试一次.
OK, now that revers has re-stated his problem into that he is looking for "generic code", let me try again.
通用代码"的问题是PDF中可能出现许多"CropBox"语句的合法"正式表示形式.以下所有可能都是正确的,并为页面的CropBox设置了相同的值:
The problem with a "generic code" is that there are many "legal" formal representations of "CropBox" statements which could appear in a PDF. All of the following are possible and correct and set the same values for the page's CropBox:
-
/CropBox[10 20 500 700]
/CropBox[ 10 20 500 700 ]
/CropBox[10 20 500 700 ]
/CropBox [10 20 500 700]
/CropBox [ 10 20 500 700 ]
/CropBox [ 10.00 20.0000 500.0 700 ]
/CropBox [ 10 20 500 700 ] - 为您的OS平台下载 qpdf .
- 在输入的PDF上运行以下命令: qpdf --qdf input.pdf output.pdf
- Download qpdf for your OS platform.
- Run this command on your input PDF: qpdf --qdf input.pdf output.pdf
- sed.exe -i.bak -e "/CropBox/,/]/s#.# #g" output.pdf
- -i.bak告诉sed内联编辑原始文件,但还要创建后缀为.bak的备份文件(以防万一出现问题).
- /CropBox/指出要由sed处理的第一条地址线.
- /]/指出sed要处理的最后一条地址线.
- s告诉sed替换从第一行到最后一个寻址行的所有行.
- #.# #g告诉sed进行哪种替换:将地址空间中的每个任意字符('.')替换为全局空格('g')的空格('').
- -i.bak tells sed to edit the original file inline, but to also create a backup file with a.bak suffix (in case something goes wrong).
- /CropBox/ states the first address line to be processed by sed.
- /]/ states the last address line to be processed by sed.
- s tells sed to do substitutions for all lines from first to last addressed line.
- #.# #g tells sed which kind of substitution to do: replace each arbitrary character ('.') in the address space by blanks (''), globally ('g').
对于ArtBox,TrimBox,BleedBox,CropBox和MediaBox也是一样.因此,如果要编辑它,则需要规范化" PDF源代码中的* Box表示形式.
The same is true for ArtBox, TrimBox, BleedBox, CropBox and MediaBox. Therefor you need to "normalize" the *Box representation inside the PDF source code if you want to edit it.
这是您的操作方式:
output.pdf现在将具有一种 normalized 结构(类似于上面给出的最后一个示例),即使使用像sed这样的流编辑器,它也将更易于编辑.
The output.pdf now will have a kind of normalized structure (similar to the last example given above), and it will be easier to edit, even with a stream editor like sed.
接下来,您需要知道唯一必需的* Box是MediaBox.这个必须存在,其他是可选的(以某种优先的方式).如果其他缺失,则它们默认为与MediaBox相同的值.因此,为了实现您的目标,我们可以简单地删除与它们相关的所有代码.我们将在sed的帮助下完成该任务.
Next, you need to know that the only essential *Box is MediaBox. This one MUST be present, the others are optional (in a certain prioritized way). If the others are missing, they default to the same values as MediaBox. Therefor, in order to achieve your goal, we can simply delete all code that is related to them. We'll do it with the help of sed.
该工具通常安装在所有Linux系统上-在Windows上,可从 gnuwin32.sf下载并安装一个>. (如果您决定使用.zip文件而不是Setup .exe,请不要忘记安装名为"dependencies"的文件).
That tool is normally installed on all Linux systems -- on Windows download and install it from gnuwin32.sf. (Don't forget to install the named "dependencies" should you decide to use the .zip file instead of the Setup .exe).
现在运行以下命令:
这是该命令应该执行的操作:
Here is what this command is supposed to do:
我们用空格代替所有字符(而不是什么都没有",即删除它们),因为否则,由于对象引用计数和流长度会发生变化,我们会抱怨"PDF文件损坏".
We substitute all characters by blanks (instead of by 'nothing', i.e. deleting them) because otherwise we'd get complaints about "PDF file corruption", since the object reference counting and the stream lengths would have changed.
您已经知道:
gswin32c.exe -sDEVICE=png16m -o outputImage_%03d.png output.pdf上面的所有三个步骤都可以轻松编写脚本,我将根据您的喜好留给您.
All the three steps from above can easily be scripted, which I'll leave to you for your own pleasure.
更多推荐
使用ghostscript将pdf页面的整个媒体框渲染为png文件
发布评论