如何确定文件是否为PDF文件?

编程入门行业动态更新时间:2024-10-20 16:37:18

本文介绍了如何确定文件是否为PDF文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在使用Java中的PdfBox从PDF文件提取文本.提供的某些输入文件无效，这些文件上的PDFTextStripper暂停.有没有一种干净的方法来检查提供的文件是否确实是有效的PDF?

I am using PdfBox in Java to extract text from PDF files. Some of the input files provided are not valid and PDFTextStripper halts on these files. Is there a clean way to check if the provided file is indeed a valid PDF?

推荐答案

您可以找出文件(或字节数组)的mime类型，因此不必盲目地依赖扩展名.我是用光圈的MimeExtractor( aperture.sourceforge/)来完成的，或者是几天前我看到的为此专用的库( sourceforge/projects/mime-util )

you can find out the mime type of a file (or byte array), so you dont dumbly rely on the extension. I do it with aperture's MimeExtractor (aperture.sourceforge/) or I saw some days ago a library just for that (sourceforge/projects/mime-util)

我使用光圈从各种文件中提取文本，不仅是pdf，而且还需要例如针对pdf进行调整(光圈使用pdfbox，但是当pdfbox失败时我添加了另一个库作为后备)

I use aperture to extract text from a variety of files, not only pdf, but have to tweak thinks for pdfs for example (aperture uses pdfbox, but i added another library as fallback when pdfbox fails)