我正在使用Mail_mimeDecode从传入的电子邮件中提取附件。 一切都运行良好一段时间,直到我开始接收使用KOI8编码的文件名的附件,其中包含如下部分标题:
Content-Disposition: attachment; filename="=?KOI8-R?B?8NLJzM/Wxc7JxSAudHh0?="mimeDecode做了一个非常合理的事情并返回KOI8中的文件名:
$attachmentNameInKOI8 = $part->d_parameters['filename'];问题是我需要UTF-8。 在这个具体的例子中,我可以运行以下命令来进行转换:
$attachmentNameInUTF8 = iconv('KOI8', 'UTF-8', $attachmentNameInKOI8);但是,如果不尝试手动解析消息,我不知道名称何时在KOI8中,何时不在KOI8中。 我也担心其他一些编码很快就会出现,所以我需要一种方法来处理任何可能出现的问题。
我曾经读过mb_detect_encoding不可靠,事实上我无法将其检测为KOI8。
有没有办法告诉mimeDecode为我做翻译? 我查看了mimeDecode.php:_decodeHeader()的源代码,我可以看到它解析了编码,但后来却没有做任何事情,这似乎是一个浪费的机会。
更新:要清楚,这只是标题的问题,而不是身体的问题,因为mimeDecode暴露了身体的charset,所以很容易像这样自己运行iconv:
$bodyutf = iconv($textpart->ctype_parameters['charset'], 'UTF-8', $textpart->body);I'm using Mail_mimeDecode to extract attachments from incoming emails. Everything was working well for a while, until I started receiving attachments with filenames encoded in KOI8, with a section header like this:
Content-Disposition: attachment; filename="=?KOI8-R?B?8NLJzM/Wxc7JxSAudHh0?="mimeDecode does a perfectly reasonable thing and returns the filename in KOI8:
$attachmentNameInKOI8 = $part->d_parameters['filename'];The problem is that I need it in UTF-8. In this specific example, I can run the following to do the conversion:
$attachmentNameInUTF8 = iconv('KOI8', 'UTF-8', $attachmentNameInKOI8);But without trying to parse the message manually, I don't know when the name is in KOI8 and when it's not. I'm also worried that some other encoding will come through soon, so I need a way to handle anything that might come my way.
I had read that mb_detect_encoding is not reliable, and in fact I could not get it to detect the string as KOI8.
Is there a way to tell mimeDecode to do the translation for me? I looked at the sourcecode of mimeDecode.php:_decodeHeader() and I can see that it parses the encoding but then does nothing with it, which seems a wasted opportunity.
UPDATE: To be clear, this is only a problem with headers and not with bodies because mimeDecode exposes the charset of the body, so it's very easy to run iconv yourself like this:
$bodyutf = iconv($textpart->ctype_parameters['charset'], 'UTF-8', $textpart->body);最满意答案
在替换之前向_decodeHeader()添加一行似乎可以解决问题:
$text = iconv($charset, 'UTF-8', $text); $input = str_replace($encoded, $text, $input);似乎很奇怪他们没有在原始课程中建立一些这样的选项,不是吗?
注意 :我已经注意到主题行和其他标题也可以与文件名(RFC2047)相同的方式编码。 似乎将iconv行添加到_decodeHeader中可以解决所有这些情况。
很奇怪,这样的功能还没有内置到mimeDecode中 - 这不是一个罕见的问题。
编辑 :我现在明白mimeDecode有一个decode_headers = false选项的要点是得到原始值,这样你就可以自己解码它们。 这似乎是浪费,因为如果你不能相信它会在预期的字符集中返回一个字符串,那么mimeDecode解码你的标题是没有意义的(它更有意义的是它接受一个charset作为参数解码为;或者null表示没有解码...我觉得他们不太可能为我改变它。)所以重点是你需要自己解码。 不幸的是,它并不像直接调用imap_utf8()或imap_mime_header_decode()那么简单。 您可以从mimeDecode中获取_decodeHeader()函数并修改它或使用以下内容:
http://www.php.net/manual/en/function.imap-mime-header-decode.php#71762
编辑#2 :令人难以置信的是,mimeDecode的人已将我的建议纳入他们最新的svn:
https://pear.php.net/bugs/bug.php?id=18876
在该版本中,您现在可以设置decode_headers ='UTF-8',mimeDecode将为您完成所有工作。 哇!
Adding a line to _decodeHeader() before the replace seems to do the trick:
$text = iconv($charset, 'UTF-8', $text); $input = str_replace($encoded, $text, $input);Seems weird that they didn't build some such option into the original class, doesn't it?
NOTE: I've since noticed that Subject lines and other headers can also be encoded the same way as filenames (RFC2047). It appears that adding the iconv line into _decodeHeader addresses all these cases.
Weird that such a feature wasn't already built into mimeDecode--this can't be a rare problem.
EDIT: I now understand that the point of mimeDecode having an option for decode_headers=false is to get the raw values so you can decode them yourself. This seems such a waste given that there's no point to having mimeDecode decode your headers ever if you can't trust that it's going to return a string in an expected charset (it would make more sense for it to accept a charset as a parameter to decode to; or null means no decoding... I have a feeling they're unlikely to change it for little me.) So the point is you need to do your own decoding. Unfortunately it's not as simple as a straight call to imap_utf8() or imap_mime_header_decode(). You could either take the _decodeHeader() function from mimeDecode and modify it or use something like this:
http://www.php.net/manual/en/function.imap-mime-header-decode.php#71762
EDIT #2: Unbelievably, the mimeDecode guys already incorporated my suggestion into their latest svn:
https://pear.php.net/bugs/bug.php?id=18876
On that version, you can now set decode_headers='UTF-8' and mimeDecode will do all the work for you. Wow!
更多推荐
发布评论