如何自动将电子邮件附件文件名转换为UTF

编程入门 行业动态 更新时间:2024-10-27 09:41:40
如何自动将电子邮件附件文件名转换为UTF-8(使用Mail_mimeDecode)(How to automatically convert email attachment filename to UTF-8 (using Mail_mimeDecode))

我正在使用Mail_mimeDecode从传入的电子邮件中提取附件。 一切都运行良好一段时间,直到我开始接收使用KOI8编码的文件名的附件,其中包含如下部分标题:

Content-Disposition: attachment; filename="=?KOI8-R?B?8NLJzM/Wxc7JxSAudHh0?="

mimeDecode做了一个非常合理的事情并返回KOI8中的文件名:

$attachmentNameInKOI8 = $part->d_parameters['filename'];

问题是我需要UTF-8。 在这个具体的例子中,我可以运行以下命令来进行转换:

$attachmentNameInUTF8 = iconv('KOI8', 'UTF-8', $attachmentNameInKOI8);

但是,如果不尝试手动解析消息,我不知道名称何时在KOI8中,何时不在KOI8中。 我也担心其他一些编码很快就会出现,所以我需要一种方法来处理任何可能出现的问题。

我曾经读过mb_detect_encoding不可靠,事实上我无法将其检测为KOI8。

有没有办法告诉mimeDecode为我做翻译? 我查看了mimeDecode.php:_decodeHeader()的源代码,我可以看到它解析了编码,但后来却没有做任何事情,这似乎是一个浪费的机会。

更新:要清楚,这只是标题的问题,而不是身体的问题,因为mimeDecode暴露了身体的charset,所以很容易像这样自己运行iconv:

$bodyutf = iconv($textpart->ctype_parameters['charset'], 'UTF-8', $textpart->body);

I'm using Mail_mimeDecode to extract attachments from incoming emails. Everything was working well for a while, until I started receiving attachments with filenames encoded in KOI8, with a section header like this:

Content-Disposition: attachment; filename="=?KOI8-R?B?8NLJzM/Wxc7JxSAudHh0?="

mimeDecode does a perfectly reasonable thing and returns the filename in KOI8:

$attachmentNameInKOI8 = $part->d_parameters['filename'];

The problem is that I need it in UTF-8. In this specific example, I can run the following to do the conversion:

$attachmentNameInUTF8 = iconv('KOI8', 'UTF-8', $attachmentNameInKOI8);

But without trying to parse the message manually, I don't know when the name is in KOI8 and when it's not. I'm also worried that some other encoding will come through soon, so I need a way to handle anything that might come my way.

I had read that mb_detect_encoding is not reliable, and in fact I could not get it to detect the string as KOI8.

Is there a way to tell mimeDecode to do the translation for me? I looked at the sourcecode of mimeDecode.php:_decodeHeader() and I can see that it parses the encoding but then does nothing with it, which seems a wasted opportunity.

UPDATE: To be clear, this is only a problem with headers and not with bodies because mimeDecode exposes the charset of the body, so it's very easy to run iconv yourself like this:

$bodyutf = iconv($textpart->ctype_parameters['charset'], 'UTF-8', $textpart->body);

最满意答案

在替换之前向_decodeHeader()添加一行似乎可以解决问题:

$text = iconv($charset, 'UTF-8', $text); $input = str_replace($encoded, $text, $input);

似乎很奇怪他们没有在原始课程中建立一些这样的选项,不是吗?

注意 :我已经注意到主题行和其他标题也可以与文件名(RFC2047)相同的方式编码。 似乎将iconv行添加到_decodeHeader中可以解决所有这些情况。

很奇怪,这样的功能还没有内置到mimeDecode中 - 这不是一个罕见的问题。

编辑 :我现在明白mimeDecode有一个decode_headers = false选项的要点是得到原始值,这样你就可以自己解码它们。 这似乎是浪费,因为如果你不能相信它会在预期的字符集中返回一个字符串,那么mimeDecode解码你的标题没有意义的(它更有意义的是它接受一个charset作为参数解码为;或者null表示没有解码...我觉得他们不太可能为我改变它。)所以重点是你需要自己解码。 不幸的是,它并不像直接调用imap_utf8()或imap_mime_header_decode()那么简单。 您可以从mimeDecode中获取_decodeHeader()函数并修改它或使用以下内容:

http://www.php.net/manual/en/function.imap-mime-header-decode.php#71762

编辑#2 :令人难以置信的是,mimeDecode的人已将我的建议纳入他们最新的svn:

https://pear.php.net/bugs/bug.php?id=18876

在该版本中,您现在可以设置decode_headers ='UTF-8',mimeDecode将为您完成所有工作。 哇!

Adding a line to _decodeHeader() before the replace seems to do the trick:

$text = iconv($charset, 'UTF-8', $text); $input = str_replace($encoded, $text, $input);

Seems weird that they didn't build some such option into the original class, doesn't it?

NOTE: I've since noticed that Subject lines and other headers can also be encoded the same way as filenames (RFC2047). It appears that adding the iconv line into _decodeHeader addresses all these cases.

Weird that such a feature wasn't already built into mimeDecode--this can't be a rare problem.

EDIT: I now understand that the point of mimeDecode having an option for decode_headers=false is to get the raw values so you can decode them yourself. This seems such a waste given that there's no point to having mimeDecode decode your headers ever if you can't trust that it's going to return a string in an expected charset (it would make more sense for it to accept a charset as a parameter to decode to; or null means no decoding... I have a feeling they're unlikely to change it for little me.) So the point is you need to do your own decoding. Unfortunately it's not as simple as a straight call to imap_utf8() or imap_mime_header_decode(). You could either take the _decodeHeader() function from mimeDecode and modify it or use something like this:

http://www.php.net/manual/en/function.imap-mime-header-decode.php#71762

EDIT #2: Unbelievably, the mimeDecode guys already incorporated my suggestion into their latest svn:

https://pear.php.net/bugs/bug.php?id=18876

On that version, you can now set decode_headers='UTF-8' and mimeDecode will do all the work for you. Wow!

更多推荐

本文发布于:2023-07-22 19:33:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1222789.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:转换为   文件名   邮件附件   电子   UTF

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!