Encoding.UTF8.GetString没有考虑到preamble / BOM

编程入门 行业动态 更新时间:2024-10-23 15:28:17
本文介绍了Encoding.UTF8.GetString没有考虑到preamble / BOM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

在.NET中,我试图用 Encoding.UTF8.GetString 的方法,这需要一个字节数组,并将其转换为字符串。

看起来这种方法忽略了 BOM(字节顺序标记),这可能是一个合法的二进制文件的一部分再presentation一个UTF8字符串,并将其作为一个字符。

我知道我可以使用的TextReader 根据需要消化的BOM,但我认为GetString方法应该是某种宏,使我们的$ C $ Ç短。

我缺少的东西?这是像这样故意?

下面是一个再现code:

静态无效的主要(字串[] args) {     字符串S1 =ABC;     byte []的abcWithBom;     使用(VAR毫秒=新的MemoryStream())     使用(VAR SW =新的StreamWriter(MS,新UTF8Encoding(真)))     {         sw.Write(S1);         sw.Flush();         abcWithBom = ms.ToArray();         Console.WriteLine(FormatArray(abcWithBom)); // EF,BB,BF,61,62,63     }     byte []的abcWithoutBom;     使用(VAR毫秒=新的MemoryStream())     使用(VAR SW =新的StreamWriter(MS,新UTF8Encoding(假)))     {         sw.Write(S1);         sw.Flush();         abcWithoutBom = ms.ToArray();         Console.WriteLine(FormatArray(abcWithoutBom)); // 61,62,63     }     VAR restore1 = Encoding.UTF8.GetString(abcWithoutBom);     Console.WriteLine(restore1.Length); // 3     Console.WriteLine(restore1); // ABC     VAR restore2 = Encoding.UTF8.GetString(abcWithBom);     Console.WriteLine(restore2.Length); // 4(!)     Console.WriteLine(restore2); //?ABC } 私人静态字符串FormatArray(byte []的bytes1) {     返回的string.join(,,从步骤b中bytes1选择b.ToString(×)); }

解决方案   

看起来这种方法忽略了BOM(字节顺序标记),这可能是一个UTF8字符串的合法二进制重新presentation的一部分,并把它作为一个字符。

它看起来并不像它忽略它 - 它忠实地将其转换为BOM字符。那它是什么,毕竟。

如果你想的您的code忽略BOM在其转换任何字符串,这是给你做......或使用的StreamReader 。

请注意,如果你的或者的使用 Encoding.GetBytes 然后按 Encoding.GetString 或的使用的StreamWriter 然后按的StreamReader ,这两种形式要么产生再吞或不产生BOM表。当你混合使用只是一个的StreamWriter (使用 Encoding.Get preamble )有直接 Encoding.GetString 你最终的额外字符的呼叫。

In .NET, I'm trying to use Encoding.UTF8.GetString method, which takes a byte array and converts it to a string.

It looks like this method ignores the BOM (Byte Order Mark), which might be a part of a legitimate binary representation of a UTF8 string, and takes it as a character.

I know I can use a TextReader to digest the BOM as needed, but I thought that the GetString method should be some kind of a macro that makes our code shorter.

Am I missing something? Is this like so intentionally?

Here's a reproduction code:

static void Main(string[] args) { string s1 = "abc"; byte[] abcWithBom; using (var ms = new MemoryStream()) using (var sw = new StreamWriter(ms, new UTF8Encoding(true))) { sw.Write(s1); sw.Flush(); abcWithBom = ms.ToArray(); Console.WriteLine(FormatArray(abcWithBom)); // ef, bb, bf, 61, 62, 63 } byte[] abcWithoutBom; using (var ms = new MemoryStream()) using (var sw = new StreamWriter(ms, new UTF8Encoding(false))) { sw.Write(s1); sw.Flush(); abcWithoutBom = ms.ToArray(); Console.WriteLine(FormatArray(abcWithoutBom)); // 61, 62, 63 } var restore1 = Encoding.UTF8.GetString(abcWithoutBom); Console.WriteLine(restore1.Length); // 3 Console.WriteLine(restore1); // abc var restore2 = Encoding.UTF8.GetString(abcWithBom); Console.WriteLine(restore2.Length); // 4 (!) Console.WriteLine(restore2); // ?abc } private static string FormatArray(byte[] bytes1) { return string.Join(", ", from b in bytes1 select b.ToString("x")); }

解决方案

It looks like this method ignores the BOM (Byte Order Mark), which might be a part of a legitimate binary representation of a UTF8 string, and takes it as a character.

It doesn't look like it "ignores" it at all - it faithfully converts it to the BOM character. That's what it is, after all.

If you want to make your code ignore the BOM in any string it converts, that's up to you to do... or use StreamReader.

Note that if you either use Encoding.GetBytes followed by Encoding.GetString or use StreamWriter followed by StreamReader, both forms will either produce then swallow or not produce the BOM. It's only when you mix using a StreamWriter (which uses Encoding.GetPreamble) with a direct Encoding.GetString call that you end up with the "extra" character.

更多推荐

Encoding.UTF8.GetString没有考虑到preamble / BOM

本文发布于:2023-11-10 06:23:28,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1574597.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:考虑到   Encoding   GetString   BOM   preamble

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!