序列化和反序列化char（s）(Serialize and deserialize char(s))

我的班上有一个字符列表。序列化和反序列化按预期工作。如果我的列表包含哪个char需要描述字节顺序标记。示例char代码是56256.因此，创建简单的测试，因为这个问题如下。

[Test] public void Utf8CharSerializeAndDeserializeShouldEqual() { UInt16 charCode = 56256; char utfChar = (char)charCode; using (MemoryStream ms = new MemoryStream()) { using (StreamWriter writer = new StreamWriter(ms, Encoding.UTF8, 1024, true)) { var serializer = new JsonSerializer(); serializer.Serialize(writer, utfChar); } ms.Position = 0; using (StreamReader reader = new StreamReader(ms, true)) { using (JsonTextReader jsonReader = new JsonTextReader(reader)) { var serializer = new JsonSerializer(); char deserializedChar = serializer.Deserialize<char>(jsonReader); Console.WriteLine($"{(int)utfChar}, {(int)deserializedChar}"); Assert.AreEqual(utfChar, deserializedChar); Assert.AreEqual((int)utfChar, (int)deserializedChar); } } } }

当char代码不需要BOM时，测试工作正常。例如65（A）将通过此测试。

i have a list of chars on my class. Serialization and deserialization are works as expected. If my list contains which char is need to describe byte order mark. Example char code is 56256. So, created simple test to as this question is below.

[Test] public void Utf8CharSerializeAndDeserializeShouldEqual() { UInt16 charCode = 56256; char utfChar = (char)charCode; using (MemoryStream ms = new MemoryStream()) { using (StreamWriter writer = new StreamWriter(ms, Encoding.UTF8, 1024, true)) { var serializer = new JsonSerializer(); serializer.Serialize(writer, utfChar); } ms.Position = 0; using (StreamReader reader = new StreamReader(ms, true)) { using (JsonTextReader jsonReader = new JsonTextReader(reader)) { var serializer = new JsonSerializer(); char deserializedChar = serializer.Deserialize<char>(jsonReader); Console.WriteLine($"{(int)utfChar}, {(int)deserializedChar}"); Assert.AreEqual(utfChar, deserializedChar); Assert.AreEqual((int)utfChar, (int)deserializedChar); } } } }

Test works as fine when char code is not needed a BOM. For example 65(A) will pass this test.

最满意答案

您的问题与Json.NET无关。您的问题是U+DBC0 （十进制56256）是一个无效的unicode字符，并且，如文档中所述，您的StreamWriter使用的Encoding.UTF8将不会编码这样的字符：

Encoding.UTF8返回一个UTF8Encoding对象，该对象使用替换回退来替换它不能编码的每个字符串以及它不能用问号（“？”）字符解码的每个字节。

要确认这一点，如果在测试示例中将Encoding.UTF8替换为new UTF8Encoding(true, true) ，则会出现以下异常：

EncoderFallbackException: Unable to translate Unicode character \uDBC0 at index 1 to specified code page.

如果您要尝试序列化无效的Unicode char值，则需要使用以下方法手动将它们编码为例如字节数组：

public static partial class TextExtensions { static void ToBytesWithoutEncoding(char c, out byte lower, out byte upper) { var u = (uint)c; lower = unchecked((byte)u); upper = unchecked((byte)(u >> 8)); } public static byte[] ToByteArrayWithoutEncoding(this char c) { byte lower, upper; ToBytesWithoutEncoding(c, out lower, out upper); return new byte[] { lower, upper }; } public static byte[] ToByteArrayWithoutEncoding(this ICollection<char> list) { if (list == null) return null; var bytes = new byte[checked(list.Count * 2)]; int to = 0; foreach (var c in list) { ToBytesWithoutEncoding(c, out bytes[to], out bytes[to + 1]); to += 2; } return bytes; } public static char ToCharWithoutEncoding(this byte[] bytes) { return bytes.ToCharWithoutEncoding(0); } public static char ToCharWithoutEncoding(this byte[] bytes, int position) { if (bytes == null) return default(char); char c = default(char); if (position < bytes.Length) c += (char)bytes[position]; if (position + 1 < bytes.Length) c += (char)((uint)bytes[position + 1] << 8); return c; } public static List<char> ToCharListWithoutEncoding(this byte[] bytes) { if (bytes == null) return null; var chars = new List<char>(bytes.Length / 2 + bytes.Length % 2); for (int from = 0; from < bytes.Length; from += 2) { chars.Add(bytes.ToCharWithoutEncoding(from)); } return chars; } }

然后修改您的测试方法如下：

public void Utf8JsonCharSerializeAndDeserializeShouldEqualFixed() { Utf8JsonCharSerializeAndDeserializeShouldEqualFixed((char)56256); } public void Utf8JsonCharSerializeAndDeserializeShouldEqualFixed(char utfChar) { byte[] data; using (MemoryStream ms = new MemoryStream()) { using (StreamWriter writer = new StreamWriter(ms, new UTF8Encoding(true, true), 1024)) { var serializer = new JsonSerializer(); serializer.Serialize(writer, utfChar.ToByteArrayWithoutEncoding()); } data = ms.ToArray(); } using (MemoryStream ms = new MemoryStream(data)) { using (StreamReader reader = new StreamReader(ms, true)) { using (JsonTextReader jsonReader = new JsonTextReader(reader)) { var serializer = new JsonSerializer(); char deserializedChar = serializer.Deserialize<byte[]>(jsonReader).ToCharWithoutEncoding(); //Console.WriteLine(string.Format("{0}, {1}", utfChar, deserializedChar)); Assert.AreEqual(utfChar, deserializedChar); Assert.AreEqual((int)utfChar, (int)deserializedChar); } } } }

或者，如果某个容器类中有List<char>属性，则可以创建以下转换器：

public class CharListConverter : JsonConverter { public override bool CanConvert(Type objectType) { return objectType == typeof(List<char>); } public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer) { if (reader.TokenType == JsonToken.Null) return null; var bytes = serializer.Deserialize<byte[]>(reader); return bytes.ToCharListWithoutEncoding(); } public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer) { var list = (ICollection<char>)value; var bytes = list.ToByteArrayWithoutEncoding(); serializer.Serialize(writer, bytes); } }

并应用如下：

public class RootObject { [JsonConverter(typeof(CharListConverter))] public List<char> Characters { get; set; } }

在这两种情况下，Json.NET都会将字节数组编码为Base64。

Your problem is unrelated to Json.NET. Your problem is that U+DBC0 (decimal 56256) is an invalid unicode character, and, as explained in the documentation, the Encoding.UTF8 used by your StreamWriter will not encode such a character:

Encoding.UTF8 returns a UTF8Encoding object that uses replacement fallback to replace each string that it can't encode and each byte that it can't decode with a question mark ("?") character.

To confirm this, if you replace Encoding.UTF8 with new UTF8Encoding(true, true) in your test example, you will get the following exception:

EncoderFallbackException: Unable to translate Unicode character \uDBC0 at index 1 to specified code page.

If you are going to try to serialize invalid Unicode char values, you're going to need to manually encode them as, e.g., a byte array using the following:

public static partial class TextExtensions { static void ToBytesWithoutEncoding(char c, out byte lower, out byte upper) { var u = (uint)c; lower = unchecked((byte)u); upper = unchecked((byte)(u >> 8)); } public static byte[] ToByteArrayWithoutEncoding(this char c) { byte lower, upper; ToBytesWithoutEncoding(c, out lower, out upper); return new byte[] { lower, upper }; } public static byte[] ToByteArrayWithoutEncoding(this ICollection<char> list) { if (list == null) return null; var bytes = new byte[checked(list.Count * 2)]; int to = 0; foreach (var c in list) { ToBytesWithoutEncoding(c, out bytes[to], out bytes[to + 1]); to += 2; } return bytes; } public static char ToCharWithoutEncoding(this byte[] bytes) { return bytes.ToCharWithoutEncoding(0); } public static char ToCharWithoutEncoding(this byte[] bytes, int position) { if (bytes == null) return default(char); char c = default(char); if (position < bytes.Length) c += (char)bytes[position]; if (position + 1 < bytes.Length) c += (char)((uint)bytes[position + 1] << 8); return c; } public static List<char> ToCharListWithoutEncoding(this byte[] bytes) { if (bytes == null) return null; var chars = new List<char>(bytes.Length / 2 + bytes.Length % 2); for (int from = 0; from < bytes.Length; from += 2) { chars.Add(bytes.ToCharWithoutEncoding(from)); } return chars; } }

Then modify your test method as follows:

public void Utf8JsonCharSerializeAndDeserializeShouldEqualFixed() { Utf8JsonCharSerializeAndDeserializeShouldEqualFixed((char)56256); } public void Utf8JsonCharSerializeAndDeserializeShouldEqualFixed(char utfChar) { byte[] data; using (MemoryStream ms = new MemoryStream()) { using (StreamWriter writer = new StreamWriter(ms, new UTF8Encoding(true, true), 1024)) { var serializer = new JsonSerializer(); serializer.Serialize(writer, utfChar.ToByteArrayWithoutEncoding()); } data = ms.ToArray(); } using (MemoryStream ms = new MemoryStream(data)) { using (StreamReader reader = new StreamReader(ms, true)) { using (JsonTextReader jsonReader = new JsonTextReader(reader)) { var serializer = new JsonSerializer(); char deserializedChar = serializer.Deserialize<byte[]>(jsonReader).ToCharWithoutEncoding(); //Console.WriteLine(string.Format("{0}, {1}", utfChar, deserializedChar)); Assert.AreEqual(utfChar, deserializedChar); Assert.AreEqual((int)utfChar, (int)deserializedChar); } } } }

Or, if you have a List<char> property in some container class, you can create the following converter:

public class CharListConverter : JsonConverter { public override bool CanConvert(Type objectType) { return objectType == typeof(List<char>); } public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer) { if (reader.TokenType == JsonToken.Null) return null; var bytes = serializer.Deserialize<byte[]>(reader); return bytes.ToCharListWithoutEncoding(); } public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer) { var list = (ICollection<char>)value; var bytes = list.ToByteArrayWithoutEncoding(); serializer.Serialize(writer, bytes); } }

And apply it as follows:

public class RootObject { [JsonConverter(typeof(CharListConverter))] public List<char> Characters { get; set; } }

In both cases Json.NET will encode the byte array as Base64.

更多推荐