Windows将哪种Unicode编码(UTF

编程入门 行业动态 更新时间:2024-10-08 22:53:14
本文介绍了Windows将哪种Unicode编码(UTF-8,UTF-16等)用于其Unicode数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

相同的Unicode(标准)表有不同的编码.例如,对于UTF-8编码 A 对应于 0x0041 ,但对于UTF-16编码,相同的 A 是表示为 0xfeff0041 .

There are different encodings of the same Unicode (standardized) table. For example for UTF-8 encoding A corresponds to 0x0041 but for UTF-16 encoding the same A is represented as 0xfeff0041.

从此精彩的文章我了解到,当我使用C ++为Windows平台编程并处理Unicode时,我应该知道它以2个字节表示.但这并没有说明编码.(即使它说x86 CPU都是低位字节序的,所以我知道这两个字节是如何存储在内存中的.)但是我也应该知道Unicode的编码,这样我才能获得有关如何将符号存储在内存中的完整信息.C ++/Windows程序员是否有固定的Unicode编码?

From this brilliant article I have learned that when I program by C++ for Windows platform and I deal with Unicode that I should know that it is represented in 2 bytes. But it does not say anything about the encoding. (Even it says that x86 CPUs are little-endian so I know how those two bytes are stored in memory.) But I should also know the encoding of the Unicode so that I have a complete information about how the symbols are stored in memory. Is there any fixed Unicode encoding for C++/Windows programmers?

推荐答案

Windows内存中存储的值始终是UTF-16 little-endian.但这不是您要说的-您正在查看文件内容.Windows本身未指定文件的编码,而是将其留给各个应用程序.

The values stored in memory for Windows are UTF-16 little-endian, always. But that's not what you're talking about - you're looking at file contents. Windows itself does not specify the encoding of files, it leaves that to individual applications.

在文件开头看到的0xfe 0xff是字节顺序标记或BOM .它不仅表明该文件很可能是Unicode,而且还告诉您Unicode编码的哪个变体.

The 0xfe 0xff you see at the start of the file is a Byte Order Mark or BOM. It not only indicates that the file is most probably Unicode, but it tells you which variant of Unicode encoding.

0xfe 0xff UTF-16 big-endian 0xff 0xfe UTF-16 little-endian 0xef 0xbb 0xbf UTF-8

没有BOM的文件应假定为8位字符,除非您知道它是如何编写的.仍然不能告诉您是UTF-8还是其他Windows字符编码,您只需要猜测即可.

A file that doesn't have a BOM should be assumed to be 8-bit characters unless you know how it was written. That still doesn't tell you if it's UTF-8 or some other Windows character encoding, you'll just have to guess.

您可以使用记事本作为完成此操作的示例.如果文件具有BOM表,则记事本将读取它并适当地处理内容.否则,您必须自己在编码"下拉列表中指定编码.

You may use Notepad as an example of how this is done. If the file has a BOM then Notepad will read it and process the contents appropriately. Otherwise you must specify the coding yourself with the "Encoding" dropdown list.

Windows文档没有更详细地说明编码的原因是Windows是Unicode的较早采用者,而当时有每个代码点只能一种编码16位.当确定65536个代码点不足时,就发明了替代对来扩展范围,从而诞生了UTF-16.Microsoft已经在使用Unicode来引用其编码,并且从未更改.

the reason Windows documentation isn't more specific about the encoding is that Windows was a very early adopter of Unicode, and at the time there was only one encoding of 16 bits per code point. When 65536 code points were determined to be inadequate, surrogate pairs were invented as a way to extend the range and UTF-16 was born. Microsoft was already using Unicode to refer to their encoding and never changed.

更多推荐

Windows将哪种Unicode编码(UTF

本文发布于:2023-11-05 10:15:51,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1560560.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:哪种   Windows   UTF   Unicode

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!