如何在C#中获取unicode字符的十进制值?

编程入门行业动态更新时间:2024-10-25 12:23:24

本文介绍了如何在C#中获取unicode字符的十进制值?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

如何在C#中获取Unicode字符的数字值?

How do i get the numeric value of a unicode character in C#?

例如，如果泰米尔字符அ( U + 0B85 )，输出应为2949(即0x0B85)

For example if tamil character அ (U+0B85) given, output should be 2949 (i.e. 0x0B85)

C ++:如何获取十进制值c ++中的unicode字符的显示
Java:如何获取Unicode字符的代码?

C++: How to get decimal value of a unicode character in c++
Java: How can I get a Unicode character's code?

某些字符需要多个代码点.在此示例UTF-16中，每个代码单元仍位于基本多语言平面中:

Some characters require multiple code points. In this example, UTF-16, each code unit is still in the Basic Multilingual Plane:

(即U+0072 U+0327 U+030C)
(即U+0072 U+0338 U+0327 U+0316 U+0317 U+0300 U+0301 U+0302 U+0308 U+0360)

(i.e. U+0072 U+0327 U+030C)
(i.e. U+0072 U+0338 U+0327 U+0316 U+0317 U+0300 U+0301 U+0302 U+0308 U+0360)

更大的一点是，一个字符"可能需要1个以上的UTF-16代码单元，它可能需要2个以上的UTF-16代码单元，它可能需要3个以上的UTF-16代码单元.

The larger point being that one "character" can require more than 1 UTF-16 code unit, it can require more than 2 UTF-16 code units, it can require more than 3 UTF-16 code units.

更大的一点是，一个字符"可能需要数十个unicode代码点.在C#中的UTF-16中，意味着大于1 char.一个字符可能需要17个char.

The larger point being that one "character" can require dozens of unicode code points. In UTF-16 in C# that means more than 1 char. One character can require 17 char.

我的问题是关于将char转换为UTF-16编码值.即使整个字符串17 char仅表示一个字符"，我仍然想知道如何将每个UTF-16单位转换为数值.

My question was about converting char into a UTF-16 encoding value. Even if an entire string of 17 char only represents one "character", i still want to know how to convert each UTF-16 unit into a numeric value.

例如

String s = "அ"; int i = Unicode(s[0]);

其中Unicode返回Unicode标准定义的整数值，输入表达式的第一个字符.

Where Unicode returns the integer value, as defined by the Unicode standard, for the first character of the input expression.

推荐答案

它与Java基本相同.如果您将其作为char，则可以隐式转换为int:

It's basically the same as Java. If you've got it as a char, you can just convert to int implicitly:

char c = '\u0b85'; // Implicit conversion: char is basically a 16-bit unsigned integer int x = c; Console.WriteLine(x); // Prints 2949

如果您将其作为字符串的一部分，只需先获取单个字符:

If you've got it as part of a string, just get that single character first:

string text = GetText(); int x = text[2]; // Or whatever...

请注意，不在基本多语言平面中的字符将表示为两个UTF-16代码单元. .NET中有支持查找完整的Unicode代码点，但它不是简单.

Note that characters not in the basic multilingual plane will be represented as two UTF-16 code units. There is support in .NET for finding the full Unicode code point, but it's not simple.