如何将unicode字符串转换为bash中的转义符？

编程入门行业动态更新时间:2024-10-25 08:16:01

本文介绍了如何将unicode字符串转换为bash中的转义符？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我需要一个将Unicode字符串转换为转义字符的工具，如\u0230。

I need a tool that will translate the unicode string into escape characters like \u0230.

例如，

echo ãçé | convert-unicode-tool \u00e3\u00e7\u00e9

推荐答案

所有bash方法-

echo ãçé | while read -n 1 u do [[ -n "$u" ]] && printf '\\u%04x' "'$u" done

领先的撇号是printf格式/解释指南。

That leading apostrophe is a printf formatting/interpretation guide.

来自在线GNU手册页：

如果数字参数的前导字符为''或'''，则其值是紧随其后的字符的数值。如果设置了POSIXLY_CORRECT环境变量，则将忽略所有其余字符；否则，将显示警告。，因为'a'的ASCII值为97，所以'printf％d a在使用ASCII字符集的主机上输出'97'。

If the leading character of a numeric argument is ‘"’ or ‘'’ then its value is the numeric value of the immediately following character. Any remaining characters are silently ignored if the POSIXLY_CORRECT environment variable is set; otherwise, a warning is printed. For example, ‘printf "%d" "'a"’ outputs ‘97’ on hosts that use the ASCII character set, since ‘a’ has the numeric value 97 in ASCII.

这使我们可以将字符传递给printf以进行数字解释，例如％d或％03o，或此处的％04x。

That lets us pass the character to printf for numeric interpretations such as %d or %03o, or here, %04x.

[[-n $ u]] 是因为存在一个空尾字节否则将附加为 \u0000 。

The [[ -n "$u" ]] is because there's a null trailing byte that will otherwise be appended as \u0000.

输出：

$: echo ãçé | > while read -n 1 u > do [[ -n "$u" ]] && printf '\\u%04x' "'$u" > done \u00e3\u00e7\u00e9

无空字节检查-

$: echo ãçé | while read -n 1 u; do printf '\\u%04x' "'$u";done \u00e3\u00e7\u00e9\u0000