Unicode和字符集|电子爱好者

admin管理员组
文章数量:1593971

外文文献取自：
http://www.joelonsoftware/articles/Unicode.html

外文原文（后接中文版）：

Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

每个软件开发人员绝对、肯定地必须了解Unicode和字符集(没有借口!)

Ever wonder about that mysterious Content-Type tag? You know, the one you’re supposed to put in HTML and you never quite know what it should be?

有没有想过那个神秘的内容类型标签?就是你应该在HTML中加入但你不知道它应该是什么?

Did you ever get an email from your friends in Bulgaria with the subject line “??? ??? ??? ???”?

你有没有收到过保加利亚朋友的邮件，主题栏是“???”??? ?? ? ?”

I’ve been dismayed to discover just how many software developers aren’t really completely up to speed on the mysterious world of character sets, encodings, Unicode, all that stuff. A couple of years ago, a beta tester for FogBUGZ was wondering whether it could handle incoming email in Japanese. Japanese? They have email in Japanese? I had no idea. When I looked closely at the commercial ActiveX control we were using to parse MIME email messages, we discovered it was doing exactly the wrong thing with character sets, so we actually had to write heroic code to undo the wrong conversion it had done and redo it correctly. When I looked into another commercial library, it, too, had a completely broken character code implementation. I corresponded with the developer of that package and he sort of thought they “couldn’t do anything about it.” Like many programmers, he just wished it would all blow over somehow.

我沮丧地发现，有多少软件开发人员并没有完全跟上字符集、编码、Unicode等神秘世界的速度。几年前，FogBUGZ的一个测试版测试者想知道它是否能处理日文邮件。日语吗?他们有日文邮件吗?我不知道。当我仔细观察我们用来解析MIME电子邮件消息的商业ActiveX控件时，我们发现它在字符集上做了完全错误的事情，所以我们实际上不得不编写英雄代码来撤销它所做的错误转换并重新正确地进行。当我研究另一个商业库时，它也有一个完全破碎的字符代码实现。我与该程序包的开发者通信，他认为他们“对此无能为力”。和许多程序员一样，他只是希望这一切能以某种方式烟消云散。

But it won’t. When I discovered that the popular web development tool PHP has almost complete ignorance of character encoding issues, blithely using 8 bits for characters, making it darn near impossible to develop good international web applications, I thought, enough is enough.

但它不会。当我发现流行的web开发工具PHP几乎完全忽略了字符编码问题，轻率地使用8位字符，这使得开发优秀的国际web应用程序几乎不可能，我想，这就够了。

So I have an announcement to make: if you are a programmer working in 2003 and you don’t know the basics of characters, character sets, encodings, and Unicode, and I catch you, I’m going to punish you by making you peel onions for 6 months in a submarine. I swear I will.

所以我要宣布一件事:如果你是一个在2003年工作的程序员，你不知道基本的字符，字符集，编码和Unicode，而我抓住了你，我要惩罚你，让你在潜艇里剥洋葱6个月。我发誓我会的。

And one more thing:

IT’S NOT THAT HARD.（这并不难）

In this article I’ll fill you in on exactly what every working programmer should know. All that stuff about “plain text = ascii = characters are 8 bits” is not only wrong, it’s hopelessly wrong, and if you’re still programming that way, you’re not much better than a medical doctor who doesn’t believe in germs. Please do not write another line of code until you finish reading this article.

在本文中，我将准确地告诉您每个工作的程序员都应该知道的东西。所有关于“纯文本= ascii =字符是8位”的东西不仅是错误的，而且是无可救药的错误，如果你仍然这样编程，你不比一个不相信细菌的医生好多少。在读完这篇文章之前，请不要再写一行代码。

Before I get started, I should warn you that if you are one of those rare people who knows about internationalization, you are going to find my entire discussion a little bit oversimplified. I’m really just trying to set a minimum bar here so that everyone can understand what’s going on and can write code that has a hope of working with text in any language other than the subset of English that doesn’t include words with accents. And I should warn you that character handling is only a tiny portion of what it takes to create software that works internationally, but I can only write about one thing at a time so today it’s character sets.

在开始之前，我应该提醒您，如果您是了解国际化的少数人之一，您会发现我的整个讨论有点过于简单。我只是想在这里设置一个最小的标准，这样每个人都能理解发生了什么，并且能够编写代码，希望能够处理除不包含重音单词的英语子集以外的任何语言的文本。我需要提醒你的是，角色处理只是创造能够在国际上运行的软件的一小部分，但我一次只能写一件事，所以今天我只写字符集。

A Historical Perspective（历史角度）

The easiest way to understand this stuff is to go chronologically.

最简单的方法是按时间顺序来理解。

You probably think I’m going to talk about very old character sets like EBCDIC here. Well, I won’t. EBCDIC is not relevant to your life. We don’t have to go that far back in time.

您可能认为我将在这里讨论非常古老的字符集，如EBCDIC。好吧,我不会的。EBCDIC与你的生活无关。我们没必要回到那么久远的过去。

ASCII tableBack in the semi-olden days, when Unix was being invented and K&R were writing The C Programming Language, everything was very simple. EBCDIC was on its way out. The only characters that mattered were good old unaccented English letters, and we had a code for them called ASCII which was able to represent every character using a number between 32 and 127. Space was 32, the letter “A” was 65, etc. This could conveniently be stored in 7 bits. Most computers in those days were using 8-bit bytes, so not only could you store every possible ASCII character, but you had a whole bit to spare, which, if you were wicked, you could use for your own devious purposes: the dim bulbs at WordStar actually turned on the high bit to indicate the last letter in a word, condemning WordStar to English text only. Codes below 32 were called unprintable and were used for cussing. Just kidding. They were used for control characters, like 7 which made your computer beep and 12 which caused the current page of paper to go flying out of the printer and a new one to be fed in.
And all was good, assuming you were an English speaker

在Unix刚被发明出来，K&R还在编写C编程语言的时候，一切都非常简单。EBCDIC即将出局。唯一重要的字符是古老的无重音英文字母，我们有一种编码，叫做ASCII，可以用32到127之间的数字来表示每个字符。空格是32，字母A是65，等等。这可以方便地以7位存储。大多数计算机在那些日子里使用8位字节,所以不仅可以存储每一个可能的ASCII字符,但你有一个整体

本文标签：字符集 Unicode

版权声明：本文标题：Unicode和字符集内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/xitong/1728182389a1148515.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

Unicode和字符集

外文原文（后接中文版）：

Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

IT’S NOT THAT HARD.（这并不难）

A Historical Perspective（历史角度）

更多相关文章

sqlserver prefix n

字符的英文单词是什么

终端乱码的终极解决方案

linux中文乱码的解决方法

数据库中中文乱码的解决方法

linux sqlserver 中文编码

readrandomstr用法

汉字简介

汉字编码系统

中间件字符乱码问题

MySQL免安装版配置

实达110列票据打印机BP-690KPro

五笔字根表里的字根用五笔怎么输入

威尔取模软件GBK字库GB2312字库说明书

前后端生僻字java处理

win7下载cmd中设置字符集位UTF-8

7.Mysql数据库表引擎与字符集

Unicode反转字符文件名植入攻击实验

查看windows操作系统的默认编码(字符集)-chcp

Unicode和字符集

发表评论

推荐文章

Unity VR 开发教程 OpenXR+XR Interaction Toolkit（九）根据不同物体匹配对应的抓取手势

ChatGPT和GPT-4帮你写人物传记

tplink控制上网设备_在家办公视频会议学生上网课慢 - 带宽控制TP-Link 篇

电脑文件加密怎么设置？如何给电脑文件加密？1分钟快速学会三种方法！

pytorch深度学习入门（14）之-模型量化

热门文章

最喜欢蹭哪里的免费WiFi

【收藏】快速排查无线AP故障的十种方法

互联网大佬打脸史，牛逼也是会吹破的

崩溃！如何面对令人脱发的老代码？

PDF编辑技巧：10 款最佳 PDF 编辑器软件

PDF文件太大了怎么办,如何压缩PDF且不改变清晰度

pdf文件没有加密保护，仍然不能编辑的解决办法

tf卡文件隐藏怎样恢复，原来有这三种方法，你了解多少呢？

手机桌面隐藏大师_应用隐藏大师下载安装|应用隐藏大师手机版下载v1.6.0a-乐游网安卓下载...

Python爬虫入门教程 79-100 Python Portia爬虫框架-在Win7里面配置起来

最新文章

8种企业赢利模式

无线增值宝典

【精品，你所不知道的IT高薪】【转贴】清华生7天猎头生活的发现!

凉宫春日的忧郁第二章

计算机科学与技术学习心得

净室软件工程随笔 ----《零缺陷程序设计》读书笔记

深入浅出软件开发技术名词_1

强烈建议每一个想成功的程序员读一读此文章

富爸爸,穷爸爸

管理小故事精髓 100例(转)

创业者怎样才能赚到钱？八种最有效创业赢利模式

软件本地化与汉化

创业知识

管理小故事精髓 100例

50个最好的firefox扩展让你尽情冲浪

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载