unicode之谜

编程入门行业动态更新时间:2024-10-28 04:18:39

本文介绍了unicode之谜的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我最近发现unicode（\ 347，iso-8859-1）是小写c-with-cedilla，所以我开始围捕unicode数字你需要法语的额外字符，我发现它们都只是罚款除了oe ligature（oeuvre等）。我检查了unicode 字符，从0到900没有找到它;然后我查看了 www.unicode ，但我看到了这些数字（ 0152和0153）没有工作。有人可以帮我这个吗？（我可能需要为第二个参数给出一个不同的值吗？）和平， STM PS：我正在考虑将pyscript作为制作图表以包含在LaTeX文档中的一种方法。如果有人可以分享关于pyscript的意见，我很有兴趣听到它。和平

I recently found out that unicode("\347", "iso-8859-1") is the lowercase c-with-cedilla, so I set out to round up the unicode numbers of the extra characters you need for French, and I found them all just fine EXCEPT for the o-e ligature (oeuvre, etc). I examined the unicode characters from 0 to 900 without finding it; then I looked at www.unicode but the numbers I got there (0152 and 0153) didn''t work. Can anybody put a help on me wrt this? (Do I need to give a different value for the second parameter, maybe?) Peace, STM PS: I''m considering looking into pyscript as a means of making diagrams for inclusion in LaTeX documents. If anyone can share an opinion about pyscript, I''m interested to hear it. Peace

推荐答案

2005年1月10日星期一07:48:44 -0800，Sean McIlroy写道： On Mon, Jan 10, 2005 at 07:48:44PM -0800, Sean McIlroy wrote: 我最近发现了unicode（\ 347" ;，iso-8859-1是小写c-with-cedilla，所以我开始围绕法语所需的额外字符的unicode数字，我找到了他们都只是罚款除了结束（全部等）。我在没有找到的情况下检查了从0到900的unicode 字符;然后我查看了 www.unicode ，但我看到了这些数字（ 0152和0153）没有工作。有人可以帮我这个吗？（我可能需要给第二个参数赋予不同的值吗？） I recently found out that unicode("\347", "iso-8859-1") is the lowercase c-with-cedilla, so I set out to round up the unicode numbers of the extra characters you need for French, and I found them all just fine EXCEPT for the o-e ligature (oeuvre, etc). I examined the unicode characters from 0 to 900 without finding it; then I looked at www.unicode but the numbers I got there (0152 and 0153) didn''t work. Can anybody put a help on me wrt this? (Do I need to give a different value for the second parameter, maybe?)

??不是ISO 8859-1的一部分，所以你不能这样做。你可以做其中一个 u''\ u0153'' 或者，如果你必须的话， unicode（" \305 \223"，" utf-8"） - John Lenton（jo**@grulic.ar） - 随机财富： Lisp，Lisp，Lisp Machine， Lisp Machine很有趣。 Lisp，Lisp，Lisp Machine，每个人的乐趣。 -----开始PGP签名--- - 版本：GnuPG v1.2.5（GNU / Linux） iD8DBQFB42K4gPqu395ykGsRAuYHAKCWQPoNdtAaBm6XeKqN4 / cdsVIhJgCggMRq NlFH8U / HGRTNkYrZsFCulVg = = 47J7 ----- END PGP SIGNATURE -----

?? isn''t part of ISO 8859-1, so you can''t get it that way. You can do one of u''\u0153'' or, if you must, unicode("\305\223", "utf-8") -- John Lenton (jo**@grulic.ar) -- Random fortune: Lisp, Lisp, Lisp Machine, Lisp Machine is Fun. Lisp, Lisp, Lisp Machine, Fun for everyone. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFB42K4gPqu395ykGsRAuYHAKCWQPoNdtAaBm6XeKqN4/cdsVIhJgCggMRq NlFH8U/HGRTNkYrZsFCulVg= =47J7 -----END PGP SIGNATURE-----

Sean McIlroy写道： Sean McIlroy wrote: 我最近发现unicode（\ 347，iso-8859-1）是小写c- with-cedilla，所以我开始围绕法语所需的额外字符的unicode 数字，我f除了o-e连字（全部等）之外，所有都很好。我检查了 unicode字符，从0到900而没有找到它;然后我查看了 www.unicode ，但我看到了这些数字（ 0152和0153）没有工作。有人可以帮我这个吗？（我可能需要给第二个参数赋予不同的值吗？） I recently found out that unicode("\347", "iso-8859-1") is the lowercase c-with-cedilla, so I set out to round up the unicode numbers of the extra characters you need for French, and I found them all just fine EXCEPT for the o-e ligature (oeuvre, etc). I examined the unicode characters from 0 to 900 without finding it; then I looked at www.unicode but the numbers I got there (0152 and 0153) didn''t work. Can anybody put a help on me wrt this? (Do I need to give a different value for the second parameter, maybe?)

iso-8859-1中的字符直接映射到Unicode。也就是说，Unicode的前256个字符与 iso-8859-1相同。考虑这个：

Characters that are in iso-8859-1 are mapped directly into Unicode. That is, the first 256 characters of Unicode are identical to iso-8859-1. Consider this:

c_cedilla = unicode（" \ '347"，" iso-8859-1"） c_cedilla u''\xe7''ord（c_cedilla） 231 ord（" \ 347"） c_cedilla = unicode("\347", "iso-8859-1") c_cedilla u''\xe7'' ord(c_cedilla) 231 ord("\347")

231 你用c_cedilla做了什么" working"因为它实际上没有做任何事情。但是如果你执行unicode（char，encoding），其中char不在编码中，它就不会工作。作为John Lenton已经指出，如果你在Unicode 表中找到一个字符，你可以直接使用它。在这个的情况下，没有必要使用unicode（）。 HTH， John

231 What you did with c_cedilla "worked" because it was effectively doing nothing. However if you do unicode(char, encoding) where char is not in encoding, it won''t "work". As John Lenton has pointed out, if you find a character in the Unicode tables, you can just use it directly. There is no need in this circumstance to use unicode(). HTH, John

一些海报写道（与另一个主题有关）： Some poster wrote (in connexion with another topic): ... unicode（" \ 347" ;，iso-8859-1）... ... unicode("\347", "iso-8859-1") ...

嗯，我好久没有好好的咆哮，所以这里有：我是一个复古标本，能够（除其他外）召回来自ICT 1900系列的八进制操作码（070 =电话， 072 =退出，074 =分支，...）但是现在我认为继续使用八进制作为痘和瘟疫。 1.八进制表示法对计算机上的系统程序员有用一个单词中的位数是3的倍数。还有生产使用？ AFAIK字大小分别为12,24,36,48和60位 - 所有4的倍数，因此可以使用十六进制。 2。考虑一下对于那些从未听说过八进制的新手的影响：

Well, I haven''t had a good rant for quite a while, so here goes: I''m a bit of a retro specimen, being able (inter alia) to recall octal opcodes from the ICT 1900 series (070=call, 072=exit, 074=branch, ...) but nowadays I regard continued usage of octal as a pox and a pestilence. 1. Octal notation is of use to systems programmers on computers where the number of bits in a word is a multiple of 3. Are there any still in production use? AFAIK word sizes were 12, 24, 36, 48, and 60 bits -- all multiples of 4, so hexadecimal could be used. 2. Consider the effect on the newbie who''s never even heard of "octal":

import datetime datetime.date（2005,01,01） datetime.date（2005,1,1）datetime.date（2005,09,09） import datetime datetime.date(2005,01,01) datetime.date(2005, 1, 1) datetime.date(2005,09,09)

文件"< stdin>"，第1行 datetime.date（2005,09,09） ^ 语法错误：无效令牌 [直接出自BOFH Po-faced错误消息手册] 3考虑来自re模块的文档的这个摘录： "" \ number 匹配的内容相同数量的组。团体从1开始编号。例如，（。+）\ 1匹配'''或''55 55''，但不是''结束''（注意组后面的空格）。这个特殊的序列只能用于匹配前99个组中的一个。如果的第一个数字是0，或者数字是3个八位数，那么它将不会被解释为组匹配，而是作为八进制的字符值编号。在[]中和]一个字符类，所有数字转义被视为字符。 """ 我帮助了几年前理顺这个描述，但我担心它还不是100％准确。更糟糕的是，看看必要的代码来实现这个。 === 我们（非语言地）隐含地将前导零（或者甚至只是 \ [0-7]）视为八进制，而不是要求使用十六进制显示为明确的内容为。字符串中的可变长度想法没有帮助： " \ 18"，" \ 0128"和\ 1238都是长度为2的字符串。我在GvR的Python Regrets中没有看到任何八进制的提及。或AMK's PEP 3000。为什么不？难道不后悔吗？

File "<stdin>", line 1 datetime.date(2005,09,09) ^ SyntaxError: invalid token [straight out of the "BOFH Manual of Po-faced Error Messages"] 3. Consider this extract from the docs for the re module: """ \number Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \1 matches ''the the'' or ''55 55'', but not ''the end'' (note the space after the group). This special sequence can only be used to match one of the first 99 groups. If the first digit of number is 0, or number is 3 octal digits long, it will not be interpreted as a group match, but as the character with octal value number. Inside the "[" and "]" of a character class, all numeric escapes are treated as characters. """ I helped to straighten out this description a few years ago, but I fear it''s still not 100% accurate. Worse, take a peek at the code necessary to implement this. === We (un-Pythonically) implicitly take a leading zero (or even just \[0-7]) as meaning octal, instead of requiring something explicit as with hexadecimal. The variable length idea in strings doesn''t help: "\18", "\128" and "\1238" are all strings of length 2. I don''t see any mention of octal in GvR''s "Python Regrets" or AMK''s "PEP 3000". Why not? Is it not regretted?

更多推荐

unicode之谜

本文发布于:2023-11-08 18:48:23，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1570138.html