我们正在从latin1转向UTF-8并拥有100k行的python代码。
另外我是python的新手(ha-ha-ha!)。
我已经知道str()函数在接收Unicode时失败了所以我们应该使用unicode()代替它,几乎具有相同的效果。
其他“危险”的代码是什么?
迁移到UTF-8有什么基本准则/算法吗? 可以写成自动“代码转换器”吗?
We are moving from latin1 to UTF-8 and have 100k lines of python code.
Plus I'm new in python (ha-ha-ha!).
I already know that str() function fails when receiving Unicode so we should use unicode() instead of it with almost the same effect.
What are the other "dangerous" places of code?
Are there any basic guidelines/algorithms for moving to UTF-8? Can it be written an automatic 'code transformer'?
最满意答案
str和unicode是类,而不是函数。 当你调用str(u'abcd')你正在初始化一个新的字符串,它将'abcd'作为一个变量。 恰好可以使用str()将任何类型的字符串转换为ascii str 。
要注意的其他方面是从文件/输入读取时,或者基本上从未为unicode编写的函数中作为字符串返回的任何内容。
请享用 :)
str and unicode are classes, not functions. When you call str(u'abcd') you are initialising a new string which takes 'abcd' as a variable. It just so happens that str() can be used to convert a string of any type to an ascii str.
Other areas to look out for are when reading from a file/input, or basically anything you get back as a string from a function that was not written for unicode.
Enjoy :)
更多推荐
发布评论