Python基础入门文档（二）

编程入门行业动态更新时间:2024-10-25 21:21:52

Python基础<a href=https://www.elefans.com/category/jswz/34/1770026.html style= 入门文档（二）"/>

Python基础入门文档（二）

高阶语法技巧

按名称传递参数

函数除了可以按照顺序传递参数，还可以按照名称传递参数。

这可以有效解决我们在“默认参数”一节提到过的问题。

即函数大量参数有默认值，导致想在某个参数上传递一个不同的值，需要将前面的参数默认值全部传递一遍的窘境。

from __future__ import print_functiondef func(a, b=5, c=10):print('a is', a, 'and b is', b, 'and c is', c)func(3, 7)
func(25, c=24)
func(c=50, a=100)

a is 3 and b is 7 and c is 10
a is 25 and b is 5 and c is 24
a is 100 and b is 5 and c is 50

混合多种传递方式

一般而言，比较容易理解的次序是：

有按顺序传递的值，就从左向右接收数据。
没有按顺序传递的值，就按名传递。
找不到名字的，报错。这个名字已经按顺序传递过了的，报错。
全部传递完了还没找到值的参数，用默认参数。
没有默认参数的，报错。

根据上面的规则，我们可以得到一些结论：

定义函数时，如果带默认值参数在不带默认值的左边，报错。
传递值时，如果按名传递写在了按顺序传递的左边，报错。
设计使用函数时，如果需要想一下才知道是怎么回事的。请衡量一下这个写法是不是太繁琐了，有没有简单的办法。

可变参数

有时我们需要接收不确定长度个参数。例如sum函数。

或者是出于某种理由，不想处理用户的所有输入。

这时我们可以用可变参数来接收或传递参数。

可变参数的基本写法是function_name(arguments, *argument_list, arguments_withname, **arguments_dict)。

def any_arguments(*p, **kw):print(p)print(kw)any_arguments(1, 2, 3, 4, 5, a=1, b=2, c=3, d=4, e=5)

(1, 2, 3, 4, 5)
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4}

要提防用可变参数来做list传递的行为。

def sum(a, b, *p):s = a + bfor i in p: s += ireturn sprint(sum(1, 2))
print(sum(1, 2, 3))

3
6

def sum(l, inital=0):s = initalfor i in l: s += ireturn sprint(sum(range(101)))

第一类函数对象

什么叫做“第一类”？

就是“一等公民”的意思。在使用上和其他值一样，没有任何区别和歧视。

第一类函数对象的意思就是，函数对象在使用上和其他对象一样。

你是否想象过，将函数作为普通对象使用？像普通对象一样，能够定义，作为参数传递，作为结果传出。

op_set = {'+': lambda x, y: x+y,'-': lambda x, y: x-y,'*': lambda x, y: x*y,'/': lambda x, y: float(x)/y,
}print(op_set['+'](1, 2))

闭包

python的作用域基本原则是，内层可以访问外层变量，外层不能访问内层。在两层的时候，只有局部和全局。

但是如果考虑在函数内可以定义函数，而且被定义出来的函数还可以作为结果值返回。

那么这个“在内层定义的函数”，除了访问自己的局部变量和全部变量外，是否可以访问定义他的函数的“局部变量”呢？

这种使得嵌套定义函数可访问上个作用域数据的定义，被称为“闭包”。

def addn(n):def add(x):return x+nreturn addadd10 = addn(10)
print(add10(50))

闭包的本质

正常来说，局部变量在函数结束时消亡。

然而如果在内层嵌套函数中需要访问的话，那么函数结束时局部变量也不能消亡，否则内层函数去访问什么呢？

带有闭包的函数，局部变量会被附带在刚刚生成函数上，随着这个函数对象的消亡而消亡。

因此，闭包是一种带环境数据的可执行对象。这是闭包和普通函数最大的区别。

闭包和类的比较

闭包是带数据的方法，类是带方法的数据，两者非常类似。可以选用简单的那个。

在需要暴露内部数据，数据附带的方法比较复杂时，建议选用类。反之，可以选用闭包。

class addn(object):def __init__(self, n):self.n = ndef __call__(self, x):return x + self.nadd10 = addn(10)
print(add10(50))

LEGB

在加入闭包后，我们可以得到完整的变量查询顺序。

局部，闭包中，全局，内置。四个名称的字母简写为LEGB。

LEGB - Local Enclosing Global Builtins

my_variable = 1def func():my_variable = 2print(my_variable)func()

my_variable = 1def func():my_variable = 2def func1():my_variable = 3print(my_variable)return func1func()()

装饰器

装饰器方法允许你重写某个函数的执行过程，在实际函数执行前后执行一些自己的辅助代码。

从而允许你将部分功能附加在其他函数上。或者运用这种手法对函数的共同行为进行剥离。

这种重写过程，可以用闭包或类来实现。

from __future__ import print_functionclass print_arg(object):def __init__(self, f):self.f = fdef __call__(self, *p, **kw):print(p, kw)return self.f(*p, **kw)def add(a, b):return a+b
add = print_arg(add)print(add(10, 20))

(10, 20) {}
30

def print_arg(f):def inner(*p, **kw):print(p, kw)return f(*p, **kw)return innerdef add(a, b):return a+b
add = print_arg(add)print(add(10, 20))

(10, 20) {}
30

装饰器语法糖

由于装饰器在python中经常使用，因此python定义了专门的语法来方便使用。

@print_arg
def add(a, b):return a+bprint(add(10, 20))

(10, 20) {}
30

练习1

fib计算函数的运算很慢，因为fib在计算某个参数时，会重复执行很多次更低值的计算。

请设计一个装饰器，加速这个执行过程。并对比在加和不加这个装饰器下的效率。

def fib(n):if n <= 1: return 1return fib(n-1) + fib(n-2)%time fib(35)

CPU times: user 3.66 s, sys: 0 ns, total: 3.66 s
Wall time: 3.66 s14930352

答案

def memorized(f):cache = {}def inner(n):if n not in cache:cache[n] = f(n)return cache[n]return inner@memorized
def fib(n):if n <= 1: return 1return fib(n-1) + fib(n-2)%time fib(35)

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 104 µs14930352

练习2

请考虑上述装饰器是否能用于其他函数。

如果不能，请修改上述装饰器，使其能够用于其他函数。并说明其适用范围。

答案

def memorized(f):cache = {}def inner(*p):if p not in cache:cache[p] = f(*p)return cache[p]return inner@memorized
def fib(n):if n <= 1: return 1return fib(n-1) + fib(n-2)%time fib(35)

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 83.9 µs14930352

生成器

在python中，我们时常需要生成一个非常大的序列数据并返回。例如帖子的列表，用户的ID，等等。

正常我们都是获得数据并生成一个list返回。但是很多时候，这个序列数据是近乎于无限的。

从性能角度来说，我希望使用一个计算一个，不需要为了全部列表而耗费资源。

从使用角度来说，我希望这个列表和原来的列表使用起来没有什么太大区别。两者最好做到可以互相替换。

幸好，python为我们提供了生成器这么一种功能。

生成器等于生成数组返回。

def fib_seq(n):l = []a, b = 1, 1for i in range(n):a, b = a+b, al.append(a)return lprint(fib_seq(20))

[2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711]

def fib_seq(n):a, b = 1, 1for i in range(n):a, b = a+b, ayield aprint(fib_seq(20))
print(list(fib_seq(20)))

<generator object fib_seq at 0x7fc8cb49ca50>
[2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711]

无限生成器

既然生成器机制允许我们一个个获得数据，那么我们定义一个“无限长”的list也是可以的。

只要不真的去使用所有数据，就不会产生死循环。

def fib_seq():a = b = 1while True:a, b = a+b, ayield bdef size_limited_seq(seq, n):for i in seq:if n <= 0: returnyield in -= 1print(list(size_limited_seq(fib_seq(), 10)))

[1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

生成器在需要时才生成。

如果此处使用数组的话，百分百会导致无法执行。

def number_limited_seq(seq, n):for i in seq:if i >= n: breakyield iprint(list(number_limited_seq(fib_seq(), 100000)))

[1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025]

# 和前面定义的sum函数联用，sum函数支持生成器
print(sum(number_limited_seq(fib_seq(), 100000)))

map/filter/reduce

map: 从一个列表映射到另一个列表
filter: 从列表中选择符合条件的
reduce: 对列表反复执行合并操作

# python2下无需list，但是python3下map返回生成器
print(list(map(lambda x: x*x, range(20))))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]

# python2下无需list，但是python3下filter返回生成器
print(list(filter(lambda x: x % 2 == 0, range(20))))

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

# 在python2下，无需import。
# python3中，reduce已经被从全局命名空间中移除了，所以需要import。
from functools import reduce
reduce(lambda x, y: x*y, range(1, 20))

121645100408832000

map/filter，列表推导式和for循环的比较

三者都可以完成对list操作的特定功能，例如筛选出其中某些值，将其中的值转换为另一个值。但是三者在便利程度和使用细节上略有不同。

map/filter和列表推导式比较简短，适用于相对不是很复杂的情况。真正复杂的情况需要用for来展开。

map/filter和列表推导式基本假定是“运算不彼此干扰”。即运算某个元素时，不受到其他元素的影响。for没有这一假定，某元素运算时可以受到另一个元素影响。

一般来说列表推导式比map/filter更简洁一些，因为map/filter需要定义lambda。但是在已经拥有现成函数的前提下，map/filter更加方便一些。

可以举筛法的例子。

print([x*x for x in range(20)])

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]

l = []
for x in range(20):l.append(x*x)
print(l)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]

print([x for x in range(20) if x % 2 == 0])

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

l = []
for x in range(20):if x % 2 == 0:l.append(x)
print(l)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

例子: 计算器

写一个计算器，能够计算简单的加减乘除。（不考虑优先级）

答案

op_set = {'+': lambda x, y: x+y,'-': lambda x, y: x-y,'*': lambda x, y: x*y,'/': lambda x, y: float(x)/y,
}def parser_exp(exp):s = ''for c in exp:if c in op_set:yield ss = ''yield celse:s += cif s: yield sprint(list(parser_exp('1+2')))

['1', '+', '2']

def eval_exp(exp):num = Nones = parser_exp(exp)for e in s:if e in op_set:r = float(next(s))num = op_set[e](num, r)else:num = float(e)return numprint(eval_exp('2*3+1'))
print(eval_exp('1+2*3'))

7.0
9.0

语法范式

引用和复制

在Python中，要分清“可变对象”和“不可变对象”。以及“对象引用”和“对象复制”。

对不可变对象而言，引用和复制并没有太大区别——反正弄来弄去都是同一个对象。

而对于可变对象，对于引用的修改会同时作用于原始对象。对复制的修改并不影响原始对象。

from __future__ import print_functionprint('Simple Assignment')
shoplist = ['apple', 'mango', 'carrot', 'banana']
mylist = shoplistdel shoplist[0]
print('shoplist is', shoplist)
print('mylist is', mylist)
print('Copy by making a full slice')mylist = shoplist[:]
del mylist[0]
print('shoplist is', shoplist)
print('mylist is', mylist)

Simple Assignment
shoplist is ['mango', 'carrot', 'banana']
mylist is ['mango', 'carrot', 'banana']
Copy by making a full slice
shoplist is ['mango', 'carrot', 'banana']
mylist is ['carrot', 'banana']

浅拷贝和深拷贝

可变对象的拷贝是一件耗时而重要的工作。

那么，如果在可变对象内有另一个可变对象呢？例如在list里嵌套了list？

只对原始的list进行复制的行为，叫做浅拷贝。对多层对象进行拷贝的行为，叫做深拷贝。

from __future__ import print_function
import copyorig_list = [1, 2, 3, [4, 5, 6]]new_list = copy.deepcopy(orig_list)
print(new_list)
new_list[3].append(7)
new_list.append(8)
print(new_list)
print(orig_list)print('-----------')new_list = copy.copy(orig_list)
print(new_list)
new_list[3].append(7)
new_list.append(8)
print(new_list)
print(orig_list)

[1, 2, 3, [4, 5, 6]]
[1, 2, 3, [4, 5, 6, 7], 8]
[1, 2, 3, [4, 5, 6]]
-----------
[1, 2, 3, [4, 5, 6]]
[1, 2, 3, [4, 5, 6, 7], 8]
[1, 2, 3, [4, 5, 6, 7]]

练习1

编写一个函数，将输入列表反序输出。

禁止用系统反向函数，例如reversed和list.reverse。

答案

l = [1,2,3,4,5]def my_reverse(l):for i in range(1, len(l)+1):yield l[len(l)-i]print(list(my_reverse(l)))

[5, 4, 3, 2, 1]

练习2

修改上面的函数，将输入列表就地反序，不产生输出。

答案

l = [1,2,3,4,5]def my_reverse(l):t = l[:]del l[:]for i in range(1, len(t)+1):l.append(t[len(t)-i])my_reverse(l)
print(l)

[5, 4, 3, 2, 1]

练习3

构造一颗自循环二叉树。

答案

l = []
l.append(l)
l.append(l)
print(l)

[[...], [...]]

练习4

思考一下，如果只用tuple，是否能够完成自循环树的构造。

多层yield范式

前面我们说过，对于list的处理常常可以归结为map/filter模式，而对于map/filter的复杂操作往往最好使用for来展开。

但是实际工作中，我们常常会发现，我们需要对list进行复杂处理，每层处理都需要用for。

而且每层处理都是可选的，总体处理过程需要根据条件进行拼装。

大家可能还记得“无限生成器”一节的size_limited_seq和number_limited_seq吧。

这种对list进行处理的模式称为链式范式——输入一个list，经过处理链条上一层层的转化，最后形成输出list。

每一个环节的输入和输出都是一个list——或其等效物生成器。因而每个环节都可以和其他环节拼装在一起。

类似的模式还在“计算器”一节出现过。

设计原则

面对对象程序设计原则

单一功能原则：一个对象应该仅具有一类特定功能。不要试图在上面做太多的功能。
开闭原则：软件体对于扩展是开放的，对于修改是封闭的。简单思考一下，如果你的代码提交后只能追加文件和修改最少的行数，如何设计？
里氏替换原则：对父类适用的程序，对子类也适用。这点往往会和我们日常生活的经验构成悖论。
接口隔离原则：多个功能独立的小接口好过一个无所不有的大接口。
依赖反转原则：针对接口编程而不是针对实现编程。

练习1

定义一些类，描述平行四边形，矩形，正方形。并按照你的理解实现继承关系。

不用实现方法和属性。

练习2

为刚刚的“平行四边形”，“矩形”，“正方形”类，添加以下属性和方法：

属性：夹角，边长
方法：获得夹角，设定夹角，获得边长，设定边长

练习3

考虑一下，刚刚你实现的类中，“设定夹角”方法，对于“长方形”和“正方形”是一个合理的方法么？

思考一下，哪个原则错了？

在父类能够出现的地方，一定能够使用子类替换。
长方形是一种平行四边形。

练习4

增加一个用于计算面积的函数，考虑是否可以复用呢？

练习5

现在实现一个方法，判断两个对象是否相等。

Python程序设计原则

Duck typing
最小惊讶原则
万物皆接口
尽早崩溃

要注意到，属性，参数名，列表结构，生成器界面，在Python中也是接口。

字符编码

字符集

GB2312：GB 2312-80，1981年5月1日，6763个字。
GBK：1995年12月15日，21003个字。
GB18030：GB 18030-2005，2006年5月1日，70244个字。
BIG5：业界标准，收入CNS 11643，13060个字。
UNICODE：Unicode 9.0，2016年6月，128237个字。

注：资料来源，wikipedia。国家标准代码 Unicode

GBK比CP936多了95个字。

编码方案

编码方案基本分为两类：变长和定长。

GB2312/GBK使用变长编码。单字编码长度1-2字节。
GB18030使用变长编码。单字编码长度1,2,4字节。
UCS-2采用定长编码。单字编码长度2字节。
UCS-2采用定长编码。单字编码长度4字节。
UTF-8使用变长编码。单字编码长度1-3字节。（如果加上BMP之外的话为1-6）

变长类的方案在微软里称为MBCS。

兼容性

字符集兼容性：GB18030下行兼容GBK，GBK下行兼容GB2312。GB18030兼容Unicode3.1中日韩表意文字区。

编码兼容性：UCS-2/UCS-4互相不兼容。UTF-8独立。GB18030的编码方式下行兼容GB2312/GBK。除了UCS-2/UCS-4以外，都兼容ascii。

注意提到GBK很多地方都被写为GB2312。

Unicode

b"hello world"

'hello world'

type(b"hello world")

str

u"hello world"

u'hello world'

type(u"hello world")

unicode

这里需要注意，Python内部的Unicode编码格式为UCS-4，而非UTF-8或者UCS-2。

Unicode IO

# encoding=utf-8
import iof = io.open("abc.txt", "wt", encoding="utf-8")
f.write(u"想象一下这里是某些中文内容，也可能是日文或者韩文")
f.close()# 如果在windows下，这个行为反而可能出现乱码。因为windows的默认编码为CP936，而不是UTF-8。
text = io.open("abc.txt", encoding="utf-8").read()
print(text)

想象一下这里是某些中文内容，也可能是日文或者韩文

!ls -l abc.txt

-rw-r--r-- 1 shell shell 72 10月 13 11:49 abc.txt

!cat abc.txt

想象一下这里是某些中文内容，也可能是日文或者韩文

# encoding=utf-8
import iof = io.open("abc.txt", "wt", encoding="gbk")
f.write(u"想象一下这里是某些中文内容，也可能是日文或者韩文")
f.close()text = io.open("abc.txt", encoding="gbk").read()
print(text)text = open("abc.txt").read()
print(text)text = io.open("abc.txt", encoding="utf-8").read()
print(text)

想象一下这里是某些中文内容，也可能是日文或者韩文
����һ��������ĳЩ�������ݣ�Ҳ���������Ļ��ߺ���---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)<ipython-input-42-d2ca517a40b2> in <module>()12 print(text)13 
---> 14 text = io.open("abc.txt", encoding="utf-8").read()15 print(text)/usr/lib/python2.7/codecs.pyc in decode(self, input, final)312         # decode input (taking the buffer into account)313         data = self.buffer + input
--> 314         (result, consumed) = self._buffer_decode(data, self.errors, final)315         # keep undecoded input until the next call316         self.buffer = data[consumed:]UnicodeDecodeError: 'utf8' codec can't decode byte 0xcf in position 0: invalid continuation byte

!ls -l abc.txt

-rw-r--r-- 1 shell shell 48 10月 13 11:49 abc.txt

!cat abc.txt

����һ��������ĳЩ�������ݣ�Ҳ���������Ļ��ߺ���

练习

写一个程序，将文件从指定编码转换成指定编码。

答案

with open('abc.txt', 'rb') as fi:data = fi.read()
data = data.decode('gbk').encode('utf-8')
with open('abc.txt', 'wb') as fo:fo.write(data)

!cat abc.txt

想象一下这里是某些中文内容，也可能是日文或者韩文

!rm -f abc.txt

练习（可选）

为上述程序增加猜测编码功能。

注：本题目需要自行查阅chardet库的文档。这是一个猜测字符串编码的库。

答案

# encoding=utf-8
import chardettext = u"想象一下这里是某些中文内容，也可能是日文或者韩文".encode('gbk')
print(text)print(chardet.detect(text))
text = text.decode(chardet.detect(text)['encoding'])
print(text)
print(type(text))

����һ��������ĳЩ�������ݣ�Ҳ���������Ļ��ߺ���
{'confidence': 0.99, 'encoding': 'GB2312'}
想象一下这里是某些中文内容，也可能是日文或者韩文
<type 'unicode'>

# encoding=utf-8
import chardettext = u"想象一下这里是某些中文内容，也可能是日文或者韩文".encode('utf-8')
print(text)print(chardet.detect(text))
text = text.decode(chardet.detect(text)['encoding'])
print(text)
print(type(text))

想象一下这里是某些中文内容，也可能是日文或者韩文
{'confidence': 0.99, 'encoding': 'utf-8'}
想象一下这里是某些中文内容，也可能是日文或者韩文
<type 'unicode'>

正则表达式入门

如果需要将一系列内容表述成文字，我们可以很方便的使用format来达成这一目标。

然而如果是表达完成的文字解析为内容（例如int）呢？format是否有一个反操作，能让我们方便的解析内容。

当然，最傻的办法就是手写文字解析器。时间成本巨大，而且可变性很差。

我们希望一种方法，能够使用和format里的语句“差不多”的东西完整这个功能。

幸运的是，我们可以用正则表达式完成类似的功能（当然，有很大区别）。

正则表达式是用于从文本中提取内容的表达式，你可以认为他是从字符串中提取字符串的字符串。

从*和?说起

在搜索文件时，我们经常会用*和?表示适配一个或多个字符。

正则的思路和这个类似，但是更加复杂一些。他不但能表示匹配一个或多个字符，而且还能限定匹配哪个或者哪些字符，匹配多少。

匹配规则

. 匹配除换行符以外的任意字符
\w 匹配字母或数字或下划线或汉字
\s 匹配任意的空白符
\d 匹配数字
\b 匹配单词的开始或结束
^ 匹配字符串的开始
$ 匹配字符串的结束

重复次数

* 重复零次或更多次
+ 重复一次或更多次
? 重复零次或一次
{n} 重复n次
{n,} 重复n次或更多次
{n,m} 重复n到m次

例子

He was carefully disguised but captured quickly by police.

\w+ly

‘carefully’, ‘quickly’

字符范围

可以用[…]的语法来匹配自定义内容，例如[a-z0-9]*，表示小写或数字重复任意次。

可以用^表示求反，除去这个字符不匹配。

捕获

在正则表达式中，可以用(…)的语法表示捕获部分内容。

例如([^/]*)，表示匹配直到/之前的所有内容，并且捕获。

被捕获的内容可以后续使用（例如用于内容替换，像sed），或者在一次匹配中或的多个内容（例如在re库中使用groups，后面会讲）。

非贪婪匹配

常规来说，正则会匹配“尽量长”的字符。例如我们用(h.*o)，试图匹配hello, fox中的hello。但是实际结果是hello, fo。

可以在匹配规则后面加?，表示非贪婪匹配。

此时正则会尽力匹配“尽量短”的字符。例如(h.*?o)去匹配hello, fox，此时得到hello。

其他材料

正则表达式30分钟入门教程

练习

写出一个正则表达式，匹配手机号码。

写出一个正则表达式，匹配ipv4地址。

观看下面两个范例，解释逻辑:

匹配url: (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*)?

匹配email地址: [a-z]([a-z0-9]*[-_]?[a-z0-9]+)*@([a-z0-9]*[-_]?[a-z0-9]+)+[\.][a-z]{2,3}([\.][a-z]{2})?

Python库示例

标准库:sys

import sys
print(sys.version_info)
print(sys.version_info.major == 3)

sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
False

标准库:logging

import os
import platform
import loggingprint('normal output')logger = logging.getLogger()
logger.setLevel(logging.DEBUG)logging.debug("Start of the program")
logging.info("Doing something")
logging.warning("Dying now")

DEBUG:root:Start of the program
INFO:root:Doing something
WARNING:root:Dying nownormal output

re

re是一个用来处理正则表达式的库。

compile 将一个正则表达式“编译”。编译后的正则表达式对象比直接运行拥有更快的速度。
search 在一系列文字中搜索。
match 在一系列文字中匹配。
split 利用正则将文字分裂为多个部分。
findall 找到所有符合正则的子字符串。
sub 对文本进行替换。

import res = 'He was carefully disguised but captured quickly by police.'
r = repile('\w+ly')
r.findall(s)

['carefully', 'quickly']

search和match的区别

match一定从字符串的头开始匹配，search搜索全部文本。search即使用了^，也可能匹配到一个新行。

因此，要匹配头部，或者整个完整字符串，需要用match。而要在文字中搜索，需要用search。

细节：从复杂性上看，理所当然的，match更快。

练习

请写程序验证在“正则表达式”一节中所写的那些正则。

答案

import rer = repile('1\d{10}')
print(r.match('13512345678'))
print(r.match('021-10101010'))

<_sre.SRE_Match object at 0x7fc8c9c6e1d0>
None

import rer = repile(r'\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}')
print(r.match('192.168.0.1'))
print(r.match('1200.5.4.3'))
print(r.match('abc.def.ghi.jkl'))print(r.match('192.168.0.999'))

<_sre.SRE_Match object at 0x7fc8c9c6e370>
None
None
<_sre.SRE_Match object at 0x7fc8c9c6e370>

import re# source: .html
r = repile(r'(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*)?')
print(r.match('.aspx'))
print(r.match('regxlib'))

<_sre.SRE_Match object at 0x7fc8c9c44458>
None

import re# source: 
r = repile(r'[a-z]([a-z0-9]*[-_]?[a-z0-9]+)*@([a-z0-9]*[-_]?[a-z0-9]+)+[\.][a-z]{2,3}([\.][a-z]{2})?')
print(r.match('shell999@gmail'))
print(r.match('000@shell@1'))

<_sre.SRE_Match object at 0x7fc8c9c440c8>
None

pickle

import pickle
shoplistfile = 'shoplist.data'
shoplist = ['apple', 'mango', 'carrot']
f = open(shoplistfile, 'wb')
pickle.dump(shoplist, f)
f.close()
del shoplist
f = open(shoplistfile, 'rb')
storedlist = pickle.load(f)
print(storedlist)

['apple', 'mango', 'carrot']

!ls -l shoplist.data

-rw-r--r-- 1 shell shell 46 10月 13 11:49 shoplist.data

!rm shoplist.data

datetime

datetime.fromtimestamp
datetime.now
datetime.strptime
datetime.strftime

练习

不要写任何代码，求出攻占攻占巴士底狱发生在周几？

提示：攻占巴士底狱发生在西历1789年7月14日。

答案

import datetime
print(datetime.date(1789, 7, 14).weekday())

帮助文档看这里。

math

math.exp
math.log
math.pow

练习

算出2**15有多少位数。2**11515呢？2**115151515呢？

答案

2**15
>> 32768

len(str(2**15))
>> 5

2**1515
>> 1149326528034702581682260802282483154731794952366680598907315749676350303455258941969435226491616272765850432529872896570140083779714087641224676603265178037383658094111263859162703551253166913832114001816457662609862015656425761751871770440149072630404384442015791262449967076912697519498726922973530707535919651560412212344672313695699099755325593751836252686364599363077472034064061672117649734107141736148353976201394067293417855894280067914417745952768L

len(str(2**1515))
>> 457

# log(2**15151515) == log2 * 15151515 == log10 * (log2/log10) * 15151515
# 2**15151515 == 10**((log2/log10) * 15151515)
import mathprint((math.log(2)/math.log(10)) * 15)
print(int((math.log(2)/math.log(10)) * 15) + 1)print((math.log(2)/math.log(10)) * 1515)
print(int((math.log(2)/math.log(10)) * 1515) + 1)print((math.log(2)/math.log(10)) * 15151515)
print(int((math.log(2)/math.log(10)) * 15151515) + 1)

4.51544993496
5
456.060443431
457
4561060.49475
4561061

import math
f = lambda x: int((math.log(2)/math.log(10)) * x) + 1print(f(15))
print(f(1515))
print(f(15151515))

5
457
4561061

random

randint 指定区间的随机数字
choice 从集合中随机挑选一个
shuffle 随机打乱集合
random 从0-1之间随机产生一个浮点数

练习

做一个抽奖程序

这里要着重指出，抽奖程序是否支持重复抽奖。尤其是无放回重复抽奖。随机抽奖只需要用choice，无放回抽奖需要用shuffle。

答案

s = u'赵钱孙李周吴郑王'
s = list(s)import random
print(random.choice(s))
print(random.choice(s))
print(random.choice(s))
print(random.choice(s))
print(random.choice(s))
print(random.choice(s))

孙
郑
郑
孙
吴
钱

s = u'赵钱孙李周吴郑王'
s = list(s)import random
random.shuffle(s)
print(s.pop())
print(s.pop())
print(s.pop())
print(s.pop())
print(s.pop())
print(s.pop())

赵
钱
李
周
郑
王

path

basename
dirname
exists
expanduser
getsize
join
realpath
splitext

subprocess

call
check_output
Popen

练习

完成grep -v ^# file | wc -l的功能，注意不能用shell调用。

用python代码完成上述功能。

注意： windows的同学无法做grep和wc，所以这道题目只能用python代码完成这部分功能。

答案

!cat /etc/rc.local

#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.exit 0

!grep -v ^# /etc/rc.local | wc -l

# grep -v ^# file | wc -l
import subprocessp1 = subprocess.Popen(["grep", "-v", "^#", "/etc/rc.local"], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["wc", "-l"], stdin=p1.stdout, stdout=subprocess.PIPE)
output = p2municate()[0]
int(output.strip())

import re
import ior = repile('^#')
# 在python2下无需考虑编码。
# python3下由于re需要处理string而非bytes，因此需要对输入做转码。
with io.open('/etc/rc.local', 'r', encoding='utf-8') as fi:print(len([line for line in fi if not r.match(line)]))

pdb

使用python -m pdb file.py来启动你的某个程序。

然后试试看跟踪运行过程，看看你的解释是否正确。

如果希望在程序中快速定位到某个位置，可以考虑set_trace调用。

unittest

执行某个代码，并且设定一些assert。如果全部正确，那么测试通过。如果出现问题，那么该子项测试失败。

unittest的好处在于，对每个小的代码段，实现一个小的自动化测试脚本，并且这些脚本可以根据框架规则整合到一起。

当代码段增多，每个代码的修改变复杂的时候，依然可以保证这些小的代码片段正确实现了当初预订的逻辑。

只要单元测试是通过的，那么每个小片段代码的逻辑就是大致上正确的。这可以有力的保证最终代码是正确的。

import unittestclass TestStringMethods(unittest.TestCase):def test_upper(self):self.assertEqual('foo'.upper(), 'FOO')def test_isupper(self):self.assertTrue('FOO'.isupper())self.assertFalse('Foo'.isupper())def test_split(self):s = 'hello world'self.assertEqual(s.split(), ['hello', 'world'])# check that s.split fails when the separator is not a stringwith self.assertRaises(TypeError):s.split(2)suite = unittest.TestLoader().loadTestsFromTestCase(TestStringMethods)
unittest.TextTestRunner().run(suite)

...
----------------------------------------------------------------------
Ran 3 tests in 0.004sOK<unittest.runner.TextTestResult run=3 errors=0 failures=0>

练习

针对“正则表达式”一节中的表达式，建立正向和反向用例，检验其正确性。

答案

import re
import unittest, unittest.loaderclass TestMobile(unittest.TestCase):r = repile('1\d{10}')def test_match(self):self.assertTrue(self.r.match('13512345678'))def test_notmatch(self):self.assertFalse(self.r.match('021-10101010'))class TestInetAddr(unittest.TestCase):r = repile(r'\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}')def test_ipv4(self):self.assertTrue(self.r.match('192.168.0.1'))def test_notmatch(self):self.assertFalse(self.r.match('1200.5.4.3'))self.assertFalse(self.r.match('abc.def.ghi.jkl'))def test_wrong(self):# 这里会报错，这是正常的self.assertFalse(self.r.match('192.168.0.999'))

class TestUrl(unittest.TestCase):# source: .htmlr = repile(r'(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*)?')def test_url(self):self.assertTrue(self.r.match('.aspx'))def test_domain(self):self.assertFalse(self.r.match('regxlib'))class TestEmail(unittest.TestCase):# source:  = repile(r'[a-z]([a-z0-9]*[-_]?[a-z0-9]+)*@([a-z0-9]*[-_]?[a-z0-9]+)+[\.][a-z]{2,3}([\.][a-z]{2})?')def test_email(self):self.assertTrue(self.r.match('shell999@gmail'))def test_notmatch(self):self.assertFalse(self.r.match('000@shell@1'))

suite = unittest.TestSuite()
loader = unittest.TestLoader()
suite.addTest(loader.loadTestsFromTestCase(TestMobile))
suite.addTest(loader.loadTestsFromTestCase(TestInetAddr))
suite.addTest(loader.loadTestsFromTestCase(TestUrl))
suite.addTest(loader.loadTestsFromTestCase(TestEmail))
unittest.TextTestRunner().run(suite)

....F....
======================================================================
FAIL: test_wrong (__main__.TestInetAddr)
----------------------------------------------------------------------
Traceback (most recent call last):File "<ipython-input-74-ea312ac19d96>", line 25, in test_wrongself.assertFalse(self.r.match('192.168.0.999'))
AssertionError: <_sre.SRE_Match object at 0x7fc8c9c8b440> is not false----------------------------------------------------------------------
Ran 9 tests in 0.006sFAILED (failures=1)<unittest.runner.TextTestResult run=9 errors=0 failures=1>

复杂例子：计算器

记得我们上面做过的练习“计算器”么？下面我们要做一个带有优先级和括号处理能力的复杂版本计算器。

请先思考一下，这类计算器如何做。然后试着实现一下想法。（假定目前只支持双目算符，但是优先级可能超过两个级别）

我实现了一下，用了20分钟。有这个时间内能够做出来的学生请务必联系我。

复杂计算器的核心问题在于算符优先级。

最简单来说，我们面对的问题核心即是a op1 b op2 c的表达式计算问题。

如果op1优先级高，我们需要计算a op1 b。如果op2高，原则上我们应当计算b op2 c。

但是实际上，由于表达式构成一个很长的链条，因此即便op2优先级比较高，也无法确定可以先算op2，有可能后续还有一个优先级更高的op3。

因此，我们需要构建一个list，并记录一个计算位置。

如果这个位置的优先级比前面的更高，那么继续向后移动。如果优先级比前面更低或相等，那么先计算前面的数据，并且将被计算部分从list中移走，插入计算结果。

在表达式结束的时候，通过持续的计算和缩减前面的数据，可以使前面的数据剩下且只剩下一项。这项即是我们需要的结果。

括号的问题更容易解决一些。

我们可以在看到(的时候，将剩余表达式作为一个新表达式传递给自身（递归）来求值，并且将)看做是一种表达式终结。

这样，()中的内容会被表达式计算函数吸收并运算成结果，替代在原始位置。经过有限次递归后，这个问题总是可以解决的。

答案

#!/usr/bin/python
# -*- coding: utf-8 -*-
'''
@date: 2016-09-02
@author: Shell.Xu
@copyright: 2016, Shell.Xu <shell909090@gmail>
@license: BSD-3-clause
'''op_set = {'(': None,')': None,'+': lambda x, y: x+y,'-': lambda x, y: x-y,'*': lambda x, y: x*y,'/': lambda x, y: float(x)/y,}op_priority = {'+': 10, '-': 10,'*': 20,'/': 20,}def parser_exp(exp):s = ''for c in exp:if c in op_set:if s: yield ss = ''yield celse: s += cif s: yield s

def find_last_op(l):for e in reversed(l):if e in op_set: return edef force_stack(stack):r = stack.pop(-1)op = stack.pop(-1)l = stack.pop(-1)result = op_set[op](l, r)print('{} {} {} => {}'.format(op, l, r, result))stack.append(result)def eval_exp(exp):stack = []for e in exp:if e == ')': breakelif e == '(':stack.append(eval_exp(exp))continueif e not in op_set:stack.append(float(e))continuewhile True:last = find_last_op(stack)if last is None or op_priority[last] < op_priority[e]: breakforce_stack(stack)stack.append(e)while len(stack) > 1:force_stack(stack)return stack[0]

def calc(exp):s = parser_exp(exp)return eval_exp(s)print(calc('2*3+1'))
print(calc('1+2*3'))
print(calc('(1.1+2.5/10)*3/4+1'))
print(eval('(1.1+2.5/10)*3/4+1'))

* 2.0 3.0 => 6.0
+ 6.0 1.0 => 7.0
7.0
* 2.0 3.0 => 6.0
+ 1.0 6.0 => 7.0
7.0
/ 2.5 10.0 => 0.25
+ 1.1 0.25 => 1.35
* 1.35 3.0 => 4.05
/ 4.05 4.0 => 1.0125
+ 1.0125 1.0 => 2.0125
2.0125
2.0125

pep8代码规范

对齐规则
导入规则
空格规则
注释规则
命名规则
异常规则
编程细节

对齐规则

使用空格而非tab，除非文件已经使用tab缩进。
使用4空格缩进。对于跨行而言，4空格缩进是可选的。
一行代码最长79个字符。对于不遵从结构的长段落（例如docstring或注释），行长度限制应当在72个字符以内。
跨行代码对齐同级首个元素位置，或者使用”首元素出现在上一行增加一个缩进”的对齐风格。
如果跨行缩进和后文缩进保持同一垂直位置，需要增加一个缩进来使其清晰。
跨行表达式的双目算符应出现在操作元素行首。
独立的顶级函数和类应该被两行空行包围。类里面的函数应被一行空行包围。
鼓励使用空行来分割代码中的逻辑段落，形成有节奏感的代码。

导入规则

文件应使用utf-8编码。符号应使用英文（但不排斥拼音）。
一行import操作一个module，但是from import多个符号是OK的。
import应该始终在文件头部。
import的推荐顺序是，系统库，第三方库，程序本身。
应使用绝对import。显式相对导入也是可以接受的。
泛式import（from import *）应被禁止，因为可能污染命名空间。
模块级dunders（例如__all__，__version__），应定义在doc string和__future__的后面，但是在import的前面。
文件的推荐次序依次是Sha-Bang，encoding声明，注释，module doc string，__future__ import，dunders，import，global和常量定义。

文件里唯一应当出现非ascii字符的地方，只有测试非ascii字符，和作者的名字。

空格规则

以下情况不应使用空格：
- 括号，方括号和花括号内部的包围空格。
- 冒号，逗号和分号前面的空格。
- 函数和参数括号之间，列表和索引之间。
- 批量赋值时使用超过一个空格进行对齐。
- 行尾的空字符。
以下情况请务必使用空格：
- 在赋值，比较，逻辑算符的两侧。
- 参数默认值和按名传参中的=两侧不要使用空格。
以下情况建议使用算符：
- 表达式中优先级最低的算符两侧。