结构化数据上的选出运算

编程入门行业动态更新时间:2024-10-14 00:25:49

【摘要】
选出是指在集合中，根据指定条件获取成员。选出与定位计算很相似，不过定位关心的是成员在集合中的位置，而选出关心的是成员记录的详细信息。比如选出入职超过 10 年的员工收入，查询销售额超过一万的客户名称等等。如何简单快捷的实现选出？这里将为你全程剖析，并提供 esProc SPL 示例代码。结构化数据上的选出运算

1. 选出符合条件的成员

【例 1】分栏列出欧洲和非洲人口超 200 万的城市名称及人口（每栏按从多到少排序）。世界城市人口表部分数据如下：

Continent	Country	City	Population
Africa	Egypt	Cairo	6789479
Asia	China	Shanghai	24240000
Europe	Britain	London	7285000
…	…	…	…

期望结果如下：

Europe City	Population	Africa City	Population
Moscow	8389200	Cairo	6789479
London	7285000	Kinshasa	5064000
St Petersburg	4694000	Alexandria	3328196
…	…	…	…

【SPL 脚本】

	A	B
1	=connect("db").query("select * from World where Continent in('Europe','Africa') and Population >= 2000000")	/连接数据库，取出欧洲和非洲超过 200 万人口的记录
2	=A1.select(Continent:"Europe")	/使用 A.select() 函数取出欧洲数据
3	=A1.select(Continent:"Africa")	/使用 A.select() 函数取出非洲数据
4	=create('Europe City',Population,'Africa City', Population)	/按目标结构创建一个空表
5	=A4.paste(A2.(City),A2.(Population),A3.(City),A3.(Population))	/使用 A.paste() 函数将值粘贴到对应列

A4的执行结果如下：

Europe City	Population	Africa City	Population
Moscow	8389200	Cairo	6789479
London	7285000	Kinshasa	5064000
St Petersburg	4694000	Alexandria	3328196
…	…	…	…

【例 2】查询纽约研发部门员工的全名和收入。部分数据如下：

ID	NAME	SURNAME	STATE	DEPT	SALARY
1	Rebecca	Moore	California	R&D	7000
2	Ashley	Wilson	New York	Finance	11000
3	Rachel	Johnson	New Mexico	Sales	9000
4	Emily	Smith	Texas	HR	7000
5	Ashley	Smith	Texas	R&D	16000
…	…	…	…	…	…

【SPL 脚本】

	A	B
1	=connect("db")	/连接数据源
2	=A1.query("select * from Employee")	/导入员工表
3	=A2.select(STATE=="New York"&&DEPT=="R&D")	/使用函数 A.select() 选出纽约研发部门员工的记录。需要同时满足所有条件时，可以使用符号 "&&"。
4	=A3.new(NAME+" "+SURNAME:FULLNAME, SALARY)	/使用符号 + 连接字符串，组成全名

A4的执行结果如下：

FULLNAME	SALARY
Matthew Johnson	6000
Lauren Thomas	12000
Brooke Williams	12000

【例 3】查询各部门 30 岁以下和 50 岁以上的员工的人数。部分数据如下：

ID	NAME	BIRTHDAY	STATE	DEPT	SALARY
1	Rebecca	1974/11/20	California	R&D	7000
2	Ashley	1980/07/19	New York	Finance	11000
3	Rachel	1970/12/17	New Mexico	Sales	9000
4	Emily	1985/03/07	Texas	HR	7000
5	Ashley	1975/05/13	Texas	R&D	16000
…	…	…	…	…	…

【SPL 脚本】

	A	B
1	=connect("db")	/连接数据源
2	=A1.query("select * from Employee")	/导入员工表
3	=A2.select((age=age(BIRTHDAY), age<30 \|\| age>50))	/使用函数 A.select() 选出年龄小于 30 或者年龄大于 50 的员工。需要满足多个条件中任意一个时，可以使用符号 "\|\|"。
4	=A3.groups(DEPT; count(~):Count)	/分组汇总各部门满足条件的人数

A4的执行结果如下：

DEPT	Count
4	9
5	5
6	3
7	6
8	1

2. 查找最大值 / 最小值对应记录

【例 4】根据成绩表，求一班数学成绩最低的学生 ID。部分数据如下：

CLASS	STUDENTID	SUBJECT	SCORE
Class one	1	English	84
Class one	1	Math	77
Class one	1	PE	69
Class one	2	English	81
Class one	2	Math	80
…	…	…	…

【SPL 脚本】

	A	B
1	=connect("db")	/连接数据库
2	=A1.query("select * from Scores where SUBJECT='Math'and CLASS='Class one'")	/查询一班的数学成绩
3	=A2.minp(SCORE)	/使用 A.minp() 函数选出最低分所在记录
4	=A3.STUDENTID	/从记录中取学生 ID

最小值所在记录不一定是唯一的，如果想返回所有记录，可以使用函数 A.minp() 的选项 @a ：

	A	B
3	=A2.minp@a(SCORE)	/使用 A.minp() 函数的 @a 选项，选出所有最高分的记录
4	=A3.(STUDENTID)	/从多条记录中取出学生 ID

A3的执行结果如下：

CLASS	STUDENTID	SUBJECT	SCORE
Class one	5	Math	60
Class one	14	Math	60

A4的执行结果如下：

Member

【例 5】根据奥运会奖牌榜统计表，求总成绩蝉联第一名届数最长的国家。部分数据如下：

Game	Nation	Gold	Silver	Copper
30	USA	46	29	29
30	China	38	27	23
30	UK	29	17	19
30	Russia	24	26	32
30	Korea	13	8	7
…	…	…	…	…

【SPL 脚本】

	A	B
1	=file("Olympic.csv").import@cqt()	/导入奥运会历届排名
2	=A1.sort@z(Game, 1000000Gold+1000Silver+Copper)	/按第几届和总成绩降序排列
3	=A2.group@o1(Game)	/每届取一名，因为有序也就是第一名
4	=A3.group@o(Nation)	/将国家按原序分组
5	=A4.maxp(~.len())	/使用函数 A.maxp() 选出成员数量最多的一组，也就是蝉联次数最多的

A5的执行结果如下：

Game	Nation	Gold	Silver	Copper
10	USA	41	32	30
9	USA	22	18	16
8	USA	45	27	27
7	USA	41	27	28

3. 分区段查找数据

有时候我们需要通过计算数值在区间中的序号，来获取在集合中的对应成员。比如根据考试成绩返回评价（优、良、中、差）；根据家庭年收入返回家庭所处的资产等级（贫困、小康、中产、富裕等）。

【例 6】根据成绩表，统计英语科目优秀、及格和不及格的人数。部分数据如下：

CLASS	STUDENTID	SUBJECT	SCORE
Class one	1	English	84
Class one	1	Math	77
Class one	1	PE	69
Class one	2	English	81
Class one	2	Math	80
…	…	…	…

【SPL 脚本】

	A	B
1	=connect("db").query("select * from Scores where SUBJECT='English'")	/连接数据库，查询英语成绩
2	=create(Assessment,Score).record(["fail",0,"pass",60,"excellent",90])	/创建分数与评价对照表
3	=A1.derive(A2.segp(Score,SCORE).Assessment:Assessment)	/使用 A.segp() 函数根据分数在对照表中的区间序号，获取相应的评价。
4	=A3.groups(Assessment;count(1):Count)	/按评价分组统计人数

A4的执行结果如下：

Assessment	Count
excellent	6
fail	4
pass	18

4. 选出前 N 名 / 后 N 名

【例 7】根据成绩表，求各班各科前两名的学生 ID。部分数据如下：

CLASS	STUDENTID	SUBJECT	SCORE
Class one	1	English	84
Class one	1	Math	77
Class one	1	PE	69
Class one	2	English	81
Class one	2	Math	80
…	…	…	…

【SPL 脚本】

	A	B
1	=connect("db")	/连接数据库
2	=A1.query("select * from Scores")	/查询学生成绩
3	=A2.group(CLASS,SUBJECT;~.top(-2;SCORE):TOP2)	/使用 A.top() 函数获取每个班级各科目分数的前两名。其中 -2 表示从大到小取两名。
4	=A3.conj(TOP2)	/将所有班级各科的前两名记录合并到一起

A4的执行结果如下：

CLASS	STUDENTID	SUBJECT	SCORE
Class one	4	English	96
Class one	9	English	93
Class one	13	Math	97
Class one	10	Math	97
…	…	…	…

5. 根据主键值查找对应记录

根据主键值查找对应记录是很常见的操作，例如根据员工 ID 查找员工记录，根据订单 ID 查询订单详细信息等等。

【例 8】根据相互关联的课程表和选课表，列出所有学生所选课程信息表，每个科目作为一列。课程表部分数据如下：

ID	NAME	TEACHERID
1	Environmental protection and …	5
2	Mental health of College Students	1
3	Computer language Matlab	8
…	…	…

选课表部分数据如下：

ID	STUDENT_NAME	COURSE
1	Rebecca	2,7
2	Ashley	1,8
3	Rachel	2,7,10
…	…	…

希望转换成如下表格：

ID	STUDENT_NAME	COURSE1	COURSE2	COURSE3	…
1	Rebecca	Mental health of College Students	Into Shakespeare		…
2	Ashley	Environmental protection and …	Modern economics		…
3	Rachel	Mental health of College Students	Into Shakespeare	Music appreciation	…
…	…	…	…	…	…

【SPL 脚本】

	A	B
1	=connect("db")	/连接数据库
2	=A1.query("select * from Course").keys(ID)	/读取课程表，并设置主键 ID
3	=A1.query("select * from SelectCourse")	/读取学生选课表
4	=A3.run(COURSE=COURSE.split@cp())	/将选课表中的课程按逗号拆分后赋值给课程字段
5	=A4.max(COURSE.len())	/找到选课最多的数量
6	=create(ID,STUDENT_NAME, ${A5.("COURSE"+string(~)).concat@c()})	/创建空表，课程列按照最多的数量创建
7	>A4.run(A6.record([ID,STUDENT_NAME]\|COURSE.(A2.find(~).Name)))	/循环选课表，将学生 ID、学生姓名以及使用 A.find() 函数查找出的课程名称合并追加到 A6 的表中。

A6的执行结果如下：

ID	STUDENT_NAME	COURSE1	COURSE2	COURSE3
1	Rebecca	Mental health of College Students	Into Shakespeare
2	Ashley	Environmental protection and …	Modern economics
3	Rachel	Mental health of College Students	Into Shakespeare	Music appreciation
…	…	…	…	…