PageRank"/>
PageRank
目录
Routes database
Content
数据集来源
代码
1) 导包
2)读入数据
3)数据探索
4) 提取起飞和目的
5)构建有向图
6) 输出机场排名,按PR值降序
7)定义画网络图函数
Routes database
As of January 2012, the OpenFlights/Airline Route Mapper Route Database contains 59036 routes between 3209 airports on 531 airlines spanning the globe.
Content
The data is ISO 8859-1 (Latin-1) encoded.
Each entry contains the following information:
- Airline 2-letter (IATA) or 3-letter (ICAO) code of the airline.
- Airline ID Unique OpenFlights identifier for airline (see Airline).
- Source airport 3-letter (IATA) or 4-letter (ICAO) code of the source airport.
- Source airport ID Unique OpenFlights identifier for source airport (see Airport)
- Destination airport 3-letter (IATA) or 4-letter (ICAO) code of the destination airport.
- Destination airport ID Unique OpenFlights identifier for destination airport (see Airport)
- Codeshare "Y" if this flight is a codeshare (that is, not operated by Airline, but another carrier), empty otherwise.
- Stops Number of stops on this flight ("0" for direct)
- Equipment 3-letter codes for plane type(s) generally used on this flight, separated by spaces
The special value \N is used for "NULL" to indicate that no value is available.
Notes:
- Routes are directional: if an airline operates services from A to B and from B to A, both A-B and B-A are listed separately.
- Routes where one carrier operates both its own and codeshare flights are listed only once.
数据集来源
Flight Route Database | Kaggle
代码
1) 导包
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import networkx as nx
2)读入数据
data=pd.read_csv("d:/datasets/Flight Route Database.csv")
3)数据探索
data.head()
data.info()
4) 提取起飞和目的
weight_=2
edges=[(i,j,weight_) for i,j in data[[" source airport"," destination apirport"]].values ]
#i为起飞,j为目的,weight为边的权重
5)构建有向图
G = nx.DiGraph() #实例化有向图
for edge in edges:G.add_edge(edge[0], edge[1]) #增加边
pagerank = nx.pagerank(G, alpha=0.85) #计算PR值
G.add_weighted_edges_from(edges) #边权重
6) 输出机场排名,按PR值降序
#pagerank为字典
sorted(pagerank.items(),key=lambda x:x[1],reverse=True)
7)定义画网络图函数
# 画网络图
def show_graph(graph, layout='spring_layout'):# 使用 Spring Layout 布局,类似中心放射状if layout == 'circular_layout':#positions=nx.positions=nx.circular_layout(graph)else:positions=nx.spring_layout(graph)# 设置网络图中的节点大小,大小与 pagerank 值相关,因为 pagerank 值很小所以需要 *200000nodesize = [x['pagerank']*200000 for v,x in graph.nodes(data=True)]# 设置网络图中的边长度edgesize = [e[2]['weight'] for e in graph.edges(data=True)]# 绘制节点nx.draw(graph, positions, node_size=nodesize, alpha=0.4)# 绘制边nx.draw_networkx_edges(graph, positions, alpha=0.2)# 绘制节点的 labelnx.draw_networkx_labels(graph, positions, font_size=10)
8) 输出PR阈值为0.003的机场链接图
nx.set_node_attributes(G, name = 'pagerank', values=pagerank)
nx.set_edge_attributes(G, name = 'weight', values=2)
pagerank_threshold = 0.003
small_graph = G.copy()
# 剪掉 PR 值小于 pagerank_threshold 的节点
for n, p_rank in G.nodes(data=True):if p_rank['pagerank'] < pagerank_threshold:small_graph.remove_node(n)
# 画网络图, 采用 circular_layout 布局让筛选出来的点组成一个圆
show_graph(small_graph, 'circular_layout')
更多推荐
PageRank
发布评论