爬虫(二)爬取微信好友、趣味分析"/>
Python网络爬虫(二)爬取微信好友、趣味分析
目录
- 一、模拟登录
- 二、爬取微信好友性别
- 三、爬取微信好友个性签名
- 四、爬取more
- (一)完整数据爬取
- (二)建表存储
一、模拟登录
(一)准备:pip install
以下三个库
库名称 | 作用 |
---|---|
itchart | 模拟微信网页登录(扫码登录) |
pymysql | 数据存储 |
pyecharts | 数据可视化分析 |
(二)itchart模拟登录
弹出二维码,手机扫码登录即可~
itchat.logout()
itchat.login()
# 爬取微信好友相关信息,并返回一个json文件
friends = itchat.get_friends(update=True)[0:]
二、爬取微信好友性别
分析见图中注释:
1 爬取结果:
2 分析结果:
源码:
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import itchat
import pymysql
from pyecharts import Piedef getsex():male = female = other = 0for i in friends[1:]:sex = i["Sex"]if sex == 1:male += 1elif sex == 2:female += 1else:other += 1# 计算好友总数total = len(friends[1:])drawsexPie(male, female, other)# 打印好友性别比例print("男性好友:%s,占比:%.2f%% \n" % (int(male), (float(male) / total * 100)) +"女性好友:%s,占比:%.2f%% \n" % (int(female), (float(female) / total * 100)) +"保密好友:%s,占比:%.2f%% \n" % (int(other), (float(other) / total * 100)))def drawsexPie(male, female, other):attr = ["男性好友", "女性好友", "保密好友"]v1 = [int(male), int(female), int(other)]pie = Pie("性别比例")pie.add("", attr, v1, is_label_show=True)pie.show_config()pie.render()if __name__ == "__main__":itchat.logout()itchat.login()# 爬取微信好友相关信息,并返回一个json文件friends = itchat.get_friends(update=True)[0:]getsex()
三、爬取微信好友个性签名
1 分析结果(用正则表达式略去个性签名中的表情符号,存入词频辞典,生成词云)
源码:
import re
import itchat
import jieba
import matplotlib.pyplot as plt
from wordcloud import WordCloud, ImageColorGenerator
import numpy as np
import PIL.Image as Image# 登录个人微信,扫码登录
itchat.login()
# 爬取自己好友相关信息
friends = itchat.get_friends(update=False)[0:]siglist = []
for i in friends:signature = i["Signature"].strip().replace("span", "").replace("class", "").replace("emoji", "")rep = repile("1f\d+\w*|[<>/=]")signature = rep.sub("", signature)siglist.append(signature)
text = "".join(siglist)wordlist = jieba.cut(text, cut_all=True)
word_space_split = " ".join(wordlist)coloring = np.array(Image.open("back.jpg"))
my_wordcloud = WordCloud(background_color="white", max_words=2000,mask=coloring, max_font_size=60, random_state=42, scale=2,font_path="C:\\Windows\\Fonts\\simsun.ttc").generate(word_space_split)image_colors = ImageColorGenerator(coloring)
plt.imshow(my_wordcloud.recolor(color_func=image_colors))
plt.imshow(my_wordcloud)
plt.axis("off")
plt.show()
四、爬取more
(一)完整数据爬取
- 同目录下创建txt文档用于缓存朋友信息
test.py 源码:
import itchat# 获取个人微信号好友信息
if __name__ == "__main__":# 登录个人微信,扫码登录itchat.login()# 爬取自己好友相关信息friends = itchat.get_friends(update=False)[0:]# 设置需要爬取的信息字段result = [('RemarkName', '备注'), ('NickName', '微信昵称'), ('Sex', '性别'),('City', '城市'), ('Province', '省份'),('ContactFlag', '联系标识'), ('UserName', '用户名'),('SnsFlag', '渠道标识'), ('Signature', '个性签名')]for user in friends:with open('myFriends.txt', 'a', encoding='utf8') as fh:fh.write("-----------------------\n")for r in result:with open('myFriends.txt', 'a', encoding='utf8') as fh:fh.write(r[1] + ":" + str(user.get(r[0])) + "\n")print("完成")
(二)建表存储
数据库建表:
插入一条数据测试:
insert into wechatfriends(remarkname, nickname,sex,city,province,contactflag,username,snsflag,signature) values('cungu', '王yu~', '2', '重庆', '永川', '联系标识', '用户名', '标识', '签名')
参考链接:
Alfred数据室
更多推荐
Python网络爬虫(二)爬取微信好友、趣味分析
发布评论