Python爬取房天下网站深圳房租信息入库并进行数据分析可视化

编程入门 行业动态 更新时间:2024-10-26 02:31:26

Python爬取房天下网站<a href=https://www.elefans.com/category/jswz/34/1769587.html style=深圳房租信息入库并进行数据分析可视化"/>

Python爬取房天下网站深圳房租信息入库并进行数据分析可视化

概述

  • 请求库:requests
  • HTML 解析:BeautifulSoup
  • 词云:wordcloud
  • 数据可视化:pyecharts
  • 数据库:MongoDB
  • 数据库连接:pymongo

爬虫思路&&页面解析

先爬取房某下深圳各个板块的数据,然后存进 MongoDB 数据库,最后再进行数据分析。

![](.png?x-oss-

process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0xpblJ1aUM=,size_16,color_FFFFFF,t_70)

右键网页,查看页面源码,找出我们要爬取得部分

![](.png?x-oss-
process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0xpblJ1aUM=,size_16,color_FFFFFF,t_70)

爬虫源代码实现

    import requestsfrom bs4 import BeautifulSoupimport timefrom pymongo import MongoClientclass HouseSpider:def __init__(self):self.client = MongoClient('mongodb://localhost:27017/')self.zfdb = self.client.zfdbsession = requests.Session()baseUrl = ""# 每个区域的urlurlDir = {"不限": "/house/","宝安": "/house-a089/","龙岗": "/house-a090/","南山": "/house-a087/","福田": "/house-a085/","罗湖": "/house-a086/","盐田": "/house-a088/","龙华区": "/house-a013080/","坪山区": "/house-a013081/","光明新区": "/house-a013079/","大鹏新区": "/house-a013082/","惠州": "/house-a013058/","东莞": "/house-a013057/","深圳周边": "/house-a016375/",}region = "不限"page = 100# 通过名字获取 url 地址def getRegionUrl(self, name="宝安", page=10):urlList = []for index in range(page):if index == 0:urlList.append(self.baseUrl + self.urlDir[name])else:urlList.append(self.baseUrl + self.urlDir[name] + "i3" + str(index + 1) + "/")return urlList# MongoDB 存储数据结构def getRentMsg(self, title, rooms, area, price, address, traffic, region, direction):return {"title": title,  # 标题"rooms": rooms,  # 房间数"area": area,  # 平方数"price": price,  # 价格"address": address,  # 地址"traffic": traffic,  # 交通描述"region": region,  # 区、(福田区、南山区)"direction": direction,  # 房子朝向(朝南、朝南北)}# 获取数据库 collectiondef getCollection(self, name):zfdb = self.zfdbif name == "不限":return zfdb.rentif name == "宝安":return zfdb.baoanif name == "龙岗":return zfdb.longgangif name == "南山":return zfdb.nanshanif name == "福田":return zfdb.futianif name == "罗湖":return zfdb.luohuif name == "盐田":return zfdb.yantianif name == "龙华区":return zfdb.longhuaquif name == "坪山区":return zfdb.pingshanquif name == "光明新区":return zfdb.guangmingxinquif name == "大鹏新区":return zfdb.dapengxinqu#def getAreaList(self):return ["不限","宝安","龙岗","南山","福田","罗湖","盐田","龙华区","坪山区","光明新区","大鹏新区",]def getOnePageData(self, pageUrl, reginon="不限"):rent = self.getCollection(self.region)self.session.headers.update({'

更多推荐

Python爬取房天下网站深圳房租信息入库并进行数据分析可视化

本文发布于:2024-03-13 03:41:13,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1733110.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:深圳   房租   行数   据分析   天下

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!