admin管理员组

文章数量:1664592

      最近玩线上剧本杀,想着试试爬虫,生成剧本数据的excel文档,看看都有哪些本方便我挑,将具体过程的代码叙述如下。

      我发现游戏有一个分享功能可以分享剧本的连接到微信,在微信中复制连接,得到类似https://m.mszmapp/dm/playbook_detail?id=397(DM带本模式的剧本)或者https://m.mszmapp/store/bookdetail/1882(普通模式剧本)。

     以https://m.mszmapp/store/bookdetail/1882为例,在网页上鼠标右击,菜单栏选择检查。点击network,刷新页面,可以看到很多请求。筛选Fetch/XHR,点击detail查看这个请求的Response,它是如下图所示的json数据。里面包含了大部分网页展示的信息。包括剧本id("id"),名称("name"),价格("cost"),游戏人数("num_player"),剧本海报("image")等等。


id	"1882"
name	"盖茨比庄园迷案"
series	"无"
series_id	0
image	"https://static.mszmapp/images/Fi9Q3psYwlDQnbMQKLDpbhJGs3v1.jpg"
club_id	1880
estimated_time	"4.0"
story_text_length	"13000"
background_html	"<div>这是一个关于爵士乐时代的故事,就像菲茨杰拉德小说里描绘的一样。香烟、酒精、音乐、狐步舞、飞女郎的细高跟和羽毛披肩、别墅里一场接着一场上演的流动的盛宴、上流社会千金一掷的纸醉金迷。</div><div>而这一切都不过是爵士时代最为浅显的外表,真正能够定义爵士时代的,是在爵士乐停止之后那无尽的寂静和空旷。</div><div>此刻盖茨比庄园正在办一场不眠的宴会,在这里我们诚挚地邀请您一起入局。</div><div><br></div><div>开局必读:</div><div>剧本文本量较大,一共五幕请留足时间。</div><div>游戏中有两对CP(菲茨和泽尔达、威廉和阿丽塔),并设有亲密的互动环节,建议可熟人组队。</div><div>警告阿婆老粉,阿加莎味浓厚!</div>"
background	"这是一个关于爵士乐时代的故事,就像菲茨杰拉德小说里描绘的一样。香烟、酒精、音乐、狐步舞、飞女郎的细高跟和羽毛披肩、别墅里一场接着一场上演的流动的盛宴、上流社会千金一掷的纸醉金迷。而这一切都不过是爵士时代最为浅显的外表,真正能够定义爵士时代的,是在爵士乐停止之后那无尽的寂静和空旷。此刻盖茨比庄园正在办一场不眠的宴会,在这里我们诚挚地邀请您一起入局。\n开局必读:剧本文本量较大,一共五幕请留足时间。游戏中有两对CP(菲茨和泽尔达、威廉和阿丽塔),并设有亲密的互动环节,建议可熟人组队。警告阿婆老粉,阿加莎味浓厚!"
num_players	5
max_player	5
min_player	5
editor_rec	""
author_rec	""
updated_time	"2023-08-11 23:06:52"
time	"西方"
style	"现实"
level	"困难"
price	"999999.00"
ori_price	"999999.00"
cost	29
ori_cost	39
share_cost	139
ori_share_cost	199
onsale	5
share_price	"999999.00"
effect_at	null
chatroom_id	"5188372305"
single_mode	0
user_level	0
mark	"6.9"
mark_cnt	134
publish_date	"2023-08-11"
age_level	0
has_truth	true
parent_playbook_id	0
chapter_id	1
chapter_name	""
chapter_image	""
has_previou_story	0
isbn	""
price_info	Object { cost: 29, ori_cost: 39, share_cost: 139, … }
pay_type	1
discount	"7.4"
vip_free	0
presell	0
series_name	"无"
series_uri	""
authors	[ {…} ]
author_id	310
author	"ZNJ"
signed	1
characters	[ {…}, {…}, {…}, {…}, {…} ]
custom_tag	""
adult_only	false
gift	0
unlock_free_enable	1
unlock_free_cost	0
read_progress	0
share	5
own	0
played	0
share_total_cost	139
purchase	0
playbook_id	"1882"
comment	null
error_description	null
room_count	"2"

因此我们只需要按照剧本id遍历所有剧本,获取这些数据再存储到excel表格中就可以达成我们的目的。

环境配置:python3安装爬虫包urllib3,excel读写工具xlwt。

代码:

# -*- coding: utf-8 -*-
import urllib3
from urllib.parse import urlencode
import json
import xlwt
import time
import random
def main():
    http=urllib3.PoolManager();
    # 创建新的workbook(其实就是创建新的excel)
    workbook = xlwt.Workbook(encoding= 'ascii')
   # 创建新的sheet表
    worksheet = workbook.add_sheet("百变大侦探全剧本数据")
    stri=["剧本名","价格","难度等级","评分","风格","发生时代","人数","剧本字数","预计时间(h)","链接"]
    j=0;
    for st in stri:
        worksheet.write(0,j,stri[j])
        j=j+1
    row=1
    for id in range(1,3000,1):
        url="https://m.mszmapp/api/playbook/"+str(id)+"/detail"
        r=http.request('GET', url)
        print(r.data)
        print(id)
        d=json.loads(r.data.decode('utf-8'));
        if 'name' in d:
            worksheet.write(row,0,d['name'])
        else:
            continue;
        if 'cost' in d:
            worksheet.write(row,1,d['cost'])
        else:
            continue;
        if 'level' in d:
            worksheet.write(row,2,d['level'])
        else:
            continue;
        if 'mark' in d:
            worksheet.write(row,3,d['mark'])
        else:
            continue;
        if 'style' in d:
            worksheet.write(row,4,d['style'])
        else:
            continue;
        if 'time' in d:
            worksheet.write(row,5,d['time'])
        else:
            continue;
        if 'num_players' in d:
            worksheet.write(row,6,d['num_players'])
        else:
            continue;
        if 'story_text_length' in d:
            worksheet.write(row,7,d['story_text_length'])
        else:
            continue;
        if 'estimated_time' in d:
            worksheet.write(row,8,d['estimated_time'])
        else:
            continue;
        detailUrl="https://m.mszmapp/store/bookdetail/"+str(id)
        worksheet.write(row,9,detailUrl)
        row=row+1
        
        #time.sleep(random.randint(10,30))
    workbook.save("百变大侦探全剧本数据.xls")
    
if __name__ == '__main__':
    main()

跑完打开百变大侦探全剧本数据.xls,部分结果如图:

剧本名价格难度等级评分风格发生时代人数剧本字数预计时间(h)链接
福威镖局59困难6.2现实古代750003.0https://m.mszmapp/store/bookdetail/1
江湖29困难7.3现实古代850002.5https://m.mszmapp/store/bookdetail/2
救赎之城99烧脑6.6魔幻架空730003.0https://m.mszmapp/store/bookdetail/3
狼人之血59困难7.6魔幻古代750002.5https://m.mszmapp/store/bookdetail/4
太空谋杀案59烧脑6.5科幻架空720003.0https://m.mszmapp/store/bookdetail/5
庙堂(江湖续集)59困难6.2奇幻古代840003.0https://m.mszmapp/store/bookdetail/6
血色南宫59烧脑6.0武侠古代830002.5https://m.mszmapp/store/bookdetail/7
三国·率土之滨69烧脑6.7现实古代730003.5https://m.mszmapp/store/bookdetail/8
腥火燎园59烧脑6.3现实古代830002.5https://m.mszmapp/store/bookdetail/9
弈剑诀39困难6.2武侠古代715003.0https://m.mszmapp/store/bookdetail/10
七宗罪99困难4.6现实现代815002.0https://m.mszmapp/store/bookdetail/11
幽凝(血目续集)59困难5.9奇幻古代620002.5https://m.mszmapp/store/bookdetail/12
血目59困难6.6奇幻古代620002.5https://m.mszmapp/store/bookdetail/13
剧本1019999入门6.0现实古代600.1https://m.mszmapp/store/bookdetail/14
待定入门10.0现实古代000.0https://m.mszmapp/store/bookdetail/15
暗影计划39困难6.1武侠古代602.0https://m.mszmapp/store/bookdetail/16
凤求凰59困难6.3奇幻古代730003.0https://m.mszmapp/store/bookdetail/17
消失的制作人0简单7.8现实现代610001.2https://m.mszmapp/store/bookdetail/18
猎狼入门10.0现实古代0https://m.mszmapp/store/bookdetail/19
船长号的裁决(内测)99困难5.7现实现代730003.0https://m.mszmapp/store/bookdetail/20
测试脚本3入门10.0现实古代-12001.0https://m.mszmapp/store/bookdetail/22
测试脚本4入门10.0现实古代-1110.0https://m.mszmapp/store/bookdetail/23
酒吧杀人计划0入门7.2现实现代510001.0https://m.mszmapp/store/bookdetail/24
四大名捕之铁公鸡(迷你)0入门8.2武侠古代42000.5https://m.mszmapp/store/bookdetail/26
孤儿房入门10.0现实古代0https://m.mszmapp/store/bookdetail/27

 这时候发现一个问题,有的剧本是无效的,是官方的测试数据,所以再筛选一次,删除所有人数小于1以及字数为0的本:

# -*- coding: utf-8 -*-
import urllib3
from urllib.parse import urlencode
import json
import xlwt
import time
import random
#此函数用来判定剧本是不是不合法的
def check(d):
    if 'name' not in d:
        return False
    if 'story_text_length' not in d:
        return False
    elif int(d['story_text_length'])<1:
        return False
    if 'num_players' not in d:
        return False
    elif int(d['num_players'])<1:
        return False
    return True

def main():
    http=urllib3.PoolManager();
    # 创建新的workbook(其实就是创建新的excel)
    workbook = xlwt.Workbook(encoding= 'ascii')
   # 创建新的sheet表
    worksheet = workbook.add_sheet("百变大侦探全剧本数据")
    stri=["剧本名","价格","难度等级","评分","风格","发生时代","人数","剧本字数","预计时间(h)","链接"]
    j=0;
    for st in stri:
        worksheet.write(0,j,stri[j])
        j=j+1
    row=1
    for id in range(1,3000,1):
        url="https://m.mszmapp/api/playbook/"+str(id)+"/detail"
        r=http.request('GET', url)
        print(r.data)
        print(id)
        d=json.loads(r.data.decode('utf-8'));
        if check(d)==False:
            continue
        if 'name' in d:
            worksheet.write(row,0,d['name'])
        else:
            continue;
        if 'cost' in d:
            worksheet.write(row,1,d['cost'])
        else:
            continue;
        if 'level' in d:
            worksheet.write(row,2,d['level'])
        else:
            continue;
        if 'mark' in d:
            worksheet.write(row,3,d['mark'])
        else:
            continue;
        if 'style' in d:
            worksheet.write(row,4,d['style'])
        else:
            continue;
        if 'time' in d:
            worksheet.write(row,5,d['time'])
        else:
            continue;
        if 'num_players' in d:
            worksheet.write(row,6,d['num_players'])
        else:
            continue;
        if 'story_text_length' in d:
            worksheet.write(row,7,d['story_text_length'])
        else:
            continue;
        if 'estimated_time' in d:
            worksheet.write(row,8,d['estimated_time'])
        else:
            continue;
        detailUrl="https://m.mszmapp/store/bookdetail/"+str(id)
        worksheet.write(row,9,detailUrl)
        row=row+1
        
        #time.sleep(random.randint(10,30))
    workbook.save("百变大侦探全剧本数据.xls")
    
if __name__ == '__main__':
    main()

现在结果看起来正常了

剧本名价格难度等级评分风格发生时代人数剧本字数预计时间(h)链接
福威镖局59困难6.2现实古代750003.0https://m.mszmapp/store/bookdetail/1
江湖29困难7.3现实古代850002.5https://m.mszmapp/store/bookdetail/2
救赎之城99烧脑6.6魔幻架空730003.0https://m.mszmapp/store/bookdetail/3
狼人之血59困难7.6魔幻古代750002.5https://m.mszmapp/store/bookdetail/4
太空谋杀案59烧脑6.5科幻架空720003.0https://m.mszmapp/store/bookdetail/5
庙堂(江湖续集)59困难6.2奇幻古代840003.0https://m.mszmapp/store/bookdetail/6
血色南宫59烧脑6.0武侠古代830002.5https://m.mszmapp/store/bookdetail/7
三国·率土之滨69烧脑6.7现实古代730003.5https://m.mszmapp/store/bookdetail/8
腥火燎园59烧脑6.3现实古代830002.5https://m.mszmapp/store/bookdetail/9
弈剑诀39困难6.2武侠古代715003.0https://m.mszmapp/store/bookdetail/10
七宗罪99困难4.6现实现代815002.0https://m.mszmapp/store/bookdetail/11
幽凝(血目续集)59困难5.9奇幻古代620002.5https://m.mszmapp/store/bookdetail/12
血目59困难6.6奇幻古代620002.5https://m.mszmapp/store/bookdetail/13
凤求凰59困难6.3奇幻古代730003.0https://m.mszmapp/store/bookdetail/17
消失的制作人0简单7.8现实现代610001.2https://m.mszmapp/store/bookdetail/18
船长号的裁决(内测)99困难5.7现实现代730003.0https://m.mszmapp/store/bookdetail/20
酒吧杀人计划0入门7.2现实现代510001.0https://m.mszmapp/store/bookdetail/24
四大名捕之铁公鸡(迷你)0入门8.2武侠古代42000.5https://m.mszmapp/store/bookdetail/26
张府悬案简单7.9现实古代58001.0https://m.mszmapp/store/bookdetail/28
南越王陵69烧脑6.7现实现代730003.0https://m.mszmapp/store/bookdetail/29
消失的爱人0简单7.9现实现代715002.0https://m.mszmapp/store/bookdetail/31
流水线惨案(迷你)入门8.1现实现代43000.5https://m.mszmapp/store/bookdetail/35
罪恶班级0简单8.0现实现代55001.0https://m.mszmapp/store/bookdetail/37
死亡数字0简单7.5现实现代65001.5https://m.mszmapp/store/bookdetail/38
消失的客人0简单7.4现实现代55001.0https://m.mszmapp/store/bookdetail/39
环彩·命0简单7.4现实古代410001.5https://m.mszmapp/store/bookdetail/42
米夏尔庄园0简单7.1现实现代58001.0https://m.mszmapp/store/bookdetail/43
疯狂博士0困难7.7科幻架空45001.0https://m.mszmapp/store/bookdetail/45
平凡客栈0简单7.4武侠古代618001.2https://m.mszmapp/store/bookdetail/46
商海忍法帖0简单7.4现实现代510001.5https://m.mszmapp/store/bookdetail/47
恩不爱0简单7.6现实现代520001.0https://m.mszmapp/store/bookdetail/48
宫心计59烧脑6.8现实古代950003.5https://m.mszmapp/store/bookdetail/49
百变山庄(新手教学)0入门8.2现实现代110000.2https://m.mszmapp/store/bookdetail/50
酒店谋杀案简单8.0现实现代58000.5https://m.mszmapp/store/bookdetail/51
谍影重重简单7.2现实近代720001.0https://m.mszmapp/store/bookdetail/52
我不是医神0简单8.1现实古代510001.5https://m.mszmapp/store/bookdetail/53
血溅天香楼0简单7.2现实古代640001.5https://m.mszmapp/store/bookdetail/54

本文标签: 爬虫百变剧本数据大侦探