基于舆论生命周期理论下的情感演化分析（Word2Vec)

编程入门行业动态更新时间:2024-10-06 14:25:46

基于<a href=https://www.elefans.com/category/jswz/34/1743735.html style= 舆论生命周期理论下的情感演化分析（Word2Vec)"/>

基于舆论生命周期理论下的情感演化分析（Word2Vec)

一、生命周期划分

（一）实现效果

（二）实现过程

1.导入源数据

data = pd.read_excel("源数据.xlsx",index_col=0)

2.新增一列，用来计数

data["number"] = 1

3.以天为索引计数，划分时间周期

df_new=data.groupby([pd.Grouper(key='date',freq='d')])[['number']].sum().reset_index()

4.绘图

fig, ax = plt.subplots()
//将横纵坐标数据输入
ax.plot(df_new["date"], df_new['number'])
//设置主刻度, 每1个月一个刻度
fmt_ten_day = mdates.DayLocator(interval=10)
ax.xaxis.set_major_locator(fmt_ten_day)
//设置次刻度，每1天一个刻度
fmt_day = mdates.DayLocator(interval=2) # 默认即可
ax.xaxis.set_minor_locator(fmt_day)
//设置横坐标名称
ax.set_xlabel('date')
//设置纵坐标名称
ax.set_ylabel('number')
//自动调整刻度字符串
fig.autofmt_xdate()
//显示画布
plt.show()

（三）参考文献

(132条消息) 使用 matplotlib 绘制带日期的坐标轴_matplotlib 时间轴_Wreng我是002的博客-CSDN博客
(132条消息) Python数据可视化 Matplotlib详解（一) ——折线图与时序数据绘制_matplotlib折线图按照数据类别_Pigou_的博客-CSDN博客 (132条消息) Python
将日期按年，月，日对数据进行分组归类及绘图_跟据日期绘图 python_abc123susie的博客-CSDN博客

二、情感演化分析（Word2Vec）

（一）前提：根据生命周期理论划分好潜伏期、发展期、高潮期和衰退期

（二）实现效果

潜伏期

（三）实现过程

1.提取潜伏期数据

data_1 = data[(data['date'] >=pd.to_datetime('20221116')) & (data['date'] <= pd.to_datetime('20230102'))]

2.处理float字符

comment_1 = [str(a) for a in data_1["comment"].to_list()]

原因

(132条消息) AttributeError: ‘float’ object has no attribute 'decode’问题的完美解决_dingshaoshuai的博客-CSDN博客

3.分词、去停用词

words_list_1 = []
for i in range(len(comment_1)):# jieba分词seg_list = list(jieba.cut_for_search(comment_1[i]))# 去停用词for seg in seg_list:if seg != "":if seg.strip() not in stopwords_list:words_list_1.append(seg)

4.将words_list_1放入DataFrame中

df_data_1 = pd.DataFrame(words_list_1)

5.用空格将词语分开

words_list_1 = " ".join(words_list_1)

6.放入Word2Vec模型中

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)#使用LineSentence函数处理语料，可避免前期构建语料的复杂性。model_1 = Word2Vec(LineSentence(open('data_1.txt', 'r',encoding = 'utf8')),sg=0,size=100, window=5, min_count=1, workers=8)

7.查看结果

model_1.wv.most_similar('胡鑫宇', topn = 20)  # 与胡鑫宇最相关的前20个词语

8.PCA降维

//将词向量投影到二维空间
rawWordVec = []
word2ind = {}
for i, w in enumerate(model_1.wv.index2word): #index_to_key 序号,词语
rawWordVec.append(model_1.wv[w]) #词向量
word2ind[w] = i #{词语:序号}
rawWordVec = np.array(rawWordVec)
X_reduced = PCA(n_components=2).fit_transform(rawWordVec)

9.降维前后对比

10.散点图绘制

//绘制星空图
//绘制所有单词向量的二维空间投影
fig = plt.figure(figsize = (10, 7))
ax = fig.gca()
ax.set_facecolor('white')
ax.plot(X_reduced[:, 0], X_reduced[:, 1], '.', markersize = 1, alpha = 0.3, color = 'black')
//绘制几个特殊单词的向量
words = ['失望','急切']
//设置中文字体 否则乱码
zhfont1 = matplotlib.font_manager.FontProperties(fname='./华文仿宋.ttf', size=16)
for w in words:
if w in word2ind:
ind = word2ind[w]
xy = X_reduced[ind]
plt.plot(xy[0], xy[1], '.', alpha =1, color = 'orange',markersize=10)
plt.text(xy[0], xy[1], w, fontproperties = zhfont1, alpha = 1, color = 'red')

11.其他时期类似

(四)参考文献

(132条消息) 【NLP】Word2Vec模型文本分类_word2vec 预测分类_AngeloG的博客-CSDN博客 (132条消息)
python提取特定时间段内的数据_python 提取特定时间段内的数据_淮南草的博客-CSDN博客词向量 | word2vec |
理论讲解理论讲解+代码 | 文本分析【python-gensim】_哔哩哔哩_bilibili@TOC

更多推荐

基于舆论生命周期理论下的情感演化分析（Word2Vec)

本文发布于:2024-02-27 20:49:29，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1766643.html