如何在语料库中手动设置文档 ID?

编程入门行业动态更新时间:2024-10-27 06:29:34

本文介绍了如何在语料库中手动设置文档 ID?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

限时送ChatGPT账号..

我正在从数据帧创建 Copus.我将它作为 VectorSource 传递，因为我只想将一列用作文本源.这可以找到，但是我需要语料库中的文档 ID 来匹配数据框中的文档 ID.文档 ID 存储在原始数据框中的单独列中.

I am creating a Copus from a dataframe. I pass it as a VectorSource as there is only one column I want to be used as the text source. This works find however I need the document ids within the corpus to match the document ids from the dataframe. The document ids are stored in a separate column in the original dataframe.

df <- as.data.frame(t(rbind(c(1,3,5,7,8,10), 
                        c("text", "lots of text", "too much text", "where will it end",         "give peas a chance","help"))))
colnames(df) <- c("ids","textColumn")
library("tm")
library("lsa")
corpus <- Corpus(VectorSource(df[["textColumn"]]))

运行此代码会创建一个语料库，但文档 ID 从 1 到 6 运行.有没有办法创建文档 ID 为 1、3、5、7、8、10 的语料库?

Running this code creates a corpus however the document ids run from 1-6. Is there any way of creating the corpus with the document ids 1,3,5,7,8,10?

推荐答案

嗯，一种简单但不是很优雅的方式来分配你的 id 到你的文档可能如下:

Well, one simple but not very elegant way to assign your ids to your documents afterward could be the following :

for (i in 1:length(corpus)) {
   attr(corpus[[i]], "ID") <- df$ids[i]
}

这篇关于如何在语料库中手动设置文档 ID?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

更多推荐

[db:关键词]

本文发布于:2023-04-30 05:20:55，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1390092.html

语料库中文档如何在 ID

上一篇：如何判断 Tmux 中的哪个窗格被聚焦?
下一篇： PS教程，今天它来了

发布评论取消回复

评论列表（有 0 条评论）

如何在语料库中手动设置文档 ID?

问题描述

推荐答案

发布评论取消回复

最近发表

热门文章

标签列表