问题描述
限时送ChatGPT账号..我正在尝试从 Quanteda dfm 中提取已识别的字典单词,但一直无法找到解决方案.
I am trying to extract the identified dictionary words from a Quanteda dfm, but have been unable to find a solution.
有人对此有解决方案吗?
Does someone have a solution for this?
样本输入:
dict <- dictionary(list(season = c("spring", "summer", "fall", "winter")))
dfm <- dfm("summer is great", dictionary = dict)
输出:
> dfm
Document-feature matrix of: 1 document, 1 feature.
1 x 1 sparse Matrix of class "dfmSparse"
features
docs season
text1 1
我现在知道在句子中确定了一个季节性字典词,但我也想知道它是哪个词.
I now know that a seasonality dict word has been identified in the sentence, but I would also like to know which word it was.
最好以表格格式提取:
docs dict dictWord
text1 season summer
推荐答案
您可以使用 keptFeatures
参数创建第二个 dfm,然后 cbind()
将其添加到首先,字典dfm.
You can create a second dfm using the keptFeatures
argument, and then cbind()
it to the first, dictionaried dfm.
dict <- dictionary(list(season = c("spring", "summer", "fall", "winter")))
txt <- "summer is great"
season_dfm <- dfm(txt, dictionary = dict, verbose = FALSE)
dict_dfm <- dfm(txt, select = dict, verbose = FALSE)
cbind(season_dfm, dict_dfm)
## Document-feature matrix of: 1 document, 2 features.
## 1 x 2 sparse Matrix of class "dfmSparse"
## season summer
## text1 1 1
如果您希望输出为表格,则为:
If you want the output as a table, it would be:
dict_df <- as.data.frame(combined_dfm)
names(dict_df)[2] <- "dictWord"
dict_df
## season dictWord
## text1 1 1
但这只适用于每个文本只有一个字典值的情况——否则 dict_dfm
将有多个特征列.
but that only works if you have a single dictionary value per text -- otherwise the dict_dfm
will have multiple feature columns.
这篇关于Quanteda - 提取已识别的字典单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论