admin管理员组文章数量:1630197
Hello! Today I am going to reading some literature about NLP/Data governance/Platform digital enablement… To recording them, I’ll put my reading notes on my CSDN blog! Welcome to communicating with me!
文章目录
- 0 Overview
- 1 Introduction
- 2 Term Set Expansion Algorithm Overview
Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow. Mamou et al. 2018
0 Overview
Overall, this paper is short and pithy. It has 4 sections and mainly proposed an algorithm to helping expanding the terms set having similar functional meaning.
1 Introduction
For quickly understanding what problem is this paper mainly address for, we should first learn about what is Term Set Expansion? In the following figure, I’ll show you two forms of similarity among terms.
- TSE based on Topical similarity
Giving a word, then finding other words having a similar topic with it. For example, we input the word “python” and we want to find some words expressing the same theme with it in our corpus having represented with word vectors using linear bag-of-words. As a result, we found “bytecode”,“high-level programming language”… You will see that these words or phrases are description of “python”. - TSE based on Functional similartity
Again, we input a word “python”. Then it generated some terms like Java, C++, C# … via terms set expansion. You must grasp the difference: Java has similar function with python but not a description or supplyment of python.
Knowing the meaning of the term set expansion, we can consider some more complicated situation. Please look at the following figure. Now, we don’t input a single word anymore. We want to put a set of terms and find the expanded set of it. We add two new words and make two independent term sets. In the first set, “yellow” and “orange” are both a description of color. So words in the expanded set must be also color terms. The second seed set is the same.
2 Term Set Expansion Algorithm Overview
In this section, the author introduced the algorithm of term set expansion elaborately. I understand it as a circular structure, you can look at the folloing flow chart.
- We should generate the original seed set using for the first iteration by manual collection.
- Trian your word embedding model.
- Set a threshold and find some words has high similarity with the centroid of the seed set. In this step, these words are likely to contain words that do not need to be placed in the expended set.
- Trian a binary classification model to screen the error words we don’t nedd.
- Iterate !
You can read this following PPT for more details.
I’ll share an example of my own practice later.
本文标签: 术语种子readingLiteratureTerm
版权声明:本文标题:【Literature Reading】Term Set Expansion(术语集扩展种子词扩展) 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://www.elefans.com/dianzi/1729056397a1184015.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论