admin管理员组

文章数量:1630197

Hello! Today I am going to reading some literature about NLP/Data governance/Platform digital enablement… To recording them, I’ll put my reading notes on my CSDN blog! Welcome to communicating with me!

文章目录

  • 0 Overview
  • 1 Introduction
  • 2 Term Set Expansion Algorithm Overview


Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow. Mamou et al. 2018


0 Overview

Overall, this paper is short and pithy. It has 4 sections and mainly proposed an algorithm to helping expanding the terms set having similar functional meaning.

1 Introduction

For quickly understanding what problem is this paper mainly address for, we should first learn about what is Term Set Expansion? In the following figure, I’ll show you two forms of similarity among terms.

  1. TSE based on Topical similarity
    Giving a word, then finding other words having a similar topic with it. For example, we input the word “python” and we want to find some words expressing the same theme with it in our corpus having represented with word vectors using linear bag-of-words. As a result, we found “bytecode”,“high-level programming language”… You will see that these words or phrases are description of “python”.
  2. TSE based on Functional similartity
    Again, we input a word “python”. Then it generated some terms like Java, C++, C# … via terms set expansion. You must grasp the difference: Java has similar function with python but not a description or supplyment of python.
    Knowing the meaning of the term set expansion, we can consider some more complicated situation. Please look at the following figure. Now, we don’t input a single word anymore. We want to put a set of terms and find the expanded set of it. We add two new words and make two independent term sets. In the first set, “yellow” and “orange” are both a description of color. So words in the expanded set must be also color terms. The second seed set is the same.

2 Term Set Expansion Algorithm Overview

In this section, the author introduced the algorithm of term set expansion elaborately. I understand it as a circular structure, you can look at the folloing flow chart.

  • We should generate the original seed set using for the first iteration by manual collection.
  • Trian your word embedding model.
  • Set a threshold and find some words has high similarity with the centroid of the seed set. In this step, these words are likely to contain words that do not need to be placed in the expended set.
  • Trian a binary classification model to screen the error words we don’t nedd.
  • Iterate !


You can read this following PPT for more details.


I’ll share an example of my own practice later.

本文标签: 术语种子readingLiteratureTerm