Sem2Vec: Semantic Word Vectors with Bidirectional Constraint Propagations

Taygun Kekec, David M.J. Tax

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Word embeddings learn a vector representation of words, which can be utilized in a large number of natural language processing applications. Learning these vectors shares the drawback of unsupervised learning: representations are not specialized for semantic tasks. In this work, we propose a full-fledged formulation to effectively learn semantically specialized word vectors (Sem2Vec) by creating shared representations of online lexical sources such as Thesaurus and lexical dictionaries. These shared representations are treated as semantic constraints for learning the word embeddings. Our methodology addresses size limitation and weak informativeness of these lexical sources by employing a bidirectional constraint propagation step. Unlike raw unsupervised embeddings that exhibit low stability and easily subject to changes under randomness, our semantic formulation learns word vectors that are quite stable. An extensive empirical evaluation on the word similarity task comprised of 11 word similarity datasets is provided where our vectors suggest notable performance gains over state of the art competitors. We further demonstrate the merits of our formulation in document text classification task over large collections of documents.

Original languageEnglish
Article number8840868
Pages (from-to)1750-1762
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume33
Issue number4
DOIs
Publication statusPublished - 2021

Keywords

  • constraint propagation
  • embedding stability
  • semantic embeddings
  • thesaurus
  • Word embeddings

Fingerprint

Dive into the research topics of 'Sem2Vec: Semantic Word Vectors with Bidirectional Constraint Propagations'. Together they form a unique fingerprint.

Cite this