Sem2Vec: Semantic Word Vectors with Bidirectional Constraint Propagations

Taygun  Kekec; David M.J. Tax

doi:10.1109/TKDE.2019.2942021

Sem2Vec: Semantic Word Vectors with Bidirectional Constraint Propagations

Taygun Kekec, David M.J. Tax

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

Abstract

Word embeddings learn a vector representation of words, which can be utilized in a large number of natural language processing applications. Learning these vectors shares the drawback of unsupervised learning: representations are not specialized for semantic tasks. In this work, we propose a full-fledged formulation to effectively learn semantically specialized word vectors (Sem2Vec) by creating shared representations of online lexical sources such as Thesaurus and lexical dictionaries. These shared representations are treated as semantic constraints for learning the word embeddings. Our methodology addresses size limitation and weak informativeness of these lexical sources by employing a bidirectional constraint propagation step. Unlike raw unsupervised embeddings that exhibit low stability and easily subject to changes under randomness, our semantic formulation learns word vectors that are quite stable. An extensive empirical evaluation on the word similarity task comprised of 11 word similarity datasets is provided where our vectors suggest notable performance gains over state of the art competitors. We further demonstrate the merits of our formulation in document text classification task over large collections of documents.

Original language	English
Article number	8840868
Pages (from-to)	1750-1762
Number of pages	13
Journal	IEEE Transactions on Knowledge and Data Engineering
Volume	33
Issue number	4
DOIs	https://doi.org/10.1109/TKDE.2019.2942021
Publication status	Published - 2021

Keywords

constraint propagation
embedding stability
semantic embeddings
thesaurus
Word embeddings

Access to Document

10.1109/TKDE.2019.2942021

Cite this

@article{270bc579abbf4cfbbea9d7c1c3821318,

title = "Sem2Vec: Semantic Word Vectors with Bidirectional Constraint Propagations",

abstract = "Word embeddings learn a vector representation of words, which can be utilized in a large number of natural language processing applications. Learning these vectors shares the drawback of unsupervised learning: representations are not specialized for semantic tasks. In this work, we propose a full-fledged formulation to effectively learn semantically specialized word vectors (Sem2Vec) by creating shared representations of online lexical sources such as Thesaurus and lexical dictionaries. These shared representations are treated as semantic constraints for learning the word embeddings. Our methodology addresses size limitation and weak informativeness of these lexical sources by employing a bidirectional constraint propagation step. Unlike raw unsupervised embeddings that exhibit low stability and easily subject to changes under randomness, our semantic formulation learns word vectors that are quite stable. An extensive empirical evaluation on the word similarity task comprised of 11 word similarity datasets is provided where our vectors suggest notable performance gains over state of the art competitors. We further demonstrate the merits of our formulation in document text classification task over large collections of documents.",

keywords = "constraint propagation, embedding stability, semantic embeddings, thesaurus, Word embeddings",

author = "Taygun Kekec and Tax, {David M.J.}",

year = "2021",

doi = "10.1109/TKDE.2019.2942021",

language = "English",

volume = "33",

pages = "1750--1762",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE",

number = "4",

}

TY - JOUR

T1 - Sem2Vec

T2 - Semantic Word Vectors with Bidirectional Constraint Propagations

AU - Kekec, Taygun

AU - Tax, David M.J.

PY - 2021

Y1 - 2021

N2 - Word embeddings learn a vector representation of words, which can be utilized in a large number of natural language processing applications. Learning these vectors shares the drawback of unsupervised learning: representations are not specialized for semantic tasks. In this work, we propose a full-fledged formulation to effectively learn semantically specialized word vectors (Sem2Vec) by creating shared representations of online lexical sources such as Thesaurus and lexical dictionaries. These shared representations are treated as semantic constraints for learning the word embeddings. Our methodology addresses size limitation and weak informativeness of these lexical sources by employing a bidirectional constraint propagation step. Unlike raw unsupervised embeddings that exhibit low stability and easily subject to changes under randomness, our semantic formulation learns word vectors that are quite stable. An extensive empirical evaluation on the word similarity task comprised of 11 word similarity datasets is provided where our vectors suggest notable performance gains over state of the art competitors. We further demonstrate the merits of our formulation in document text classification task over large collections of documents.

AB - Word embeddings learn a vector representation of words, which can be utilized in a large number of natural language processing applications. Learning these vectors shares the drawback of unsupervised learning: representations are not specialized for semantic tasks. In this work, we propose a full-fledged formulation to effectively learn semantically specialized word vectors (Sem2Vec) by creating shared representations of online lexical sources such as Thesaurus and lexical dictionaries. These shared representations are treated as semantic constraints for learning the word embeddings. Our methodology addresses size limitation and weak informativeness of these lexical sources by employing a bidirectional constraint propagation step. Unlike raw unsupervised embeddings that exhibit low stability and easily subject to changes under randomness, our semantic formulation learns word vectors that are quite stable. An extensive empirical evaluation on the word similarity task comprised of 11 word similarity datasets is provided where our vectors suggest notable performance gains over state of the art competitors. We further demonstrate the merits of our formulation in document text classification task over large collections of documents.

KW - constraint propagation

KW - embedding stability

KW - semantic embeddings

KW - thesaurus

KW - Word embeddings

UR - http://www.scopus.com/inward/record.url?scp=85102289837&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2019.2942021

DO - 10.1109/TKDE.2019.2942021

M3 - Article

AN - SCOPUS:85102289837

SN - 1041-4347

VL - 33

SP - 1750

EP - 1762

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 4

M1 - 8840868

ER -

Sem2Vec: Semantic Word Vectors with Bidirectional Constraint Propagations

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this