Boosted negative sampling by quadratically constrained entropy maximization

Taygun Kekec; David Mimno; David M.J. Tax

doi:10.1016/j.patrec.2019.04.027

Boosted negative sampling by quadratically constrained entropy maximization

Taygun Kekec, David Mimno, David M.J. Tax

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

1 Citation (Scopus)

Abstract

Learning probability densities for natural language representations is a difficult problem because language is inherently sparse and high-dimensional. Negative sampling is a popular and effective way to avoid intractable maximum likelihood problems, but it requires correct specification of the sampling distribution. Previous state of the art methods rely on heuristic distributions that appear to do well in practice. In this work, we define conditions for optimal sampling distributions and demonstrate how to approximate them using Quadratically Constrained Entropy Maximization(QCEM). Our analysis shows that state of the art heuristics are restrictive approximations to our proposed framework. To demonstrate the merits of our formulation, we apply QCEM to matching synthetic exponential family distributions and to finding high dimensional word embedding vectors for English. We are able to achieve faster inference on synthetic experiments and improve the correlation on semantic similarity evaluations on the Rare Words dataset by 4.8%.

Original language	English
Pages (from-to)	310-317
Number of pages	8
Journal	Pattern Recognition Letters
Volume	125
DOIs	https://doi.org/10.1016/j.patrec.2019.04.027
Publication status	Published - 2019

Keywords

Contrastive learning
Entropy maximization
Negative sampling
Semantic similarity
Word embeddings

Access to Document

10.1016/j.patrec.2019.04.027

Cite this

@article{a0f728fa7efb43b98931fa5f91de22c1,

title = "Boosted negative sampling by quadratically constrained entropy maximization",

abstract = "Learning probability densities for natural language representations is a difficult problem because language is inherently sparse and high-dimensional. Negative sampling is a popular and effective way to avoid intractable maximum likelihood problems, but it requires correct specification of the sampling distribution. Previous state of the art methods rely on heuristic distributions that appear to do well in practice. In this work, we define conditions for optimal sampling distributions and demonstrate how to approximate them using Quadratically Constrained Entropy Maximization(QCEM). Our analysis shows that state of the art heuristics are restrictive approximations to our proposed framework. To demonstrate the merits of our formulation, we apply QCEM to matching synthetic exponential family distributions and to finding high dimensional word embedding vectors for English. We are able to achieve faster inference on synthetic experiments and improve the correlation on semantic similarity evaluations on the Rare Words dataset by 4.8%.",

keywords = "Contrastive learning, Entropy maximization, Negative sampling, Semantic similarity, Word embeddings",

author = "Taygun Kekec and David Mimno and Tax, {David M.J.}",

year = "2019",

doi = "10.1016/j.patrec.2019.04.027",

language = "English",

volume = "125",

pages = "310--317",

journal = "Pattern Recognition Letters",

issn = "0167-8655",

publisher = "Elsevier",

}

TY - JOUR

T1 - Boosted negative sampling by quadratically constrained entropy maximization

AU - Kekec, Taygun

AU - Mimno, David

AU - Tax, David M.J.

PY - 2019

Y1 - 2019

N2 - Learning probability densities for natural language representations is a difficult problem because language is inherently sparse and high-dimensional. Negative sampling is a popular and effective way to avoid intractable maximum likelihood problems, but it requires correct specification of the sampling distribution. Previous state of the art methods rely on heuristic distributions that appear to do well in practice. In this work, we define conditions for optimal sampling distributions and demonstrate how to approximate them using Quadratically Constrained Entropy Maximization(QCEM). Our analysis shows that state of the art heuristics are restrictive approximations to our proposed framework. To demonstrate the merits of our formulation, we apply QCEM to matching synthetic exponential family distributions and to finding high dimensional word embedding vectors for English. We are able to achieve faster inference on synthetic experiments and improve the correlation on semantic similarity evaluations on the Rare Words dataset by 4.8%.

AB - Learning probability densities for natural language representations is a difficult problem because language is inherently sparse and high-dimensional. Negative sampling is a popular and effective way to avoid intractable maximum likelihood problems, but it requires correct specification of the sampling distribution. Previous state of the art methods rely on heuristic distributions that appear to do well in practice. In this work, we define conditions for optimal sampling distributions and demonstrate how to approximate them using Quadratically Constrained Entropy Maximization(QCEM). Our analysis shows that state of the art heuristics are restrictive approximations to our proposed framework. To demonstrate the merits of our formulation, we apply QCEM to matching synthetic exponential family distributions and to finding high dimensional word embedding vectors for English. We are able to achieve faster inference on synthetic experiments and improve the correlation on semantic similarity evaluations on the Rare Words dataset by 4.8%.

KW - Contrastive learning

KW - Entropy maximization

KW - Negative sampling

KW - Semantic similarity

KW - Word embeddings

UR - http://www.scopus.com/inward/record.url?scp=85065722196&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2019.04.027

DO - 10.1016/j.patrec.2019.04.027

M3 - Article

AN - SCOPUS:85065722196

SN - 0167-8655

VL - 125

SP - 310

EP - 317

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

ER -

Boosted negative sampling by quadratically constrained entropy maximization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this