Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications

Daniel Vliegenthart, Sepideh Mesbah*, Christoph Lofi, Akiko Aizawa, Alessandro Bozzon

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)
74 Downloads (Pure)

Abstract

Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.

Original languageEnglish
Title of host publicationDigital Libraries for Open Knowledge
Subtitle of host publication23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Proceedings
EditorsAntoine Doucet, Antoine Isaac, Koraljka Golub, Trond Aalberg, Adam Jatowt
Place of PublicationCham
PublisherSpringer
Pages3-17
Number of pages15
ISBN (Electronic)978-3-030-30760-8
ISBN (Print)978-3-030-30759-2
DOIs
Publication statusPublished - 2019
Event23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019 - Oslo, Norway
Duration: 9 Sept 201912 Sept 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11799 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019
Country/TerritoryNorway
CityOslo
Period9/09/1912/09/19

Fingerprint

Dive into the research topics of 'Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications'. Together they form a unique fingerprint.

Cite this