Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications

Daniel Vliegenthart; Sepideh Mesbah; Christoph Lofi; Akiko Aizawa; Alessandro Bozzon

doi:10.1007/978-3-030-30760-8_1

Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications

Daniel Vliegenthart, Sepideh Mesbah^*, Christoph Lofi, Akiko Aizawa, Alessandro Bozzon

^*Corresponding author for this work

Web Information Systems

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

1 Citation (Scopus)

74 Downloads (Pure)

Abstract

Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.

Original language	English
Title of host publication	Digital Libraries for Open Knowledge
Subtitle of host publication	23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Proceedings
Editors	Antoine Doucet, Antoine Isaac, Koraljka Golub, Trond Aalberg, Adam Jatowt
Place of Publication	Cham
Publisher	Springer
Pages	3-17
Number of pages	15
ISBN (Electronic)	978-3-030-30760-8
ISBN (Print)	978-3-030-30759-2
DOIs	https://doi.org/10.1007/978-3-030-30760-8_1
Publication status	Published - 2019
Event	23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019 - Oslo, Norway Duration: 9 Sept 2019 → 12 Sept 2019

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11799 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019
Country/Territory	Norway
City	Oslo
Period	9/09/19 → 12/09/19

Access to Document

10.1007/978-3-030-30760-8_1

2019TPDL_ConerAccepted author manuscript, 364 KB

Cite this

Vliegenthart, D., Mesbah, S., Lofi, C., Aizawa, A., & Bozzon, A. (2019). Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications. In A. Doucet, A. Isaac, K. Golub, T. Aalberg, & A. Jatowt (Eds.), Digital Libraries for Open Knowledge : 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Proceedings (pp. 3-17). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11799 LNCS). Springer. https://doi.org/10.1007/978-3-030-30760-8_1

Vliegenthart, Daniel ; Mesbah, Sepideh ; Lofi, Christoph et al. / Coner : A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications. Digital Libraries for Open Knowledge : 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Proceedings. editor / Antoine Doucet ; Antoine Isaac ; Koraljka Golub ; Trond Aalberg ; Adam Jatowt. Cham : Springer, 2019. pp. 3-17 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{e3d634e46ba94ab4b80affaefc527091,

title = "Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications",

abstract = "Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.",

author = "Daniel Vliegenthart and Sepideh Mesbah and Christoph Lofi and Akiko Aizawa and Alessandro Bozzon",

year = "2019",

doi = "10.1007/978-3-030-30760-8_1",

language = "English",

isbn = "978-3-030-30759-2",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "3--17",

editor = "Antoine Doucet and Antoine Isaac and Koraljka Golub and Trond Aalberg and Adam Jatowt",

booktitle = "Digital Libraries for Open Knowledge",

note = "23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019 ; Conference date: 09-09-2019 Through 12-09-2019",

}

Vliegenthart, D, Mesbah, S, Lofi, C, Aizawa, A & Bozzon, A 2019, Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications. in A Doucet, A Isaac, K Golub, T Aalberg & A Jatowt (eds), Digital Libraries for Open Knowledge : 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11799 LNCS, Springer, Cham, pp. 3-17, 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, 9/09/19. https://doi.org/10.1007/978-3-030-30760-8_1

Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications. / Vliegenthart, Daniel; Mesbah, Sepideh; Lofi, Christoph et al.
Digital Libraries for Open Knowledge : 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Proceedings. ed. / Antoine Doucet; Antoine Isaac; Koraljka Golub; Trond Aalberg; Adam Jatowt. Cham: Springer, 2019. p. 3-17 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11799 LNCS).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Coner

T2 - 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019

AU - Vliegenthart, Daniel

AU - Mesbah, Sepideh

AU - Lofi, Christoph

AU - Aizawa, Akiko

AU - Bozzon, Alessandro

PY - 2019

Y1 - 2019

N2 - Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.

AB - Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.

UR - http://www.scopus.com/inward/record.url?scp=85072851105&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-30760-8_1

DO - 10.1007/978-3-030-30760-8_1

M3 - Conference contribution

AN - SCOPUS:85072851105

SN - 978-3-030-30759-2

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 3

EP - 17

BT - Digital Libraries for Open Knowledge

A2 - Doucet, Antoine

A2 - Isaac, Antoine

A2 - Golub, Koraljka

A2 - Aalberg, Trond

A2 - Jatowt, Adam

PB - Springer

CY - Cham

Y2 - 9 September 2019 through 12 September 2019

ER -

Vliegenthart D, Mesbah S, Lofi C, Aizawa A, Bozzon A. Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications. In Doucet A, Isaac A, Golub K, Aalberg T, Jatowt A, editors, Digital Libraries for Open Knowledge : 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Proceedings. Cham: Springer. 2019. p. 3-17. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-30760-8_1

Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this