TY - GEN
T1 - Coner
T2 - 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019
AU - Vliegenthart, Daniel
AU - Mesbah, Sepideh
AU - Lofi, Christoph
AU - Aizawa, Akiko
AU - Bozzon, Alessandro
PY - 2019
Y1 - 2019
N2 - Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.
AB - Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.
UR - http://www.scopus.com/inward/record.url?scp=85072851105&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-30760-8_1
DO - 10.1007/978-3-030-30760-8_1
M3 - Conference contribution
AN - SCOPUS:85072851105
SN - 978-3-030-30759-2
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 17
BT - Digital Libraries for Open Knowledge
A2 - Doucet, Antoine
A2 - Isaac, Antoine
A2 - Golub, Koraljka
A2 - Aalberg, Trond
A2 - Jatowt, Adam
PB - Springer Science+Business Media
CY - Cham
Y2 - 9 September 2019 through 12 September 2019
ER -