Abstract
Event detection is still a difficult task due to the complexity and the ambiguity of such entities. On the one hand, we observe a low inter-annotator agreement among experts when annotating events, disregarding the multitude of existing annotation guidelines and their numerous revisions. On the other hand, event extraction systems have a lower measured performance in terms of F1-score compared to other types of entities such as people or locations. In this paper we study the consistency and completeness of expert-annotated datasets for events and time expressions. We propose a data-agnostic validation methodology of such datasets in terms of consistency and completeness. Furthermore, we combine the power of crowds and machines to correct and extend expert-annotated datasets of events. We show the benefit of using crowd-annotated events to train and evaluate a state-of-the-art event extraction system. Our results show that the crowd-annotated events increase the performance of the system by at least 5.3%.
| Original language | English |
|---|---|
| Title of host publication | 2nd Conference on Language, Data and Knowledge, LDK 2019 |
| Editors | Gerard de Melo, Bettina Klimek, Christian Fath, Paul Buitelaar, Milan Dojchinovski, Maria Eskevich, John P. McCrae, Christian Chiarcos |
| Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
| Pages | 1-15 |
| Number of pages | 15 |
| Volume | 70 |
| ISBN (Electronic) | 9783959771054 |
| DOIs | |
| Publication status | Published - 1 May 2019 |
| Event | 2nd Conference on Language, Data and Knowledge, LDK 2019 - Leipzig, Germany Duration: 20 May 2019 → 23 May 2019 |
Conference
| Conference | 2nd Conference on Language, Data and Knowledge, LDK 2019 |
|---|---|
| Country/Territory | Germany |
| City | Leipzig |
| Period | 20/05/19 → 23/05/19 |
Keywords
- Crowdsourcing
- Event extraction
- Human-in-the-loop
- Time extraction