The task of author verification is concerned with the question whether or not someone is the author of a given piece of text. Algorithms that extract writing style features from texts are used to determine how close in style different documents are. Currently, evaluations of author verification algorithms are restricted to small-scale corpora with usually less than one hundred test cases. In this work, we present a methodology to derive a large-scale author verification corpus based on Wikipedia Talkpages. We create a corpus based on English Wikipedia which is significantly larger than existing corpora. We investigate two dimensions on this corpus which so far have not received sufficient attention: the influence of topic and the influence of time on author verification accuracy.
|Title of host publication||Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval|
|Number of pages||4|
|Publication status||Published - 2014|
|Event||SIGIR '14: 37th international ACM SIGIR conference on Research and development in information retrieval - Gold Coast, Australia|
Duration: 6 Jul 2014 → 11 Jul 2014
|Conference||SIGIR '14: 37th international ACM SIGIR conference on Research and development in information retrieval|
|Period||6/07/14 → 11/07/14|