Tekstkenmerken en tekstkwaliteit van leerlingteksten: Een annotatiestudie

Translated title of the contribution: Text features and quality of learner text: an annotation study

Henk Pander Maat, Kay Raaijmakers, Dennis Vermeulen, Kees de Glopper

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Manually annotated corpora of writing products may greatly contribute to writing research: they offer detailed insights in the quality of these texts, in the text features actually attended to by human text raters, in possibilities and difficulties for the use of automatic writing analytics and writing tools, and in the relations between different text quality dimensions. This paper presents the Utrecht System for Annotation of Learner text (USALT), that covers both general features (orthography, punctuation, wording, coherence) and genre-specific elements (such as openings, endings, structuring devices and politeness). The annotations contain up to three items (annotation unit; problem type; part-of-speech tag). USALT reflects various text quality dimensions, notably correctness, comprehensibility and appropriateness (both stylistically and in terms of genre conventions).

We present an USALT analysis of 371 texts produced by Dutch students from grades 7-9 (aged 12-15 years), taken from the so-called Schrijfmeters-corpus. The assignment concerned a letter about ‘typically Dutch things’ to a Swedish girl about to emigrate to The Netherlands. USALT reliabilities were adequate. In terms of problem frequency, we were struck by the pervasiveness of punctuation problems. Furthermore, the orthography and punctuation problems together present considerable difficulties for automatic analysis of original learner texts at this level. A remarkable result regarding relations between various text quality dimensions is that the frequency of orthography problems correlates higher with genre convention problems than with lexico-grammatical problems. We also used the annotations as predictors of the holistic scores assigned to the texts by human raters. Standardized annotation frequencies by themselves may account for 45% of the score variance, with a prominent role for annotations regarding genre elements; text length by itself explains 52%. The best model includes both text length and annotations (65% explained variance). In ongoing work, USALT is being extended to handle argumentative writing assignments.
Translated title of the contributionText features and quality of learner text: an annotation study
Original languageDutch
Pages (from-to)331–361
Number of pages32
JournalTijdschrift voor Taalbeheersing
Volume41
Issue number2
DOIs
Publication statusPublished - 2019
Externally publishedYes

Keywords

  • learner text
  • writing quality
  • annotation
  • text quality ratings

Fingerprint

Dive into the research topics of 'Text features and quality of learner text: an annotation study'. Together they form a unique fingerprint.

Cite this