Abstract
Code completion is an essential feature of IDEs, yet current auto-completers are restricted to either grammar-based or NLP-based single token completions. Both approaches have significant draw-backs: grammar-based autocompletion is restricted in dynamically-typed language environments, whereas NLP-based autocompleters struggle to understand the semantics of the programming language and the developer's code context. In this work, we present CodeFill, a language model for autocompletion that combines learned structure and naming information. Using a parallel Transformer architecture and multi-task learning, CodeFill consumes sequences of source code token names and their equivalent AST token types. Uniquely, CodeFill is trained both for single-token and multi-token (statement) prediction, which enables it to learn long-range dependencies among grammatical and naming elements. We train CodeFill on two datasets, consisting of 29M and 425M lines of code, respectively. To make the evaluation more realistic, we develop a method to automatically infer points in the source code at which completion matters. We compare CodeFill against four baselines and two state-of-the-art models, GPT-C and TravTrans+. CodeFill surpasses all baselines in single token prediction (MRR: 70.9% vs. 66.2% and 67.8%) and outperforms the state of the art for multi-token prediction (ROUGE-L: 63.7% vs. 52.4% and 59.2%, for n=4 tokens). We publicly release our source code and datasets.
Original language | English |
---|---|
Title of host publication | Proceedings - 2022 ACM/IEEE 44th International Conference on Software Engineering, ICSE 2022 |
Publisher | IEEE |
Pages | 401-412 |
Number of pages | 12 |
ISBN (Electronic) | 9781450392211 |
DOIs | |
Publication status | Published - 2022 |
Event | 44th ACM/IEEE International Conference on Software Engineering, ICSE 2022: Software Engineering in Practice (ICSE-SEIP) - Pittsburgh, United States Duration: 22 May 2022 → 27 May 2022 Conference number: 44th |
Publication series
Name | Proceedings - International Conference on Software Engineering |
---|---|
Volume | 2022-May |
ISSN (Print) | 0270-5257 |
Conference
Conference | 44th ACM/IEEE International Conference on Software Engineering, ICSE 2022 |
---|---|
Abbreviated title | ICSE 2022 |
Country/Territory | United States |
City | Pittsburgh |
Period | 22/05/22 → 27/05/22 |
Keywords
- Automatic Code Completion
- Dynamically-typed Languages
- Multi-Task Learning
- Transformers
- Types