Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

30 Citations (Scopus)
75 Downloads (Pure)

Abstract

Dynamic languages, such as Python and Javascript, trade static typing for developer flexibility and productivity. Lack of static typing can cause run-time exceptions and is a major factor for weak IDE support. To alleviate these issues, PEP 484 introduced optional type annotations for Python. As retrofitting types to existing code-bases is error-prone and laborious, machine learning (ML)-based approaches have been proposed to enable automatic type infer-ence based on existing, partially annotated codebases. However, previous ML-based approaches are trained and evaluated on human-provided type annotations, which might not always be sound, and hence this may limit the practicality for real-world usage. In this paper, we present TYPE4Py, a deep similarity learning-based hier-archical neural network model. It learns to discriminate between similar and dissimilar types in a high-dimensional space, which results in clusters of types. Likely types for arguments, variables, and return values can then be inferred through the nearest neigh-bor search. Unlike previous work, we trained and evaluated our model on a type-checked dataset and used mean reciprocal rank (MRR) to reflect the performance perceived by users. The obtained results show that TYPE4Py achieves an MRR of 77.1 %, which is a substantial improvement of 8.1% and 16.7% over the state-of-the-art approaches Typilus and Typewriter, respectively. Finally, to aid developers with retrofitting types, we released a Visual Stu-dio Code extension, which uses TYPE4Py to provide ML-based type auto-completion for Python.

Original languageEnglish
Title of host publicationProceedings - 2022 ACM/IEEE 44th International Conference on Software Engineering, ICSE 2022
PublisherIEEE
Pages2241-2252
Number of pages12
ISBN (Electronic)978-1-4503-9221-1
DOIs
Publication statusPublished - 2022
Event44th ACM/IEEE International Conference on Software Engineering, ICSE 2022: Software Engineering in Practice (ICSE-SEIP) - Pittsburgh, United States
Duration: 22 May 202227 May 2022
Conference number: 44th

Publication series

NameProceedings - International Conference on Software Engineering
Volume2022-May
ISSN (Print)0270-5257

Conference

Conference44th ACM/IEEE International Conference on Software Engineering, ICSE 2022
Abbreviated title ICSE 2022
Country/TerritoryUnited States
CityPittsburgh
Period22/05/2227/05/22

Keywords

  • Machine Learning
  • Mean Reciprocal Rank
  • Python
  • Similarity Learning
  • Type Inference

Fingerprint

Dive into the research topics of 'Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python'. Together they form a unique fingerprint.

Cite this