PathMiner: A library for mining of path-based representations of code

Vladimir Kovalenko, Egor Bogomolov, Timofey Bryksin, Alberto Bacchelli

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)
48 Downloads (Pure)

Abstract

One recent, significant advance in modeling source code for machine learning algorithms has been the introduction of path-based representation - an approach consisting in representing a snippet of code as a collection of paths from its syntax tree. Such representation efficiently captures the structure of code, which, in turn, carries its semantics and other information. Building the path-based representation involves parsing the code and extracting the paths from its syntax tree; these steps build up to a substantial technical job. With no common reusable toolkit existing for this task, the burden of mining diverts the focus of researchers from the essential work and hinders newcomers in the field of machine learning on code. In this paper, we present PathMiner - an open-source library for mining path-based representations of code. PathMiner is fast, flexible, well-tested, and easily extensible to support input code in any common programming language. Preprint [https://doi.org/10.5281/zenodo.2595271]; released tool [https://doi.org/10.5281/zenodo.2595257].

Original languageEnglish
Title of host publication2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)
PublisherIEEE
Pages13-17
Number of pages5
ISBN (Electronic)978-1-7281-3412-3
ISBN (Print)978-1-7281-3370-6
DOIs
Publication statusPublished - 1 May 2019
Event16th IEEE/ACM International Conference on Mining Software Repositories, MSR 2019 - Montreal, Canada
Duration: 26 May 201927 May 2019

Conference

Conference16th IEEE/ACM International Conference on Mining Software Repositories, MSR 2019
CountryCanada
CityMontreal
Period26/05/1927/05/19

Keywords

  • Ast path
  • Code2Vec
  • Machine learning on code
  • Mining tool
  • Path based representation

Fingerprint Dive into the research topics of 'PathMiner: A library for mining of path-based representations of code'. Together they form a unique fingerprint.

  • Cite this

    Kovalenko, V., Bogomolov, E., Bryksin, T., & Bacchelli, A. (2019). PathMiner: A library for mining of path-based representations of code. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) (pp. 13-17). [8816777] IEEE. https://doi.org/10.1109/MSR.2019.00013