Abstract
Large Language Models (LLMs) are gaining popularity in the field of Natural Language Processing (NLP) due to their remarkable accuracy in various NLP tasks. LLMs designed for coding are trained on massive datasets, which enables them to learn the structure and syntax of programming languages. These datasets are scraped from the web and LLMs memorise information in these datasets. LLMs for code are also growing, making them more challenging to execute and making users increasingly reliant on external infrastructure.We aim to explore the challenges faced by LLMs for code and propose techniques to measure and prevent memorisation. Additionally, we suggest methods to compress models and run them locally on consumer hardware.
Original language | English |
---|---|
Title of host publication | Proceedings - 2024 ACM/IEEE 46th International Conference on Software Engineering |
Subtitle of host publication | Companion, ICSE-Companion 2024 |
Publisher | IEEE |
Pages | 258-260 |
Number of pages | 3 |
ISBN (Electronic) | 9798400705021 |
DOIs | |
Publication status | Published - 2024 |
Event | ACM/IEEE 46th International Conference on Software Engineering - Lisbon, Lisbon, Portugal Duration: 14 Apr 2024 → 20 Apr 2024 Conference number: 46 https://conf.researchr.org/home/icse-2024 |
Publication series
Name | Proceedings - International Conference on Software Engineering |
---|---|
ISSN (Print) | 0270-5257 |
Conference
Conference | ACM/IEEE 46th International Conference on Software Engineering |
---|---|
Abbreviated title | ICSE '24 |
Country/Territory | Portugal |
City | Lisbon |
Period | 14/04/24 → 20/04/24 |
Internet address |
Keywords
- compression
- data leakage
- large language models
- memorisation
- privacy