Continual learning by subnetwork creation and selection

Research output: ThesisDissertation (TU Delft)

207 Downloads (Pure)

Abstract

Deep learning models have made enormous strides over the past decade. However, they still have some disadvantages when dealing with changing data streams. One of these flaws is the phenomenon called catastrophic forgetting. It occurs when a model learns multiple tasks sequentially, having access only to the data of the current task. However, this scenario has strong implications for real-world machine learning and engineering problems where new information is introduced into the system over time. Continual learning is a subfield of deep learning that aims to work in this scenario. Therefore, this thesis presents a general continual learning paradigm to tackle the catastrophic forgetting issue in deep learning models, regardless of architecture.

Following ideas from the neuroscience literature, we create task-specific regions in the network, i.e. subnetworks, to encode information there. Thus, some parameters are responsible for solving this task, which mitigates forgetting compared to conventional training where the trainable parameters are simultaneously assigned to all tasks. A proper subnetwork should be then selected by the algorithm to make a prediction or information about the correct subnetwork must be given by the user. The subnetworks can share some connections to transfer knowledge between each other and facilitate future learning.

In the first part of the thesis, we describe the proposed methodology: task-specific subnetwork creation during training and the proper subnetwork selection during inference stages. We examine different subnetwork prediction strategies outlining their advantages and disadvantages. We validate the proposed algorithms on a series of well-known image datasets in computer vision in classification and semantic segmentation tasks. The proposed solution significantly outperforms current state-of-the-art methods by 10-20\% of accuracy.

The second part of the thesis illustrates the benefits of cooperative learning via continual learning in physical sciences and solid mechanic examples. We demonstrate that by sharing parameters, the following subnetwork can be trained either with lower prediction error, requiring fewer training data points, or both, compared to conventional training with one network per task. Importantly, the model does not forget any of the acquired knowledge since once a parameter is assigned to a subnetwork, it is not changed when training new tasks. We would like to highlight the potential importance of further development of continual learning methods in engineering to improve the generalization capabilities of the models.

The thesis concludes by discussing the main results and findings. We also outline the main limitations of the work and directions for improvement. Further development of continual learning models will lead to more advanced artificial intelligence systems that should contribute to solving a wider range of problems.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Delft University of Technology
Supervisors/Advisors
  • Sluiter, M.H.F., Promotor
  • Tax, D.M.J., Copromotor
Award date25 Jun 2024
Print ISBNs978-94-6469-983-8
DOIs
Publication statusPublished - 2024

Keywords

  • deep learning
  • continual learning
  • catastrophic forgetting
  • scientific machine learning
  • cooperative modeling

Fingerprint

Dive into the research topics of 'Continual learning by subnetwork creation and selection'. Together they form a unique fingerprint.

Cite this