Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks

Ahmad Hesam*, Sofia Vallecorsa, Gulrukh Khattak, Federico Carminati

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)

Abstract

The increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes unpractical to perform the training phase on single machine instances due to memory constraints, and extremely long training time. Rather than scaling up, scaling out the computing resources is a productive approach to improve performance. The paradigm of data parallelism allows us to split the training dataset into manageable chunks that can be processed in parallel. In this work, we evaluate the scaling performance of training a 3D generative adversarial network (GAN) on an IBM POWER8 cluster, equipped with 12 NVIDIA P100 GPUs. The full training duration of the GAN, including evaluation, is reduced from 20 h and 16 min on a single GPU, to 2 h and 14 min on all 12 GPUs. We achieve a scaling efficiency of 98.9% when scaling from 1 to 12 GPUs, taking only the training process into consideration.

Original languageEnglish
Title of host publicationHigh Performance Computing - ISC High Performance 2019 International Workshops, Revised Selected Papers
EditorsMichèle Weiland, Guido Juckeland, Sadaf Alam, Heike Jagode
PublisherSpringer
Pages432-440
Number of pages9
ISBN (Print)9783030343552
DOIs
Publication statusPublished - 2019
Event34th International Conference on High Performance Computing, ISC High Performance 2019 - Frankfurt, Germany
Duration: 16 Jun 201920 Jun 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11887 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference34th International Conference on High Performance Computing, ISC High Performance 2019
Country/TerritoryGermany
CityFrankfurt
Period16/06/1920/06/19

Keywords

  • Distributed training
  • Generative adversarial network
  • GPU
  • High Performance Computing
  • POWER8

Fingerprint

Dive into the research topics of 'Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks'. Together they form a unique fingerprint.

Cite this