Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity

Chang Gao, Tobi Delbruck, Shih Chii Liu

Research output: Contribution to journalArticleScientificpeer-review

10 Citations (Scopus)

Abstract

Long short-term memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data, such as speech recognition. Unlike previous LSTM accelerators that either exploit spatial weight sparsity or temporal activation sparsity, this article proposes a new accelerator called 'Spartus' that exploits spatio-temporal sparsity to achieve ultralow latency inference. Spatial sparsity is induced using a new column-balanced targeted dropout (CBTD) structured pruning method, producing structured sparse weight matrices for a balanced workload. The pruned networks running on Spartus hardware achieve weight sparsity levels of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access and associated arithmetic operations. The Spartus architecture is scalable and supports real-time online speech recognition when implemented on small and large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 μs. Exploiting spatio-temporal sparsity on our test LSTM network using the TIMIT dataset leads to 46 × speedup of Spartus over its theoretical hardware performance to achieve 9.4-TOp/s effective batch-1 throughput and 1.1-TOp/s/W power efficiency.

Original languageEnglish
Pages (from-to)1098-1112
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume35
Issue number1
DOIs
Publication statusPublished - 2022
Externally publishedYes

Keywords

  • Delta network
  • dropout
  • edge computing
  • recurrent neural network (RNN)
  • spiking neural network
  • structured pruning

Fingerprint

Dive into the research topics of 'Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity'. Together they form a unique fingerprint.

Cite this