Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity

Chang Gao; Tobi Delbruck; Shih Chii Liu

doi:10.1109/TNNLS.2022.3180209

Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity

Chang Gao, Tobi Delbruck, Shih Chii Liu

Research output: Contribution to journal › Article › Scientific › peer-review

10 Citations (Scopus)

Abstract

Long short-term memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data, such as speech recognition. Unlike previous LSTM accelerators that either exploit spatial weight sparsity or temporal activation sparsity, this article proposes a new accelerator called 'Spartus' that exploits spatio-temporal sparsity to achieve ultralow latency inference. Spatial sparsity is induced using a new column-balanced targeted dropout (CBTD) structured pruning method, producing structured sparse weight matrices for a balanced workload. The pruned networks running on Spartus hardware achieve weight sparsity levels of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access and associated arithmetic operations. The Spartus architecture is scalable and supports real-time online speech recognition when implemented on small and large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 μs. Exploiting spatio-temporal sparsity on our test LSTM network using the TIMIT dataset leads to 46 × speedup of Spartus over its theoretical hardware performance to achieve 9.4-TOp/s effective batch-1 throughput and 1.1-TOp/s/W power efficiency.

Original language	English
Pages (from-to)	1098-1112
Number of pages	15
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	35
Issue number	1
DOIs	https://doi.org/10.1109/TNNLS.2022.3180209
Publication status	Published - 2022
Externally published	Yes

Keywords

Delta network
dropout
edge computing
recurrent neural network (RNN)
spiking neural network
structured pruning

Access to Document

10.1109/TNNLS.2022.3180209

Cite this

@article{92d7ec00b5ac4cd7a3610827e84658f4,

title = "Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity",

abstract = "Long short-term memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data, such as speech recognition. Unlike previous LSTM accelerators that either exploit spatial weight sparsity or temporal activation sparsity, this article proposes a new accelerator called 'Spartus' that exploits spatio-temporal sparsity to achieve ultralow latency inference. Spatial sparsity is induced using a new column-balanced targeted dropout (CBTD) structured pruning method, producing structured sparse weight matrices for a balanced workload. The pruned networks running on Spartus hardware achieve weight sparsity levels of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access and associated arithmetic operations. The Spartus architecture is scalable and supports real-time online speech recognition when implemented on small and large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 μs. Exploiting spatio-temporal sparsity on our test LSTM network using the TIMIT dataset leads to 46 × speedup of Spartus over its theoretical hardware performance to achieve 9.4-TOp/s effective batch-1 throughput and 1.1-TOp/s/W power efficiency.",

keywords = "Delta network, dropout, edge computing, recurrent neural network (RNN), spiking neural network, structured pruning",

author = "Chang Gao and Tobi Delbruck and Liu, {Shih Chii}",

year = "2022",

doi = "10.1109/TNNLS.2022.3180209",

language = "English",

volume = "35",

pages = "1098--1112",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "1",

}

TY - JOUR

T1 - Spartus

T2 - A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity

AU - Gao, Chang

AU - Delbruck, Tobi

AU - Liu, Shih Chii

PY - 2022

Y1 - 2022

N2 - Long short-term memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data, such as speech recognition. Unlike previous LSTM accelerators that either exploit spatial weight sparsity or temporal activation sparsity, this article proposes a new accelerator called 'Spartus' that exploits spatio-temporal sparsity to achieve ultralow latency inference. Spatial sparsity is induced using a new column-balanced targeted dropout (CBTD) structured pruning method, producing structured sparse weight matrices for a balanced workload. The pruned networks running on Spartus hardware achieve weight sparsity levels of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access and associated arithmetic operations. The Spartus architecture is scalable and supports real-time online speech recognition when implemented on small and large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 μs. Exploiting spatio-temporal sparsity on our test LSTM network using the TIMIT dataset leads to 46 × speedup of Spartus over its theoretical hardware performance to achieve 9.4-TOp/s effective batch-1 throughput and 1.1-TOp/s/W power efficiency.

AB - Long short-term memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data, such as speech recognition. Unlike previous LSTM accelerators that either exploit spatial weight sparsity or temporal activation sparsity, this article proposes a new accelerator called 'Spartus' that exploits spatio-temporal sparsity to achieve ultralow latency inference. Spatial sparsity is induced using a new column-balanced targeted dropout (CBTD) structured pruning method, producing structured sparse weight matrices for a balanced workload. The pruned networks running on Spartus hardware achieve weight sparsity levels of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access and associated arithmetic operations. The Spartus architecture is scalable and supports real-time online speech recognition when implemented on small and large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 μs. Exploiting spatio-temporal sparsity on our test LSTM network using the TIMIT dataset leads to 46 × speedup of Spartus over its theoretical hardware performance to achieve 9.4-TOp/s effective batch-1 throughput and 1.1-TOp/s/W power efficiency.

KW - Delta network

KW - dropout

KW - edge computing

KW - recurrent neural network (RNN)

KW - spiking neural network

KW - structured pruning

UR - http://www.scopus.com/inward/record.url?scp=85132790317&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2022.3180209

DO - 10.1109/TNNLS.2022.3180209

M3 - Article

AN - SCOPUS:85132790317

SN - 2162-237X

VL - 35

SP - 1098

EP - 1112

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 1

ER -

Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this