GPGPU Linear Complexity t-SNE Optimization

Nicola Pezzotti; Julian Thijssen; Alexander Mordvinstev; Thomas Hollt; Baldur Van Lew; Boudewijn Lelieveldt; Elmar Eisemann; Anna Vilanova

doi:10.1109/TVCG.2019.2934307

GPGPU Linear Complexity t-SNE Optimization

Nicola Pezzotti, Julian Thijssen, Alexander Mordvinstev, Thomas Hollt, Baldur Van Lew, Boudewijn Lelieveldt, Elmar Eisemann, Anna Vilanova

Research output: Contribution to journal › Article › Scientific › peer-review

47 Citations (Scopus)

226 Downloads (Pure)

Abstract

In recent years the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has become one of the most used and insightful techniques for exploratory data analysis of high-dimensional data. It reveals clusters of high-dimensional data points at different scales while only requiring minimal tuning of its parameters. However, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of t-SNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the t-SNE embedding for large datasets. In this work, we present a novel approach to the minimization of the t-SNE objective function that heavily relies on graphics hardware and has linear computational complexity. Our technique decreases the computational cost of running t-SNE on datasets by orders of magnitude and retains or improves on the accuracy of past approximated techniques. We propose to approximate the repulsive forces between data points by splatting kernel textures for each data point. This approximation allows us to reformulate the t-SNE minimization problem as a series of tensor operations that can be efficiently executed on the graphics card. An efficient implementation of our technique is integrated and available for use in the widely used Google TensorFlow.js, and an open-source C++ library.

Original language	English
Article number	8811606
Pages (from-to)	1172-1181
Number of pages	10
Journal	IEEE Transactions on Visualization and Computer Graphics
Volume	26
Issue number	1
DOIs	https://doi.org/10.1109/TVCG.2019.2934307
Publication status	Published - 2020

Bibliographical note

Accepted author manuscript

Keywords

High Dimensional Data
Dimensionality Reduction
Progressive Visual Analytics
Approximate Computation
GPGPU

Access to Document

10.1109/TVCG.2019.2934307

08811606.1Accepted author manuscript, 14.2 MB

Cite this

@article{52823dd9a8c1470ea4ec77a1a59577dc,

title = "GPGPU Linear Complexity t-SNE Optimization",

abstract = "In recent years the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has become one of the most used and insightful techniques for exploratory data analysis of high-dimensional data. It reveals clusters of high-dimensional data points at different scales while only requiring minimal tuning of its parameters. However, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of t-SNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the t-SNE embedding for large datasets. In this work, we present a novel approach to the minimization of the t-SNE objective function that heavily relies on graphics hardware and has linear computational complexity. Our technique decreases the computational cost of running t-SNE on datasets by orders of magnitude and retains or improves on the accuracy of past approximated techniques. We propose to approximate the repulsive forces between data points by splatting kernel textures for each data point. This approximation allows us to reformulate the t-SNE minimization problem as a series of tensor operations that can be efficiently executed on the graphics card. An efficient implementation of our technique is integrated and available for use in the widely used Google TensorFlow.js, and an open-source C++ library.",

keywords = "High Dimensional Data, Dimensionality Reduction, Progressive Visual Analytics, Approximate Computation, GPGPU",

author = "Nicola Pezzotti and Julian Thijssen and Alexander Mordvinstev and Thomas Hollt and {Van Lew}, Baldur and Boudewijn Lelieveldt and Elmar Eisemann and Anna Vilanova",

note = "Accepted author manuscript",

year = "2020",

doi = "10.1109/TVCG.2019.2934307",

language = "English",

volume = "26",

pages = "1172--1181",

journal = "IEEE Transactions on Visualization and Computer Graphics",

issn = "1077-2626",

publisher = "IEEE",

number = "1",

}

TY - JOUR

T1 - GPGPU Linear Complexity t-SNE Optimization

AU - Pezzotti, Nicola

AU - Thijssen, Julian

AU - Mordvinstev, Alexander

AU - Hollt, Thomas

AU - Van Lew, Baldur

AU - Lelieveldt, Boudewijn

AU - Eisemann, Elmar

AU - Vilanova , Anna

N1 - Accepted author manuscript

PY - 2020

Y1 - 2020

N2 - In recent years the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has become one of the most used and insightful techniques for exploratory data analysis of high-dimensional data. It reveals clusters of high-dimensional data points at different scales while only requiring minimal tuning of its parameters. However, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of t-SNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the t-SNE embedding for large datasets. In this work, we present a novel approach to the minimization of the t-SNE objective function that heavily relies on graphics hardware and has linear computational complexity. Our technique decreases the computational cost of running t-SNE on datasets by orders of magnitude and retains or improves on the accuracy of past approximated techniques. We propose to approximate the repulsive forces between data points by splatting kernel textures for each data point. This approximation allows us to reformulate the t-SNE minimization problem as a series of tensor operations that can be efficiently executed on the graphics card. An efficient implementation of our technique is integrated and available for use in the widely used Google TensorFlow.js, and an open-source C++ library.

AB - In recent years the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has become one of the most used and insightful techniques for exploratory data analysis of high-dimensional data. It reveals clusters of high-dimensional data points at different scales while only requiring minimal tuning of its parameters. However, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of t-SNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the t-SNE embedding for large datasets. In this work, we present a novel approach to the minimization of the t-SNE objective function that heavily relies on graphics hardware and has linear computational complexity. Our technique decreases the computational cost of running t-SNE on datasets by orders of magnitude and retains or improves on the accuracy of past approximated techniques. We propose to approximate the repulsive forces between data points by splatting kernel textures for each data point. This approximation allows us to reformulate the t-SNE minimization problem as a series of tensor operations that can be efficiently executed on the graphics card. An efficient implementation of our technique is integrated and available for use in the widely used Google TensorFlow.js, and an open-source C++ library.

KW - High Dimensional Data

KW - Dimensionality Reduction

KW - Progressive Visual Analytics

KW - Approximate Computation

KW - GPGPU

UR - http://www.scopus.com/inward/record.url?scp=85075604395&partnerID=8YFLogxK

U2 - 10.1109/TVCG.2019.2934307

DO - 10.1109/TVCG.2019.2934307

M3 - Article

SN - 1077-2626

VL - 26

SP - 1172

EP - 1181

JO - IEEE Transactions on Visualization and Computer Graphics

JF - IEEE Transactions on Visualization and Computer Graphics

IS - 1

M1 - 8811606

ER -

GPGPU Linear Complexity t-SNE Optimization

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this