FMCW Radar-Based Hand Gesture Recognition using Spatiotemporal Deformable and Context-Aware Convolutional 5D Feature Representation

Xichao Dong; Zewei Zhao; Yupei Wang; Tao Zeng; Jianping Wang; Yi Sui

doi:10.1109/TGRS.2021.3122332

FMCW Radar-Based Hand Gesture Recognition using Spatiotemporal Deformable and Context-Aware Convolutional 5D Feature Representation

Xichao Dong, Zewei Zhao, Yupei Wang, Tao Zeng, Jianping Wang, Yi Sui

Microwave Sensing, Signals & Systems

Research output: Contribution to journal › Article › Scientific › peer-review

12 Citations (Scopus)

309 Downloads (Pure)

Abstract

Recently, frequency-modulated continuous-wave (FMCW) radar-based hand gesture recognition (HGR) using deep learning has achieved favorable performance. However, many existing methods use extracted features separately, i.e., using one of the range, Doppler, azimuth, or elevation angle information, or a combination of any two, to train convolutional neural networks (CNNs), which ignore the interrelation among the 5-D time-varying-range-Doppler-azimuth-elevation feature space. Although there have been methods using the 5-D information, their mining of the interrelation among the 5-D feature space is not sufficient, and there is still room for improvements. This article proposes a new processing scheme of HGR based on 5-D feature cubes that are jointly encoded by a 3-D fast Fourier transform (3-D-FFT)-based method. Then, a CNN is proposed by building two novel blocks, i.e., the spatiotemporal deformable convolution (STDC) block and the adaptive spatiotemporal context-aware convolution (ASTCAC) block. Concretely, STDC is designed to cope with hand gestures' large spatiotemporal geometric transformations in the 5-D feature space. Moreover, ASTCAC is designed for modeling long-distance global relationships, e.g., relationships between pixels of the feature at the upper left corner and lower right corner, and exploring the global spatiotemporal context, in order to enhance the target feature representation and suppress interference. Finally, our presented method is verified on a large radar dataset, including 19 760 sets of 16 common hand gestures, collected by 19 subjects. Our method obtains a recognition rate of 99.53% on the validation dataset and that of 97.22% on the test dataset, which is significantly better than state-of-the-art methods.

Original language	English
Number of pages	11
Journal	IEEE Transactions on Geoscience and Remote Sensing
Volume	60
DOIs	https://doi.org/10.1109/TGRS.2021.3122332
Publication status	Published - 2022

Keywords

Azimuth
Convolution
Doppler effect
Estimation
Feature extraction
Frequency-modulated continuous wave (FMCW) radar
hand gesture recognition
spatiotemporal context modeling
spatiotemporal deformable convolution
Spatiotemporal phenomena
Three-dimensional displays

Access to Document

10.1109/TGRS.2021.3122332

FMCW_Radar-Based_Hand_Gesture_Recognition_using_Spatiotemporal_Deformable_and_Context-Aware_Convolutional_5D_Feature_RepresentationAccepted author manuscript, 3.76 MB

Cite this

@article{4b73ce157c524ef99729372c0ca88400,

title = "FMCW Radar-Based Hand Gesture Recognition using Spatiotemporal Deformable and Context-Aware Convolutional 5D Feature Representation",

abstract = "Recently, frequency-modulated continuous-wave (FMCW) radar-based hand gesture recognition (HGR) using deep learning has achieved favorable performance. However, many existing methods use extracted features separately, i.e., using one of the range, Doppler, azimuth, or elevation angle information, or a combination of any two, to train convolutional neural networks (CNNs), which ignore the interrelation among the 5-D time-varying-range-Doppler-azimuth-elevation feature space. Although there have been methods using the 5-D information, their mining of the interrelation among the 5-D feature space is not sufficient, and there is still room for improvements. This article proposes a new processing scheme of HGR based on 5-D feature cubes that are jointly encoded by a 3-D fast Fourier transform (3-D-FFT)-based method. Then, a CNN is proposed by building two novel blocks, i.e., the spatiotemporal deformable convolution (STDC) block and the adaptive spatiotemporal context-aware convolution (ASTCAC) block. Concretely, STDC is designed to cope with hand gestures' large spatiotemporal geometric transformations in the 5-D feature space. Moreover, ASTCAC is designed for modeling long-distance global relationships, e.g., relationships between pixels of the feature at the upper left corner and lower right corner, and exploring the global spatiotemporal context, in order to enhance the target feature representation and suppress interference. Finally, our presented method is verified on a large radar dataset, including 19 760 sets of 16 common hand gestures, collected by 19 subjects. Our method obtains a recognition rate of 99.53% on the validation dataset and that of 97.22% on the test dataset, which is significantly better than state-of-the-art methods.",

keywords = "Azimuth, Convolution, Doppler effect, Estimation, Feature extraction, Frequency-modulated continuous wave (FMCW) radar, hand gesture recognition, spatiotemporal context modeling, spatiotemporal deformable convolution, Spatiotemporal phenomena, Three-dimensional displays",

author = "Xichao Dong and Zewei Zhao and Yupei Wang and Tao Zeng and Jianping Wang and Yi Sui",

year = "2022",

doi = "10.1109/TGRS.2021.3122332",

language = "English",

volume = "60",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

}

TY - JOUR

T1 - FMCW Radar-Based Hand Gesture Recognition using Spatiotemporal Deformable and Context-Aware Convolutional 5D Feature Representation

AU - Dong, Xichao

AU - Zhao, Zewei

AU - Wang, Yupei

AU - Zeng, Tao

AU - Wang, Jianping

AU - Sui, Yi

PY - 2022

Y1 - 2022

N2 - Recently, frequency-modulated continuous-wave (FMCW) radar-based hand gesture recognition (HGR) using deep learning has achieved favorable performance. However, many existing methods use extracted features separately, i.e., using one of the range, Doppler, azimuth, or elevation angle information, or a combination of any two, to train convolutional neural networks (CNNs), which ignore the interrelation among the 5-D time-varying-range-Doppler-azimuth-elevation feature space. Although there have been methods using the 5-D information, their mining of the interrelation among the 5-D feature space is not sufficient, and there is still room for improvements. This article proposes a new processing scheme of HGR based on 5-D feature cubes that are jointly encoded by a 3-D fast Fourier transform (3-D-FFT)-based method. Then, a CNN is proposed by building two novel blocks, i.e., the spatiotemporal deformable convolution (STDC) block and the adaptive spatiotemporal context-aware convolution (ASTCAC) block. Concretely, STDC is designed to cope with hand gestures' large spatiotemporal geometric transformations in the 5-D feature space. Moreover, ASTCAC is designed for modeling long-distance global relationships, e.g., relationships between pixels of the feature at the upper left corner and lower right corner, and exploring the global spatiotemporal context, in order to enhance the target feature representation and suppress interference. Finally, our presented method is verified on a large radar dataset, including 19 760 sets of 16 common hand gestures, collected by 19 subjects. Our method obtains a recognition rate of 99.53% on the validation dataset and that of 97.22% on the test dataset, which is significantly better than state-of-the-art methods.

AB - Recently, frequency-modulated continuous-wave (FMCW) radar-based hand gesture recognition (HGR) using deep learning has achieved favorable performance. However, many existing methods use extracted features separately, i.e., using one of the range, Doppler, azimuth, or elevation angle information, or a combination of any two, to train convolutional neural networks (CNNs), which ignore the interrelation among the 5-D time-varying-range-Doppler-azimuth-elevation feature space. Although there have been methods using the 5-D information, their mining of the interrelation among the 5-D feature space is not sufficient, and there is still room for improvements. This article proposes a new processing scheme of HGR based on 5-D feature cubes that are jointly encoded by a 3-D fast Fourier transform (3-D-FFT)-based method. Then, a CNN is proposed by building two novel blocks, i.e., the spatiotemporal deformable convolution (STDC) block and the adaptive spatiotemporal context-aware convolution (ASTCAC) block. Concretely, STDC is designed to cope with hand gestures' large spatiotemporal geometric transformations in the 5-D feature space. Moreover, ASTCAC is designed for modeling long-distance global relationships, e.g., relationships between pixels of the feature at the upper left corner and lower right corner, and exploring the global spatiotemporal context, in order to enhance the target feature representation and suppress interference. Finally, our presented method is verified on a large radar dataset, including 19 760 sets of 16 common hand gestures, collected by 19 subjects. Our method obtains a recognition rate of 99.53% on the validation dataset and that of 97.22% on the test dataset, which is significantly better than state-of-the-art methods.

KW - Azimuth

KW - Convolution

KW - Doppler effect

KW - Estimation

KW - Feature extraction

KW - Frequency-modulated continuous wave (FMCW) radar

KW - hand gesture recognition

KW - spatiotemporal context modeling

KW - spatiotemporal deformable convolution

KW - Spatiotemporal phenomena

KW - Three-dimensional displays

UR - http://www.scopus.com/inward/record.url?scp=85118552397&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2021.3122332

DO - 10.1109/TGRS.2021.3122332

M3 - Article

AN - SCOPUS:85118552397

SN - 0196-2892

VL - 60

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

ER -

FMCW Radar-Based Hand Gesture Recognition using Spatiotemporal Deformable and Context-Aware Convolutional 5D Feature Representation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this