Unsupervised learning used in automatic detection and classification of ambient-noise recordings from a large-n array

Michał Chamarczuk; Yohei Nishitsuji; Michał Malinowski; Deyan Draganov

doi:10.1785/0220190063

Unsupervised learning used in automatic detection and classification of ambient-noise recordings from a large-n array

Michał Chamarczuk^*, Yohei Nishitsuji, Michał Malinowski, Deyan Draganov

^*Corresponding author for this work

Applied Geophysics and Petrophysics

Research output: Contribution to journal › Article › Scientific › peer-review

11 Citations (Scopus)

Abstract

We present a method for automatic detection and classification of seismic events from continuous ambient-noise (AN) recordings using an unsupervised machine-learning (ML) approach. We combine classic and recently developed array-processing techniques with ML enabling the use of unsupervised techniques in the routine processing of continuous data. We test our method on a dataset from a large-number (large-N) array, which was deployed over the Kylylahti underground mine (Finland), and show the potential to automatically process and cluster the volumes of AN data. Automatic sorting of detected events into different classes allows faster data analysis and facilitates the selection of desired parts of the wavefield for imaging (e.g., using seismic interferometry) and monitoring. First, using array-processing techniques, we obtain directivity, location, velocity, and frequency representations of AN data. Next, we transform these representations into vector-shaped matrices. The transformed data are input into a clustering algorithm (called k-means) to define groups of similar events, and optimization methods are used to obtain the optimal number of clusters (called elbow and silhouette tests). We use these techniques to obtain the optimal number of classes that characterize the AN recordings and consequently assign the proper class membership (cluster) to each data sample. For the Kylylahti AN, the unsupervised clustering produced 40 clusters. After visual inspection of events belonging to different clusters that were quality controlled by the silhouette method, we confirm the reliability of 10 clusters with a prediction accuracy higher than 90%. The obtained division into separate seismic-event classes proves the feasibility of the unsupervised ML approach to advance the automation of processing and the utilization of array AN data. Our workflow is very flexible and can be easily adapted for other input features and classification algorithms.

Original language	English
Pages (from-to)	370-389
Number of pages	20
Journal	Seismological Research Letters
Volume	91
Issue number	1
DOIs	https://doi.org/10.1785/0220190063
Publication status	Published - 2019

Access to Document

10.1785/0220190063

Cite this

@article{59fefff6a14449a3902f09796a151228,

title = "Unsupervised learning used in automatic detection and classification of ambient-noise recordings from a large-n array",

abstract = "We present a method for automatic detection and classification of seismic events from continuous ambient-noise (AN) recordings using an unsupervised machine-learning (ML) approach. We combine classic and recently developed array-processing techniques with ML enabling the use of unsupervised techniques in the routine processing of continuous data. We test our method on a dataset from a large-number (large-N) array, which was deployed over the Kylylahti underground mine (Finland), and show the potential to automatically process and cluster the volumes of AN data. Automatic sorting of detected events into different classes allows faster data analysis and facilitates the selection of desired parts of the wavefield for imaging (e.g., using seismic interferometry) and monitoring. First, using array-processing techniques, we obtain directivity, location, velocity, and frequency representations of AN data. Next, we transform these representations into vector-shaped matrices. The transformed data are input into a clustering algorithm (called k-means) to define groups of similar events, and optimization methods are used to obtain the optimal number of clusters (called elbow and silhouette tests). We use these techniques to obtain the optimal number of classes that characterize the AN recordings and consequently assign the proper class membership (cluster) to each data sample. For the Kylylahti AN, the unsupervised clustering produced 40 clusters. After visual inspection of events belonging to different clusters that were quality controlled by the silhouette method, we confirm the reliability of 10 clusters with a prediction accuracy higher than 90%. The obtained division into separate seismic-event classes proves the feasibility of the unsupervised ML approach to advance the automation of processing and the utilization of array AN data. Our workflow is very flexible and can be easily adapted for other input features and classification algorithms.",

author = "Micha{\l} Chamarczuk and Yohei Nishitsuji and Micha{\l} Malinowski and Deyan Draganov",

year = "2019",

doi = "10.1785/0220190063",

language = "English",

volume = "91",

pages = "370--389",

journal = "Seismological Research Letters",

issn = "0895-0695",

publisher = "Seismological Society of America",

number = "1",

}

TY - JOUR

T1 - Unsupervised learning used in automatic detection and classification of ambient-noise recordings from a large-n array

AU - Chamarczuk, Michał

AU - Nishitsuji, Yohei

AU - Malinowski, Michał

AU - Draganov, Deyan

PY - 2019

Y1 - 2019

N2 - We present a method for automatic detection and classification of seismic events from continuous ambient-noise (AN) recordings using an unsupervised machine-learning (ML) approach. We combine classic and recently developed array-processing techniques with ML enabling the use of unsupervised techniques in the routine processing of continuous data. We test our method on a dataset from a large-number (large-N) array, which was deployed over the Kylylahti underground mine (Finland), and show the potential to automatically process and cluster the volumes of AN data. Automatic sorting of detected events into different classes allows faster data analysis and facilitates the selection of desired parts of the wavefield for imaging (e.g., using seismic interferometry) and monitoring. First, using array-processing techniques, we obtain directivity, location, velocity, and frequency representations of AN data. Next, we transform these representations into vector-shaped matrices. The transformed data are input into a clustering algorithm (called k-means) to define groups of similar events, and optimization methods are used to obtain the optimal number of clusters (called elbow and silhouette tests). We use these techniques to obtain the optimal number of classes that characterize the AN recordings and consequently assign the proper class membership (cluster) to each data sample. For the Kylylahti AN, the unsupervised clustering produced 40 clusters. After visual inspection of events belonging to different clusters that were quality controlled by the silhouette method, we confirm the reliability of 10 clusters with a prediction accuracy higher than 90%. The obtained division into separate seismic-event classes proves the feasibility of the unsupervised ML approach to advance the automation of processing and the utilization of array AN data. Our workflow is very flexible and can be easily adapted for other input features and classification algorithms.

AB - We present a method for automatic detection and classification of seismic events from continuous ambient-noise (AN) recordings using an unsupervised machine-learning (ML) approach. We combine classic and recently developed array-processing techniques with ML enabling the use of unsupervised techniques in the routine processing of continuous data. We test our method on a dataset from a large-number (large-N) array, which was deployed over the Kylylahti underground mine (Finland), and show the potential to automatically process and cluster the volumes of AN data. Automatic sorting of detected events into different classes allows faster data analysis and facilitates the selection of desired parts of the wavefield for imaging (e.g., using seismic interferometry) and monitoring. First, using array-processing techniques, we obtain directivity, location, velocity, and frequency representations of AN data. Next, we transform these representations into vector-shaped matrices. The transformed data are input into a clustering algorithm (called k-means) to define groups of similar events, and optimization methods are used to obtain the optimal number of clusters (called elbow and silhouette tests). We use these techniques to obtain the optimal number of classes that characterize the AN recordings and consequently assign the proper class membership (cluster) to each data sample. For the Kylylahti AN, the unsupervised clustering produced 40 clusters. After visual inspection of events belonging to different clusters that were quality controlled by the silhouette method, we confirm the reliability of 10 clusters with a prediction accuracy higher than 90%. The obtained division into separate seismic-event classes proves the feasibility of the unsupervised ML approach to advance the automation of processing and the utilization of array AN data. Our workflow is very flexible and can be easily adapted for other input features and classification algorithms.

UR - http://www.scopus.com/inward/record.url?scp=85077123140&partnerID=8YFLogxK

U2 - 10.1785/0220190063

DO - 10.1785/0220190063

M3 - Article

AN - SCOPUS:85077123140

SN - 0895-0695

VL - 91

SP - 370

EP - 389

JO - Seismological Research Letters

JF - Seismological Research Letters

IS - 1

ER -

Unsupervised learning used in automatic detection and classification of ambient-noise recordings from a large-n array

Abstract

Access to Document

Other files and links

Fingerprint

Cite this