Susceptible-infected-spreading-based network embedding in static and temporal networks

Xiu Xiu Zhan; Ziyu Li; Naoki Masuda; Petter Holme; Huijuan Wang

doi:10.1140/epjds/s13688-020-00248-5

Susceptible-infected-spreading-based network embedding in static and temporal networks

Xiu Xiu Zhan, Ziyu Li, Naoki Masuda, Petter Holme, Huijuan Wang^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Scientific › peer-review

19 Citations (Scopus)

88 Downloads (Pure)

Abstract

Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low-dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be connected. Classic network embedding algorithms are random-walk-based. They sample trajectory paths via random walks and generate node pairs from the trajectory paths. The node pair set is further used as the input for a Skip-Gram model, a representative language model that embeds nodes (which are regarded as words) into vectors. In the present study, we propose to replace random walk processes by a spreading process, namely the susceptible-infected (SI) model, to sample paths. Specifically, we propose two susceptible-infected-spreading-based algorithms, i.e., Susceptible-Infected Network Embedding (SINE) on static networks and Temporal Susceptible-Infected Network Embedding (TSINE) on temporal networks. The performance of our algorithms is evaluated by the missing link prediction task in comparison with state-of-the-art static and temporal network embedding algorithms. Results show that SINE and TSINE outperform the baselines across all six empirical datasets. We further find that the performance of SINE is mostly better than TSINE, suggesting that temporal information does not necessarily improve the embedding for missing link prediction. Moreover, we study the effect of the sampling size, quantified as the total length of the trajectory paths, on the performance of the embedding algorithms. The better performance of SINE and TSINE requires a smaller sampling size in comparison with the baseline algorithms. Hence, SI-spreading-based embedding tends to be more applicable to large-scale networks.

Original language	English
Article number	30
Number of pages	20
Journal	EPJ Data Science
Volume	9
Issue number	1
DOIs	https://doi.org/10.1140/epjds/s13688-020-00248-5
Publication status	Published - 2020

Keywords

Link prediction
Network embedding
SI spreading process

Access to Document

10.1140/epjds/s13688-020-00248-5

s13688-020-00248-5Final published version, 2.34 MBLicence: CC BY

Cite this

@article{98b069102a4e44f6b341ed144ffbd484,

title = "Susceptible-infected-spreading-based network embedding in static and temporal networks",

abstract = "Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low-dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be connected. Classic network embedding algorithms are random-walk-based. They sample trajectory paths via random walks and generate node pairs from the trajectory paths. The node pair set is further used as the input for a Skip-Gram model, a representative language model that embeds nodes (which are regarded as words) into vectors. In the present study, we propose to replace random walk processes by a spreading process, namely the susceptible-infected (SI) model, to sample paths. Specifically, we propose two susceptible-infected-spreading-based algorithms, i.e., Susceptible-Infected Network Embedding (SINE) on static networks and Temporal Susceptible-Infected Network Embedding (TSINE) on temporal networks. The performance of our algorithms is evaluated by the missing link prediction task in comparison with state-of-the-art static and temporal network embedding algorithms. Results show that SINE and TSINE outperform the baselines across all six empirical datasets. We further find that the performance of SINE is mostly better than TSINE, suggesting that temporal information does not necessarily improve the embedding for missing link prediction. Moreover, we study the effect of the sampling size, quantified as the total length of the trajectory paths, on the performance of the embedding algorithms. The better performance of SINE and TSINE requires a smaller sampling size in comparison with the baseline algorithms. Hence, SI-spreading-based embedding tends to be more applicable to large-scale networks.",

keywords = "Link prediction, Network embedding, SI spreading process",

author = "Zhan, {Xiu Xiu} and Ziyu Li and Naoki Masuda and Petter Holme and Huijuan Wang",

year = "2020",

doi = "10.1140/epjds/s13688-020-00248-5",

language = "English",

volume = "9",

journal = "EPJ Data Science",

issn = "2193-1127",

publisher = "Springer",

number = "1",

}

TY - JOUR

T1 - Susceptible-infected-spreading-based network embedding in static and temporal networks

AU - Zhan, Xiu Xiu

AU - Li, Ziyu

AU - Masuda, Naoki

AU - Holme, Petter

AU - Wang, Huijuan

PY - 2020

Y1 - 2020

N2 - Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low-dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be connected. Classic network embedding algorithms are random-walk-based. They sample trajectory paths via random walks and generate node pairs from the trajectory paths. The node pair set is further used as the input for a Skip-Gram model, a representative language model that embeds nodes (which are regarded as words) into vectors. In the present study, we propose to replace random walk processes by a spreading process, namely the susceptible-infected (SI) model, to sample paths. Specifically, we propose two susceptible-infected-spreading-based algorithms, i.e., Susceptible-Infected Network Embedding (SINE) on static networks and Temporal Susceptible-Infected Network Embedding (TSINE) on temporal networks. The performance of our algorithms is evaluated by the missing link prediction task in comparison with state-of-the-art static and temporal network embedding algorithms. Results show that SINE and TSINE outperform the baselines across all six empirical datasets. We further find that the performance of SINE is mostly better than TSINE, suggesting that temporal information does not necessarily improve the embedding for missing link prediction. Moreover, we study the effect of the sampling size, quantified as the total length of the trajectory paths, on the performance of the embedding algorithms. The better performance of SINE and TSINE requires a smaller sampling size in comparison with the baseline algorithms. Hence, SI-spreading-based embedding tends to be more applicable to large-scale networks.

AB - Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low-dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be connected. Classic network embedding algorithms are random-walk-based. They sample trajectory paths via random walks and generate node pairs from the trajectory paths. The node pair set is further used as the input for a Skip-Gram model, a representative language model that embeds nodes (which are regarded as words) into vectors. In the present study, we propose to replace random walk processes by a spreading process, namely the susceptible-infected (SI) model, to sample paths. Specifically, we propose two susceptible-infected-spreading-based algorithms, i.e., Susceptible-Infected Network Embedding (SINE) on static networks and Temporal Susceptible-Infected Network Embedding (TSINE) on temporal networks. The performance of our algorithms is evaluated by the missing link prediction task in comparison with state-of-the-art static and temporal network embedding algorithms. Results show that SINE and TSINE outperform the baselines across all six empirical datasets. We further find that the performance of SINE is mostly better than TSINE, suggesting that temporal information does not necessarily improve the embedding for missing link prediction. Moreover, we study the effect of the sampling size, quantified as the total length of the trajectory paths, on the performance of the embedding algorithms. The better performance of SINE and TSINE requires a smaller sampling size in comparison with the baseline algorithms. Hence, SI-spreading-based embedding tends to be more applicable to large-scale networks.

KW - Link prediction

KW - Network embedding

KW - SI spreading process

UR - http://www.scopus.com/inward/record.url?scp=85092762066&partnerID=8YFLogxK

U2 - 10.1140/epjds/s13688-020-00248-5

DO - 10.1140/epjds/s13688-020-00248-5

M3 - Article

AN - SCOPUS:85092762066

SN - 2193-1127

VL - 9

JO - EPJ Data Science

JF - EPJ Data Science

IS - 1

M1 - 30

ER -

Susceptible-infected-spreading-based network embedding in static and temporal networks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this