Network embedding aims at learning node representation by preserving the network topology. Previous embedding methods do not scale for large real-world networks which usually contain millions of nodes. They generally adopt a one-size-fits-all strategy to collect information, resulting in a large amount of redundancy. In this paper, we propose DiaRW, a scalable network embedding method based on a degree-biased random walk with variable length to sample context information for learning. Our walk strategy can well adapt to the scale-free feature of real-world networks and extract information from them with much less redundancy. In addition, our method can greatly reduce the size of context information, which is efficient for large-scale network embedding. Empirical experiments on node classification and link prediction prove not only the effectiveness but also the efficiency of DiaRW on a variety of real-world networks. Our algorithm is able to learn the network representations with millions of nodes and edges in hours on a single machine, which is tenfold faster than previous methods.
- Network embedding
- Random walks