So-called tap-in–tap-off smart card data have become increasingly available and popular as a result of the deployment of automatic fare collection systems on transit systems in many cities and areas worldwide. An opportunity to obtain much more accurate transit demand data than before has thus been opened to both researchers and practitioners. However, given that travelers in some cases can choose different origin and destination stations, as well as different transit lines, depending on their personal acceptable walking distances, being able to aggregate the demand of spatially close stations becomes essential when transit demand matrices are constructed. With the aim of investigating such problems using data-driven approaches, this paper proposes a k-means-based station aggregation method that can quantitatively determine the partitioning by considering both flow and spatial distance information. The method was applied to a case study of Haaglanden, Netherlands, with a specified objective of maximizing the ratio of average intra-cluster flow to average inter-cluster flow while maintaining the spatial compactness of all clusters. With a range of clustering of different K performed first using the distance feature, a distance-based metric and a flow-based metric were then computed and ultimately combined to determine the optimal number of clusters. The transit demand matrices constructed by implementing this method lay a foundation for further studies on short-term transit demand prediction and demand assignment.