Sparse Matrix Vector Multiplication (SpMV) is a key kernel in various domains, that is known to be difficult to parallelize efficiently due to the low spatial locality of data. This is problematic for computing large-scale SpMV due to limited cache sizes but also in achieving speedups through parallel execution. To address these issues, we present 1) sparstition, a novel partitioning scheme that enables computing SpMV without the need to do any major post-processing steps, and 2) a corresponding HLS-based hardware design that is able to perform large-scale SpMV efficiently. The design is pipelined so the matrix size is limited only by the size of the off-chip memory (DRAM) and not by the available on-chip memory (BRAMs). Our experimental results, performed on a ZedBoard, show that we achieve a computational throughput of up to 300 MFLOPS in single-precision and 108 MFLOPS in double-precision, an improvement of 2.6X on average compared to current state-of-the-art HLS results. Finally, we predict that sparstition can boost the computational throughput of HLS-based SpMV kernel to over 10 GFLOPS when using High Bandwidth Memories.