Randomized benchmarking (RB) is an efficient and robust method to characterize gate errors in quantum circuits. Averaging over random sequences of gates leads to estimates of gate errors in terms of the average fidelity. These estimates are isolated from the state preparation and measurement errors that plague other methods such as channel tomography and direct fidelity estimation. A decisive factor in the feasibility of randomized benchmarking is the number of sampled sequences required to obtain rigorous confidence intervals. Previous bounds were either prohibitively loose or required the number of sampled sequences to scale exponentially with the number of qubits in order to obtain a fixed confidence interval at a fixed error rate. Here, we show that, with a small adaptation to the randomized benchmarking procedure, the number of sampled sequences required for a fixed confidence interval is dramatically smaller than could previously be justified. In particular, we show that the number of sampled sequences required is essentially independent of the number of qubits and scales favorably with the average error rate of the system under investigation. We also investigate the fitting procedure inherent to randomized benchmarking in light of our results and find that standard methods such as ordinary least squares optimization can give misleading results. We therefore recommend moving to more sophisticated fitting methods such as iteratively reweighted least squares optimization. Our results bring rigorous randomized benchmarking on systems with many qubits into the realm of experimental feasibility.