TY - THES
T1 - Approximations and transformations of piecewise deterministic Monte Carlo algorithms
AU - Bertazzi, A.
PY - 2023
Y1 - 2023
N2 - This thesis studies methods to improve the applicability and the performance of Markov Chain Monte Carlo (MCMC) algorithms based on Piecewise Deterministic Markov processes (PDMPs). First, we discuss the key ideas that lay the foundations of the field of MCMC, spanning from the Metropolis-Hastings algorithm to PDMC methods, emphasising a common structure underlying most non-reversible MCMC algorithms studied in the literature. The rest of the thesis is divided in two parts, respectively treating approximations and transformations of PDMC algorithms. In the first part we introduce several discretisation schemes that approximate a given PDMP and study the properties of the proposed algorithms in detail. This area is of fundamental importance to make PDMPs widely applicable, as indeed the PDMPs considered in the MCMC literature typically cannot be simulated exactly because of either complicated deterministic dynamics or because the random event times are distributed according to an exponential distribution with non-homogeneous rate. In the latter case, existing approaches to simulate the random event times are applicable exclusively when the rate is of simple form, a requirement that covers only toy models from the MCMC literature. In this thesis we introduce and study a wide variety of time discretisations of PDMPs of any order of accuracy, which can now be used as a basis for MCMC algorithms. We study two types of discretisations: the first kind is obtained generalising the principle behind classical Euler schemes, while the second is based on splitting schemes.In both settings, we establish the dependence of the error on the step size of the discretisation. For suitable Euler schemes we prove uniform in time estimates on the weak error, a particularly challenging result which gives that the error is fully controlled by the step size and does not depend on the time horizon. Moreover, for approximations of PDMPs obtained with Euler-based schemes we obtain error bounds in Wasserstein and total variation distance using the coupling approach.For our approximations based on splitting schemes we mainly focus on the Zig-Zag sampler (ZZS) and Bouncy Particle Sampler (BPS) and study the best splitting scheme in terms of bias in the invariant measure. For both samplers we obtain conditions ensuring existence and uniqueness of a stationary distribution for the approximation process, as well as exponential convergence to such a distribution. Importantly, we show that symmetric splitting schemes are of second order, although they only require one computation of the gradient of the negative log-likelihood per iteration. Another important novelty we introduce is the possibility to correct the introduced bias via a skew-reversible Metropolis-Hastings acceptance-rejection step. This allows us to design the first unbiased, PDMP-based MCMC algorithms that can be applied effortlessly to sample from any target probability distribution. Our numerical experiments show that the remarkable properties of PDMPs give their approximations excellent convergence properties improving over benchmark methods such as Hamiltonian Monte Carlo and the unadjusted Langevin algorithm. The second part of the thesis concerns transformations of PDMPs. First, we discuss space transformations of PDMPs, in which case the main goal is to improve the performance of PDMC algorithms when the target distribution $\pi$ is anisotropic. Our proposal is to design PDMC algorithms that learn adaptively the covariance structure of $\pi$ and use this information to tune the velocity of the underlying PDMP, i.e. the directions that the PDMP is more likely to explore. Finding a good set of directions requires knowledge of the target $\pi$, and hence information on previous positions of the process needs to be used. In a similar fashion, we introduce adaptive PDMC algorithms which automatically tune the refreshment rate of the process, i.e. the frequency at which the current velocity vector is replaced with an independent draw from a suitable distribution. For these algorithms we carefully study the convergence to the target distribution by establishing ergodicity, which is challenging for such non-homogeneous Markov processes. Moreover, we test our algorithms on some benchmark examples, on which we observe relevant improvements over the standard, non-adaptive samplers.In the last chapter of the thesis we consider time transformations of (piecewise deterministic) Markov processes, with an emphasis on improving the convergence of MCMC algorithms. In particular, we study the effect on the properties of a Markov process of a change of the speed of time, where importantly changes in speed depend on the state of the process. This notion can prove helpful in the context of multimodal target distributions, in which case we argue that communication between different modes can be improved by increasing the speed of time when the process is located in low density regions. We connect various properties of a process to those of a related time-changed process, such as a connection between the stationary distributions, the generators, non-explosivity, ergodicity and rate of convergence to the limiting distribution. For PDMPs we show that suitable time transformations can make a geometrically ergodic Markov process uniformly ergodic, a remarkable property which means that the initialisation of the process does not affect the speed of convergence. We apply our theorem to time transformations of the Zig-Zag process, demonstrating the applicability of our conditions. By applying this framework to PDMPs we define several novel processes which have dynamics depending on a user-chosen, interpretable speed function.
AB - This thesis studies methods to improve the applicability and the performance of Markov Chain Monte Carlo (MCMC) algorithms based on Piecewise Deterministic Markov processes (PDMPs). First, we discuss the key ideas that lay the foundations of the field of MCMC, spanning from the Metropolis-Hastings algorithm to PDMC methods, emphasising a common structure underlying most non-reversible MCMC algorithms studied in the literature. The rest of the thesis is divided in two parts, respectively treating approximations and transformations of PDMC algorithms. In the first part we introduce several discretisation schemes that approximate a given PDMP and study the properties of the proposed algorithms in detail. This area is of fundamental importance to make PDMPs widely applicable, as indeed the PDMPs considered in the MCMC literature typically cannot be simulated exactly because of either complicated deterministic dynamics or because the random event times are distributed according to an exponential distribution with non-homogeneous rate. In the latter case, existing approaches to simulate the random event times are applicable exclusively when the rate is of simple form, a requirement that covers only toy models from the MCMC literature. In this thesis we introduce and study a wide variety of time discretisations of PDMPs of any order of accuracy, which can now be used as a basis for MCMC algorithms. We study two types of discretisations: the first kind is obtained generalising the principle behind classical Euler schemes, while the second is based on splitting schemes.In both settings, we establish the dependence of the error on the step size of the discretisation. For suitable Euler schemes we prove uniform in time estimates on the weak error, a particularly challenging result which gives that the error is fully controlled by the step size and does not depend on the time horizon. Moreover, for approximations of PDMPs obtained with Euler-based schemes we obtain error bounds in Wasserstein and total variation distance using the coupling approach.For our approximations based on splitting schemes we mainly focus on the Zig-Zag sampler (ZZS) and Bouncy Particle Sampler (BPS) and study the best splitting scheme in terms of bias in the invariant measure. For both samplers we obtain conditions ensuring existence and uniqueness of a stationary distribution for the approximation process, as well as exponential convergence to such a distribution. Importantly, we show that symmetric splitting schemes are of second order, although they only require one computation of the gradient of the negative log-likelihood per iteration. Another important novelty we introduce is the possibility to correct the introduced bias via a skew-reversible Metropolis-Hastings acceptance-rejection step. This allows us to design the first unbiased, PDMP-based MCMC algorithms that can be applied effortlessly to sample from any target probability distribution. Our numerical experiments show that the remarkable properties of PDMPs give their approximations excellent convergence properties improving over benchmark methods such as Hamiltonian Monte Carlo and the unadjusted Langevin algorithm. The second part of the thesis concerns transformations of PDMPs. First, we discuss space transformations of PDMPs, in which case the main goal is to improve the performance of PDMC algorithms when the target distribution $\pi$ is anisotropic. Our proposal is to design PDMC algorithms that learn adaptively the covariance structure of $\pi$ and use this information to tune the velocity of the underlying PDMP, i.e. the directions that the PDMP is more likely to explore. Finding a good set of directions requires knowledge of the target $\pi$, and hence information on previous positions of the process needs to be used. In a similar fashion, we introduce adaptive PDMC algorithms which automatically tune the refreshment rate of the process, i.e. the frequency at which the current velocity vector is replaced with an independent draw from a suitable distribution. For these algorithms we carefully study the convergence to the target distribution by establishing ergodicity, which is challenging for such non-homogeneous Markov processes. Moreover, we test our algorithms on some benchmark examples, on which we observe relevant improvements over the standard, non-adaptive samplers.In the last chapter of the thesis we consider time transformations of (piecewise deterministic) Markov processes, with an emphasis on improving the convergence of MCMC algorithms. In particular, we study the effect on the properties of a Markov process of a change of the speed of time, where importantly changes in speed depend on the state of the process. This notion can prove helpful in the context of multimodal target distributions, in which case we argue that communication between different modes can be improved by increasing the speed of time when the process is located in low density regions. We connect various properties of a process to those of a related time-changed process, such as a connection between the stationary distributions, the generators, non-explosivity, ergodicity and rate of convergence to the limiting distribution. For PDMPs we show that suitable time transformations can make a geometrically ergodic Markov process uniformly ergodic, a remarkable property which means that the initialisation of the process does not affect the speed of convergence. We apply our theorem to time transformations of the Zig-Zag process, demonstrating the applicability of our conditions. By applying this framework to PDMPs we define several novel processes which have dynamics depending on a user-chosen, interpretable speed function.
KW - MCMC algorithms
KW - non-reversibility
KW - Piecewise deterministic Markov processes
KW - Bayesian statistics
KW - computational statistics
U2 - 10.4233/uuid:c53da6a5-948a-490d-9061-1f650f7a6125
DO - 10.4233/uuid:c53da6a5-948a-490d-9061-1f650f7a6125
M3 - Dissertation (TU Delft)
ER -