Topics in Causal Inference and Privacy

Research output: ThesisDissertation (TU Delft)

46 Downloads (Pure)

Abstract

Chapter 1 (Caliper Matching): Caliper matching is used to estimate causal effects of a binary treatment from observational data by comparing matched treated and control units. Units are matched when their propensity scores, the conditional probability of receiving treatment given pretreatment covariates, are within a certain distance called caliper. So far, theoretical results on caliper matching are lacking, leaving practitioners with ad-hoc caliper choices and inference procedures. We bridge this gap by proposing a caliper that balances the quality and the number of matches. We prove that the resulting estimator of the average treatment effect, and average treatment effect on the treated, is asymptotically unbiased and normal at parametric rate. We describe the conditions under which semiparametric efficiency is obtainable, and show that when the parametric propensity score is estimated, the variance is increased for both estimands. Finally, we construct asymptotic confidence intervals for the two estimands.

Chapter 2 (Combining Experimental And Observational Data: The APOLLO Trial): In the APOLLO trial (Tol et al., 2022, 2024), we inferred the causal effects of two hip fracture treatments, Posterolateral Approach (PLA) and Direct Lateral Approach (DLA), on health outcomes of patients in the Netherlands. The starting point of the inference was a Randomised Experiment (RE), where patients were randomly assigned to PLA or DLA, independently of their baseline characteristics. In addition, data from a `Natural Experiment' (NE, or observational data) were also collected, under the plausible assumption that therein the allocation to PLA or DLA can be considered as good as random conditional on the patients' baseline characteristics. We estimated the average treatment effects of DLA versus PLA in the RE and the NE data separately, using a flexible and asymptotically efficient estimation strategy. We found no significant difference between PLA and DLA in any of the RE or the NE datasets. To improve the precision of the inference by increasing the sample size, we tested whether the RE and the NE datasets can be combined. Having found no evidence against combination, we estimated the average treatment effect on the combined dataset as well. Despite the improved precision, there was still no significant difference between PLA and DLA. Our conclusions were weakened by missing data, but they proved to be robust to our approach in handling the missingness and to our estimation strategy.

Chapter 3 (Private Double Robust Inference): Privacy mechanisms preserve the privacy of individuals in a sample by injecting noise into their sensitive data in a controlled manner, revealing only the noisy, privatised data to the statistician for inference purposes. The inference of a parameter exhibits a rate double robustness property when the large-sample bias of an estimator of the parameter is characterised by the product of the estimation errors of two other, auxiliary (or nuisance), often infinite-dimensional, parameters. We propose a novel class of rate double robust parameters whose novelty lies in the potentially nonlinear but smooth dependence on a low-dimensional regression parameter. Among others, this includes average treatment effects. We show that the properties of the sensitive-data model carry over to the privatised-data model by a suitable choice of the privacy mechanism, which, in general, means a total-variationally private mechanism. In particular, the double robustness property is retained, enabling efficient estimation from the privatised sample. We also find that the estimation of the nuisance parameters is not harder, albeit possibly less efficient and computationally more demanding, from the privatised sample compared to the sensitive sample for a given, suitable privacy mechanism. Indeed, if the estimation is feasible from the sensitive sample by some procedure, we can directly transport that procedure to the privatised setting. Lastly, we develop a private method of moments estimator for parametric models. This shows that in the private setting a parametric assumption about one nuisance parameter affords more flexible modelling and slower estimation of the other one, as is the well-known case in the nonprivate setting.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Delft University of Technology
Supervisors/Advisors
  • van der Vaart, A.W., Promotor
  • van der Pas, S. L., Copromotor, External person
Award date5 Jun 2025
DOIs
Publication statusPublished - 2025

Keywords

  • caliper matching
  • radius matching
  • privacy
  • double robust inference
  • privacy-preserving inference
  • semiparametric efficiency
  • data fusion
  • APOLLO trial

Fingerprint

Dive into the research topics of 'Topics in Causal Inference and Privacy'. Together they form a unique fingerprint.

Cite this