Genotype imputation estimates missing genotypes from the haplotype or genotype reference panel in individual genetic sequences, which boosts the potential of genome-wide association and is essential in genetic data analysis. However, the genetic sequences involve people's privacy, confirming an individual's identification and even disease information. This work proposes a secure genotype imputation model, which uses a linear regression model and the homomorphic encryption scheme over ciphertext to impute missing genotypes. The inference model is trained with float plaintext parameters, which are round into integers to avoid high complexity homomorphic evaluation on float number operations without bootstrapping operations. Even though the rounding parameters in the inference model are not the same as those in the trained model, We find that it will no effect on the outcome of the homomorphic prediction. Thus, a high-efficiency genotype imputation inference model over the ciphertext is obtained while keeping the high-security level. The simulation results indicate that the accuracy of the secure inference model is almost the same as the original model trained on float parameters. The secure inference model's accuracy is 98.6% for a single genotype.
|Number of pages||7|
|Journal||Journal of Information Security and Applications|
|Publication status||Published - 2023|
Bibliographical noteGreen Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
- Genetic security
- Genotype imputation
- Homomorphic encryption
- Privacy computing