HASE: Framework for efficient high-dimensional association analyses

G. V. Roshchupkin; H. H H Adams; M. W. Vernooij; A. Hofman; C. M. Van Duijn; M. A. Ikram; W. J. Niessen

doi:10.1038/srep36076

HASE: Framework for efficient high-dimensional association analyses

G. V. Roshchupkin, H. H H Adams, M. W. Vernooij, A. Hofman, C. M. Van Duijn, M. A. Ikram^*, W. J. Niessen

^*Corresponding author for this work

Research output: Contribution to journal › Article › Scientific › peer-review

15 Citations (Scopus)

42 Downloads (Pure)

Abstract

High-throughput technology can now provide rich information on a person's biological makeup and environmental surroundings. Important discoveries have been made by relating these data to various health outcomes in fields such as genomics, proteomics, and medical imaging. However, cross-investigations between several high-throughput technologies remain impractical due to demanding computational requirements (hundreds of years of computing resources) and unsuitability for collaborative settings (terabytes of data to share). Here we introduce the HASE framework that overcomes both of these issues. Our approach dramatically reduces computational time from years to only hours and also requires several gigabytes to be exchanged between collaborators. We implemented a novel meta-analytical method that yields identical power as pooled analyses without the need of sharing individual participant data. The efficiency of the framework is illustrated by associating 9 million genetic variants with 1.5 million brain imaging voxels in three cohorts (total N = 4,034) followed by meta-analysis, on a standard computational infrastructure. These experiments indicate that HASE facilitates high-dimensional association studies enabling large multicenter association studies for future discoveries.

Original language	English
Article number	36076
Journal	Scientific Reports
Volume	6
DOIs	https://doi.org/10.1038/srep36076
Publication status	Published - 26 Oct 2016

Keywords

Genome-wide association studies
Software

Access to Document

10.1038/srep36076

srep36076Final published version, 725 KBLicence: CC BY

Cite this

@article{14bdab7a6b744f8cb51e57da140b151f,

title = "HASE: Framework for efficient high-dimensional association analyses",

abstract = "High-throughput technology can now provide rich information on a person's biological makeup and environmental surroundings. Important discoveries have been made by relating these data to various health outcomes in fields such as genomics, proteomics, and medical imaging. However, cross-investigations between several high-throughput technologies remain impractical due to demanding computational requirements (hundreds of years of computing resources) and unsuitability for collaborative settings (terabytes of data to share). Here we introduce the HASE framework that overcomes both of these issues. Our approach dramatically reduces computational time from years to only hours and also requires several gigabytes to be exchanged between collaborators. We implemented a novel meta-analytical method that yields identical power as pooled analyses without the need of sharing individual participant data. The efficiency of the framework is illustrated by associating 9 million genetic variants with 1.5 million brain imaging voxels in three cohorts (total N = 4,034) followed by meta-analysis, on a standard computational infrastructure. These experiments indicate that HASE facilitates high-dimensional association studies enabling large multicenter association studies for future discoveries.",

keywords = "Genome-wide association studies, Software",

author = "Roshchupkin, {G. V.} and Adams, {H. H H} and Vernooij, {M. W.} and A. Hofman and {Van Duijn}, {C. M.} and Ikram, {M. A.} and Niessen, {W. J.}",

year = "2016",

month = oct,

day = "26",

doi = "10.1038/srep36076",

language = "English",

volume = "6",

journal = "Scientific Reports",

issn = "2045-2322",

publisher = "Nature",

}

TY - JOUR

T1 - HASE

T2 - Framework for efficient high-dimensional association analyses

AU - Roshchupkin, G. V.

AU - Adams, H. H H

AU - Vernooij, M. W.

AU - Hofman, A.

AU - Van Duijn, C. M.

AU - Ikram, M. A.

AU - Niessen, W. J.

PY - 2016/10/26

Y1 - 2016/10/26

N2 - High-throughput technology can now provide rich information on a person's biological makeup and environmental surroundings. Important discoveries have been made by relating these data to various health outcomes in fields such as genomics, proteomics, and medical imaging. However, cross-investigations between several high-throughput technologies remain impractical due to demanding computational requirements (hundreds of years of computing resources) and unsuitability for collaborative settings (terabytes of data to share). Here we introduce the HASE framework that overcomes both of these issues. Our approach dramatically reduces computational time from years to only hours and also requires several gigabytes to be exchanged between collaborators. We implemented a novel meta-analytical method that yields identical power as pooled analyses without the need of sharing individual participant data. The efficiency of the framework is illustrated by associating 9 million genetic variants with 1.5 million brain imaging voxels in three cohorts (total N = 4,034) followed by meta-analysis, on a standard computational infrastructure. These experiments indicate that HASE facilitates high-dimensional association studies enabling large multicenter association studies for future discoveries.

AB - High-throughput technology can now provide rich information on a person's biological makeup and environmental surroundings. Important discoveries have been made by relating these data to various health outcomes in fields such as genomics, proteomics, and medical imaging. However, cross-investigations between several high-throughput technologies remain impractical due to demanding computational requirements (hundreds of years of computing resources) and unsuitability for collaborative settings (terabytes of data to share). Here we introduce the HASE framework that overcomes both of these issues. Our approach dramatically reduces computational time from years to only hours and also requires several gigabytes to be exchanged between collaborators. We implemented a novel meta-analytical method that yields identical power as pooled analyses without the need of sharing individual participant data. The efficiency of the framework is illustrated by associating 9 million genetic variants with 1.5 million brain imaging voxels in three cohorts (total N = 4,034) followed by meta-analysis, on a standard computational infrastructure. These experiments indicate that HASE facilitates high-dimensional association studies enabling large multicenter association studies for future discoveries.

KW - Genome-wide association studies

KW - Software

UR - http://resolver.tudelft.nl/uuid:14bdab7a-6b74-4f8c-b51e-57da140b151f

UR - http://www.scopus.com/inward/record.url?scp=84992562845&partnerID=8YFLogxK

U2 - 10.1038/srep36076

DO - 10.1038/srep36076

M3 - Article

AN - SCOPUS:84992562845

SN - 2045-2322

VL - 6

JO - Scientific Reports

JF - Scientific Reports

M1 - 36076

ER -

HASE: Framework for efficient high-dimensional association analyses

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this