全基因组关联分析标准化流程的构建与扩展应用

doi:10.3969/j.issn.1673-5765.2025.06.002

中国卒中杂志 ›› 2025, Vol. 20 ›› Issue (6): 664-674.DOI: 10.3969/j.issn.1673-5765.2025.06.002

全基因组关联分析标准化流程的构建与扩展应用

许喆^1，2，石延枫^1，2，张杰^1，2，姜明慧^1，2，刘阳^1，2，李昊^1，2，3，廖晓凌⁴，程丝^1，2，3

1 北京 100070 首都医科大学附属北京天坛医院，国家神经系统疾病临床医学研究中心，卒中多组学创新中心
2 脑血管病药械研发北京市重点实验室
3 首都医科大学卒中精准临床诊疗与研究中心
4 首都医科大学附属北京天坛医院神经病学中心

收稿日期:2025-05-04 修回日期:2025-05-25 接受日期:2025-06-02 出版日期:2025-06-20 发布日期:2025-06-20
通讯作者: 程丝 sicheng@ncrcnd.org.cn 廖晓凌 liao828@sina.com
基金资助:
国家重点研发计划（2022YFE0209600）
国家自然科学基金（82471304）
中国科协青年人才托举工程（2023QNRC001）

Development and Extended Applications of Standardized Processes for Genome-Wide Association Studies

XU Zhe^1,2, SHI Yanfeng^1,2, ZHANG Jie^1,2, JIANG Minghui^1,2, LIU Yang^1,2, LI Hao^1,2,3, LIAO Xiaoling⁴, CHENG Si^1,2,3

1 Center of Excellence for Omics Research, China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
2 Beijing Key Laboratory of Drug and Device Research and Development for Cerebrovascular Diseases, Beijing 100070, China
3 Clinical Center for Precision Medicine in Stroke, Capital Medical University, Beijing 100070, China
4 Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China

Received:2025-05-04 Revised:2025-05-25 Accepted:2025-06-02 Online:2025-06-20 Published:2025-06-20
Contact: CHENG Si, E-mail: sicheng@ncrcnd.org.cn LIAO Xiaoling, E-mail: liao828@sina.com

摘要/Abstract

摘要： 目的构建GWAS标准化流程及多组学分析体系框架，为基于多组学队列的脑血管病药物逆向研发提供高效分析方法。
方法基于国际GWAS质量控制标准与多组学整合与分析策略，构建模块化的分析体系。GWAS前数据质量控制模块：对样本和变异检出率、群体遗传结构与分层、亲缘关系等进行严格质量控制。在合格样本组成的群体中，保留次要等位基因频率＞0.5%的遗传变异用于GWAS。关联分析模块：利用PLINK、SAIGE和Regenie等软件，使用广义线性模型与广义线性混合模型进行GWAS操作。通过基因组膨胀系数和分位数-分位数图评估GWAS质量。使用中国国家卒中登记Ⅲ的全基因组测序和临床数据，对该模块进行测试。多组学分析模块：整合多基因风险评分、跨队列meta分析、孟德尔随机化及共定位分析等流程，为利用GWAS结果进行分子机制解析和靶点筛选提供支持。
结果本研究搭建的GWAS前数据质量控制模块主要从遗传数据质量和群体遗传两方面对数据进行GWAS前质量控制和评估。经过质量控制，有9632例和7265例样本分别被纳入基线TG水平、卒中后3个月死亡两个表型的GWAS。GWAS结果显示，不同软件得到的曼哈顿图趋势较为接近，但在病例-对照样本存在较大偏倚时，SAIGE软件相比于PLINK和Regenie软件校正适度、统计检验方法相对稳健。在多组学分析模块中，构建了包含多基因风险评分、meta分析、孟德尔随机化和共定位分析等多个标准化分析流程，用以开展对GWAS结果的深入挖掘。
结论本研究建立的GWAS标准化流程具有模块化、扩展性强等特点，能够满足复杂表型分析和多组学数据整合与分析的需求，为基于遗传关联的药物逆向研发提供了方法学基础。

文章导读： 本研究构建的GWAS标准化、可扩展流程将加速脑血管病药物逆向研发，提升多组学大数据在脑血管病药物逆向研发中的转化效能。

关键词: 全基因组关联分析; 多组学; 卒中; 药物研发; 生物信息学

Abstract: Objective To develop standardized workflow for GWAS and multi-omics analysis frameworks, providing an efficient analytical pipeline for pharmaceutical reverse engineering of cerebrovascular diseases using multi-omics cohorts.
Methods A modular analysis system was constructed based on international GWAS quality control standards and multi-omics integration strategies. Pre-GWAS data quality control module: this module performed stringent quality control on sample and variant call rates, population genetic structure and stratification, and kinship. In the population composed of qualified samples, genetic variants with a minor allele frequency＞0.5% were retained for GWAS. Association analysis module: using software such as PLINK, SAIGE, and Regenie, GWAS was performed utilizing generalized linear models and generalized linear mixed models. The quality of GWAS was evaluated by the genome inflation coefficient and quantile-quantile plots. The module was tested using whole-genome sequencing and clinical data from the China national stroke registry Ⅲ. Multi-omics analysis module: this module integrated polygenic risk score, cross-cohort meta-analysis, Mendelian randomization, and colocalization analysis procedures, providing support for molecular mechanism interpretation and target screening using GWAS results.
Results The pre-GWAS data quality control module established in this study conducts pre-GWAS quality control and assessment from the aspects of genetic data quality and population genetics. After quality control, 9632 and 7265 samples were included in the GWAS of baseline TG levels and 3-month post-stroke mortality phenotypes, respectively. The GWAS results showed that the trends of Manhattan plots obtained from different software were similar. However, compared to PLINK and Regenie, SAIGE software offered more appropriate correction and relatively robust statistical testing, especially when case-control samples were biased. In the multi-omics analysis module, standardized analysis processes including polygenic risk score, meta-analysis, Mendelian randomization, and colocalization analysis were developed to enable in-depth exploration of GWAS results.
Conclusions The GWAS standardization processes established in this study are characterized by modularity and high scalability, enabling comprehensive analysis of complex phenotypes and multi-omics data. These processes provide a methodological foundation for exploration of pharmaceutical reverse engineering based on genetic association.

Key words: Genome-wide association study; Multi-omics; Stroke; Drug development; Bioinformatics

中图分类号:

许喆, 石延枫, 张杰, 姜明慧, 刘阳, 李昊, 廖晓凌, 程丝. 全基因组关联分析标准化流程的构建与扩展应用[J]. 中国卒中杂志, 2025, 20(6): 664-674.

XU Zhe, SHI Yanfeng, ZHANG Jie, JIANG Minghui, LIU Yang, LI Hao, LIAO Xiaoling, CHENG Si. Development and Extended Applications of Standardized Processes for Genome-Wide Association Studies[J]. Chinese Journal of Stroke, 2025, 20(6): 664-674.

参考文献

[1] DIMASI J A，GRABOWSKI H G，HANSEN R W. Innovation in the pharmaceutical industry：new estimates of R&D costs[J/OL]. J Health Econ，2016，47：20-33[2025-05-01]. https://doi.org/10.1016/j.jhealeco.2016.01.012.
[2] 刘昌孝. 精准药学：从转化医学到精准医学探讨新药发展[J]. 药物评价研究，2016，39（1）：1-18.
LIU C X. Precision pharmacy：investigating new drug research and development from translational medicine to precision medicine[J]. Drug Evaluation Research，2016，39（1）：1-18.
[3] VIAYNA E，SOLA I，DI PIETRO O，et al. Human disease and drug pharmacology，complex as real life[J]. Curr Med Chem，2013，20（13）：1623-1634.
[4] FINAN C，GAULTON A，KRUGER F A，et al. The druggable genome and support for target identification and validation in drug development[J/OL]. Sci Transl Med，2017，9（383）：eaag1166[2025-05-01]. https://doi.org/10.1126/scitranslmed.aag1166.
[5] SUN D X，GAO W，HU H X，et al. Why 90% of clinical drug development fails and how to improve it？[J]. Acta Pharm Sin B，2022，12（7）：3049-3062.
[6] MONTANER J，RAMIRO L，SIMATS A，et al. Multilevel omics for the discovery of biomarkers and therapeutic targets for stroke[J]. Nat Rev Neurol，2020，16（5）：247-264.
[7] WANG Y H，MICHAEL S，YANG S M，et al. Retro drug design：from target properties to molecular structures[J]. J Chem Inf Model，2022，62（11）：2659-2669.
[8] UFFELMANN E，HUANG Q Q，MUNUNG N S，et al. Genome-wide association studies[J/OL]. Nat Rev Methods Primers，2021，1（1）：59[2025-05-01]. https://doi.org/10.1038/s43586-021-00056-9.
[9] KANG H E，PAN S Y，LIN S Q，et al. PharmGWAS：a GWAS-based knowledgebase for drug repurposing[J/OL]. Nucleic Acids Res，2024，52（D1）：D972-D979[2025-05-01]. https://doi.org/10.1093/nar/gkad832.
[10] DE KLEIN N，TSAI E A，VOCHTELOO M，et al. Brain expression quantitative trait locus and network analyses reveal downstream effects and putative drivers for brain-related diseases[J]. Nat Genet，2023，55（3）：377-388.
[11] DUERR R H，TAYLOR K D，BRANT S R，et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene[J]. Science，2006，314（5804）：1461-1463.
[12] WANG Y J，JING J，MENG X，et al. The third China national stroke registry（CNSR-Ⅲ）for patients with acute ischaemic stroke or transient ischaemic attack：design，rationale and baseline patient characteristics[J]. Stroke Vasc Neurol，2019，4（3）：158-164.
[13] CHENG S，XU Z，BIAN S Z，et al. The STROMICS genome study：deep whole-genome sequencing and analysis of 10 k Chinese patients with ischemic stroke reveal complex genetic and phenotypic interplay[J/OL]. Cell Discov，2023，9（1）：75[2025-05-01]. https://doi.org/10.1038/s41421-023-00582-8.
[14] CHANG C C，CHOW C C，TELLIER L C，et al. Second-generation PLINK：rising to the challenge of larger and richer datasets[J/OL]. Gigascience，2015，4：7[2025-05-01]. https://doi.org/10.1186/s13742-015-0047-8.
[15] ZHOU W，NIELSEN J B，FRITSCHE L G，et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies[J]. Nat Genet，2018，50（9）：1335-1341.
[16] MBATCHOU J，BARNARD L，BACKMAN J，et al. Computationally efficient whole-genome regression for quantitative and binary traits[J]. Nat Genet，2021，53（7）：1097-1103.
[17] CHENG S，XU Z，LIU Y，et al. Whole genome sequencing of 10 k patients with acute ischaemic stroke or transient ischaemic attack：design，methods and baseline patient characteristics[J]. Stroke Vasc Neurol，2021，6（2）：291-297.
[18] ALDANA R，FREED D. Data processing and germline variant calling with the sentieon pipeline[J/OL]. Methods Mol Biol，2022，2493：1-19[2025-05-01]. https://doi.org/10.1007/978-1-0716-2293-3_1.
[19] MANICHAIKUL A，MYCHALECKYJ J C，RICH S S，et al. Robust relationship inference in genome-wide association studies[J]. Bioinformatics，2010，26（22）：2867-2873.
[20] SATO T，YAMANISHI Y，KANEHISA M，et al. The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships[J]. Bioinformatics，2005，21（17）：3482-3489.
[21] PRICE A L，PATTERSON N J，PLENGE R M，et al. Principal components analysis corrects for stratification in genome-wide association studies[J]. Nat Genet，2006，38（8）：904-909.
[22] ZHANG F，FLICKINGER M，TALIUN S A G，et al. Ancestry-agnostic estimation of DNA sample contamination from sequence reads[J]. Genome Res，2020，30（2）：185-194.
[23] MANNING S E，KU H C，DLUZEN D F，et al. A nonparametric alternative to the Cochran-Armitage trend test in genetic case-control association studies：the Jonckheere-Terpstra trend test[J/OL]. PLoS One，2023，18（2）：e0280809[2025-05-01]. https://doi.org/10.1371/journal.pone.0280809.
[24] SONG K，ELSTON R C. A powerful method of combining measures of association and Hardy-Weinberg disequilibrium for fine-mapping in case-control studies[J]. Stat Med，2006，25（1）：105-126.
[25] CONG P K，BAI W Y，LI J C，et al. Genomic analyses of 10 376 individuals in the Westlake biobank for Chinese（WBBC）pilot project[J/OL]. Nat Commun，2022，13（1）：2939[2025-05-01]. https://doi.org/10.1038/s41467-022-30526-x.
[26] YU C Q，LAN X M，TAO Y，et al. A high-resolution haplotype-resolved reference panel constructed from the China Kadoorie biobank study[J]. Nucleic Acids Res，2023，51（21）：11770-11782.
[27] LI L，HUANG P D，SUN X H，et al. The ChinaMAP reference panel for the accurate genotype imputation in Chinese populations[J]. Cell Res，2021，31（12）：1308-1310.
[28] CHOI S W，O’REILLY P F. PRSice-2：polygenic risk score software for biobank-scale data[J/OL]. Gigascience，2019，8（7）：giz082[2025-05-01]. https://doi.org/10.1093/gigascience/giz082.
[29] PRIVÉ F，ARBEL J，VILHJÁLMSSON B J. LDpred2：better，faster，stronger[J]. Bioinformatics，2021，36（22/23）：5424-5431.
[30] WILLER C J，LI Y，ABECASIS G R. METAL：fast and efficient meta-analysis of genomewide association scans[J]. Bioinformatics，2010，26（17）：2190-2191.
[31] SANDERSON E，GLYMOUR M M，HOLMES M V，et al. Mendelian randomization[J/OL]. Nat Rev Methods Primers，2022，2：6[2025-05-01]. https://doi.org/10.1038/s43586-021-00092-5.
[32] BOWDEN J，DAVEY SMITH G，BURGESS S. Mendelian randomization with invalid instruments：effect estimation and bias detection through Egger regression[J]. Int J Epidemiol，2015，44（2）：512-525.
[33] GIAMBARTOLOMEI C，VUKCEVIC D，SCHADT E E，et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics[J/OL]. PLoS Genet，2014，10（5）：e1004383[2025-05-01]. https://doi.org/10.1371/journal.pgen.1004383.
[34] WALLACE C. A more accurate method for colocalisation analysis allowing for multiple causal variants[J/OL]. PLoS Genet，2021，17（9）：e1009440[2025-05-01]. https://doi.org/10.1371/journal.pgen.1009440.
[35] WANG G，SARKAR A，CARBONETTO P，et al. A simple new approach to variable selection in regression，with application to genetic fine mapping[J]. J R Stat Soc Series B Stat Methodol，2020，82（5）：1273-1300.
[36] 谭力治，赵毅强. 全基因组关联分析中混合模型的原理、优化与应用[J]. 中国农业科学，2023，56（9）：1617-1632.
TAN L Z，ZHAO Y Q. Principle，optimization and application of mixed models in genome-wide association study[J]. Scientia Agricultura Sinica，2023，56（9）：1617-1632.
[37] CHEN H，WANG C L，CONOMOS M P，et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models[J]. Am J Hum Genet，2016，98（4）：653-666.
[38] LANDER E S. The new genomics：global views of biology[J]. Science，1996，274（5287）：536-539.
[39] PRITCHARD J K，COX N J. The allelic architecture of human disease genes：common disease-common variant… or not？[J]. Hum Mol Genet，2002，11（20）：2417-2423.
[40] CAO X，JIANG M H，GUAN Y L，et al. Trans-ancestry GWAS identifies 59 loci and improves risk prediction and fine-mapping for kidney stone disease[J/OL]. Nat Commun，2025，16（1）：3473[2025-05-01]. https://doi.org/10.1038/s41467-025-58782-7.
[41] YAN J，QIU Y J，RIBEIRO DOS SANTOS A M，et al. Systematic analysis of binding of transcription factors to noncoding variants[J]. Nature，2021，591（7848）：147-151.

全基因组关联分析标准化流程的构建与扩展应用

Development and Extended Applications of Standardized Processes for Genome-Wide Association Studies

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈晨, 刘飞凤, 熊昕丽, 汤跃宇, 李刚. 卒中院前急救与血压管理的挑战与突破：从现状到未来[J]. 中国卒中杂志, 2025, 20(7): 797-801.
[2]	姜华, 张春芳, 陈晨, 刘飞凤, 李刚. 中国卒中院前急救现状及研究进展[J]. 中国卒中杂志, 2025, 20(7): 803-808.
[3]	王荣, 何松, 岗瑞娟, 王琪, 唐宇杰, 刘飞凤, 杨杰, 李刚, 林亚鹏. 急性卒中院前诊断识别研究进展[J]. 中国卒中杂志, 2025, 20(7): 809-818.
[4]	杨志进, 谭荃丹, 毛凤凯, 李朝晖, 陈晨, 李刚, 杨杰, 林亚鹏. 急性缺血性卒中超早期血压管理研究进展[J]. 中国卒中杂志, 2025, 20(7): 829-839.
[5]	李静, 程实, 郭军平, 胡爱香, 于鑫玮, 韩玮, 张越巍, 冀瑞俊. 老年急性卒中患者医院感染预测模型的构建与验证[J]. 中国卒中杂志, 2025, 20(7): 840-850.
[6]	郭志慧, 王小洁, 汪利, 姜斌, 康晨瑶. 颈动脉蹼致反复卒中1例的诊疗体会及文献复习[J]. 中国卒中杂志, 2025, 20(7): 899-903.
[7]	石艳萍, 罗颜, 朱龙海, 时代, 曹勇军, 石际俊. 茎突综合征与缺血性卒中：罕见但必须重视的病因[J]. 中国卒中杂志, 2025, 20(7): 917-922.
[8]	吴喜, 李洪岩, 张新博, 郭松韬, 李智强, 杨军, 郭伟, 郭秀海, 张洪钿, 徐如祥. 5G车载移动卒中单元医疗相关法律问题浅析[J]. 中国卒中杂志, 2025, 20(7): 923-928.
[9]	张杰, 李昊, 程丝. 脑血管病的多组学研究[J]. 中国卒中杂志, 2025, 20(6): 657-662.
[10]	姜明慧, 许喆, 石延枫, 张杰, 李昊, 程丝. 脂质代谢组与卒中的因果关系：系统孟德尔随机化研究[J]. 中国卒中杂志, 2025, 20(6): 675-685.
[11]	张杰, 勾岚, 李兰欣, 许喆, 石延枫, 姜明慧, 李昊, 程丝. 卒中关联基因驱动不同族裔与卒中亚型的药物重定位研究[J]. 中国卒中杂志, 2025, 20(6): 686-698.
[12]	勾岚, 姜明慧, 姜勇, 廖晓凌, 李昊, 张杰, 程丝. 人工智能融合临床与多组学数据在卒中防治及医药研发中的应用与挑战[J]. 中国卒中杂志, 2025, 20(6): 710-717.
[13]	王赟, 米亚儒, 邓荷萍, 张博. 颈动脉脉搏波传导速度及血清Lp-PLA2、NRG-1与伴高血压的急性缺血性卒中的关系[J]. 中国卒中杂志, 2025, 20(6): 718-727.
[14]	郭莉, 宋欣同, 张冠一, 张倩, 鞠奕, 赵性泉. 颈动脉蹼的超声特征与缺血性卒中的关系[J]. 中国卒中杂志, 2025, 20(6): 728-733.
[15]	王小军, 杨珂珂, 吴健恒, 林铎, 李晨威, 彭慧渊. 中国青年缺血性卒中的疾病负担变化趋势分析及预测[J]. 中国卒中杂志, 2025, 20(6): 734-745.