中国卒中杂志 ›› 2025, Vol. 20 ›› Issue (6): 664-674.DOI: 10.3969/j.issn.1673-5765.2025.06.002

• 专题论坛 • 上一篇    下一篇

全基因组关联分析标准化流程的构建与扩展应用

许喆1,2,石延枫1,2,张杰1,2,姜明慧1,2,刘阳1,2,李昊1,2,3,廖晓凌4,程丝1,2,3   

  1. 1 北京 100070 首都医科大学附属北京天坛医院,国家神经系统疾病临床医学研究中心,卒中多组学创新中心
    2 脑血管病药械研发北京市重点实验室
    3 首都医科大学卒中精准临床诊疗与研究中心
    4 首都医科大学附属北京天坛医院神经病学中心
  • 收稿日期:2025-05-04 出版日期:2025-06-20 发布日期:2025-06-20
  • 通讯作者: 程丝 sicheng@ncrcnd.org.cn 廖晓凌 liao828@sina.com
  • 基金资助:
    国家重点研发计划(2022YFE0209600)
    国家自然科学基金(82471304)
    中国科协青年人才托举工程(2023QNRC001)

Development and Extended Applications of Standardized Processes for Genome-Wide Association Studies

XU Zhe1,2, SHI Yanfeng1,2, ZHANG Jie1,2, JIANG Minghui1,2, LIU Yang1,2, LI Hao1,2,3, LIAO Xiaoling4, CHENG Si1,2,3   

  1. 1 Center of Excellence for Omics Research, China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
    2 Beijing Key Laboratory of Drug and Device Research and Development for Cerebrovascular Diseases, Beijing 100070, China
    3 Clinical Center for Precision Medicine in Stroke, Capital Medical University, Beijing 100070, China
    4 Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
  • Received:2025-05-04 Online:2025-06-20 Published:2025-06-20
  • Contact: CHENG Si, E-mail: sicheng@ncrcnd.org.cn LIAO Xiaoling, E-mail: liao828@sina.com

摘要: 目的 构建GWAS标准化流程及多组学分析体系框架,为基于多组学队列的脑血管病药物逆向研发提供高效分析方法。 
方法 基于国际GWAS质量控制标准与多组学整合与分析策略,构建模块化的分析体系。GWAS前数据质量控制模块:对样本和变异检出率、群体遗传结构与分层、亲缘关系等进行严格质量控制。在合格样本组成的群体中,保留次要等位基因频率>0.5%的遗传变异用于GWAS。关联分析模块:利用PLINK、SAIGE和Regenie等软件,使用广义线性模型与广义线性混合模型进行GWAS操作。通过基因组膨胀系数和分位数-分位数图评估GWAS质量。使用中国国家卒中登记Ⅲ的全基因组测序和临床数据,对该模块进行测试。多组学分析模块:整合多基因风险评分、跨队列meta分析、孟德尔随机化及共定位分析等流程,为利用GWAS结果进行分子机制解析和靶点筛选提供支持。
结果 本研究搭建的GWAS前数据质量控制模块主要从遗传数据质量和群体遗传两方面对数据进行GWAS前质量控制和评估。经过质量控制,有9632例和7265例样本分别被纳入基线TG水平、卒中后3个月死亡两个表型的GWAS。GWAS结果显示,不同软件得到的曼哈顿图趋势较为接近,但在病例-对照样本存在较大偏倚时,SAIGE软件相比于PLINK和Regenie软件校正适度、统计检验方法相对稳健。在多组学分析模块中,构建了包含多基因风险评分、meta分析、孟德尔随机化和共定位分析等多个标准化分析流程,用以开展对GWAS结果的深入挖掘。
结论 本研究建立的GWAS标准化流程具有模块化、扩展性强等特点,能够满足复杂表型分析和多组学数据整合与分析的需求,为基于遗传关联的药物逆向研发提供了方法学基础。

文章导读: 本研究构建的GWAS标准化、可扩展流程将加速脑血管病药物逆向研发,提升多组学大数据在脑血管病药物逆向研发中的转化效能。

关键词: 全基因组关联分析; 多组学; 卒中; 药物研发; 生物信息学

Abstract: Objective  To develop standardized workflow for GWAS and multi-omics analysis frameworks, providing an efficient analytical pipeline for pharmaceutical reverse engineering of cerebrovascular diseases using multi-omics cohorts.
Methods  A modular analysis system was constructed based on international GWAS quality control standards and multi-omics integration strategies. Pre-GWAS data quality control module: this module performed stringent quality control on sample and variant call rates, population genetic structure and stratification, and kinship. In the population composed of qualified samples, genetic variants with a minor allele frequency>0.5% were retained for GWAS. Association analysis module: using software such as PLINK, SAIGE, and Regenie, GWAS was performed utilizing generalized linear models and generalized linear mixed models. The quality of GWAS was evaluated by the genome inflation coefficient and quantile-quantile plots. The module was tested using whole-genome sequencing and clinical data from the China national stroke registry Ⅲ. Multi-omics analysis module: this module integrated polygenic risk score, cross-cohort meta-analysis, Mendelian randomization, and colocalization analysis procedures, providing support for molecular mechanism interpretation and target screening using GWAS results.
Results  The pre-GWAS data quality control module established in this study conducts pre-GWAS quality control and assessment from the aspects of genetic data quality and population genetics. After quality control, 9632 and 7265 samples were included in the GWAS of baseline TG levels and 3-month post-stroke mortality phenotypes, respectively. The GWAS results showed that the trends of Manhattan plots obtained from different software were similar. However, compared to PLINK and Regenie, SAIGE software offered more appropriate correction and relatively robust statistical testing, especially when case-control samples were biased. In the multi-omics analysis module, standardized analysis processes including polygenic risk score, meta-analysis, Mendelian randomization, and colocalization analysis were developed to enable in-depth exploration of GWAS results.
Conclusions  The GWAS standardization processes established in this study are characterized by modularity and high scalability, enabling comprehensive analysis of complex phenotypes and multi-omics data. These processes provide a methodological foundation for exploration of pharmaceutical reverse engineering based on genetic association.

Key words: Genome-wide association study; Multi-omics; Stroke; Drug development; Bioinformatics

中图分类号: