Knowledge Tree

thesis

Méthodes Statistiques pour l’analyse de données génétiques d’association à grande échelle

Defense date:

Jan. 1, 2007

Edit

Institution:

Evry-Val d'Essonne

Disciplines:

Applied mathematics

Authors:

Mickaël Guedj

Directors:

Grégory Nuel

Abstract EN:

The increasing availability of dense Single Nucleotide Polymorphisms (SNPs) maps due to rapid improvements in Molecular Biology and genotyping technologies have recently led geneticists towards genome-wide association studies with hopes of encouraging results concerning our understanding of the genetic basis of complex diseases. The analysis of such high-throughput data implies today new statistical and computational problematic to face, which constitute the main topic of this thesis. After a brief description of the main questions raised by genome-wide association studies, we deal with single-marker approaches by a power study of the main association tests. We consider then the use of multi-markers approaches by focusing on the method we developed which relies on the Local Score. Finally, this thesis also deals with the multiple-testing problem: our Local Score-based approach circumvents this problem by reducing the number of tests; in parallel, we present an estimation of the Local False Discovery Rate by a simple Gaussian mixed model.

Abstract FR:

Les avancées en Biologie Moléculaire ont accéléré le développement de techniques de génotypage haut-débit et ainsi permis le lancement des premières études génétiques d'association à grande échelle. La dimension et la complexité des données issues de ce nouveau type d'étude posent aujourd'hui de nouvelles perspectives statistiques et informatiques nécessaires à leur analyse, constituant le principal axe de recherche de cette thèse. Après une description introductive des principales problématiques liées aux études d'association à grande échelle, nous abordons plus particulièrement les approches simple-marqueur avec une étude de puissance des principaux test d’association, les approches multi-marqueurs avec le développement d’une méthode fondée sur la statistique du Score Local, et enfin le problème du test-multiple avec l'estimation du Local False Discovery Rate à travers un simple modèle de mélange gaussien.