A9: Genetic and molecular similarity of autoimmune disease.

Background. Autoimmune diseases are known to have a strong genetic background. Genome-wide association studies by our groups (Exp Dermatol. 24:510-5; Nat Genet. 45:670-5) and others have identified a large number of loci, within and outside the MHC region, associated with autoimmune diseases. Interestingly, many of these loci overlap between different disease entities. At the same time, autoimmune diseases show a large heterogeneity concerning disease presentation and outcome, and it is hoped that genetics might allow to address common and specific etiologies by classifying patients into more homogenous subgroups based on genetic underpinnings (Nat Rev Rheumatol. 13:421), as well as identifying common mechanisms that drive the transition from health to autoimmune disease. Indeed, recent studies suggest disease to be more complex than depending on a subset of core genes. Instead it is hypothesized that both common and private, yet rare mutations contribute to the individual disease (Cell 173:1573-80). Previous approaches investigated colocalization or pleiotropic effects of associated loci, thus viewing the diagnosis as a fixed starting point in supervised analyses. In contrast, our approach is to disregard the given diagnoses, but to cluster patients based on their genetic similarity and molecular consequences of the individuals` mutatomes, thus utilizing unsupervised techniques (Fig. A9-1).

Objectives. (i) Identify suitable unsupervised learning strategies using different variant effect predictors for use in large-scale genetic data. (ii) Link mutation effects with molecular pathways using network propagation methods. (iii) Investigate whether genetically homogeneous clusters of autoimmune disease patients can be identified and validated.

Work program. (i) Available unsupervised learning methods for genotype and Whole Exome/Genome data are reviewed regarding required data and underlying assumptions. Various approaches to the preselection and preprocessing of genotypes and handling of population stratification are compared. On the individual patient level suitable mapping approaches from mutations to pathways are reviewed and aggregation strategies for clustering and patient stratification evaluated (ii) Suitable data sets of genotypic and genome data of autoimmune patients are selected, ideally together with phenotypic data from publicly available sources. The genetic and molecular pathway similarity of these patients is used to cluster them into genetically more homogenous subgroups and resulting clusters are validated in independent data. (iii) Inclusion of healthy controls with presence of autoantibodies (with A1 and A4) will allow to distinguish the genetic architecture among healthy, pre-diseased and clinically manifest individuals.