Homozygosity mapping was first proposed in 1987 (1) as an approach for the identification of genes underlying recessive diseases. It is based in the premise that there will be evidence in the affected subjects of an enrichment of homozygosity in the genomic region harboring the affected genes. Since then, hundreds of papers applying this mapping approach have been published and it has led to the successful identification of a large number of novel genes mutated in recessive entities.
For many years, it was based in the analysis of hundreds of microsatellite markers. With advances in high throughput genotyping, it is now mainly based on data generated by commercially available chips analyzing hundreds of thousands of SNPs (2). Processing and analysis of these large datasets represent a new computational challenge. There are some freely available tools that facilitate the analysis and identification of homozygosity regions in data generated by SNP chips.
HomozygosityMapper (3) is a web server that can analyze data from Affymetrix or Illumina genotyping platforms to identify homozygous regions. Genes in those candidate regions can be automatically identified through its interaction with GeneDistiller (4).
IBDFinder (5) is a program that is useful for the identification of homozygosity regions in genotype data generated by Affymetrix platforms. Data from Affymetrix 6.0 need to be annotated with SNPAnnotator.
PLINK (6), a commonly used program for the analysis of genome-wide association studies, has an option for the identification of runs of homozygosity. Its output is composed of the IDs of the individuals and the start and end of the candidate regions.
These tools are able to process these large datasets from SNP chips in question of minutes. There are some differences between these programs in relation to the specific implementation of the algorithms for the identification of candidate regions, in addition to their input and output profiles (something that makes them potentially complementary).
IGG3 (7) -a Java program- and Linkdatagen -a Perl script- (8) are two useful tools that are able to convert genotype data from SNP chips to formats used by PLINK (.ped and .map files). These modified file formats are also useful for further examination in programs designed for linkage analysis. TableButler (9) is a Windows program that can be used to open and edit these very large files with genotype data, complementing the tasks that can be carried out in the R environment (10).
Given the relatively low average heterozygosity (around 0.26 in European populations) and high density of SNP chips (2), comparisons with results obtained in unaffected individuals might be useful to rule out false positives regions. The latest versions of SNP chips also include thousands of probes for CNV identification; these additional data can be useful to rule out that some large deletions could be explaining some of the candidate homozygous regions.
Recent data show that there are several regions of extended homozygosity that are present in healthy individuals from outbred populations (11) and that can be associated with the risk for complex diseases (12).
Further implementations of programs for homozygosity mapping will benefit from the incorporation of additional information (such as taking into account local LD patterns or location of common extended homozygous regions previously identified in systematic studies in outbred populations).
(1) Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science. 1987 Jun 19;236(4808):1567-70.
(2) Li M, Li C, Guan W. Evaluation of coverage variation of SNP chips for genome-wide association studies. Eur J Hum Genet. 2008 May;16(5):635-43.
(3) Seelow D, Schuelke M, Hildebrandt F, Nürnberg P. HomozygosityMapper--an interactive approach to homozygosity mapping. Nucleic Acids Res. 2009 Jul 1;37(Web Server issue):W593-9.
(4) Seelow D, Schwarz JM, Schuelke M. GeneDistiller--distilling candidate genes from linkage intervals. PLoS One. 2008;3(12):e3874.
(5) Carr IM, Sheridan E, Hayward BE, Markham AF, Bonthron DT. IBDfinder and SNPsetter: tools for pedigree-independent identification of autozygous regions in individuals with recessive inherited disease. Hum Mutat. 2009 Jun;30(6):960-7.
(6) Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007 Sep;81(3):559-75.
(7) Li MX, Jiang L, Kao PY, Sham PC, Song YQ. IGG3: a tool to rapidly integrate large genotype datasets for whole-genome imputation and individual-level meta-analysis. Bioinformatics. 2009 Jun 1;25(11):1449-50.
(8) Bahlo M, Bromhead CJ. Generating linkage mapping files from Affymetrix SNP chip data. Bioinformatics. 2009 Aug 1;25(15):1961-2.
(9) Schwager C, Wirkner U, Abdollahi A, Huber PE. TableButler - a Windows based tool for processing large data tables generated with high-throughput methods. BMC Bioinformatics. 2009 Jul 29;10:235.
(10) Eglen SJ. A quick guide to teaching R programming to computational biology students. PLoS Comput Biol. 2009 Aug;5(8):e1000482.
(11) McQuillan R, Leutenegger AL, Abdel-Rahman R, ..., Wright AF, Campbell H, Wilson JF. Runs of homozygosity in European populations. Am J Hum Genet. 2008 Sep;83(3):359-72.
(12) Lencz T, Lambert C, DeRosse P, Burdick KE, Morgan TV, Kane JM, Kucherlapati R, Malhotra AK. Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proc Natl Acad Sci U S A. 2007 Dec 11;104(50):19942-7.
R environment for statistical computing
R reference card
Diego Forero, MD
Message posted by: Diego Forero