Basic Anatomy of the Mammalian Genome
Two loci, A and B, are said to be in linkage (or gametic) disequilibrium if their respective alleles do not associate independently in the studied population.
Measures of linkage disequilibrium (some examples out of many measures):D:
xAB = pA qB + D
xaB = pa qB - D
xAb = pA qb - D
xab = pa qb + D
The range of D is a function of the allelic frequencies, making it obvious that a measure that has the same range for all allelic frequencies is desirable.
D' (normalized measure of Lewontin):
For loci having more than two alleles, a total linkage disequilibrium measure can be obtained from:
Lambda (Terwilliger, 1995):
Terwilliger (1995) has introduced a measure of linkage disequilibrium (l ) and used it to develop a maximum likelihood association test to be used in "case-control" studies.
Assume that a disease causing mutation D*, appeared in the population by mutation or migration on a chromosome harboring allele "1" at a nearby marker locus n generations ago. The frequency of the "1-D" haplotype at the present generation (where D includes a series of disease causing mutations in this very same gene, of which a fraction a corresponds to D*) corresponds to:
From this, one can generate the following expected gametic frequencies:
g D1 = pD (p1 + l (1- p1))
g +1 = p1 - g D1
g D0 = pD - g D1
g +0 = p+ - g +1
Where D and + are the disease causing and wild type alleles at the disease causing locus, 1 and 0 are the marker allele 1 and all marker alleles other than 1 respectively.
In a case-control study one would rather use the derived conditional probability model, i.e. the probability of the marker genotypes (1 or 0) given the genotype at the disease locus (D or +).
g D1 / pD
g +1 / p+
g D0 / pD
g +0 / p+
Statistical significance of the observed linkage disequilibrium:Monte-Carlo approximation of Fisher's exact test:
The statistical significance (a ) of the observed allelic association under the null hypothesis of random allelic assortment, was estimated by Monte-Carlo approximation of Fisher’s exact test as described by Weir (1996). Briefly, assume a sample of n gametes genotyped for marker loci A and B having respectively u and v alleles. The sample is fully characterized by allele counts ni . (locus A) and n. j (locus B) and haplotype counts nij , as illustrated in the following table for a simple example where locus A and B are characterized by respectively three and two alleles.
The statistical significance a of the observed linkage disequilibrium could easily be tested by applying a chi-squared test of independence on the gametic tables. However, and especially when dealing with poly-allelic loci, several cells in the gametic table may have very small numbers of observations. It is then recommended to apply Fisher's exact test, knowing that the probability of a given gametic table is given by:
The value of a for a given marker pair corresponds to the proportion of all possible tables with same allele counts (ni. and n.j) that have equal or lower P. In practice, it is usually impossible to enumerate all possible tables. a can then be estimated by simulating such tables under the hypothesis of random assortment and counting the proportion of tables that have equal or lower P than the real sample.Terwilliger (1995):
Assume that both a sample of chromosomes containing allele D ("case" cohort) and a sample containing the normal (+) allele ("control" cohort) have been ascertained and that both cohorts have been genotyped for the marker of interest.
Assuming that marker allele "1" is associated with "D", one can compute the likelihood of the data as:
where g D1 , g D0 , … are as previously described and ND1, ND0 are the corresponding gamete counts.
Of course, one doesn't know in advance which of the marker alleles is potentially associated with D. This is addressed by computing the overall likelihood:
where Li corresponds to the likelihood of the data assuming that marker allele "i" is associated with D, and pi corresponds to the population frequency of that allele.
This likelihood can then be maximized over the one free parameter, l , to provide a maximum-likelihood estimate of this association parameter. Under the null hypothesis of no linkage disequilibrium, l =0, the statistic:
(since only values of l >=0 are admissible, the test is one sided).
Terwilliger (1995) proposes a "brave" multipoint extension (f.i. M markers) to this single point analysis, by computing:
that is by multiplying across markers as if this probabilities were independent (which is "brave"). This can be accomplished in a sensible way by replacing l by a (1-q )^g, maximizing the unknown a and g (number of generations) parameters (see above), while fixing the q parameter for each marker according to their respective known map positions.
An example of the application of this test is shown in the following figure (1 ). The multipoint approach has proven valuable in several circumstances, but suffers from the fact that the corresponding likelihood ratio statistic does not follow an "orthodox" distribution (use permutations).
The level of linkage disequilibrium observed is the net result of opposing forces:Forces creating linkage disequilibrium:
- Random drift
- Selection (LD involving the loci underlying the genetic variance for the selected trait = "Bulmer effect")
Forces causing linkage disequilibrium decay = recombination:
The level of linkage disequilibrium, D, between a pair of alleles at two loci decays by a factor of q each generation (1). This can easily be demonstrated as followed: assume that the frequency of an A1B1 haplotype equals (pq + D) at generation (i-1). The A1B1 haplotypes of the next generation (i+1) result either from A1B1/AxBx meioses without recombination or from A1Bx/AxB1 meioses with recombination. Therefore, the frequency of A1B1 haplotypes at generation (i+1) equals: [(pq + D)(1-q )+pqq] = pq + D(1-q ). Therefore:
Useful levels of linkage disequilibrium extend over distance which may vary dramatically between populations / species:
In most human populations, useful levels of LD extend over regions of the order of 10,000 to 100,000 bp.
In Holstein-Friesian dairy cattle, useful levels of LD extend over several tens of centimorgan and gametic associations between non syntenic loci are common (1 ).
The Transmission Disequilibrium Test (TDT) (Spielman et al., 1993)
Finding an association between a marker allele and a disease allele may suggest but is no prove that both are "genetically linked". Association in the absence of linkage can be found for a whole series of reasons including population stratification and - for instance in dairy cattle populations - because gametic associations between non syntenic loci are common.
The TDT has been designed to distinguish association without linkage from association with (due to ?) linkage. Assume that one wants to test the effect of marker allele "1" on the probability to develop a given disease. Rather than to simply compare the frequency of marker allele "1" in a case cohort of affected individuals with that in a control cohort, the TDT consists in sampling affected individuals based on the fact that they have one or both parents heterozygous "1x" for the marker locus. The essence of the TDT is then to look whether these heterozygous parents are transmitting the "1" allele to their affected offspring more often than expected by chance alone.
QTL Mapping in Outbred Half-Sib Pedigrees