Difference between revisions of "Haplotype"

From Opengenome.net
Line 1: Line 1:
<font size="2"><strong>Haplotype</strong><br /><br />A haplotype is a combination of genotypes on the same chromosome that tend to be inherited as a group.&nbsp; In other words, it is the genotype for a group of genes.<br /><br />[http://www.hapmap.org/ Hapmap Project]<br /><br /></font>
+
<font size="2"><br />
 +
<p>A <strong>haplotype</strong> is the <a title="Genetics" href="http://en.wikipedia.org/wiki/Genetics">genetic</a> constitution of an individual <a title="Chromosome" href="http://en.wikipedia.org/wiki/Chromosome">chromosome</a>. <em>Haplotype</em> may refer to only one <a title="Locus (genetics)" href="http://en.wikipedia.org/wiki/Locus_%28genetics%29">locus</a> or to an entire <a title="Genome" href="http://en.wikipedia.org/wiki/Genome">genome</a>. In the case of <a title="Diploid" href="http://en.wikipedia.org/wiki/Diploid">diploid</a> organisms such as humans, a genome-wide haplotype comprises one member of the pair of <a title="Allele" href="http://en.wikipedia.org/wiki/Allele">alleles</a> for each locus (that is, half of a <a title="Diploid" href="http://en.wikipedia.org/wiki/Diploid">diploid</a> genome). An organism's haplotype is studied using a <a title="Genealogical DNA test" href="http://en.wikipedia.org/wiki/Genealogical_DNA_test">genealogical DNA test</a>. The term <em>haplotype</em> is a <a title="Contraction (linguistics)" href="http://en.wikipedia.org/wiki/Contraction_%28linguistics%29">contraction</a> of <em>&quot;<a title="Ploidy" href="http://en.wikipedia.org/wiki/Ploidy">haploid</a> <a title="Genotype" href="http://en.wikipedia.org/wiki/Genotype">genotype</a>&quot;</em>.</p>
 +
<p>In a second meaning, haplotype is a set of <a title="Single nucleotide polymorphism" href="http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism">single nucleotide polymorphisms</a> (SNPs) on a single <a title="Chromatid" href="http://en.wikipedia.org/wiki/Chromatid">chromatid</a> that are <a title="Association (statistics)" href="http://en.wikipedia.org/wiki/Association_%28statistics%29">statistically associated</a>. It is thought that these associations, and the identification of a few alleles of a haplotype block, can unambiguously identify all other polymorphic sites in its region. Such information is very valuable for investigating the genetics behind common <a title="Diseases" href="http://en.wikipedia.org/wiki/Diseases">diseases</a> and is collected by the <a title="International HapMap Project" href="http://en.wikipedia.org/wiki/International_HapMap_Project"><font color="#800080">International HapMap Project</font></a>.</p>
 +
<p>
 +
<script type="text/javascript">
 +
//<![CDATA[
 +
if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); }
 +
//]]>
 +
</script>
 +
<a id="Relation_to_genotypes" name="Relation_to_genotypes"></a></p>
 +
<h2><span class="mw-headline">Relation to genotypes</span></h2>
 +
<p>An organism's genotype may not uniquely define its haplotype. For example, consider two <a title="Locus (genetics)" href="http://en.wikipedia.org/wiki/Locus_%28genetics%29">loci</a> on the same chromosome, each locus with two possible alleles: the first locus being either <em>A</em> or <em>a</em>, the second locus being <em>B</em> or <em>b</em>. If the organism's genotype is <em>AaBb</em>, there are two possible sets of haplotypes, corresponding to which pairs occur on the same <a title="Chromosome" href="http://en.wikipedia.org/wiki/Chromosome">chromosome</a>:</p>
 +
<table>
 +
    <tbody>
 +
        <tr>
 +
            <th>&nbsp;</th>
 +
            <th>haplotype at<br />
 +
            allele 1</th>
 +
            <th>haplotype at<br />
 +
            allele 2</th>
 +
        </tr>
 +
        <tr>
 +
            <td>haplotype set 1</td>
 +
            <td align="center"><em>AB</em></td>
 +
            <td align="center"><em>ab</em></td>
 +
        </tr>
 +
        <tr>
 +
            <td>haplotype set 2</td>
 +
            <td align="center"><em>Ab</em></td>
 +
            <td align="center"><em>aB</em></td>
 +
        </tr>
 +
    </tbody>
 +
</table>
 +
<p>In this case, more information is required to determine which particular set of haplotypes occur in the organism (i.e. which alleles appear on the same chromosome).</p>
 +
<p>Given the genotypes for a number of individuals, the haplotypes can be inferred by haplotype resolution or haplotype phasing techniques. These methods work by applying the observation that certain haplotypes are common in certain genomic regions. Therefore given a set of possible haplotype resolutions, these methods choose those which use fewer different haplotypes overall. The specifics of these methods vary - some are based on combinatorial approaches (e.g., <a title="Parsimony" href="http://en.wikipedia.org/wiki/Parsimony">parsimony</a>), while others use likelihood functions based on different models and assumptions such as the <a title="Hardy-Weinberg principle" href="http://en.wikipedia.org/wiki/Hardy-Weinberg_principle"><font color="#800080">Hardy-Weinberg principle</font></a>, the <a title="Coalescent theory" href="http://en.wikipedia.org/wiki/Coalescent_theory">coalescent theory</a> model, or perfect phylogeny. These models are combined with optimization algorithms such as <a title="Expectation-maximization algorithm" href="http://en.wikipedia.org/wiki/Expectation-maximization_algorithm">expectation-maximization algorithm</a> (EM) or <a title="Markov chain Monte Carlo" href="http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">Markov chain Monte Carlo</a> (MCMC).</p>
 +
<p><a id="Y-DNA_haplotypes_from_genealogical_DNA_tests" name="Y-DNA_haplotypes_from_genealogical_DNA_tests"></a></p>
 +
<h2><span class="mw-headline">Y-DNA haplotypes from genealogical DNA tests</span></h2>
 +
<dl><dd>
 +
<div class="noprint"><em>Main article: <a title="Genealogical DNA test" href="http://en.wikipedia.org/wiki/Genealogical_DNA_test">Genealogical DNA test</a></em></div>
 +
</dd></dl>
 +
<p>Unlike other chromosomes, Y chromosomes do not come in pairs. Every human male has one copy only of that chromosome. This means that there is no lottery of which copy to inherit; and also (for much of the chromosome) no shuffling between copies by <a title="Recombination" href="http://en.wikipedia.org/wiki/Recombination">recombination</a>; so unlike <a title="Autosomal DNA" href="http://en.wikipedia.org/wiki/Autosomal_DNA">autosomal</a> haplotypes, there is therefore effectively no randomisation of the Y-chromosome haplotype between generations, and a human male should largely share the same Y chromosome as his father, give or take a few mutations.</p>
 +
<p>In particular the Y-DNA that is the numbered results of a <a title="Genealogical DNA test" href="http://en.wikipedia.org/wiki/Genealogical_DNA_test">Y-DNA genealogical DNA test</a> should match, barring mutations. Within genealogical and popular discussion this is sometimes referred to as the &quot;DNA signature&quot; of a particular male human, or of his paternal bloodline.</p>
 +
<p><a id="UEP_results_.28SNP_results.29" name="UEP_results_.28SNP_results.29"></a></p>
 +
<h3><span class="mw-headline">UEP results (SNP results)</span></h3>
 +
<p>The results which make up the full Y-DNA haplotype from the Y chromosome DNA test can be divided into two parts: the results for <a title="Unique event polymorphism" href="http://en.wikipedia.org/wiki/Unique_event_polymorphism">unique event polymorphisms</a> (UEPs), sometimes loosely called the SNP results as most UEPs are <a title="Single nucleotide polymorphisms" href="http://en.wikipedia.org/wiki/Single_nucleotide_polymorphisms">single nucleotide polymorphisms</a>; and the results for <a title="Microsatellite" href="http://en.wikipedia.org/wiki/Microsatellite">microsatellite</a> <a title="Short tandem repeat" href="http://en.wikipedia.org/wiki/Short_tandem_repeat">short tandem repeat</a> sequences (<a title="Y-STR" href="http://en.wikipedia.org/wiki/Y-STR">Y-STRs</a>), often designated by <a title="DYS (DNA)" href="http://en.wikipedia.org/wiki/DYS_%28DNA%29">DYS numbers</a>.</p>
 +
<p>The UEP results reflect the inheritance of events it is believed can be assumed to have happened only once in all human history. These can be used to directly identify the individual's <a title="Human Y-chromosome DNA haplogroups" href="http://en.wikipedia.org/wiki/Human_Y-chromosome_DNA_haplogroups">Y-DNA haplogroup</a>, his place on the broad family tree of the whole of humanity. Different Y-DNA haplogroups identify genetic populations which are often intricately geographically orientated, reflecting the migrations of current individuals' direct <a title="Patrilineal" href="http://en.wikipedia.org/wiki/Patrilineal">patrilineal</a> ancestors tens of thousands of years ago.</p>
 +
<p><a id="Y-STR_haplotypes" name="Y-STR_haplotypes"></a></p>
 +
<h3><span class="mw-headline">Y-STR haplotypes</span></h3>
 +
<p>The other possible part of the genetic results is the <strong>Y-STR haplotype</strong>, the set of results from the Y-STR markers tested.</p>
 +
<p>Unlike the UEPs, the Y-STRs mutate much more easily, which gives them much more resolution to distinguish recent genealogy. But it also means that rather than the population of descendents of a genetic event all sharing the <em>same</em> result, the Y-STR haplotypes are likely to have spread apart, to form a <em>cluster</em> of more or less similar results. Typically, this cluster will have a definite most probable centre, the <strong>modal haplotype</strong> (presumably close to the haplotype of the original founding event); and also a <strong>haplotype diversity</strong> - the degree to which it has become spread out. The longer ago the defining event occurred, and the more that subsequent population growth occurred early, the more will be the haplotype diversity for a particular number of descendents. On the other hand, if the haplotype diversity is smaller for a particular number of descendents, this may indicate a more recent common ancestor, or that a population expansion has occurred more recently.</p>
 +
<p>Importantly, unlike for UEPs, there is no guarantee that two individuals with a similar Y-STR haplotype will necessarily share a similar ancestry. There is no uniqueness about Y-STR events. Instead, the clusters of Y-STR haplotype results inheriting from different events and different histories all tend to overlap.</p>
 +
<p>Thus, although sometimes a Y-STR haplotype may be directly indicative of a particular Y-DNA haplogoup, it is in most cases a long time since the haplogoups' defining events, so typically the cluster of Y-STR haplotype results associated with descendents of that event has become rather broad, and will tend to significantly overlap the (similarly broad) clusters of Y-STR haplotypes associated with other haplogroups, making it impossible to predict with absolute certainty which Y-DNA haplogroup a Y-STR haplotype would point to. All that can be done from the Y-STRs, if the UEPs are not actually tested, is to predict probabilities for haplogroup ancestry (as this <a class="external text" title="https://home.comcast.net/~whitathey/hapest5/" href="https://home.comcast.net/~whitathey/hapest5/" rel="nofollow">online program</a> does), but not certainties.</p>
 +
<p>Similarly for surnames. A cluster of similar Y-STR haplotypes may indicate a shared common ancestor, with an identifiable modal haplotype; but only if the cluster is sufficently distinct from what may have arisen by chance from different individuals historically having adopted the same name independently. This may require the typing of quite an extensive haplotype to establish, which has fuelled DNA testing companies to offer ever larger sets of markers - 24 then 37 then 63 and perhaps soon even more.</p>
 +
<p>Plausibly establishing relatedness between different surnames data-mined from a database is significantly harder, because now one must establish not that a <em>randomly selected</em> member of the population is unlikely to have such a close match by accident; but rather that the <em>very nearest</em> member of the population in question, chosen purposely from the population for that very reason, would even under those circumstances be unlikely to match by accident. This is for the foreseeable future likely to be impossible except in special cases where there is further information to drastically limit the size of that population of candidates under consideration.</p>
 +
<p><a id="See_also" name="See_also"></a></p>
 +
<h2><span class="mw-headline">See also</span></h2>
 +
<ul>
 +
    <li><a title="International HapMap Project" href="http://en.wikipedia.org/wiki/International_HapMap_Project"><font color="#800080">International HapMap Project</font></a> </li>
 +
    <li><a title="Genealogical DNA test" href="http://en.wikipedia.org/wiki/Genealogical_DNA_test">genealogical DNA test</a> </li>
 +
    <li><a title="Haplogroup" href="http://en.wikipedia.org/wiki/Haplogroup">Haplogroup</a> </li>
 +
</ul>
 +
<p><a id="External_links" name="External_links"></a></p>
 +
<h2><span class="mw-headline">External links</span></h2>
 +
<ul>
 +
    <li><a class="external text" title="http://genome.wellcome.ac.uk/" href="http://genome.wellcome.ac.uk/" rel="nofollow">The Wellcome Trust</a> &mdash; Haplotype mapping </li>
 +
    <li><a class="external text" title="http://ihap.bii.a-star.edu.sg" href="http://ihap.bii.a-star.edu.sg/" rel="nofollow">The integrated Haplotype Analysis Pipeline (iHAP)</a> </li>
 +
</ul>
 +
<!--
 +
Pre-expand include size: 1114 bytes
 +
Post-expand include size: 208 bytes
 +
Template argument size: 257 bytes
 +
Maximum: 2048000 bytes
 +
--><!-- Saved in parser cache with key enwiki:pcache:idhash:607285-0!1!0!default!!en!2 and timestamp 20070321175330 -->
 +
<div class="printfooter">Retrieved from &quot;<a href="http://en.wikipedia.org/wiki/Haplotype"><font color="#800080">http://en.wikipedia.org/wiki/Haplotype</font></a>&quot;</div>
 +
<br />
 +
<br />
 +
</font>

Revision as of 07:53, 25 March 2007


A haplotype is the genetic constitution of an individual chromosome. Haplotype may refer to only one locus or to an entire genome. In the case of diploid organisms such as humans, a genome-wide haplotype comprises one member of the pair of alleles for each locus (that is, half of a diploid genome). An organism's haplotype is studied using a genealogical DNA test. The term haplotype is a contraction of "haploid genotype".

In a second meaning, haplotype is a set of single nucleotide polymorphisms (SNPs) on a single chromatid that are statistically associated. It is thought that these associations, and the identification of a few alleles of a haplotype block, can unambiguously identify all other polymorphic sites in its region. Such information is very valuable for investigating the genetics behind common diseases and is collected by the International HapMap Project.

<script type="text/javascript"> //<![CDATA[ if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); } //]]> </script>

Relation to genotypes

An organism's genotype may not uniquely define its haplotype. For example, consider two loci on the same chromosome, each locus with two possible alleles: the first locus being either A or a, the second locus being B or b. If the organism's genotype is AaBb, there are two possible sets of haplotypes, corresponding to which pairs occur on the same chromosome:

  haplotype at
allele 1
haplotype at
allele 2
haplotype set 1 AB ab
haplotype set 2 Ab aB

In this case, more information is required to determine which particular set of haplotypes occur in the organism (i.e. which alleles appear on the same chromosome).

Given the genotypes for a number of individuals, the haplotypes can be inferred by haplotype resolution or haplotype phasing techniques. These methods work by applying the observation that certain haplotypes are common in certain genomic regions. Therefore given a set of possible haplotype resolutions, these methods choose those which use fewer different haplotypes overall. The specifics of these methods vary - some are based on combinatorial approaches (e.g., parsimony), while others use likelihood functions based on different models and assumptions such as the Hardy-Weinberg principle, the coalescent theory model, or perfect phylogeny. These models are combined with optimization algorithms such as expectation-maximization algorithm (EM) or Markov chain Monte Carlo (MCMC).

Y-DNA haplotypes from genealogical DNA tests

Main article: Genealogical DNA test

Unlike other chromosomes, Y chromosomes do not come in pairs. Every human male has one copy only of that chromosome. This means that there is no lottery of which copy to inherit; and also (for much of the chromosome) no shuffling between copies by recombination; so unlike autosomal haplotypes, there is therefore effectively no randomisation of the Y-chromosome haplotype between generations, and a human male should largely share the same Y chromosome as his father, give or take a few mutations.

In particular the Y-DNA that is the numbered results of a Y-DNA genealogical DNA test should match, barring mutations. Within genealogical and popular discussion this is sometimes referred to as the "DNA signature" of a particular male human, or of his paternal bloodline.

UEP results (SNP results)

The results which make up the full Y-DNA haplotype from the Y chromosome DNA test can be divided into two parts: the results for unique event polymorphisms (UEPs), sometimes loosely called the SNP results as most UEPs are single nucleotide polymorphisms; and the results for microsatellite short tandem repeat sequences (Y-STRs), often designated by DYS numbers.

The UEP results reflect the inheritance of events it is believed can be assumed to have happened only once in all human history. These can be used to directly identify the individual's Y-DNA haplogroup, his place on the broad family tree of the whole of humanity. Different Y-DNA haplogroups identify genetic populations which are often intricately geographically orientated, reflecting the migrations of current individuals' direct patrilineal ancestors tens of thousands of years ago.

Y-STR haplotypes

The other possible part of the genetic results is the Y-STR haplotype, the set of results from the Y-STR markers tested.

Unlike the UEPs, the Y-STRs mutate much more easily, which gives them much more resolution to distinguish recent genealogy. But it also means that rather than the population of descendents of a genetic event all sharing the same result, the Y-STR haplotypes are likely to have spread apart, to form a cluster of more or less similar results. Typically, this cluster will have a definite most probable centre, the modal haplotype (presumably close to the haplotype of the original founding event); and also a haplotype diversity - the degree to which it has become spread out. The longer ago the defining event occurred, and the more that subsequent population growth occurred early, the more will be the haplotype diversity for a particular number of descendents. On the other hand, if the haplotype diversity is smaller for a particular number of descendents, this may indicate a more recent common ancestor, or that a population expansion has occurred more recently.

Importantly, unlike for UEPs, there is no guarantee that two individuals with a similar Y-STR haplotype will necessarily share a similar ancestry. There is no uniqueness about Y-STR events. Instead, the clusters of Y-STR haplotype results inheriting from different events and different histories all tend to overlap.

Thus, although sometimes a Y-STR haplotype may be directly indicative of a particular Y-DNA haplogoup, it is in most cases a long time since the haplogoups' defining events, so typically the cluster of Y-STR haplotype results associated with descendents of that event has become rather broad, and will tend to significantly overlap the (similarly broad) clusters of Y-STR haplotypes associated with other haplogroups, making it impossible to predict with absolute certainty which Y-DNA haplogroup a Y-STR haplotype would point to. All that can be done from the Y-STRs, if the UEPs are not actually tested, is to predict probabilities for haplogroup ancestry (as this online program does), but not certainties.

Similarly for surnames. A cluster of similar Y-STR haplotypes may indicate a shared common ancestor, with an identifiable modal haplotype; but only if the cluster is sufficently distinct from what may have arisen by chance from different individuals historically having adopted the same name independently. This may require the typing of quite an extensive haplotype to establish, which has fuelled DNA testing companies to offer ever larger sets of markers - 24 then 37 then 63 and perhaps soon even more.

Plausibly establishing relatedness between different surnames data-mined from a database is significantly harder, because now one must establish not that a randomly selected member of the population is unlikely to have such a close match by accident; but rather that the very nearest member of the population in question, chosen purposely from the population for that very reason, would even under those circumstances be unlikely to match by accident. This is for the foreseeable future likely to be impossible except in special cases where there is further information to drastically limit the size of that population of candidates under consideration.

See also

External links