Evaluation of the DAPC paper by Laurent
Excoffier on
the Faculty of 1000:
This paper elegantly combines a series of existing multivariate methods to detect the genetic structure of populations from genomic data. I found the validation of the methodology extremely convincing, as it shows that one can clearly recover very complex patterns and even hierarchical genetic structures.
The
authors propose the analysis of large genomic data sets by
combining
three model-free multivariate analysis techniques. The idea is
to
perform a discriminant analysis (DA) on genomic data, after the
main
axes of variation have been extracted by a principal component
(PC)
analysis, and genetic clusters of individuals have been
identified by a
sequential K-means and model selection method. The first use of
a PC
analysis solves two problems preventing the normal use of DA on
genomic
data. It reduces the number of genetic dimensions to a number
smaller
than that of the sampled individuals, and it ensures that
information
provided by each dimension is uncorrelated, thus removing
potential
effects of linkage disequilibrium. The Discriminant Analysis of
Principal Components (DAPC) is blazingly fast and allows one to
nicely
visualize the respective positions of clusters and individuals
in a
reduced number of dimensions. In a few simulated cases, the
resulting
clustering of the populations matched exactly the underlying
genetic
structure, even in complex cases with isolation by distance, or
when
there was a hierarchical structure of the clusters, which is
difficult
to recover by other multivariate analyses (see e.g. {1}).
Among a growing number of new multivariate techniques {1}, DAPC
seems
to be a very simple, robust, and useful tool for the finding and
the
visualization of clusters of individuals. When such clusters
make
little sense, like in admixture analysis for instance, a simple
PC
analysis may seem more appropriate {2}. While being a model-free
approach, DAPC efficiency in recovering true genetic structure
may be
due to the fact that principal components are actually related
to
demographic parameters via coalescence times {3}, making the
distinction between model-based and model-free approaches more
subtle
than previously anticipated.
References:
{1} Engelhardt and Stephens, PLoS Genet 2010, 6:e1001117 [PMID:20862358].
{2} Bryc et al. Proc Natl Acad Sci U S A 2010, 107(2):786-91 [PMID:20080753].
{3} McVean G. PLoS Genet 2009, 5:e1000686 [PMID:19834557].