documentation

Evaluation of the DAPC paper by Laurent Excoffier on the Faculty of 1000:

This paper elegantly combines a series of existing multivariate methods to detect the genetic structure of populations from genomic data. I found the validation of the methodology extremely convincing, as it shows that one can clearly recover very complex patterns and even hierarchical genetic structures.

The authors propose the analysis of large genomic data sets by combining three model-free multivariate analysis techniques. The idea is to perform a discriminant analysis (DA) on genomic data, after the main axes of variation have been extracted by a principal component (PC) analysis, and genetic clusters of individuals have been identified by a sequential K-means and model selection method. The first use of a PC analysis solves two problems preventing the normal use of DA on genomic data. It reduces the number of genetic dimensions to a number smaller than that of the sampled individuals, and it ensures that information provided by each dimension is uncorrelated, thus removing potential effects of linkage disequilibrium. The Discriminant Analysis of Principal Components (DAPC) is blazingly fast and allows one to nicely visualize the respective positions of clusters and individuals in a reduced number of dimensions. In a few simulated cases, the resulting clustering of the populations matched exactly the underlying genetic structure, even in complex cases with isolation by distance, or when there was a hierarchical structure of the clusters, which is difficult to recover by other multivariate analyses (see e.g. {1}).

Among a growing number of new multivariate techniques {1}, DAPC seems to be a very simple, robust, and useful tool for the finding and the visualization of clusters of individuals. When such clusters make little sense, like in admixture analysis for instance, a simple PC analysis may seem more appropriate {2}. While being a model-free approach, DAPC efficiency in recovering true genetic structure may be due to the fact that principal components are actually related to demographic parameters via coalescence times {3}, making the distinction between model-based and model-free approaches more subtle than previously anticipated.

References:
{1} Engelhardt and Stephens, PLoS Genet 2010, 6:e1001117 [PMID:20862358].
{2} Bryc et al. Proc Natl Acad Sci U S A 2010, 107(2):786-91 [PMID:20080753].
{3} McVean G. PLoS Genet 2009, 5:e1000686 [PMID:19834557].

Competing interests: None declared

To cite this evaluation

Excoffier L: "This paper elegantly combines a series of existing multivariate methods to detect the genetic structure..." Evaluation of: [Jombart T et al. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 2010; 11:94; doi: 10.1186/1471-2156-11-94]. Faculty of 1000, 14 Dec 2010. F1000.com/6949956

Short form
Excoffier L: 2010. F1000.com/6949956