Genetic diversity provides variations in which that selection can act upon, and it determines the response of population towards environmental change. Therefore, knowledge in genetic diversity is crucial for developing breeding strategies and also for the management of endangered species. Understanding the determinant genetic diversity is also essential for resolving the long-standing riddle in population genetics, the disproportionately narrow range of the genetic diversity compared with the large magnitude difference of population size across the tree of life, known as Lewontin’s Paradox (Lewontin, 1974).
Genetics polymorphism varies across genome and thought to be as the result of the balance between mutational input, allele loss, and fixation. Thus, diversity is partly explained by the difference in mutation rate (such as an elevated transition mutation in the mammalian genomic regions which rich in CpG (Hodgkinson & Walker, 2011)), but could also be understood from the variance in effective population size (Ne). This essay reviews the determinant of genomic diversity with the primary focus on the effect of linked selection, life history, genetic architecture and mode of reproduction and how it all can contribute to the dynamic of effective population size (Ne). Briefly, effective population size governs the neutral site diversity and the effectiveness of selection, which eventually shapes the overall genome polymorphism (Charlesworth, 2009).
Linked Selection Tuned by Recombination Rate Affects Genetics Diversity
When a new beneficial mutation emerges, it will sweep to fixation carrying linked genomic regions (even neutral loci). Thus, the polymorphism reduces, and it equals as the reduction in intra-genomic effective population size. This phenomenon called as the hard sweep, leaving a signature of excess low-frequency variant or negative skew in site frequency spectrum. Gossman et. al (2011) suggested that in a population with large Ne, the random fluctuations in allele frequency due to episodic hard sweep (genetic draft) has more contribution towards neutral diversity than the genetic drift.
The assumption underlying the hard sweep is that the adaptation is constrained by the input of beneficial mutation (mutation limited adaptation), which might not be a common in the nature (Pritchard et. al, 2010), as adaptation can also utilize standing genetic variation. Soft sweep is the relaxed version of sweep where the adaptive evolution uses the available genetic variation (instead of a new variation). In this type of sweep, fixation occurs in a more diverse genetic background and thus, the linked selection will not leave imprint as strong as the hard sweep.
Early work examining the evolutionary forces responsible for linked selection was focused on Drosophila which have large effective population size (for over a million, Charlesworth, 2009). Working with genome of Drosophila, Sattath et al., 2011 showed that, after controlling for mutation rate by species divergence, there is a depth trough in neutral diversity around non-synonymous substitution but not in synonymous substitution, a substantial evidence for the action of selective sweep.
Interestingly, following observation from organisms with lower Ne, background selection is also increasingly appreciated as the driver of linked selection. Background selection is the reverse version of sweep where negative selection on the deleterious mutations purges the adjacent neutral loci (Charlesworth et al., 1993). For example, analysis in Caenorhabditis elegans (with Ne: 80000) and in human (Ne: 10400) indicated that the effect of linked selection as the outcome of both hitchhiking and background selection processes (Andersen et. al, 2012, Hernandez et. al, 2011). Stephan (2010) stated that effort for separating the effect of background selection and hitchhiking might be a false dichotomy, as most of the mutational input are deleterious and thus, background selection happens perpetually. Lohmueller et al. (2011) suggested to include background selection in the null model of molecular evolution.
The effect of linked selection depends on the level of genetic recombination which can uncouple the linkage between neutral and selection sites and thus, obscuring the diversity-reducing effect of linked selection. Smukowski et al. (2011) suggested that there is heterogeneity in recombination rate across genome and between species, so to get a valid analysis of linked selection, the difference in recombination rate should be accounted for. Begun and Aquadro (1992) showed that the silent site diversity scale positively with recombination rate in Drosophila and this is interpreted as the effect of episodic positive selection. Further, Corbett-Detig et. al (2015) showed that neutral site diversity in a large population has a relatively weaker correlation with recombination than in a smaller population. The reasoning is that a large Ne intensifies selection processes and eliminates more neutral diversity. Another issue is that the recombination itself is inherently mutagenic, so it promotes higher diversity and thus, confounding the effect of linked selection. However, the correlation between species divergence (which is a proxy for mutation rate) and the level of recombination is insufficient for explaining all pattern of genetic diversity.
Genomic Architecture, Mating System and Demographic Fluctuations
Mutations in the functional elements of the genome affect fitness more heavily than in non-functional region, so these regions experience more frequent selection. As a result, due to the effect of linked selection, silent site diversity is expected to be lower in the gene-rich regions. However, recombination rate should also be taken into attention. Some plant species exhibit less signature of the linked selection in the genic region because they have a positive correlation between gene density and recombination rate (Flowers et al., 2012).
Across chromosome comparison showed that the genetic diversity is lower in sex chromosome than in autosome and this again can be understood in terms of Ne. For example, the difference in Ne is reflected in the ratio of A: X: Y chromosomes where it is 4:3:1 for a diploid heterogametic organism with X-Y sex determination system. Another factor is that hemizygosity in Y chromosome makes recessive mutation directly subjected to the selection and it is exacerbated by the lack of recombination which results in more profound effect of linked selection (Ellegren & Ellegren, 2016).
When the mating type is considered, it also contributes for the variances in genetic polymorphism. Species with asexual reproduction has smaller Ne, so that it experiences higher effect of drift and thus, having less diversity. Genetic recombination is also absent from this mode of reproduction, so that they suffer more profound effect of linked selection (Campos et al., 2012). No recombination also makes selection less effective due to the strong Hill-Robertson Interference which resulted in the drop of its adaptive potential. This reason might account for the lack of evolutionary success in asexual organisms indicated by rapid species turnover and thus, relatively recent origin among asexual lineages (Charlesworth, 2009).
Demographic fluctuations result in Ne varies over time and in turn affects the genetic diversity. For example, bottleneck event following human migrated out of Africa makes the genome of human in other continents as the subset of African genome with less in diversity (Yu et al., 2002). Demographic effects also influence the outcome of the linked selection. The signal for linked selection is expected to be weaker in the subdivided population (with high population differentiation index, Fst). For example, if the hard sweep occurs locally (in subpopulation), the signature of linked selection will be masked in a global scale. It is also predicted that in expanding population, there are many newly arises mutations which become the subject of selection, so the effect of linked selection will be more prevalent than in population at equilibrium (Cutter & Payseur, 2013).
Life history but not population history provides a more universal explanation for the determinant of genomic diversity across taxa
Romiguier et al. (2014) analysed transcriptome from 76 non-model animals and shown that there is only minor relationship among factors related with demographic history (e.g. endemism) with genetic diversity. They also observed that species in the same family tends to have similar genomic diversity, indicating that the pattern is governed by similar mechanisms. Another interesting finding is that the best predictor for the genetic diversity is the factors related to life history, where propagule size (as an indicator of parental care) is scale negatively with synonymous diversity (πs) and longevity has a positive correlation with non-synonymous diversity (πn/ πs).
The difference in parental investment and longevity reflects the difference in the life history strategy, and it can also be interpreted by difference in Ne. For the long term, the average Ne is best represented by the minimum population size following environmental perturbations. Species with high fecundity (r strategy) develop strategy for ability to recover following disturbance, so that they have higher survival probability, and thus, large Ne and neutral diversity. Meanwhile, species with fewer offspring (K strategy) concentrate more on the investment in parental care, optimized for smaller Ne and least ability for adaptation during stress. In regards to non-synonymous diversity pattern, the effect of drift outweighs selection in long-lived species with smaller Ne, which allowing for weakly deleterious mutations to segregate at an appreciable frequency in the population and thus, resulting in an elevated level of non-synonymous polymorphisms (Romiguier et al., 2014).
In conclusion, heterogeneity of genomic polymorphisms is partly explained by the effective population size and the magnitude is the result of the interplay among several variables such as linked selection, demography, mating system and the life history (Figure 1). Finally, with the availability of whole genome data across the tree of life, we are closer than ever to understanding the ‘dark matter’ of population genetics, the Lewontin’s paradox.