0 Datasets
0 Files
Get instant academic access to this publication’s datasets.
Yes. After verification, you can browse and download datasets at no cost. Some premium assets may require author approval.
Files are stored on encrypted storage. Access is restricted to verified users and all downloads are logged.
Yes, message the author after sign-up to request supplementary files or replication code.
Join 50,000+ researchers worldwide. Get instant access to peer-reviewed datasets, advanced analytics, and global collaboration tools.
✓ Immediate verification • ✓ Free institutional access • ✓ Global collaborationJoin our academic network to download verified datasets and collaborate with researchers worldwide.
Get Free AccessRunning title: Evolution of Arabian mangrovesGuillermo Friis*, Edward G. Smith, Catherine E. Lovelock, Alejandra Ortega, Alyssa Marshell, Carlos M. Duarte, John A. Burt*Corresponding author: Center for Genomics and Systems Biology, New York University — Abu Dhabi, PO Box 129188, Abu Dhabi, United Arab Emirates; Email: guillefriis@gmail.com; Tel: +97126286739. AbstractBiological systems occurring in ecologically heterogeneous and spatially discontinuous habitats provide an ideal opportunity to investigate the relative roles of neutral and selective factors in driving lineage diversification. The gray mangroves (Avicennia marina) of Arabia occur at the northern edge of the species’ range and are subject to variable, often extreme, environmental conditions, as well as historic large fluctuations in habitat availability and connectivity resulting from Quaternary glacial cycles. Here, we analyze fully sequenced genomes sampled from 19 locations across the Red Sea, the Arabian Sea and the Persian/Arabian Gulf (PAG) to reconstruct the evolutionary history of the species in the region, and to identify adaptive mechanisms of lineage diversification. Population structure and phylogenetic analyses revealed marked genetic structure correlating with geographic distance and highly supported clades among and within the seas surrounding the Arabian Peninsula. Demographic modelling showed times of divergence consistent with recent periods of geographic isolation and low marine connectivity during glaciations, suggesting the presence of (cryptic) glacial refugia in the Red Sea and the PAG. Significant migration was detected within the Red Sea and the PAG, and across the Strait of Hormuz to the Arabian Sea, suggesting gene flow upon secondary contact among populations. Genetic‐environment association analyses revealed high levels of adaptive divergence, and detected signs of multi-loci local adaptation driven by temperature extremes and hypersalinity. These results support a process of rapid diversification resulting from the combined effects of historical factors and ecological selection, and reveal mangrove peripheral environments as relevant drivers of lineage diversity.IntroductionLineage diversification involves both neutral and selective factors, and elucidating their relative roles in the process of evolutionary divergence is essential to understand the mechanisms underlying the early stages of speciation (Coyne & Orr, 2004; Nosil, 2012). Evolutionary divergence may result from the accumulation of genetic differences caused by drift in geographic isolation or isolation-by-distance (IBD, Wright, 1943, 1946), a mode of divergence driven by neutral factors (Mayr, 1954, 1963). In turn, geographic variation in environmental conditions can result in divergent selection, the diversifying process that drives ecological speciation (Coyne & Orr, 2004; Darwin, 1859; Nosil, 2012). In ecological speciation models, reproductive barriers arise as a by-product of cumulative, ecologically adaptive changes (Mayr, 1947; Rundle & Nosil, 2005; Schluter, 2000), enabling genome‐wide differentiation at both neutral and selected loci (Funk, Egan, & Nosil, 2011; Nosil, Egan, & Funk, 2008; Shafer & Wolf, 2013; Wang & Bradburd, 2014). Ecological speciation in geographic isolation is theoretically uncontroversial, and deemed common in nature as a mechanism maintaining lineage diversity upon secondary contact (Keller & Seehausen, 2012; Nosil, 2012; Rundle & Nosil, 2005). However, whether environment-driven processes of lineage diversification occur frequently in nature in the absence of long-term geographic isolation and reduced gene flow remains debated in evolutionary research (Bolnick & Fitzpatrick, 2007; Fitzpatrick, Fordyce, & Gavrilets, 2008; Foote, 2018). The interactions between selection and the stochastic effects derived from processes such as founder events, bottlenecks and genetic drift also remain unclear, and difficult to assess in natural systems (Barton & Charlesworth, 1984; Burri et al., 2015; Kliber & Eckert, 2005). Biological systems occurring at the species' range edges, which are frequently extreme and environmentally diverse habitats, are suitable models to investigate questions related to lineage diversification. The environment at the edges of species’ range tends to be stressful and spatially discontinuous, as well as temporally unstable (Lesica & Allendorf, 1995), often resulting in dynamic settings of multiple isolated populations subject to strong differential selection. The severe and stochastic character of peripheral environments is hypothesized to generate strong selective interplay between adaptation and neutral processes (Hardie & Hutchings, 2010), providing an ideal opportunity for speciation research. One such system is provided by gray mangrove populations in the Arabian Peninsula (Avicennia marina var. marina). The gray mangrove has the broadest distribution of any mangrove species (Hogarth, 2015; Spalding, Kainuma, & Collins, 2010; Tomlinson, 2016), extending across the Indian Ocean and into the West Pacific as far as Japan and New Zealand (Fouda & AI-Muharrami, 1996; Khalil, 2015; C. Sheppard et al., 2010; Spalding et al., 2010). They present several morphological and physiological adaptations to their harsh intertidal habitat (Tomlinson, 2016), which makes them a compelling model for the study of functional genes and biological pathways involved in selection and stress tolerance (Urashi, Teshima, Minobe, Koizumi, & Inomata, 2013; Xu et al., 2017). The Arabian Peninsula represents one of the northernmost edges of the species’ distribution (NC Duke, 1991; Spalding et al., 2010; Tomlinson, 2016), as well as a stressful habitat characterized by extreme temperatures, aridity, and often extreme salinity, factors known to be limiting for mangrove growth (Ball, 1988; Lovelock, Krauss, Osland, Reef, & Ball, 2016; Charles Sheppard, Price, & Roberts, 1992). Arabian marine domains are also environmentally diverse both within, and between, the main water bodies bordering the peninsula, which define three main biogeographic regions: (i) the Red Sea, where the marine system presents opposing gradients of salinity and temperature, with the highest temperature and lowest salinity in shallow southern basin, while the north has cooler temperatures but high salinity as a result of limited precipitation and high evaporation (Anton et al., 2020; Carvalho, Kürten, Krokos, Hoteit, & Ellis, 2019); (ii) the Persian/Arabian Gulf (referred to as ‘PAG’ hereafter) to the northeast of the Arabian Peninsula, where populations are subject to arid (<250 mm/year) to hyper-arid (<100 mm/year) rainfall regimes, and experience the widest range of air temperatures in the region throughout the year (Böer, 1997; Whitford & Duval, 2019); and (iii) the Arabian Sea and Sea of Oman, which in contrast with former biogeographic regions, has normal oceanic salinity, and summer temperatures that are buffered by cold-water upwelling as a result of the Indian Ocean monsoon, resulting in more moderate environmental conditions (Claereboudt, 2019). The Arabian Peninsula has experienced large fluctuations in spatial and environmental conditions throughout glacio-eustatic cycles that largely impacted the biodiversity of the region, in particular the enclosed water bodies of the Red Sea and the PAG (DiBattista, Choat, et al., 2016). Throughout the last 400,000 years the Red Sea has remained connected to the Indian Ocean, yet cross-sectional area along the Strait of Bab al Mandab that connects these water bodies was, at times of glacial maxima, as low as 2% of that today, resulting in major increases in salinity and temperature within the Red Sea as well as near-complete isolation at times (Lambeck et al., 2011). For several sustained periods during the last two glacial cycles, the minimum channel width connecting the Red Sea to the Arabian Sea was less than 4 km wide and remained narrow whenever the local sea levels were 50 meters below current levels (Lambeck et al., 2011). In contrast, models show that the PAG was nearly completely drained during the peak of the last glaciation until c.a. 14,000 years ago (Lambeck, 1996). A marine incursion into the southern PAG basin started approximately 12,500 years ago, extending towards the northern basin over the following millennia, with the present day PAG shorelines forming just 6,000 years ago (Lambeck, 1996). In contrast, as an open ocean habitat, the Arabian Sea coast has only experienced vertical migration of sea levels during these glacial periods, without geographic isolation. The combination of extreme environmental conditions, differential changes in habitat and dynamic barriers to gene flow, makes the seas bordering the Arabian Peninsula one of the most variable marine environments in the world, with a high potential for speciation driven by both neutral and selective factors (DiBattista, Roberts, et al., 2016). Although the phylogenetic relationships for the varieties of A. marina and congeneric species have been reported for other regions (N. C. Duke, Benzie, Goodall, & Ballment, 1998; X. Li et al., 2016; Nettel, Dodd, Afzal‐Rafii, & Tovilla‐Hernández, 2008), the extensive gray mangrove populations from the Arabian coasts have rarely been included in reported DNA sequence-based analyses (see Al-Qthanin & Alharbi, 2020; N. C. Duke et al., 1998; X. Li et al., 2016; Maguire, Saenger, Baverstock, & Henry, 2000), so that their evolutionary origin and relationships remain largely unexplored. The specific drivers and molecular basis of local adaptation and lineage diversification in A. marina also remain understudied both in Arabia and across its entire distribution. Here, we used the Arabian gray mangrove complex to examine how extreme habitat conditions and heterogeneous spatial settings have shaped genetic diversity at the highly variable edge of the species’ range using whole genome and georeferenced environmental data. First, we analyzed patterns of population structure and reconstructed the evolutionary and demographic history of the species in the Arabian Peninsula. Two general competing hypotheses about the evolutionary history of the Arabian mangroves were tested in this study:(i) mangroves from the Red Sea and PAG were extirpated during the glacial cycles of the Pleistocene, followed by a recolonization after the last glacial maximum (LGM); and (ii) mangroves remained within the enclosed seas in glacial refugia during glacial periods, and expanded once sea levels rose. Second, we studied patterns of adaptive variability applying genotype-environment association (GEA) analysis. We used redundancy analysis combining environmental and single nucleotide polymorphisms (SNP) data to survey the genome and jointly identify environmental variables and functional genes potentially involved in local adaptation and lineage divergence. Materials and MethodsPopulation sampling We sampled a total of 200 individuals of Avicennia marina from 19 sites of the Arabian Peninsula coasts (var. marina, N = 190), and one site from Australia (var. australasica, N = 10) to be used as outgroup (Fig. 1A; Table S1, Supplementary Information). Leaf tissue was collected from trees separated by at least 20 meters and preserved in silica beads for up to ten days before extraction. Geographic coordinates for each one of the trees were recorded. Genomic DNA was extracted from ground leaf tissue using the DNeasy 96 plant kit (Qiagen, Valencia, CA) according to the manufacturer’s protocol. Genome resequencing and variant callingIllumina paired-end 150 bp libraries with insert size equal to 350 bp were prepared and sequenced in a Novaseq platform. A total of 8 billion reads were produced resulting in a mean coverage per site and sample of 28X before filtering. Read quality was evaluated using FASTQC (Andrews, 2010) after sorting reads by individual with AXE version 0.3.3 (Murray & Borevitz, 2017). Trimming and quality filtering treatment was conducted using Trim Galore version 0.6.6 (Krueger, 2015) with parameters --stringency 1 --clip_R1 12 --clip_R2 12 --length 90, resulting in a set of reads ranging between 90 and 138 bp long. Reads were then mapped against the previously published reference genome for A. marina (Friis et al., 2020) using the mem algorithm in the Burrows-Wheeler Aligner (BWA; H. Li & Durbin, 2009) version 0.7.1.7. Read groups were assigned and BAM files generated with Picard Tools version 1.126 (http://broadinstitute.github.io/picard). Duplicates were marked also with Picard Tools v1.126. We used the HaplotypeCaller + GenotypeGVCFs tools from the Genome Analysis Toolkit (GATK; McKenna et al., 2010) version 4.1.8.1 to produce a set of SNPs in the variant call format (vcf). Using vcftools version 0.1.16 (Danecek et al., 2011), we retained biallelic SNPs excluding those out of a range of coverage between 4 and 50, or with a genotyping phred quality score below 40. Twenty-two samples presenting more than a 25% of missing data were discarded at this point. We then applied GATK generic hard-filtering recommendations consisting of QualByDepth (QD) > 2.0; FisherStrand (FS) < 60.0; RMSMappingQuality (MQ) > 40; MappingQualityRankSumTest (MQRankSum) > -12.5; ReadPosRankSum (RPRK) < -8.0; and StrandOddsRatio (SOR) > 3.0 (GATK Best Practices; Auwera et al., 2013; DePristo et al., 2011). The resulting dataset (hereafter referred to as ‘Full Dataset’) consisted of 178 individuals and 15,702,886 SNPs (Table 1; Table S2, Supporting Information) with a per-individual average coverage of 16.8 and a missing data rate of 0.11. The program KING version 2.2.7 (Manichaikul et al., 2010) was used to confirm the absence of close relatives in our set of sampled individuals. Widespread self-fertilization was also ruled out using the program RMES and 300 variant positions randomly selected (Table S3, Supporting Information). The ‘Full Dataset’ was further filtered and customized for downstream analyses (Table S2, Supporting Information) Population structure analysesTo explore genome-wide population structure in Arabian mangroves, we conducted a principal components analysis (PCA). After excluding the samples from Australia, SNP loci under linkage disequilibrium were filtered out from the ‘Full Dataset’ with bcftools version 1.12 (Danecek & McCarthy, 2017) applying and r2 limit of 0.2 in windows of 10K bp. Using vcftools, a threshold for SNPs showing highly significant deviations from Hardy-Weinberg equilibrium (HWE) with a p-value of 10-4 was also implemented to filter out false variants arisen by the alignment of paralogous loci. Positions with less than 75% of individuals genotyped for each population were also removed from the data matrix, along with those presenting a minor allele frequency (MAF) below 0.02. To fulfil neutrality assumptions in population structure analyses, we used PCADAPT version 5.2 (Luu, Bazin, & Blum, 2017) to detect and exclude sites putatively under selection, applying a q-value threshold of 0.05 , resulting in a final data matrix of 143,900 SNPs and 170 samples. The PCA was conducted with the R package SNPRelate version 1.26.0 (Zheng, 2012). Using the same dataset as in the PCA, we examined patterns of population divergence in Arabian mangroves using a sparse non-negative matrix factorization method (SNMF) as implemented in R package LEA version 2.0.0 (Frichot, Mathieu, Trouillon, Bouchard, & François, 2014). We selected SNMF for inferring genetic structure due to its efficiency and shorter computational time. We ran the program five times per K value, with K ranging from 2 to 20. Similarity scores among runs and graphics were computed with CLUMPAK version 1.1.2 (Kopelman et al. 2015). To further explore patterns of geographic variation in Arabian mangroves, the dataset used in the PCA and the SNMF analysis was also used to test for isolation-by-distance. We computed pairwise Nei’s genetic distance values with the StAMPP R package version 1.6.0 (Pembleton, Cogan, & Forster, 2013). By-sea, pairwise geographic distances were measured based on GIS data. A Mantel test was implemented using the vegan R package version 2.5-7 (Oksanen, Blanchet, Kindt, Legendre, & O’Hara, 2016), and significance was computed through 9,999 matrix permutations. A linear regression between pairwise Nei’s genetic distances and geographic distances was also implemented and plotted for visual inspection. In addition, we used StAMPP to compute pairwise FST (Weir & Cockerham, 1984) among sampled populations. Significance was tested by conducting 100 permutations.A hierarchical analysis of molecular variance (AMOVA) as described in Excoffier et al. (1992) was used to assess the genetic structure of Arabian mangroves using the pegas R package version 1.2 (Paradis, 2010). We grouped individuals into two levels: major clades identified in phylogenetic analyses (see results below), and sampled populations within clades. We also computed the observed (HO) and expected heterozygosity (HE), and the nucleotide diversity (π) for each sampling site using the population program from Stacks version 2.63 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013). For this analysis, we did not exclude the Brisbane population belonging to the Australian variety of the gray mangrove (A. marina var. australasica), yet applied the same filters than for the analyses above (SNP matrix = 113,706). Phylogenetic analysisA maximum likelihood phylogeny was produced using the program IQ-TREE version 2.1.3 (Nguyen, Schmidt, Von Haeseler, & Minh, 2015) based on the SNP dataset used for heterozygosity and diversity analysis. In this case, ambiguously constant sites (positions lacking homozygous representants of at least one of the two alleles) were excluded (SNP matrix = 29,433) to enable ascertainment bias correction. The generalized time-reversible (GTR) model was implemented. The Brisbane population was used as outgroup. Branch support was estimated using the ultrafast bootstrap approximation by Hoang, Chernomor, and Von Haeseler (2018) with 1,000 iterations. Phylogenetic relationships were also inferred using the SVDQuartets model (Chifman & Kubatko, 2014) as implemented in PAUP version 4a169. We evaluated all the possible quartets and assessed branch support using 1,000 bootstrap replicates. The Brisbane population was used to root the tree. Population and demographic history analyses We performed model comparisons under the likelihood framework developed in fastSIMCOAL2 version 2.7 (Laurent Excoffier, Dupanloup, Huerta-Sánchez, Sousa, & Foll, 2013) to estimate demographic parameters and date cladogenetic events among mangroves populations, and to test the competing hypotheses of colonization after the LGM versus potential isolation in glacial refugia in the enclosed seas around Arabia. Three sets of models based on the evolutionary relationships inferred in the phylogenetic analysis with IQ-TREE and SVDQuartets (See results) were independently analyzed for (i) the Red Sea, (ii) the PAG plus the Sea of Oman, and (iii) the entire Arabian Peninsula. Three populations were used as lineage representatives for each set of models. The Brisbane population was included as outgroup in all models to calibrate times of divergence (He et al., 2019; He et al., 2020; X. Li et al., 2016). Although more comprehensive models including a higher number of populations could have been tested, we chose this approach to avoid the multiplicity of overly complex models resulting from the combination of the alternative scenarios for each one of the biogeographic regions. To avoid confounding effects from divergent evolutionary histories, we opted for a limited number of representative populations per geographic region instead of merging populations based on clustering and phylogenetic analyses (Hansen et al., 2018; Pedersen et al., 2018). For the Red Sea, the populations of Duba, Al Kharrar and Farasan Banks 2 (hereafter FB2) were included in three models with different topologies: simultaneous cladogenesis of the three lineages for a scenario of fast diversification; dichotomic cladogenesis from north to south; and a third scenario in which Al Kharrar from central Red Sea would have originated by admixture of the boreal and meridional populations. Because the Red Sea was never completely drained through the glacial cycles during the last 400,000 years, no assumptions were made about putative ancestral barriers to gene flow and coalescence times were allowed to vary freely. In the case of the PAG, two general hypotheses for lineage differentiation were tested: a scenario of recent differentiation following the colonization of the PAG after the last glacial maximum (LGM); and a scenario of early lineage diversification in which populations within the PAG became geographically isolated in glacial refugia. To test these two competing hypotheses, we fixed the coalescence times either to 700 generations or less (14,000 years); or to above 900 generations (18,000 years) assuming a generation time of 20 years (X. Li et al., 2016). Representative populations included in the model were Dammam from the northern basin, Ras Ghurab from the southern basin, and Shinas from the Sea of Oman. Three different tree topologies were modeled: a case of simultaneous cladogenesis corresponding to a fast diversification process; and two more models in which the most recent split corresponded either to Dammam-Ras Ghurab or to Ras Ghurab-Shinas. Each one of these topologies was tested under the postglacial colonization and divergence in glacial refugia scenarios. In the test for the entire Arabian Peninsula, the populations of FB2, Taqah and Ras Ghurab were used as representatives of the Red Sea, the Arabian Sea and the PAG, respectively. A single topology matching the IQ-TREE and SVDQuartets phylogenies was tested, and divergence times were allowed to vary freely. Time of coalescence of the Arabian lineages with Brisbane was set to 2.7 million years ago in all models (X. Li et al., 2016). Every model was compared under two different gene migration scenarios: a ‘strict isolation’ scenario with migration rates set to zero; and an ‘isolation with migration’ scenario where migration rates could vary freely (Table S4, Fig. S1, Supporting Information). As input data we used the folded site frequency spectra (SFS) generated from resequencing data. We retained the samples corresponding to the groups of study of each of the analyses from the ‘Full Dataset’. In these analyses, SNP loci under linkage disequilibrium were filtered out applying and less strict r2 limit of 0.4 in windows of 10K bp. A HWE filter for SNPs with a p-value of 10-4 was implemented. Because singletons are important for estimating parameters and likelihoods, no MAF filters were applied. Positions with less than 50% of individuals genotyped for each taxon/population were removed from the data matrices. Final matrices were of 58,613; 72,119 and 69,113 SNPs for the Red Sea, the PAG and the entire Arabian Peninsula analyses, respectively. The SFS were generated with easySFS version 0.0.1 (https://github.com/isaacovercast/easySFS; Gutenkunst, Hernandez, Williamson, & Bustamante, 2009) maximizing the number of segregating sites as recommended by the author (pers. comm.). Parameters with the highest likelihood were estimated under each of the models after 50 cycles of the algorithm, with 150,000 coalescent simulations per cycle. This procedure was replicated 100 times and the set of parameters with the highest final likelihood was retained as the best point estimate. To identify the model that better fit the data, we applied the Akaike information criterion (AIC; Akaike, 1998). For estimating 95% CIs of the parameters under the best model, we applied a non-parametric bootstrap procedure. Bootstrapping was carried out by splitting the SNP matrices in 100 SNP blocks and randomly combining them for each one of the repetitions. A total of 100 analyses were run for confidence interval estimation. TreeMix version 1.13 (Pickrell & Pritchard, 2012) was used to model historical patterns of gene flow between mangrove populations. The corresponding SNP dataset was built applying the same filters as for heterozygosity and diversity analyses, with the exemption of linkage disequilibrium filters (SNP matrix = 797,949) as it can be controlled for in the TreeMix command line. We ran TreeMix for 0–15 migrations, grouping SNPs in blocks of 50. Migration edges were plotted until 99.8% of the variance in ancestry between populations was explained by the model (Pickrell & Pritchard, 2012). The consistency of migration edges was evaluated by running TreeMix with 50 total replicates for each added migration edge number using a different, randomly generated seed. Results from the seed that yielded the highest likelihood are reported. Candidate gene identification with genotype-environment association analysisWe used genotype-environment association (GEA) analysis to identify candidate genes evolving under specific environmental pressures, and to estimate their contribution to patterns of local adaptation in mangroves from the Arabian Peninsula. We applied a redundancy analysis approach (Borcard, Gillet, & Legendre, 2011; Legendre & Legendre, 1998; Van Den Wollenberg, 1977) as implemented in the vegan R package. As explanatory variables, we used georeferenced environmental data extracted for the coordinates corresponding to the sampling sites and averaged over populations. Despite the availability of remote sensing data for a high number of both terrestrial and marine environmental parameters, we opted for a hypothesis oriented, ecologically informed approach to build our dataset of explanatory variables. Environmental parameters presenting particularly variable and extreme gradients across the Arabian Peninsula were specifically selected according to their relevance to mangrove ecology (Norman Duke, Ball, & Ellison, 1998; Naidoo, 2016). This dataset included two marine variables from the MARSPEC database (Sbrocco & Barber, 2013) approximating maximum of sea surface salinity and minimum of sea surface temperature averaged over months (MS_biogeo10_sss_max_5m, MS_biogeo14_sst_min_5m), two parameters of particular importance for mangrove physiology (Hogarth, 2015). It also included four terrestrial parameters from WorldClim2 (Fick & Hijmans, 2017; Hijmans, Cameron, Parra, Jones, & Jarvis, 2005) consisting of isothermality (WC_bio3), quantifying how the range of day-to-night temperature differs from the range of summer-to-winter, a variable generally useful for tropical and maritime environments (Nix, 1986); maximum air temperature of warmest month (WC_bio5) and minimum air temperature of coldest month (WC_bio6), both relevant ecological extremes in the Arabian habitats (Martínez‐Díaz & Reef, 2022); and annual precipitation (WC_bio12) as a proxy of aridity (Table S5, Supporting Information). Minimum sea surface and air temperatures showed high correlation, and the latter was removed when applying a correlation cutoff of 0.75. Annual precipitation was also filtered out by applying a forward selection method, implemented with an adding p-value limit < 0.05 and 1,000 permutations (Blanchet, Legendre, & Borcard, 2008; Capblancq, Luu, Blum, & Bazin, 2018). The remaining variables were retained. As response variables, we used population allele frequencies at each variant position. We chose to analyze population frequencies over individual genotypes because environmental scores were nearly identical for individuals from each population, given the spatial resolution of the remote sensing data. After excluding Brisbane samples from the ‘Full Dataset’, the HWE and MAF filters previously used were applied. Positions with less than 75% of individuals genotyped for each population were removed for a final dataset of 2,488,560 SNPs. Allele frequencies were then computed over populations to be used as the matrix of response variables. The genotyping quality filters employed here are commonly utilized in GEA studies based on RDA (e.g. Brauer, Hammer, & Beheregaray, 2016; Chang, Fridman, Mascher, Himmelbach, & Schmid, 2022; Faske et al., 2021; Friis et al., 2022; Friis et al., 2018; Laporte et al., 2016; Ortiz et al., 2023; Ruiz Miñano et al., 2022; Vu et al., 2020). However, overly conservative quality filters may limit outlier detection analyses based on genotype-environment association analysis. To investigate the impact of these filters in our GEA test, we conducted the analysis here described using an SNP dataset with less stringent quality filtering parameters.Two GEA analyses were implemented: a simple redundancy analysis (RDA) to test for genotype-environment associations between allele frequencies and environmental predictors; and a partial redundancy analysis (pRDA), in which in addition, we controlled for population structure effects. Covariates accounting for population structure consisted of the first two PCs of a PCA based on the allele frequencies matrix after filtering out positions putatively under selection, identified using the approach previously described with PCADAPT. Following the procedure described in Capblancq et al. (2018), we used the redundancy analyses to identify candidate genes potentially involved in divergent selection based on Mahalanobis distances estimated between each SNP and the center of the RDA space (Capblancq & Forester, 2021). Controlling for population structure is a common approach to reduce the rate of false positives due to historical processes. However, it may be overly conservative when neutral genetic variation correlates with environmental divergence, resulting in a higher frequency of false negatives and reduced power to detect signals of selection associated with environmental parameters (Ahrens et al., 2021; Capblancq, Lachmuth, Fitzpatrick, & Keller, 2022). Furthermore, in
Guillermo Friis, Edward G. Smith, Catherine E. Lovelock, Alejandra Ortega, Alyssa Marshell, Carlos M. Duarte, John A. Burt (2023). Rapid diversification of gray mangroves (Avicennia marina) driven by geographic isolation and extreme environmental conditions in the Arabian Peninsula. , DOI: https://doi.org/10.22541/au.165451878.80994363/v4.
Datasets shared by verified academics with rich metadata and previews.
Authors choose access levels; downloads are logged for transparency.
Students and faculty get instant access after verification.
Type
Preprint
Year
2023
Authors
7
Datasets
0
Total Files
0
Language
en
DOI
https://doi.org/10.22541/au.165451878.80994363/v4
Access datasets from 50,000+ researchers worldwide with institutional verification.
Get Free Access