Phylogeographic analysis of spatial diffusion

Last updated on Mar 2, 2023 4 min read Genetics, Phylogeography

Introduction

The analyzes implemented in phylogeographic studies have traditionally allowed inferring the genetic structure and demographic processes that have shaped the current distribution of genetic lineages. Many times these demographic changes have been implicitly associated with range expansions of the species. However, these analyzes do not explicitly test these hypotheses. To do this, in this post we are going to introduce ourselves to the use of the SPREAD application. This software allows the explicit reconstruction of the distribution of lineages in space and time, taking into account the phylogenetic uncertainty (posterior probability of the nodes), and the spatial diffusion process (Kms in Ma).

This post is an adaptation of BEAST tutorial.

Software needed

BEAUTi
BEAST v. 1.10.4
TRACER v. 1.7.2
Tree Annotator
spreaD3

How do I perform spatial diffusion analysis?

Starting files

We start with a fasta file that contains the sequences, and a nexus file containing geographical coordinates of each sequence. AdemásIn addition, it is necessary to know the evolutionary model that best fits the sequence matrix, which can be calculated in software such as jModelTest.

In addition to these files, in order to geographically represent and visualise the spatial dynamics of the species, a vector geographic file of the type geojson will be needed. This can be obtained in R using the following code:

library(sf)

## Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE

library(rnaturalearth)

# Download polygons of South American country boundaries:
Sudam = ne_states(iso_a2 = c("AR", "BO", "PY", "BR", "CL", "UY", "PE", "EC", "VE", "CO", "GY", "SR", "FR"),
                                     returnclass = "sf")
write_sf(obj = Sudam, dsn = "./ProvSudam.geojson")

## Warning in CPL_write_ogr(obj, dsn, layer, driver,
## as.character(dataset_options), : GDAL Error 6: DeleteLayer() not supported by
## this dataset.

BEAUTi v. 1.10.4

In this programme we are going to prepare our data to generate an xml file, which can be read by the BEAST programme. To do this:: 1. Import the file sequence.fas. Once imported, check in the Partitions tab that the data is correct. 2. In the Traits tab, add a new data partition with geographical coordinates. Click Import traits to import coordinates.nex. Then, click Create partition from trait... and name the partition as you wish, for example location. 3. Sites tab. 4. Clocks tab. 5. Trees tab. 6. States tab. 7. Priors tab. 8. MCMC tab.

BEAST v 1.10.4

You can run the analysis in the software or BEAST, or you can use an internet server like CIPRES to save work for our machine.

Tracer 1.7.2

After the run, we must check that the analysis has had a chain length long enough to reach ESS values greater than 200 (convergence). For this we open the .log file generated in the Tracer program. If those of ESS have posterior probabilities greater than 200, we can check the diffusion rate which is found to be location.diffusionRate. This is indicated in km/year. In species, where data on dispersal rates are available, this value is important for comparison with previous studies.

TreeAnnotator v. 1.10.4

This software allows to generate trees that summarise the information of the 10000 trees retrieved by Bayesian inference. In this software you must define the Burn-in of low probability trees. Usually 12-25% of the trees are discarded or “burned” (usually aiming for a total of 10,000 trees before burning). Leave other options by default, and load the file generated after BEAST run with the extention .trees. As output you can generate a file .tree that can be read in Figtree.

spreaD3

Finally, to generate the spatial patterns of haplotypes and their ancestors, the spreaD3 software must be opened. * Data tab. Load the .tree file obtained with TreeAnontator. Next, select location1 as latitude and location2 as longitude. After that, you must load a geojson file with the baseline map you will use, for eg. South American bordes. Click Generate JSON and name the json file of output, including the extention .json. * Rendering tab. Load the json file generated in the previous tab. Render it in kml form, defining the name followed by the extension. * This will generate a .kml file in the folder which can then be opened using google earth.

BEAST