How to conduct a MrBayes MCMC analysis on Gyra

For this to work, you need:

To determine if X-forwarding is on, open a terminal, log-on to gyra, and issue the command:

$ xclock

You should see a clock if its working.

If you need to convert your data to Nexus format, issue the command:

$ convertDataToNexusFormat.py -o <output Nexus file>  <your data file>

Running the MrBayes MCMC analysis

Here we are going to run a MrBayes MCMC analysis with a single data partition (i.e. a data-homogeneous model) for 1 million generations, sampling every 1000 generations, under a GTR+I+G model. To set a different nucleotide model, or a protein model, and to adjust other parameters, consult the MrBayes manual.

Open the data file in a text editor and add the following block to the bottom of the file, and save it:

(comments are in [])

begin mrbayes;
    set autoclose=yes nowarnings=yes;
    lset nst=6 rates=invgamma; [specifies a GTR+I+G]
    mcmc ngen=1000000 samplefreq=1000 printfreq=1000 savebrlens=yes filename=mymb;
    quit;
end;

Write a queue submission file called mrbayes.sh as follows:

#!/bin/bash
#$ -N <job name>
#$ -cwd
#$ -o mb-log
#$ -j y
#$ -S /bin/bash
#$ -M <your email address>
#$ -m bae
#$ -pe orte 8
source ~/.bash_profile
mpirun -np 8 mb32 <your Nexus data file>

By default, MrBayes will run 2 MCMC analyses at once, each with 4 chains (1 cold, 3 hot - Metropolis-coupling), so here we request 8 processors, one for each chain.

Submit the job to the queue:

$ qsub mrbayes.sh

Analysing the MrBayes MCMC

Here we assume that the MCMC run was a success: that is, that the chains converged and were run for a sufficient number of generations to sample adequately from the posterior probability distribution. For convergenece diagnostics and their interpretation, refer to the MrBayes manual.

For additional diagnostics and an alternative to constructing the consensus tree, see the MrBayes commands named sump and sumt.

First, the number of samples to discard as “burnin” needs to be determined for each run. Here we plot the log likelihood values at each sampling point versus the number of generations.

Issue the commands:

$ gnuplot

gnuplot> plot 'mymb.run1.p' u 1:2
gnuplot> plot 'mymb.run1.p' u 1:2

Note the number of generations before the log likelihoods plateau for each run - divide the number of generations by the sampling frequency (i.e. 1000 above) to calculate the number of samples to burnin for each chain.

To make a 50% majority rule consensus tree of the trees sampled from the posteriors of both runs combined, issue the following command:

$ makeConsensusTree.py -d <your data file> -t mymb.run1.t mymb.run2.t -b <burnin run1> <burnin run2>

Where <burnin run1> <burnin run2> are the integer values of the numbers of samples to burnin for run1 and run2, respectively. The consensus tree will be saved in a file name consensus.tree.

For additional options when making the consensus tree, issue the command:

$ makeConsensusTree.py -h

The consensus tree can be manipulated and saved in various formats using Figtree:

$ figtree consensus.tree