Notes on running Crux 1.2.0

Crux is written by Jason Evans

redpoint

redpoint usage:

redpoint -h | –help redpoint -V | –version redpoint [<options>]

General options:

-h, --help Print usage and exit.
-V, --version Print version information and exit.
-v, --verbose Enable verbose output (*disabled).
-q, --quiet Disable verbose output (*enabled).
-t, --threaded Enable thread parallelism (*enabled).
-u, --unthreaded
 Disable thread parallelism.
-s <uint>, --seed=<uint>
 Set pseudo-random number generator seed (*based on system time, microsecond resolution).
-G <float>, --gcmult=<float>
 Factor by which to modify Python’s garbage collection thresholds (*1.0).

Input/output options:

-g <stages>, --stages=<stages>
 

Comma-separated list composed of:

  • mc3: Run Mc3 analysis (*enabled).
  • post: Compute statistics on posterior distribution (*enabled).
-i <file>, --input-file=<file>
 Input DNA alignment, in FASTA format.
-p <prefix>, --prefix=<prefix>
 Input/output path prefix, used as the base name for Mc3 output files.

Mc3 options (see Crux.Mc3.Mc3.optName for documentation):

General:

--graphDelay=<float>          .
--cvgSampStride=<uint>        .
--cvgAlpha=<float>            .
--cvgEpsilon=<float>          .
--minStep=<uint>              .
--maxStep=<uint>              .
--stride=<uint>               .
--nruns=<uint>                .
--ncoupled=<uint>             .
--heatDelta=<float>           .
--swapStride=<uint>           .

Proposal parameters:

--ncat=<uint>                   .
--catMedian=<bool>              .
--invar=<bool>                  .
--weightLambda=<float>          .
--freqLambda=<float>            .
--rmultLambda=<float>           .
--rateLambda=<float>            .
--rateShapeInvLambda=<float>    .
--invarLambda=<float>           .
--brlenLambda=<float>           .
--etbrPExt=<float>              .
--etbrLambda=<float>            .

Model parameter priors:

--rateShapeInvPrior=<float>         .
--invarPrior=<float>                .
--brlenPrior=<float>                .
--brlenPrior=<float>                .
--rateJumpPrior=<float>             .
--polytomyJumpPrior=<float>         .
--rateShapeInvJumpPrior=<float>     .
--invarJumpPrior=<float>            .
--freqJumpPrior=<float>             .
--mixtureJumpPrior=<float>          .

Proposal probabilities:

--weightProp=<float>            .
--freqProp=<float>              .
--rmultProp=<float>             .
--rateProp=<float>              .
--rateShapeInvProp=<float>      .
--invarProp=<float>             .
--brlenProp=<float>             .
--etbrProp=<float>              .
--rateJumpProp=<float>          .
--polytomyJumpProp=<float>      .
--rateShapeInvJumpProp=<float>  .
--invarJumpProp=<float>         .
--freqJumpProp=<float>          .
--mixtureJumpProp=<float>       .

Fixed parameter overrides (appropriate proposals are implicitly disabled):

--fixed-topology=<file>
 Use the Newick tree in <file> to fix the tree topology.
--fixed-tree=<file>
 Use the Newick tree in <file> to fix the tree topology and branch lengths.
--fixed-nmodels=<uint>
 Fix the number of models in the mixture.
--fixed-rclass=<rclass>
 Fix the comma-separated rclass. Semantics are strange unless –fixed-nmodels is also specified.
--fixed-rates=<rates>
 Fix the comma-separated rates associated with the fixed rclass. Semantics are strange unless –fixed-nmodels is also specified.
--fixed-shape=<alpha>
 Fix the +G shape parameter. Semantics are strange unless –fixed-nmodels is also specified.
--fixed-pinvar=<pinvar>
 Fix the +I proportion of invariable sites. Semantics are strange unless –fixed-nmodels is also specified.
--fixed-freqs=<freqs>
 Fix the comma-separated state frequencies. Semantics are strange unless –fixed-nmodels is also specified.

Posterior distribution statistics options (*all enabled by default):

--burnin=<burnin>
 Set number of burn-in samples. Specify –burnin=half for a burn-in equal to half the total samples (*default).
--sum=<bool> Write summary statistics to <prefix>.sum.
--trprobs=<bool>
 Write tree topology frequencies with mean branch lengths to <prefix>.trprobs.
--parts=<bool> Write partition frequencies and branch length statistics to <prefix>.parts.
--con=<bool> Write consensus tree (first tree with mean branch lengths, second with bipartition support values) to <prefix>.con.

Example command lines

# Default starting conditions: GTR, 1 run, no Metropolis-coupling, no proportion of invariants, and no gdsarv:

redpoint -i ../concatg_set5.fa --prefix=run1

# GTR+I+G, 2 runs, each 4 chains, pInvar, and gdsarv 4 categories, submitted with MPI on 8 CPUs (1 per chain):

mpirun -np 8 redpoint -i ../concatg_set5.fa --nruns=2 --ncoupled=4 --prefix=run1 --ncat=4 --catMedian=True --invar=True

Output files

<prefix>.con     - 2 trees: a) phylogram, b) cladogram with PP's
<prefix>.l       - Starting configuration
<prefix>.lnL.R   - R script to print lnL plots
<prefix>.p       - sampled parameter values
<prefix>.parts   - partition table
<prefix>.s       - lnL, proposal probs., swap rates
<prefix>.sum     - summary
<prefix>.t       - all sampled trees (Newick)
<prefix>.trprobs - tree posteriors

Typical options to change

Default values:

burnin=half              Default burnin - and interger or "half" ?
nruns=1                  Number of independent runs
ncoupled=1               Number of chains in Metropolis-coupling
stride=100               Sampling frequency
minStep=100000           Minimum chain length
maxStep=~inf             Maximum chain length (essential set to inf)
swapStride=1             ??? How often swapping is proposed between hot and cold chains ???
heatDelta=0.05           M-coupled chain heating temp.
ncat=1                   Number of categories in Gdasrv
catMedian=False          Use Gdasrv category means
invar=False              Set proportion of invariant sites
cvgSampStride=1          How often to run convergence diagnostics (cvgSampStride x stride)
graphDelay=-1.0          If positive, output R program (<prefix>.lnL.R) every <value> seconds