How to analyze your molecular data

Back to the BioComputing Homepage

Alignment (Final)

Total # of Sites:
Ambiguous at beginning:
Ambiguous at Ending:
Total # sites excluding ambiguous ends:
Ambiguous in Alignment (conservative):
Ambiguous in Alignment (tolerant):


Look at informative sites (PAUP, in 'DATA' menu, 'show data matrix') and compare trees of both data sets. Does one make more sense? Are all the taxa contributing to the answer? At this time it is good to make hard decisions as to which alignment, exclusions etc you will make, and include them into the PAUP file you will use

Use PAUP to ascertain your data have information- Search for random trees (10,000 or more, depending on how many taxa you have). The tree lengths should be skewed, so only a few trees are shortest- ie the curve should be significantly different than bell-shaped. Hillis's G1 statistic.

Length of most parsimonious tree:
Decay index: Find all trees one percent longer than shortest tree.
Alternative hypotheses: Using constraint trees, what is the most parsimonious tree fitting under the constraint, and how much longer than the unconstrained most parsimonious tree.

Bootstrap tree: what branches are supported and at what level?

Maximum likelihood: what is the log likelihood?
Input user trees for alternative topologies, are they significantly different by Kishino Hasegawa test? Use Excell to make a table of different substitution rates and adjacent nucleotides

Are the substitutions saturated?
Use PAUP to generate a distance matrix of the data corrected for missing data; then PHYLIP dnadist using Jukes-Cantor correction. Graph the two against each other in Excel- all pairs of taxa sorted by increasing distance.


DNAdist: make table of pairwise distances, using Kimura-2 way correction.

NJ joining- Bootstrap
# of Unambiguously Aligned Sites:



#NEXUS
[! CO1 Data Same file with the respectcase option, allowing individual, ambiguously aligned,
 nucleotides to be ignored from the data set]

 begin data;
 dimensions  ntax=6 nchar=36 ;
FORMAT     
    MISSING=N   respectcase  
  [Enclose the "equate...  =N" in brackets and re-execute file to produce data matrix in PAUP]
   equate="a=N"     equate="c=N" equate=".=N"     equate="n=N"         
   equate="g=N"     equate="t=N"   equate="I=N" 
   [equate="A=R" equate="G=R"   equate="T=Y" equate="C=Y" ][Allows Transversion Parsimony]
   SYMBOLS="ACGTacgtI"]] INTERLEAVE  [Don't interleave if using a PHYLIP sequential file] 
   GAP=-;  OPTIONS IGNOR=INVAR  [gapmode=newstate][treats gaps as extra characters];  
  matrix
DROMTTGNC  TACTACCCTGCTCTTTCT TTATTATTAGTAAGAAGA
     Dros  TACTATCCTGCTCTTTCT TTATTATTAGTAAGAAGA
 YMU09206  TATTATCCATCCTTAACa cTATTAATTTCTAGAAGA
LUCMTPIEA  TTTTATCCTGCATTAACT TTACTATtagtaagtagt [lower case ignored if using the respectcase format]
 MSQNCATR  TATTACCCCTCTTTAACT CTTCTAATTTCTAGAAGT
     Apis  TACTTTCCCTCATTATTT ATACTTTTATTAAGAAAT   ;
  end;
[[begin assumptions;
charset begin = 1-5;  [Characters sets (for excluding/including etc)]
charset various = 8 10 13-16;
charset first = 1-36\3;   [For amino-acid coding, every third base]
charset second = 2-36\3;
charset third = 3-36\3;
taxset  one = 1 3 5;    [Taxa sets, note taxa may be referred to by number or name]
taxset Diptera = DROMTTGNC LUCMTPIEA MSQNCATR Dros;    end;
begin PAUP;
outgroup Apis; [Automatically roots trees at Apis instead of the first taxon when file is executed]
delete DROMTTGNC  ;  [Automatically excludes taxa or taxa sets]
exclude various  third ; [Automatically excludes characters or character sets]

  constraints   both_genera =  ((Dros,DROMTTGNC),(LUCMTPIEA, MSQNCATR)); [Constraint tree, only 
	enforced when ticked in the search box]   end;
begin trees;    uTREE both_genera =  (((2,3),6),(4,5)) [Places tree in memory upon execution];     end;