Software  
Biology «
Chemistry »
Mathematics »
Physics »
Visualization »
 
 
 
 
 
 
 
 
 
 


PUZZLE icon PPUZZLE icon NSC   TREE-PUZZLE:
  Maximum likelihood analysis for nucleotide, amino acid, and two-state data

Copyright 1995-1999 by Korbinian Strimmer and Arndt von Haeseler
Parallel version 2000 by Mikael Hansson and Torbjörn Holmgren

A parallel version of the bioinformatics program Tree-Puzzle

Local usage on the Beowulf Ingvar at NSC
Source code and executable files
Extensive report in Swedish PDF
Documentation in Swedish PDF
News
Tree-Puzzle 5.0
Some recommendations

Tree-Puzzle is a computer program used by molecular biologists to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. Phylogenetics is the study of the evolutionary history and relationship of different species; to study phylogenetics is to recreate the history of life on earth from the existing species.

The program, which is written in C, uses maximum likelihood methods and a fast tree search algorithm, quartet puzzling. For additional information on the original program, see the Tree-puzzle web site, where also a manual for version 5 is available. Locally we have the manual for version 4.0.2 available. That manual (4.0.2) is also valid for the parallel version at NSC.

The computations often require a lot of CPU time and the need of a parallel version is therefore large. The National Supercomputer Centre (NSC) in Linköping and the Linnaeus Centre for Bioinformatics in Uppsala therefore decided to develop a parallel version of Tree-Puzzle. The work was done by Mikael Hansson and Torbjörn Holmgren at NSC. In parallel (sic!) also the original version was parallelized (TREE-PUZZLE 5.0, October 2000) by Heiko A. Schmidt. We now also offer that version, somewhat impoved, see Tree-Puzzle 5.0.

The program was parallelized with MPI (Message Passing Interface) which is a standard for parallel programming based on message passing. The MPI standard is described in the MPI Forum web site.

That part of the program where the tree reconstruction is done was easy to parallelize. It was really an "embarassingly parallel" problem, that is the processes work quite independently of one another. Communication between the processes only happens at initiation and termination. This implies that this part of the program scales almost linearly.

The first part of the program, parameter estimation, is however completely serial; and at several levels, with iterative algorithms that have to converge. In some of these there are also calls to other iterative algorithms. The parameter estimation phase therefore had to be parallelized granularly, which in practice means that the parallelization is done on the matrix level. Each processor is allotted its part of the matrices and usually have to communicate in order to assemble the results. A lot of communication between the processes means that they may have to stop the calculation in order to exchange data. The processors can therefore not work as efficiently, which means less scalability. The scalability will also vary with the input data and with the actual system. The variation due to the system is mainly dependent on the properties of the communication net. We therefore compile the source code in two different ways depending on how the parameter estimation is to be parallelized and how the communication between the processors is performed. For more information we refer to the documentation and the recommendations below.

An optional calculation of a molecular clock is also parallelized granularly (since it uses some functions common to the parameter estimation). Another function which has been parallelized is calculations on user-defined trees. This parallelization scales quite well.

The parallel version of Tree-Puzzle has been tested and benchmarked on three of the NSC computers: Cray T3E, the Beowulf cluster Ingvar (both with the 100 MHz Ethernet and the fast SCI based network) and the interim system SGI Origin 2000. The tests showed the Beowulf cluster to be the best environment for Tree-Puzzle, if the fast network is used.

An extensive report of the parallelization work and a documentation are available in Swedish as PDF files.

News

Parallel Puzzle 4.0.2 has now been updated with better load balancing in the parameter estimation phase and in the analysis of user defined trees.

Tree Puzzle 5.0 features parallel reconstruction in its original edition. A further parallelized version where also the parameter estimation, ML branch length calculations, and analysis of user defined trees, are all parallelized, is now available as Tree-Puzzle 5.0.

Some recommendations

Tree-Puzzle can be compiled with two different parallelizations. In general the default parallelization gives best performance, but on computers with very fast networks (e.g. Cray T3E and SGI 3800, but not Beowulf) compilation with -DPARLEVEL2 gives better performance on amino acid files when the number of patterns in the file are less than about 500. PARLEVEL2 has no effect on DNA files. Rule of thumb: default compilation is the preferred option.

Usually the scalability is improved if the parameter estimation corresponds to a minor part of the total execution time. However, at least on the cluster, some DNA files scale super-linear in phase 1 (the parameter estimation).

Usually calculations with many categories scale better and the difference between 8 and 16 categories is not large if many processors are used. However, if you double the number of categories you have to double the memory.

The quartet puzzling scales well and if the number of sequences is less than 50 the puzzling is fast compared to the rest of the computations. If you suspect that the result will benefit frome more puzzling steps you may increase this number, for example with a factor 10. If the input file contains more than 50 sequences this is an indication that the tree reconstruction (quartet calculations plus quartet puzzling) may require very long time (much more than the parameter estimation). In such a case as many processors as possible should be used, and the number of puzzling steps should not be increased too much.

If the assumption of a molecular clock is relevant it may be advisable to include this calculation if it scales well on the used system If many processors are used the additional time will be small.






Page last modified: 2003-03-05 10:56
For more information contact us at info@nsc.liu.se.