Subject: CVS commit: pkgsrc/biology/phylip
From: OBATA Akio
Date: 2010-07-10 13:26:32
Message id: 20100710112633.274AC175DD@cvs.netbsd.org

Log Message:
Update phylip to 3.69.
Based on PR#43388 by Wen Heping.

 version 3.69 (September, 2009)

        * If there are more than about 50 species in the tree, Treedist can
	  fail to compute distances among the trees. This is due to an overflow
	  problem inadvertently introduced in version 3.68. There is no
	  workaround with the 3.68 executable, but if you can recompile you can
	  fix it by replacing line 1179 of treedist.c, which is currently

            maxgrp = pow(2,tip_count);

          by

            maxgrp = 100000;

          This is fixed in version 3.69. Versions prior to 3.68 will not have
	  this problem.
        * In Dnacomp, Pars, and Dollop, if the Shimodaira-Hasegawa test is
	  performed and there are trees perfectly tied with the best tree, the
	  P values were incorrect (being 0 instead of 1).
        * A team from Iowa State University noticed that time was being wasted
	  in calculations in Dnapenny in the bound calculations. This has now
	  been remedied and it should be noticeably faster.
        * In the molecular likelihood programs, ancestral state probabilities
	  were being incorrectly calculated for user trees that had internal
	  multifurcations. This has been corrected.

version 3.68 (August, 2008)

        * We received some reports that Dnaml was freezing on some data sets in
	  the Windows executables. This seems to have been because of incorrect
	  handling of small increases in the log-likelihood, causing the
	  algorithm to fall into loops. It was temporarily cured in version 3.67
	  by changing the compiler optimization level, downwards from -O3 to
	  -O1. Now the underlying problem of small differences of log-likelihood
	  has been addressed too, so you should use the new Windows executables
	  (3.68) to avoid having these problems on Windows systems.
        * We found that the .DMG (disk image) archive for Mac OS X contained
	  executables for the Intel Mac but not universal binaries that would
	  work on both Intel Mac and PowerPC systems. Oops. We recompiled and
	  reposted the archives (on 23 August 2007). They should work on both
	  kinds of systems now.
        * We were told that on a Linux computer with a 64-bit Intel Itanium chip
	  the bootstrapping program Seqboot creates blatantly wrong bootstrap
	  samples with characters sampled too many times (or none). On a 64-bit
	  AMD processor the program works fine. The problem is in the random
	  number function "randum" in phylip.c. It seems to be a problem with
	  optimization on the GCC compiler. It is cured by dropping the compiler
	  optimization level from -O3 to -O2.
        * In Protdist the program would blow up if it computes a distance
	  greater than 100.0. This is owing to a subscript error in the code
	  that writes out the distances, in line 1874 where

                      else if (d[j][k] < 1000.0)

          should have been

                      else if (d[i][j-1] < 1000.0)

          If you have this problem and cannot upgrade to version 3.68 or
	  recompile the program with this change, and your data comes from
	  bootstrapping, try omitting just that replicate, or else rerunning
	  the bootstrapping with a different random number seed (which might not
	  happen to drop as many of the sites that caused these two sequences to
	  be so distant).
        * When Dnadist is used and the lower-triangular output format is chosen,
	  the resulting file has headers at the top of columns and is human-
	  readable but is not machine readable. The (temporary) solution is not
	  to use this option for the time being.
        * In Mac OS X, Drawgram produces some alarming lines of text at the top
	  of its terminal window when it first runs. These are just scripting
	  commands that were not erased because we do not clear the screen at
	  the right moment. The workaround is simply to ignore these commands.

version 3.67 (July, 2007)

        * We had our first reports on the behavior of PHYLIP Windows executables
	  on Windows Vista. The programs work fine. The only thing that did not
	  work is the self-extraction program that unpacks the archives. For
	  some reason it did not work on Vista. The work-around was that, after
	  you got an archive file like phylipwx.exe onto your system, you had to
	  change the file extension from "exe" to "zip". Then you \ 
had to click
	  on the file. You were presented with options including "Extract all
	  files". If you chose that the archive was unpacked. The programs would
	  then work. Although we provided "zip" archive versions of the package,
	  we have now got a new version of WinZip which is supposed to have a
	  self-extractor that works on Windows Vista, and it was used to produce
	  the self-extracting archive since 27 August 2007.
        * On Mac OS X systems, if our distributed executables are placed in a
	  folder whose path contains a name with an internal blank, such as
	  /Users/ianr/the files/ then the script that causes each of our
	  programs to run when you click on the corresponding icon does not
	  work, and there is an error message. This is a scripting error in our
	  Mac OS X setup, and it was corrected in version 3.67. In the meantime,
	  if you have this problem, the solution is to put PHYLIP in a folder
	  whose path does not have any folder that has a blank in its name. In
	  the above example, all that would be necessary is to rename the folder
	  the files to the_files
        * We are still getting reports of stickiness of the tree, and
	  occasionally of negative branch lengths, in Dnamlk and Promlk which
	  do not do as good a job of searching for best trees as they should.
	  This has turned out to be an issue of nodes getting stuck when they
	  collide in moving them on the "time" scale. Some major changes were in
	  the code in the 3.67 release to eliminate this stickiness and give a
	  good search.
        * An error was made in putting together the matrices for the PAM
	  mutation model in Protdist, Proml, and Promlk. These programs will
	  give PAM calculations inconsistent with earlier (v3.65 and before)
	  versions, and with other programs. The matrices were corrected in
	  version 3.67. This does not affect JTT or PMB models.
        * The W (within-species varation) option of CONTRAST uses somewhat
	  incorrect equations to infer within-species covariances and
	  phylogenetic covariances. These were corrected in version 3.67.
	  Anyone severely impacted by the problem in the meantime should contact
	  me.
        * Protdist sometimes results in distances greater than or equal to
	  100.000. When this happens, the distance can run together with the
	  previous number in the output file. For example, a distance of 0.31766
	  followed by one which is 127.43986 might look like this:
	  "0.31766127.43986". This causes trouble in any program that tries to
	  use this distance matrix. One symptom of this may be the program
	  reporting that two distances which are expected to be equal are
	  unequal -- but then printing them both out, and they appear to be
	  equal! In this case it would print out a message warning you that
	  0.31766 was not equal to 0.31766. It is doing so because one of them
	  is actually seen by it as 0.31766127 and the other 0.31766. In all
	  future versions, there will be a blank printed between the two
	  numbers. For the present, use an editor to find them and insert the
	  blank by hand. If this is difficult, a Sed script (which can be used
	  on Linux or Unix machines) has been written by Doug Scofield, and is
	  available from him at: this link. Many thanks to him for this. As you
	  can see, this problem is the result of us not thinking of what happens
	  when the distances are big, and the fix in the code is trivial -- just
	  ensuring that there is at least one blank between successive
	  distances.
        * Contml, with gene frequencies, has a bug in the transformation to
	  variables that have approximate Brownian motion as their evolutionary
	  process. This can lead to wierd trees. It might be preferable to go
	  back to the 3.5c version if you need to use Contml for this. We
	  believe that this will be correctly fixed in the 3.67 version. If
	  people can recompile the source code, they replace the function
	  transformgfs with this one and recompile (you should be able to save
	  it from your browser using the Save As choice in its File menu.

version 3.66 (August, 2006)

        * Program Treedist was found to compute the Branch Score Distance
	  incorrectly. It will, in most cases, get the branch lengths in
	  terminal branches incorrect and then be likely to find a nonzero
	  distance between trees when they are really identical, and incorrect
	  distances when they are not identical. Alas, there is no workaround to
	  avoid this. All distances done with this option before version 3.66
	  should be regarded as incorrect unless all terminal branches have the
	  same length, or unless the order of species in the tree is the same as
	  in the first tree in the file. The Symmetric Difference option, which
	  does not use branch lengths, works properly.
        * Program Dnamlk, when run on Linux or Windows systems, sometimes gave
	  negative branch lengths for some branches on the tree. This is bad.
	  Although we at first thought that this was a compiler bug, it seems to
	  be a lack of initialization of some pointers. Program Promlk may have
	  the same problem, as they share code. If you have this problem you can
	  work around it by not using the Global menu option when running Dnamlk
	  (or Promlk). If you need more extensive tree search the J (Jumble)
	  option may be your best bet.
        * On Windows (at least, on Windows xp), our executables for version 3.65
	  produce output files (outfile) and output tree files (outtree) that
	  have end-of-line characters that result in their being hard to read on
	  the Notepad editor. They appear as one big line. If you use the
	  Wordpad editor, or Microsoft Word itself, the files will be readable.
	  This is and end-of-line compiler setting we got wrong when compiling
	  the programs.
        * Programs Dnaml and Proml sometimes failed to iterate branch lengths in
	  trees enough -- this can result in them failing to find as good a tree
	  as the molecular clock versions Dnamlk and Promlk, a phenomenon that
	  is not supposed to occur. The problem results from the iteration code
	  in function makenewv giving up too easily when branch lengths are very
	  short. The resulting branches get "stuck" at length 0 when they should
	  not. If you can recompile the programs, the problem can be solved by
	  the following changes:
              o In file phylip.h change the value of the constant iterations to
		8 instead of 4.
              o In files dnaml.c and proml.c, change function makenewv to
		replace

                   done = fabs(y-yold) < epsilon;

                by

                   done = fabs(y-yold) < 0.1*epsilon;

              o In dnaml.c, in function makenewv, also replace*

                     if (yold < epsilon)
                        yold = epsilon;

                by

                     if (y < epsilon)
                        y = epsilon;

          We think these fix the problem. Some more thorough fixes are
	  implemented in the 3.66 code.
        * The Mac OS X archives (in .dmg form) appeared at first sight not to
	  have any executables directory in the package. This is owing to
	  strange placement of icons once we package the files. The OS X
	  executables are there -- their folder is just way down the window. Use
	  the scroll bar to look for them. You should be able to use the
	  View/Rearrange menus to make the folder icons appear in a more
	  reasonable place. (Or this can be done once all of the contents of the
	  .dmg archive are copied out to another folder).
        * Programs Dnaml and Proml (but not Dnamlk or Promlk), from version 3.64
	  on, crashed if the Categories (C) option is used, even if all
	  categories are given the same rate of change. This unpleasant behavior
	  does not occur if the menu option for "Speedier but rougher analysis"
	  is changed to "No, not rough". That slows down the run but allows it
	  to succeed.

          The fix turns out to be that all instances in dnaml.c of calls to
	  function copynode (or all instances in proml.c of calls to
	  prot_copynode) that involve an argument lrsaves should have the third
	  argument be rcategs instead of categs.
        * In Seqboot, when menu item J is set to Permute species within
	  characters it is impossible to change menu item W (character weights).
	  This is a glitch in the menuing code. If you can change the source
	  code and recompile, change at line 215 of seqboot.c:

                  ((permute || ild || lockhart)
                    && (strchr("ACDEFSJPRXNI%1.20",ch) != NULL)) ||
          to be:
                  (permute && (strchr("ACDEFSJPRWXNI%1.20",ch) \ 
!= NULL)) ||
                  ((ild || lockhart) && \ 
(strchr("ACDEFSJPRXNI%1.20",ch) != NULL)) ||

          If you are stuck with our executables and need this feature, you can
	  also work around it in the following devious way:
             1. Set menu item J to some other setting where menu item W appears
		in the menu, such as Bootstrap,
             2. Change menu item W
             3. Then change item J to Permute species within characters
             4. Our Makefile for Unix had some problem finding some of the
		X-windows libraries on Mac OS X systems on Intel Macs. This
		prevented the compilation of Drawtree and Drawgram. You might
		have had to use those two programs by using their PowerMac Mac
		OS X executables. All the other programs did compile and run
		correctly on Intel Macs.

version 3.65 (August, 2005)

        * Protpars sometimes gave the result "0 trees found" or else simply
	  hung and did not complete its run. This was a bug. The program should
	  always get at least one tree -- if it does not, that is a bug and not
	  a judgement on your data, provided the data file is in our format!
        * Proml and Restml, and maybe some others, seg-faulted when run on
	  enough multiple data sets, as in bootstrapping. If you have a version
	  that has this problem and can recompile the programs, here is a fix
	  for Proml and Restml. In function "inputdata", replace the lines

            makeweights();
            if ( firstset ) alloclrsaves();
            else resetlrsaves();

          by

            if ( !firstset ) freelrsaves();
            makeweights();
            alloclrsaves();

          and you can also eliminate the now-unnecessary function \ 
"restlrsaves".
	  (Thanks to Jacques Rougemont for this).

version 3.64 (July, 2005)

        * Treedist had trouble on Windows systems reading trees. This was due to
	  problems with the ftell command on CygWin. It has been fixed by having
	  the files read as binary files.
        * Trees with branch lengths compared using Treedist may have incorrect
	  distances when evaluated as unrooted trees, owing to miscalculation of
	  branch lengths for the bottommost branches.
        * Runs of Seqboot on Mac OS X systems with gene frequencies data have
	  showed incorrect results -- wrong numbers of loci sampled, for
	  example. This is due to bad code generated by the Metrowerks
	  Codewarrior compiler when set to higher levels of optimization (our
	  source code is OK). We will recompile the program at a lower level of
	  optimization in the next bug-fixing release. If you can follow our
	  compiling instructions and have this compiler, you can produce a
	  correctly working executable. Alternatively you can use the gcc
	  compiler and use our Unix Makefile to recompile this program (by
	  typing "make seqboot"). This is quite easy to do and all Mac OS X
	  releases have the gcc compiler in them -- it only needs to be
	  installed.
        * In runs of Proml, Dnaml or Restml with user trees, if one puts in a
	  user tree with an internal multifurcation and asks the program to re-
	  estimate the branch lengths for that tree, the branch lengths in only
	  two of the furcs will be re-estimated if they already have branch
	  lengths. This is due to a bug in the function "initrav" causing it to
	  fail to enter one or more of the subtrees. A workaround until the next
	  release is as follows: Use Retree to remove all branch lengths on the
	  tree. The tree's branch lengths will then all be re-estimated when it
	  is used as a user tree.
        * The example output in the Treedist documentation gives distances
	  computed by version 3.62 or earlier, in which the tree distance is not
	  square-rooted.

version 3.63 (December, 2004)

        * The DNA and protein likelihood programs could have problems with
	  underflow if very large numbers of sequences were analyzed. Underflow
	  protection code was needed to make this much less likely to happen.
        * A number of programs had the problem that when M (multiple data set)
	  runs are done, if the data sets differ in the number of characters
	  from data set to data set, they only allocate enough memory for the
	  first data set, and then can crash on subsequent, larger, data sets.
	  For bootstrap and permutation runs this should not be a problem, but
	  for jackknife runs it might be. One work-around until we fixed this
	  was to move the data set with the most characters to the front, so
	  that enough space is allocated. The programs we think had this problem
	  are: Clique, Dnacomp, Proml, Promlk, Protdist, Dollop, Gendist, Pars,
	  Restml, and Restdist.
        * When the Branch Score distances are computed in program Treedist, the
	  sum of squares of differences between branches was not square-rooted,
	  as the documentation web page says it is.
        * Fitch and Contml may die when asked to do Jumbling, in some cases.
        * Dnaml had inconsistencies in results when branch lengths of a user
	  tree were estimated, and when the same numbers were provided in the
	  user tree.
        * Trees fed into Contrast could cause trouble if they contained
	  unifurcations (forks with only one descendant). The program did not
	  complain about this, as it should have.
        * End-of-line characters in input files in certain cases caused trouble
	  in Mac OS X (for example when the files came over from Windows).
        * When printing a rooted tree out in Kitsch, the root was not placed
	  intermediate between its two decsendants.
        * The variable numtrees was sometimes used when still uninitialized in
	  Pars.
        * Restdist had a site-aliasing bookkeeping bug that could lead to
	  incorrect results.
        * Restml would not allow site lengths greater than 8, because an array
	  was of fixed size when it should have been dynamically allocated.
        * The variable name howmany conflicts with predefined names in some
	  older Sun compilers. It will henceforth be deliberately misspelled to
	  avoid this.
        * With larger data sets being analyzed, Proml, Promlk, Dnaml, and
	  Dnamlk have had to have underflow protection code installed, as
	  likelihoods were getting too small.
        * Treedist was giving wrong answers when asked to compute all distances
	  between trees in two files that had unequal numbers of trees. This
	  was a bookkeeping error.
        * The variable scanned was uninitialized in the Drawtree and Drawgram
	  programs, which could sometimes cause problems.
        * The lack of initialization of a variable, delta in Dnadist meant that
	  different results could be obtained from interactive runs than were
	  obtained in runs under the control of a command file.
        * Dnadist was sometimes stopping when encountering sequences that had
	  an infinite or indeterminate distance (i.e. when the sequences were
	  too different or when they had no sites in common), when it should
	  have printed out "-1" and continued. When it was supposed to print
	  "-1" in some recent versions of PHYLIP it printed \ 
"1.0000" instead.

version 3.62 (September, 2004)

        * The ftp link used by our "Get Me PHYLIP" page to fetch the \ 
version
	  3.62 Linux gzip'ed sources and documentation archive was incorrect
	  until recently (I hadn't updated it to fetch version 3.62). If you had
	  trouble fetching this archive in version 3.62, please try one more
	  time. It will work now.
        * A number of people have found, with Fitch and with Contml, that
	  version 3.61 crashes on multiple Jumbling (option J) or on bootstrap
	  runs. This is fairly serious. It does not happen with versions of
	  these programs earlier than 3.6 (such as 3.6a3 or 3.573c). This
	  release fixes these problems.

Files:
RevisionActionfile
1.23modifypkgsrc/biology/phylip/Makefile
1.7modifypkgsrc/biology/phylip/PLIST
1.6modifypkgsrc/biology/phylip/distinfo
1.3modifypkgsrc/biology/phylip/patches/patch-aa