Recently while implementing the Small Parsimony Problem I had the need to generate a binary tree from a string in Python.

The pseudo-code in the question implicitly assumes you have some functionality that will generate a DNA sequence like CAAATCCC into a binary tree, then run SmallParsimony on it. I couldn’t find that functionality anywhere, so I decided to implement something simple in networkx. Hopefully it’s useful.

#!/usr/bin/env python
 """
 Shane Dowling, 27 Oct 2015
 To run simply call generate_leaf_graph and pass in s string.
 """
def count_predecessors(T):
 pred = 0
 for v in T.nodes():
 pred += 1
 return pred

def generate_leaf_graph(s):
 """
 Will generate a leaf graph and return it along
 with the Character dictionary representing the characters
 at each leaf node position
 """
 T = nx.DiGraph()
 Character = {}
 i = 0
 parI = len(s)
 for (son, daughter) in zip(s[0::2], s[1::2]):
   Character[i] = son
   Character[i+1] = daughter
   T.add_edge(parI, i)
   T.add_edge(parI, i+1)
   i += 2
   parI += 1

while count_predecessors(T) > 1:
  for v in T.nodes():
   T.add_edge(parI, v)
   T.add_edge(parI, v+1)
   parI += 1
 return [T, Character]