Generate Binary Tree From String

Posted on Tue 27 October 2015 in Python

Recently while implementing the Small Parsimony Problem I had the need to generate a binary tree from a string in Python.

The pseudo-code in the question implicitly assumes you have some functionality that will generate a DNA sequence like CAAATCCC into a binary tree, then run SmallParsimony on it. I couldn't find that functionality anywhere, so I decided to implement something simple in networkx. Hopefully it's useful.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Shane Dowling, 27 Oct 2015
To run simply call generate_leaf_graph and pass in s string.
"""

def count_predecessors(T):
    pred = 0
    for v in T.nodes():
        if len(T.predecessors(v)) == 0:
            pred += 1
    return pred


def generate_leaf_graph(s):
    """
    Will generate a leaf graph and return it along
    with the Character dictionary representing the characters
    at each leaf node position
    """
    T = nx.DiGraph()
    Character = {}
    i = 0
    parI = len(s)
    for (son, daughter) in zip(s[0::2], s[1::2]):
        Character[i] = son
        Character[i+1] = daughter
        T.add_edge(parI, i)
        T.add_edge(parI, i+1)
        i += 2
        parI += 1

    while count_predecessors(T) > 1:
        for v in T.nodes():
            if len(T.predecessors(v)) == 0:
                T.add_edge(parI, v)
                T.add_edge(parI, v+1)
                parI += 1
    return [T, Character]