I am trying to create a function called "common_ancestor()" that takes two inputs: the first a list of string taxa names, and the second a phylogenetic tree dictionary. It should return a string giving the name of the taxon that is the closest common ancestor of all the
species in the input list. Already made a separate function called "list_ancestors" that gives me the general ancestors of the elements in the list. Also, have a dictionary I am working with.
tax_dict = {
'Pan troglodytes': 'Hominoidea', 'Pongo abelii': 'Hominoidea',
'Hominoidea': 'Simiiformes', 'Simiiformes': 'Haplorrhini',
'Tarsius tarsier': 'Tarsiiformes', 'Haplorrhini': 'Primates',
'Tarsiiformes': 'Haplorrhini', 'Loris tardigradus':'Lorisidae',
'Lorisidae': 'Strepsirrhini', 'Strepsirrhini': 'Primates',
'Allocebus trichotis': 'Lemuriformes', 'Lemuriformes': 'Strepsirrhini',
'Galago alleni': 'Lorisiformes', 'Lorisiformes': 'Strepsirrhini',
'Galago moholi': 'Lorisiformes'
}
def halfroot(tree):
taxon = random.choice(list(tree))
result = [taxon]
for i in range(0,len(tree)):
result.append(tree.get(taxon))
taxon = tree.get(taxon)
return result
def root(tree):
rootlist = halfroot(tree)
rootlist2 = rootlist[::-1]
newlist = []
for e in range(0,len(rootlist)):
if rootlist2[e] != None:
newlist.append(rootlist2[e])
return newlist[0]
def list_ancestors(taxon, tree):
result = [taxon]
while taxon != root(tree):
result.append(tree.get(taxon))
taxon = tree.get(taxon)
return result
def common_ancestors(inputlist,tree)
biglist1 = []
for i in range(0,len(listname)):
biglist1.append(list_ancestors(listname[i],tree))
"continue so that I get three separate lists where i can cross reference all elements from the first list to every other list to find a common ancestor "
the result should look something like
print(common_ancestor([’Hominoidea’, ’Pan troglodytes’,’Lorisiformes’], tax_dict)
Output: ’Primates’"
One way would be to collect the all ancestors for each species, place them in a set and then get an intersection to get what they have in common:
def common_ancestor(species_list, tree):
result = None # initiate a `None` result
for species in species_list: # loop through each species in the species_list
ancestors = {species} # initiate the ancestors set with the species itself
while True: # rinse & repeat until there are leaves in the ancestral tree
try:
species = tree[species] # get the species' ancestor
ancestors.add(species) # store it in the ancestors set
except KeyError:
break
# initiate the result or intersect it with ancestors from the previous species
result = ancestors if result is None else result & ancestors
# finally, return the ancestor if there is only one in the result, or None
return result.pop() if result and len(result) == 1 else None
print(common_ancestor(["Hominoidea", "Pan troglodytes", "Lorisiformes"], tax_dict))
# Primates
You can use the 'middle' part of this function for the list_ancestors(), too - there is no need to complicate it by trying to find the tree's root:
def list_ancestors(species, tree, include_self=True):
ancestors = [species] if include_self else []
while True:
try:
species = tree[species]
ancestors.append(species)
except KeyError:
break
return ancestors
Of course, both rely on a valid ancestral tree dictionary - if some of the ancestors were to recurse on themselves or if there is a breakage in the chain it won't work. Also, if you were to do a lot of these operations it might be worth to turn your flat dictionary into a proper tree.
Related
I am struggling to figure how to make dependency b/w values in python list.
Basically I have a list like below. from the below list ,i can pass a input value as TABLE_VIEW
based on the input value , i want to generate dependency list in a order.
INPUT
1.EXP_TABLE_NAME_STG,TARGET_TABLE
2.SQ_TABLE_NAME,EXP_TABLE_NAME_STG
3.TABLE_VIEW,SQ_TABLE_NAME
4.EXP_TABLE_NAME_STG,LKP_NEW_TABLE_3
5.SQ_TABLE_NAME,LKP_NEW_TABLE_1
6.EXP_TABLE_NAME_STG,LKP_NEW_TABLE_2
7.LKP_NEW_TABLE_1,TARGET_TABLE
For example 3rd one value is TABLE_VIEW,SQ_TABLE_NAME, so here based on 2nd value i.e SQ_TABLE_NAME I want to find out next dependency so in this case
SQ_TABLE_NAME,EXP_TABLE_NAME_STG
SQ_TABLE_NAME,LKP_NEW_TABLE_1
again from above two , take the 2nd value and again make dependency.
EXP_TABLE_NAME_STG,LKP_NEW_TABLE_3
EXP_TABLE_NAME_STG,LKP_NEW_TABLE_2
EXP_TABLE_NAME_STG,TARGET_TABLE
LKP_NEW_TABLE_1,TARGET_TABLE
I may have up to 50 list like this, but wanted to put them in dependency order based on 2nd value.
OUTPUT:
1.TABLE_VIEW,SQ_TABLE_NAME
2.SQ_TABLE_NAME,EXP_TABLE_NAME_STG
3.SQ_TABLE_NAME,LKP_NEW_TABLE_1
4.EXP_TABLE_NAME_STG,LKP_NEW_TABLE_3
5.EXP_TABLE_NAME_STG,LKP_NEW_TABLE_2
6.EXP_TABLE_NAME_STG,TARGET_TABLE
7.LKP_NEW_TABLE_1,TARGET_TABLE
I have tried writing static query by taking multiple list variable and deleted already processed one original list, but I many never know when all values ends up. Can you please share some thoughts how to implement this dynamically?
sq_order_dependency=[]
for sq_dep in job_dependent_details:
if 'SQ' in sq_dep.split(',')[0] :
sq_order_dependency.append(sq_dep)
job_dependent_details.remove(sq_dep)
sq_order_dependency1=[]
for sq_depenent_order in sq_order_dependency:
next_dependency=sq_depenent_order.split(',')[1]
#print(next_dependency)
for job_dependent_details_list in job_dependent_details:
if next_dependency in job_dependent_details_list.split(','[0]):
#print(job_dependent_details_list)
sq_order_dependency.append(job_dependent_details_list)
for i in sq_order_dependency:
job_dependent_details.remove(i)
I would follow a different approach, i.e. build a tree-like structure of dependency pairs, and then print out the tree.
In the following code I defined a simple Dep class and chose a depth first traversal for showing the tree, both for readability; and since we meet the dependencies in an unspecified order, I used a helper dictionary. Oh, and I abbreviated the table names out of laziness :)
class Dep():
def __init__(self, name, children = None):
self.name = name
if children:
self.children = [children]
else:
self.children = []
def add_child(self, child):
self.children.append(child)
def show(self, level=0):
for c in self.children:
print ('\t'*level, self.name, c.name)
c.show(level+1)
def show_dependencies(deps):
out = {}
root = deps[0][0]
for d in y:
pname, cname = d
if cname in out:
c = out[cname]
else:
c = Dep(cname)
out[cname] = c
if pname in out:
out[pname].add_child(c)
else:
out[pname] = Dep(pname, c)
if root == cname:
root = pname
out[root].show()
>>> show_dependencies([('EXP','TARGET'),('SQ','EXP'),('TABLE','SQ'),('EXP','LKP3'),('SQ','LKP1'),('EXP','LKP2'),('LKP1','TARGET')])
TABLE SQ
SQ EXP
EXP TARGET
EXP LKP3
EXP LKP2
SQ LKP1
LKP1 TARGET
well according to your examples and data there should be no duplicates so it should work i think.
main function is based on recursive call (something like dfs)
this works only for directed edges and without self-node edge.
from collections import defaultdict
a=['EXP_TABLE_NAME_STG, TARGET_TABLE'
,'SQ_TABLE_NAME, EXP_TABLE_NAME_STG'
,'TABLE_VIEW, SQ_TABLE_NAME'
,'EXP_TABLE_NAME_STG, LKP_NEW_TABLE_3'
,'SQ_TABLE_NAME, LKP_NEW_TABLE_1'
,'EXP_TABLE_NAME_STG, LKP_NEW_TABLE_2'
,'LKP_NEW_TABLE_1, TARGET_TABLE']
a = [i.replace(" ",'') for i in a]
sql_data = [tuple(i.split(',')) for i in a]
class CustomGraphDependency:
#desired for question
def __init__(self,data:list):
self.graph = defaultdict(set) # no self edge
self.add_dependency(data)
self.count = 1
def add_dependency(self,data:list):
for node1,node2 in data: #directed edge only
self.graph[node1].add(node2)
def dependency_finder_with_count(self,node: str, sq_order_dependency: list, flag: list):
# (e.x 1.TABLE_VIEW,SQ_TABLE_NAME )
flag.append(node)
for item in self.graph[node]:
sq_order_dependency.append((self.count,node,item))
if item not in flag:
self.count+=1
self.dependency_finder_with_count(item, sq_order_dependency, flag)
return sorted(sq_order_dependency,key=lambda x: x[0])
obj_test = CustomGraphDependency(sql_data).dependency_finder_with_count('TABLE_VIEW', [], [])
for i in obj_test:
print(i)
'''
(1, 'TABLE_VIEW', 'SQ_TABLE_NAME')
(2, 'SQ_TABLE_NAME', 'EXP_TABLE_NAME_STG')
(3, 'EXP_TABLE_NAME_STG', 'LKP_NEW_TABLE_3')
(4, 'EXP_TABLE_NAME_STG', 'TARGET_TABLE')
(5, 'EXP_TABLE_NAME_STG', 'LKP_NEW_TABLE_2')
(6, 'SQ_TABLE_NAME', 'LKP_NEW_TABLE_1')
(7, 'LKP_NEW_TABLE_1', 'TARGET_TABLE')
'''
so I am writing a code to find the height of a binary search tree. My first thoughts were to traverse through both the right and left subtrees, append the values of the nodes in each subtree to a list, and then get the length of the list, and the longer list would be the height of the tree. Here is my code:
def getHeight(self,root):
lstr = []
lstl = []
if root.right is not None:
lstr.append(root.right.data)
if root.right.right is not None:
lstr.append(root.right.right.data)
if not root.left == None:
lstl.append(root.left.data)
if root.left.left is not None:
lstr.append(root.left.left.data)
return lstr
return lstl
However, is it possible to use a for loop to keep iterating through the .right.right.right and just continue using the .right attribute in the if statement until the for loop ends?
Given a basic class Item:
class Item(object):
def __init__(self, val):
self.val = val
a list of objects of this class (the number of items can be much larger):
items = [ Item(0), Item(11), Item(25), Item(16), Item(31) ]
and a function compute that process and return a value.
How to find two items of this list for which the function compute return the same value when using the attribute val? If nothing is found, an exception should be raised. If there are more than two items that match, simple return any two of them.
For example, let's define compute:
def compute( x ):
return x % 10
The excepted pair would be: (Item(11), Item(31)).
You can check the length of the set of resulting values:
class Item(object):
def __init__(self, val):
self.val = val
def __repr__(self):
return f'Item({self.val})'
def compute(x):
return x%10
items = [ Item(0), Item(11), Item(25), Item(16), Item(31)]
c = list(map(lambda x:compute(x.val), items))
if len(set(c)) == len(c): #no two or more equal values exist in the list
raise Exception("All elements have unique computational results")
To find values with similar computational results, a dictionary can be used:
from collections import Counter
new_d = {i:compute(i.val) for i in items}
d = Counter(new_d.values())
multiple = [a for a, b in new_d.items() if d[b] > 1]
Output:
[Item(11), Item(31)]
A slightly more efficient way to find if multiple objects of the same computational value exist is to use any, requiring a single pass over the Counter object, whereas using a set with len requires several iterations:
if all(b == 1 for b in d.values()):
raise Exception("All elements have unique computational results")
Assuming the values returned by compute are hashable (e.g., float values), you can use a dict to store results.
And you don't need to do anything fancy, like a multidict storing all items that produce a result. As soon as you see a duplicate, you're done. Besides being simpler, this also means we short-circuit the search as soon as we find a match, without even calling compute on the rest of the elements.
def find_pair(items, compute):
results = {}
for item in items:
result = compute(item.val)
if result in results:
return results[result], item
results[result] = item
raise ValueError('No pair of items')
A dictionary val_to_it that contains Items keyed by computed val can be used:
val_to_it = {}
for it in items:
computed_val = compute(it.val)
# Check if an Item in val_to_it has the same computed val
dict_it = val_to_it.get(computed_val)
if dict_it is None:
# If not, add it to val_to_it so it can be referred to
val_to_it[computed_val] = it
else:
# We found the two elements!
res = [dict_it, it]
break
else:
raise Exception( "Can't find two items" )
The for block can be rewrite to handle n number of elements:
for it in items:
computed_val = compute(it.val)
dict_lit = val_to_it.get(computed_val)
if dict_lit is None:
val_to_it[computed_val] = [it]
else:
dict_lit.append(it)
# Check if we have the expected number of elements
if len(dict_lit) == n:
# Found n elements!
res = dict_lit
break
It's a question from leetcode as linked here.
https://leetcode.com/problems/binary-tree-level-order-traversal/
So here is my code, but I really don't know what the input is and its relationship to TreeNode defined below.
Definition for a binary tree node.
# class TreeNode:
# def __init__(self, x):
# self.val = x
# self.left = None
# self.right = None
class Solution:
# #param {TreeNode} root
# #return {integer[][]}
def levelOrder(self, root):
if not root:
return []
else:
printout, CurrLevel = [], [root]
while CurrLevel:
printout.extend(CurrLevel)
NextLevel = []
for item in CurrLevel:
temp = []
if item.left != None:
temp.append(item.left)
if item.right != None:
temp.append(item.right)
NextLevel = NextLevel.extend(temp)
CurrLevel = NextLevel
return printout
Input: [1,2]
Output: [[1,2]]
Expected: [[1],[2]]
So I met this finally, so what's the treenode anyway? Why does it appear like a list?
Well this is a OJ problem and I am just so unsure about the testing case, as I can't understand how the treenode presented here. As described in the problem, it seems there is only one "root" element is input each time.
I am confused because it seems that for each test case (each problem in leetcode contains tens or hundreds of test cases to test the validity of the program), only one TreeNode (named root) is inputted, if so, I could happily reference its .val, its .left or its .right, therefore a tree could be traversed up to down from the "root" element. BUT through the test case I see here (once the wrong solution is produced by the program, it produced a report, that's why I show the "input" [1, 2] presented here), I see a list, or at least list alike, which contains two elements. What I expected is only one TreeNode called root inputted here, but now I see a list. Clearly I know I am supposed to printout a list of lists, which differs to the confusion I raised up here.
So in [1, 2], I can't understand what 1 and 2 represent respectively, an integer? A treenode? Actually I think [1, 2] itself is a treenode where 1 and 2 stands for its .val, .left.
Importantly, my code is based on the assumption that each input is one node, and leave the [1,2,3](may stands for .val, .left, .right of the TreeNode) whatever input behind. I later found I did do something wrong and revised my code as follows:
class Solution:
# #param {TreeNode} root
# #return {integer[][]}
def levelOrder(self, root):
if not root:
return []
else:
printout, CurrLevel = [], [root]
while CurrLevel:
values = []
for node in CurrLevel:
values = values.append(node.val)
printout.append(values)
NextLevel = []
for item in CurrLevel:
temp = []
if item.left != None:
temp.append(item.left)
if item.right != None:
temp.append(item.right)
NextLevel = NextLevel.extend(temp)
CurrLevel = NextLevel
return printout
This time it seems the following bug occurs:
Input: [1]
Output: [null]
Expected: [[1]]
Well I have checked my code, and found that node.val produces 1 when Input:[1], while values produces [null]! That's so weird!
I've made custom class for nodes
class NodeTree(object):
def __init__(self, name = None, children = None):
self.name = name
self.children = children
and defined a function that make a tree(a node containing its children nodes)
def create_tree(d):
x = NodeTree()
for a in d.keys():
if type(d[a]) == str:
x.name = d[a]
if type(d[a]) == list:
if d[a] != []:
for b in d[a]:
x.add_child(create_tree(b))
return x
The input is a dict with one argument for the node name and a list with its children in the same form as the parent.
The function work fine and I've made method that prove it but I can't find a way to traverse it right and get the height of the tree. I don't know if "height" it's the right term cause I know it may be ambivalent, I need to count the node as a measure unit, like this:
parent
|
|
---------
| |
child child
The height of this tree is 2, I've tried everything, from counters to tag in the class, everything seems to degenerate an I never get the right height.
How should I approach that?
To create a recursive height method for your tree that determines the height of the node (that is, the maximum number of nodes in a path from that node to a leaf):
def height(self):
if not self.children: # base case
return 1
else: # recursive case
return 1 + max(child.height() for child in self.children)
Other tree traversals can also be done recursively. For example, here's a generator method that yields the names of the trees nodes in "pre-order" (that is, with each parent preceding its children and decedents):
def preorder(self):
yield self.name
for child in self.children:
yield from child.preorder() # Python 3.3 only!
The yield from syntax in that loop is new in Python 3.3. You can get the same results in earlier versions with this:
for descendent in child.preorder():
yield descendent