I need to find all projects and shared projects within a Gitlab group with subgroups. I managed to list the names of all projects like this:
group = gl.groups.get(11111, lazy=True)
# find all projects, also in subgroups
projects=group.projects.list(include_subgroups=True, all=True)
for prj in projects:
print(prj.attributes['name'])
print("")
What I am missing is to list also the shared projects within the group. Or maybe to put this in other words: find out all projects where my group is a member. Is this possible with the Python API?
So, inspired by the answer of sytech, I found out that it was not working in the first place, as the shared projects were still hidden in the subgroups. So I came up with the following code that digs through all various levels of subgroups to find all shared projects. I assume this can be written way more elegant, but it works for me:
# group definition
main_group_id = 11111
# create empty list that will contain final result
list_subgroups_id_all = []
# create empty list that act as temporal storage of the results outside the function
list_subgroups_id_stored = []
# function to create a list of subgroups of a group (id)
def find_subgroups(group_id):
# retrieve group object
group = gl.groups.get(group_id)
# create empty lists to store id of subgroups
list_subgroups_id = []
#iterate through group to find id of all subgroups
for sub in group.subgroups.list():
list_subgroups_id.append(sub.id)
return(list_subgroups_id)
# function to iterate over the various groups for subgroup detection
def iterate_subgroups(group_id, list_subgroups_id_all):
# for a given id, find existing subgroups (id) and store them in a list
list_subgroups_id = find_subgroups(group_id)
# add the found items to the list storage variable, so that the results are not overwritten
list_subgroups_id_stored.append(list_subgroups_id)
# for each found subgroup_id, test if it is already part of the total id list
# if not, keep store it and test for more subgroups
for test_id in list_subgroups_id:
if test_id not in list_subgroups_id_all:
# add it to total subgroup id list (final results list)
list_subgroups_id_all.append(test_id)
# check whether test_id contains more subgroups
list_subgroups_id_tmp = iterate_subgroups(test_id, list_subgroups_id_all)
#if so, append to stored subgroup list that is currently checked
list_subgroups_id_stored.append(list_subgroups_id_tmp)
return(list_subgroups_id_all)
# find all subgroup and subsubgroups, etc... store ids in list
list_subgroups_id_all = iterate_subgroups(main_group_id , list_subgroups_id_all)
print("***ids of all subgroups***")
print(list_subgroups_id_all)
print("")
print("***names of all subgroups***")
list_names = []
for ids in list_subgroups_id_all:
group = gl.groups.get(ids)
group_name = group.attributes['name']
list_names.append(group_name)
print(list_names)
#print(list_subgroups_name_all)
print("")
# print all directly integrated projects of the main group, also those in subgroups
print("***integrated projects***")
group = gl.groups.get(main_group_id)
projects=group.projects.list(include_subgroups=True, all=True)
for prj in projects:
print(prj.attributes['name'])
print("")
# print all shared projects
print("***shared projects***")
for sub in list_subgroups_id_all:
group = gl.groups.get(sub)
for shared_prj in group.shared_projects:
print(shared_prj['path_with_namespace'])
print("")
One question that remains - at the very beginning I retrieve the main group by its id (here: 11111), but can I actually also get this id by looking for the name of the group? Something like: group_id = gl.group.get(attribute={'name','foo'}) (not working)?
You can get the shared projects by the .shared_projects attribute:
group = gl.groups.get(11111)
for proj in group.shared_projects:
print(proj['path_with_namespace'])
However, you cannot use the lazy=True argument to gl.groups.get.
>>> group = gl.groups.get(11111, lazy=True)
>>> group.shared_projects
AttributeError: shared_projects
Related
I have generated a list of genes
genes = ['geneName1', 'geneName2', ...]
and a set of their interactions:
geneInt = {('geneName1', 'geneName2'), ('geneName1', 'geneName3'),...}
I want to find out how many interactions each gene has and put that in a vector (or dictionary) but I struggle to count them. I tried the usual approach:
interactionList = []
for gene in genes:
interactions = geneInt.count(gene)
interactionList.append(ineractions)
but of course the code fails because my set contains elements that are made out of two values while I need to iterate over the single values within.
I would argue that you are using the wrong data structure to hold interactions. You can represent interactions as a dictionary keyed by gene name, whose values are a set of all the genes it interacts with.
Let's say you currently have a process that does something like this at some point:
geneInt = set()
...
geneInt.add((gene1, gene2))
Change it to
geneInt = collections.defaultdict(set)
...
geneInt[gene1].add(gene2)
If the interactions are symmetrical, add a line
geneInt[gene2].add(gene1)
Now, to count the number of interactions, you can do something like
intCounts = {gene: len(ints) for gene, ints in geneInt.items()}
Counting your original list is simple if the interactions are one-way as well:
intCounts = dict.fromkeys(genes, 0)
for gene, _ in geneInt:
intCounts[gene] += 1
If each interaction is two-way, there are three possibilities:
Both interactions are represented in the set: the above loop will work.
Only one interaction of a pair is represented: change the loop to
for gene1, gene2 in geneInt:
intCounts[gene1] += 1
if gene1 != gene2:
intCounts[gene2] += 1
Some reverse interactions are represented, some are not. In this case, transform geneInt into a dictionary of sets as shown in the beginning.
Try something like this,
interactions = {}
for gene in genes:
interactions_count = 0
for tup in geneInt:
interactions_count += tup.count(gene)
interactions[gene] = interactions_count
Use a dictionary, and keep incrementing the value for every gene you see in each tuple in the set geneInt.
interactions_counter = dict()
for interaction in geneInt:
for gene in interaction:
interactions_counter[gene] = interactions_counter.get(gene, 0) + 1
The dict.get(key, default) method returns the value at the given key, or the specified default if the key doesn't exist. (More info)
For the set geneInt={('geneName1', 'geneName2'), ('geneName1', 'geneName3')}, we get:
interactions_counter = {'geneName1': 2, 'geneName2': 1, 'geneName3': 1}
I would like to do a FVA only for selected reactions, in my case on transport reactions between compartments (e.g. between the cytosol and mitochondrion). I know that I can use selected_reactions in doFVA like this:
import cbmpy as cbm
mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')
cbm.doFVA(mod, selected_reactions=['R_FORtm', 'R_CO2tm'])
Is there a way to get the entire list of transport reactions, not only the two I manually added? I thought about
selecting the reactions based on their ending tm but that fails for 'R_ORNt3m' (and probably other reactions, too).
I want to share this model with others. What is the best way of storing the information in the SBML file?
Currently, I would store the information in the reaction annotation as in
this answer. For example
mod.getReaction('R_FORtm').setAnnotation('FVA', 'yes')
which could be parsed.
There is no built-in function for this kind of task. As you already mentioned, relying on the IDs is generally not a good idea as those can differ between different databases, models and groups (e.g. if someone decided just to enumerate reactions from r1 till rn and or metabolites from m1 till mm, filtering based on IDs fails). Instead, one can make use of the compartment field of the species. In CBMPy you can access a species' compartment by doing
import cbmpy as cbm
import pandas as pd
mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')
mod.getSpecies('M_atp_c').getCompartmentId()
# will return 'c'
# run a FBA
cbm.doFBA(mod)
This can be used to find all fluxes between compartments as one can check for each reaction in which compartment their reagents are located. A possible implementation could look as follows:
def get_fluxes_associated_with_compartments(model_object, compartments, return_values=True):
# check whether provided compartment IDs are valid
if not isinstance(compartments, (list, set) or not set(compartments).issubset(model_object.getCompartmentIds())):
raise ValueError("Please provide valid compartment IDs as a list!")
else:
compartments = set(compartments)
# all reactions in the model
model_reactions = model_object.getReactionIds()
# check whether provided compartments are identical with the ones of the reagents of a reaction
return_reaction_ids = [ri for ri in model_reactions if compartments == set(si.getCompartmentId() for si in
model_object.getReaction(ri).getSpeciesObj())]
# return reaction along with its value
if return_values:
return {ri: model_object.getReaction(ri).getValue() for ri in return_reaction_ids}
# return only a list with reaction IDs
return return_reaction_ids
So you pass your model object and a list of compartments and then for each reaction it is checked whether there is at least one reagent located in the specified compartments.
In your case you would use it as follows:
# compartment IDs for mitochondria and cytosol
comps = ['c', 'm']
# you only want the reaction IDs; remove the ', return_values=False' part if you also want the corresponding values
trans_cyt_mit = get_fluxes_associated_with_compartments(mod, ['c', 'm'], return_values=False)
The list trans_cyt_mit will then contain all desired reaction IDs (also the two you specified in your question) which you can then pass to the doFVA function.
About the second part of your question. I highly recommend to store those reactions in a group rather than using annotation:
# create an empty group
mod.createGroup('group_trans_cyt_mit')
# get the group object so that we can manipulate it
cyt_mit = mod.getGroup('group_trans_cyt_mit')
# we can only add objects to a group so we get the reaction object for each transport reaction
reaction_objects = [mod.getReaction(ri) for ri in trans_cyt_mit]
# add all the reaction objects to the group
cyt_mit.addMember(reaction_objects)
When you now export the model, e.g. by using
cbm.CBWrite.writeSBML3FBCV2(mod, 'iMM904_with_groups.xml')
this group will be stored as well in SBML. If a colleague reads the SBML again, he/she can then easily run a FVA for the same reactions by accessing the group members which is far easier than parsing the annotation:
# do an FVA; fva_res: Reaction, Reduced Costs, Variability Min, Variability Max, abs(Max-Min), MinStatus, MaxStatus
fva_res, rea_names = cbm.doFVA(mod, selected_reactions=mod.getGroup('group_trans_cyt_mit').getMemberIDs())
fva_dict = dict(zip(rea_names, fva_res.tolist()))
# store results in a dataframe which makes the selection of reactions easier
fva_df = pd.DataFrame.from_dict(fva_dict, orient='index')
fva_df = fva_df.rename({0: "flux_value", 1: "reduced_cost_unscaled", 2: "variability_min", 3: "variability_max",
4: "abs_diff_var", 5: "min_status", 6: "max_status"}, axis='columns')
Now you can easily query the dataframe and find the flexible and not flexible reactions within your group:
# filter the reactions with flexibility
fva_flex = fva_df.query("abs_diff_var > 10 ** (-4)")
# filter the reactions that are not flexible
fva_not_flex = fva_df.query("abs_diff_var <= 10 ** (-4)")
What I want to do is duplicate a controller to other side and rename/replace _L to _R. So I just have to select controller and it will create a group and then another group to mirror it on right side and renaming that other group to _R. Then unparent first group to world. thats all I want to do. but I'm stuck on renaming. I know I have to sort list in reverse order to rename it but whenever I do it Maya says:
More than one object matches name
Duplicated object has different parent name and same children name. Please tell me how should I do it and what I'm missing.
import maya.cmds as cmds
list = cmds.ls(sl=1)
grp = cmds.group(em=1, name=("grp" + list[0]))
# creating constraint to match transform and deleting it
pc = cmds.pointConstraint(list, grp, o=[0,0,0], w=1)
oc = cmds.orientConstraint(list, grp, o=[0,0,0], w=1)
cmds.delete(pc, oc)
# parenting it to controller
cmds.parent(list, grp)
# creating new group to reverse it to another side
Newgrp = cmds.group(em=1)
cmds.parent(grp, Newgrp)
Reversedgrp = cmds.duplicate(Newgrp)
cmds.setAttr(Reversedgrp[0] +'.sx', -1)
selection = cmds.ls(Reversedgrp, long=1)
selection.sort(key=len, reverse=1)
Renaming in Maya is very annoying, because the names are your only handle to the objects themselves.
The usually trick is basically:
Duplicate the items with the rr flag, so you only get the top nodes
Use listRelatives with the ad and full flags to get all the children of the duplicated top node in long form like |Parent|Child|Grandchild. In this form the where the entire hierarchy above the name is listed in order (you can get this form with cmds.ls(l=True) on objects as well)
Sort that list and then reverse it. This will put the longest path names first, so you can start with the leaf nodes and work your way upwards
Now loop through the items and apply your renaming pattern
So something like this, though you probably want to replace the selection here with something you control:
import maya.cmds as cmds
dupes = cmds.duplicate(cmds.ls(sl=True), rr=True) # duplicate, return only roots
dupes += cmds.listRelatives(dupes, ad=True, f=True) # add children as long names
longnames = cmds.ls(dupes, l=True) # make sure we have long name for root
longnames.sort() # usually these sort automatically, but's good to be safe
for item in longnames[::-1]: # this is shorthand for 'walk through the list backwards'
shortname = item.rpartition("|")[-1] # get the last bit of the name
cmds.rename(item, shortname.replace("r","l")) # at last, rename the item
thanks "theodox" it was very usefull. but still little bit confused in sorting, long names, short names and .rpartition... but anyway i have created this script finally.
import maya.cmds as cmds
_list = cmds.ls(sl=1)
grp = cmds.group(em=1, name=("grp_"+ _list[0]))
#creating constraint to match transfor and deleting it.
pc=cmds.pointConstraint( _list, grp, o=[0,0,0],w=1 )
oc=cmds.orientConstraint( _list, grp, o=[0,0,0],w=1 )
cmds.delete(pc,oc)
cmds.parent( _list, grp )
Newgrp=cmds.group(em=1)
cmds.parent(grp,Newgrp)
#duplicating new group and reversing it to negative side
dupes = cmds.duplicate(cmds.ls(Newgrp,s=0), rr=True) # duplicate, return only roots
cmds.setAttr( dupes[0] +'.sx', -1 )
#renaming
dupes += cmds.listRelatives(dupes, ad=True, f=True) # add children as long names
longnames = cmds.ls(dupes, l=True,s=0) # make sure we have long name for root
longnames.sort() # usually these sort automatically, but's good to be safe
print longnames
for item in longnames[::-1]: # this is shorthand for 'walk through the list backwards'
shortname = item.rpartition("|")[-1] # get the last bit of the name
cmds.rename(item, shortname.replace("_L","_R")) # at last, rename the item
#ungrouping back to world and delting unused nodes
cmds.parent( grp, world=True )
duplicatedGrp=cmds.listRelatives(dupes[0], c=True)
cmds.parent( duplicatedGrp, world=True )
cmds.delete(dupes[0],Newgrp)
anyone can use this code for mirroring controllers just change "l","r" in rename command.
thank you.
I am trying to work with a many-to-many mapping, finding subsets of one set that map to specific subsets of the other set.
I have many genes. Each gene is a member of one or more COGs (and vice versa), eg.
gene1 is member of COG1
gene1 is member of COG1003
gene2 is member of COG2
gene3 is member of COG273
gene4 is member of COG1
gene5 is member of COG273
gene5 is member of COG71
gene6 is member of COG1
gene6 is member of COG273
I have a short set of COGs that represents an enzyme, eg. COG1,COG273.
I want to find all sets of genes that between them have membership of every COG in the enzyme, but without unnecessary overlaps (in this case, for instance, 'gene1 and gene6' would be spurious as gene6 is already a member of both COGs).
In this example, the answers would be:
gene1 and gene3
gene1 and gene5
gene3 and gene4
gene4 and gene5
gene6
Although I could get all members of each COG and create a 'product', this would contain spurious results (as mentioned above) where more genes than necessary are in the set.
My mappings are currently contained in a dictionary where the key is the gene ID and the value is a list of the COG IDs of which that gene is a member. However I accept that this might not be the best way to have the mapping stored.
One basic attack:
Keep your representation as it is for now.
Initialize a dictionary with the COGs as keys; each value is an initial count of 0.
Now start building your list of enzyme coverage sets (ecs_list), one ecs at a time. Do this by starting at the front of the gene list and working your way to the end, considering all combinations.
Write a recursive routine to solve the remaining COGs in the enzyme. Something like this:
def pick_a_gene(gene_list, cog_list, solution_set, cog_count_dict):
pick the first gene in the list that is in at least one cog in the list.
let the rest of the list be remaining_gene_list.
add the gene to the solution set.
for each of the gene's cogs:
increment the cog's count in cog_count_dict
remove the cog from cog_list (if it's still there).
add the gene to the solution set.
is there anything left in the cog_list?
yes:
pick_a_gene(remaining_gene_list, cog_list, solution_set, cog_count_dict)
no: # we have a solution: check it for minimality
from every non-zero entry in cog_count_dict, subtract 1. This gives us a list of excess coverage.
while the excess list is not empty:
pick the next gene in the solution set, starting from the *end* (if none, break the loop)
if the gene's cogs are all covered by the excess:
remove the gene from the solution set.
decrement the excess count of each of its cogs.
The remaining set of genes is an ECS; add it to ecs_list
Does this work for you? I believe that it covers the minimal sets properly, given the well-behaved example you have. Note that starting from the high end when we check minimality guards against a case like this:
gene1: cog1, cog5
gene2: cog2, cog5
gene3: cog3
gene4: cog1, cog2, cog4
enzyme: cog1 - cog5
We can see that we need gene3, gene4, and either gene1 or gene2. If we eliminate from the low end, we'll toss out gene1 and never find that solution. If we start from the high end, we'll eliminate gene2, but find that solution in a later pass of the main loop.
It's possible to construct a case in which there is a 3-way conflict of this ilk. In that case, we'd have to write an extra loop in the minimality check to find them all. However, I gather that your data aren't that nasty to us.
def findGenes(seq1, seq2, llist):
from collections import OrderedDict
from collections import Counter
from itertools import product
od = OrderedDict()
for b,a in llist:
od.setdefault(a,[]).append(b)
llv = []
for k,v in od.items():
if seq1 == k or seq2 == k:
llv.append(v)
# flat list needed for counting genes frequencies
flatL = [ x for sublist in llv for x in sublist]
cFlatl = Counter(flatL)
# this will gather genes that like gene6 have both sequencies
l_lonely = []
for k in cFlatl:
if cFlatl[k] > 1:
l_lonely.append(k)
newL = []
temp = []
for sublist in llv:
for el in sublist:
if el not in l_lonely:
newL.append(el)
temp.append(newL)
newL = []
# temp contains only genes that do not belong to both sequences
# product will connect genes from different sequence groups
p = product(*temp)
for el in list(p):
print(el)
print(l_lonely)
OUTPUT:
lt = [('gene1', 'COG1'), ('gene1', 'COG1003'),('gene2', 'COG2'), ('gene3', 'COG273'), ('gene4', 'COG1'),
('gene5', 'COG273'),('gene5', 'COG71'), ('gene6' ,'COG1'),('gene6', 'COG273')]
findGenes('COG1', 'COG273', lt )
('gene1', 'gene3')
('gene1', 'gene5')
('gene4', 'gene3')
('gene4', 'gene5')
['gene6']
Does this do it for you? Note that since you said you had a short set of COGs, I went ahead and did nested for loops; there may be ways to optimize this...
For future reference, please post any code that you've got along with your question.
import itertools
d = {'gene1':['COG1','COG1003'], 'gene2':['COG2'], 'gene3':['COG273'], 'gene4':['COG1'], 'gene5':['COG273','COG71'], 'gene6':['COG1','COG273']}
COGs = [set(['COG1','COG273'])] # example list of COGs containing only one enzyme; NOTE: your data should be a list of multiple sets
# create all pair-wise combinations of our data
gene_pairs = [l for l in itertools.combinations(d.keys(),2)]
found = set()
for pair in gene_pairs:
join = set(d[pair[0]] + d[pair[1]]) # set of COGs for gene pairs
for COG in COGs:
# check if gene already part of enzyme
if sorted(d[pair[0]]) == sorted(list(COG)):
found.add(pair[0])
elif sorted(d[pair[1]]) == sorted(list(COG)):
found.add(pair[1])
# check if gene combinations are part of enzyme
if COG <= join and pair[0] not in found and pair[1] not in found:
found.add(pair)
for l in found:
if isinstance(l, tuple): # if tuple
print l[0], l[1]
else:
print l
Thanks for the suggestions, they have inspired me to hack something together using recursion. I want to deal with arbitrary gene-cog relationships, so it needs to be a general solution. This should yield all sets of genes (enzymes) that between them are members of all required COGs, without duplicate enzymes and without redundant genes:
def get_enzyme_cogs(enzyme, gene_cog_dict):
"""Get all COGs of which there is at least one member gene in the enzyme."""
cog_list = []
for gene in enzyme:
cog_list.extend(gene_cog_dict[gene])
return set(cog_list)
def get_gene_by_gene_cogs(enzyme, gene_cog_dict):
"""Get COG memberships for each gene in enzyme."""
cogs_list = []
for gene in enzyme:
cogs_list.append(set(gene_cog_dict[gene]))
return cogs_list
def add_gene(target_enzyme_cogs, gene_cog_dict, cog_gene_dict, proposed_enzyme = None, fulfilled_cogs = None):
"""Generator for all enzymes with membership of all target_enzyme_cogs, without duplicate enzymes or redundant genes."""
base_enzyme_genes = proposed_enzyme or []
fulfilled_cogs = get_enzyme_cogs(base_enzyme_genes, target_enzyme_cogs, gene_cog_dict)
## Which COG will we try to find a member of?
next_cog_to_fill = sorted(list(target_enzyme_cogs-fulfilled_cogs))[0]
gene_members_of_cog = cog_gene_dict[next_cog_to_fill]
for gene in gene_members_of_cog:
## Check whether any already-present gene's COG set is a subset of the proposed gene's COG set, if so skip addition
subset_found = False
proposed_gene_cogs = set(gene_cog_dict[gene]) & target_enzyme_cogs
for gene_cogs_set in get_gene_by_gene_cogs(base_enzyme_genes, target_enzyme_cogs, gene_cog_dict):
if gene_cogs_set.issubset(proposed_gene_cogs):
subset_found = True
break
if subset_found:
continue
## Add gene to proposed enzyme
proposed_enzyme = deepcopy(base_enzyme_genes)
proposed_enzyme.append(gene)
## Determine which COG memberships are fulfilled by the genes in the proposed enzyme
fulfilled_cogs = get_enzyme_cogs(proposed_enzyme, target_enzyme_cogs, gene_cog_dict)
if (fulfilled_cogs & target_enzyme_cogs) == target_enzyme_cogs:
## Proposed enzyme has members of every required COG, so yield
enzyme = deepcopy(proposed_enzyme)
proposed_enzyme.remove(gene)
yield enzyme
else:
## Proposed enzyme is still missing some COG members
for enzyme in add_gene(target_enzyme_cogs, gene_cog_dict, cog_gene_dict, proposed_enzyme, fulfilled_cogs):
yield enzyme
Input:
gene_cog_dict = {'gene1':['COG1','COG1003'], 'gene2':['COG2'], 'gene3':['COG273'], 'gene4':['COG1'], 'gene5':['COG273','COG71'], 'gene6':['COG1','COG273']}
cog_gene_dict = {'COG2': ['gene2'], 'COG1': ['gene1', 'gene4', 'gene6'], 'COG71': ['gene5'], 'COG273': ['gene3', 'gene5', 'gene6'], 'COG1003': ['gene1']}
target_enzyme_cogs = ['COG1','COG273']
Usage:
for enzyme in add_gene(target_enzyme_cogs, gene_cog_dict, cog_gene_dict):
print enzyme
Output:
['gene1', 'gene3']
['gene1', 'gene5']
['gene4', 'gene3']
['gene4', 'gene5']
['gene6']
I have no idea about its performance though.
I have a dataset containing historical transaction records for real estate properties. Each property has an ID number. To check if the data is complete, for each property I am identifying a "transaction chain": I take the original buyer, and go through all intermediate buyer/seller combinations until I reach the final buyer of record. So for data that looks like this:
Buyer|Seller|propertyID
Bob|Jane|23
Tim|Bob|23
Karl|Tim|23
The transaction chain will look like: [Jane, Bob, Tim, Karl]
I am using three datasets to do this. The first contains the names of only the first buyer of each property. The second contains the names of all intermediate buyers and sellers, and the third contains only the final buyer for each property. I use three datasets so I can follow the process given by vikramls answer here.
In my version of the graph dictionary, each seller is a key to its corresponding buyer, and the oft-cited find_path function finds the path from first seller to last buyer. The problem is that the dataset is very large, so I get a maximum recursion depth reached error. I think I can solve this by nesting the graph dictionary inside another dictionary where they key is the property id number, and then searching for the path within ID groups. However, when I tried:
graph = {}
propertyIDgraph = {}
with open('buyersAndSellers.txt','r') as f:
for row in f:
propertyid, seller, buyer = row.strip('\n').split('|')
graph.setdefault(seller, []).append(buyer)
propertyIDgraph.setdefault(propertyid, []).append(graph)
f.close()
It assigned every buyer/seller combination to every property id. I would like it to assign the buyers and sellers to only their corresponding property ID.
You might attempt to something like the following. I adapted from the link at https://www.python.org/doc/essays/graphs/
Transaction = namedtuple('Transaction', ['Buyer', 'PropertyId'])
graph = {}
## maybe this is a db or a file
for data in datasource:
graph[data.seller] = Transaction(data.buyer,data.property_id)
## returns something like
## graph = {'Jane': [Transaction('Bob',23)],
## 'Bob': [Transaction('Tim',23)],
## 'Time': [Transaction('Karl',23)]}
##
def find_transaction_path(graph, original_seller,current_owner,target_property_id path=[]):
assert(target_property_id is not None)
path = path + [original_seller]
if start == end:
return path
if not graph.has_key(original_seller):
return None
shortest = None
for node in graph[start]:
if node not in path and node.property_id == target_property_id:
newpath = find_shortest_path(graph, node.Buyer, current_owner, path,target_property_id)
if newpath:
if not shortest or len(newpath) < len(shortest):
shortest = newpath
return shortest
I wouldn't recommend append to a graph. It will append to every node. Better check if exists first than right after append it to the already existed object.
Try this:
graph = {}
propertyIDgraph = {}
with open('buyersAndSellers.txt','r') as f:
for row in f:
propertyid, seller, buyer = row.strip('\n').split('|')
if seller in graph.iterkeys() :
graph[seller] = graph[seller] + [buyer]
else:
graph[seller] = [buyer]
if propertyid in propertyIDgraph.iterkeys():
propertyIDgraph[propertyid] = propertyIDgraph[propertyid] + [graph]
else:
propertyIDgraph[propertyid] = [graph]
f.close()
Here a link that maybe will be usefull:
syntax for creating a dictionary into another dictionary in python