Python: Find subsets across many-to-many mapping

Python: Find subsets across many-to-many mapping - python

I am trying to work with a many-to-many mapping, finding subsets of one set that map to specific subsets of the other set.
I have many genes. Each gene is a member of one or more COGs (and vice versa), eg.
gene1 is member of COG1
gene1 is member of COG1003
gene2 is member of COG2
gene3 is member of COG273
gene4 is member of COG1
gene5 is member of COG273
gene5 is member of COG71
gene6 is member of COG1
gene6 is member of COG273
I have a short set of COGs that represents an enzyme, eg. COG1,COG273.
I want to find all sets of genes that between them have membership of every COG in the enzyme, but without unnecessary overlaps (in this case, for instance, 'gene1 and gene6' would be spurious as gene6 is already a member of both COGs).
In this example, the answers would be:
gene1 and gene3
gene1 and gene5
gene3 and gene4
gene4 and gene5
gene6
Although I could get all members of each COG and create a 'product', this would contain spurious results (as mentioned above) where more genes than necessary are in the set.
My mappings are currently contained in a dictionary where the key is the gene ID and the value is a list of the COG IDs of which that gene is a member. However I accept that this might not be the best way to have the mapping stored.

One basic attack:
Keep your representation as it is for now.
Initialize a dictionary with the COGs as keys; each value is an initial count of 0.
Now start building your list of enzyme coverage sets (ecs_list), one ecs at a time. Do this by starting at the front of the gene list and working your way to the end, considering all combinations.
Write a recursive routine to solve the remaining COGs in the enzyme. Something like this:
def pick_a_gene(gene_list, cog_list, solution_set, cog_count_dict):
pick the first gene in the list that is in at least one cog in the list.
let the rest of the list be remaining_gene_list.
add the gene to the solution set.
for each of the gene's cogs:
increment the cog's count in cog_count_dict
remove the cog from cog_list (if it's still there).
add the gene to the solution set.
is there anything left in the cog_list?
yes:
pick_a_gene(remaining_gene_list, cog_list, solution_set, cog_count_dict)
no: # we have a solution: check it for minimality
from every non-zero entry in cog_count_dict, subtract 1. This gives us a list of excess coverage.
while the excess list is not empty:
pick the next gene in the solution set, starting from the *end* (if none, break the loop)
if the gene's cogs are all covered by the excess:
remove the gene from the solution set.
decrement the excess count of each of its cogs.
The remaining set of genes is an ECS; add it to ecs_list
Does this work for you? I believe that it covers the minimal sets properly, given the well-behaved example you have. Note that starting from the high end when we check minimality guards against a case like this:
gene1: cog1, cog5
gene2: cog2, cog5
gene3: cog3
gene4: cog1, cog2, cog4
enzyme: cog1 - cog5
We can see that we need gene3, gene4, and either gene1 or gene2. If we eliminate from the low end, we'll toss out gene1 and never find that solution. If we start from the high end, we'll eliminate gene2, but find that solution in a later pass of the main loop.
It's possible to construct a case in which there is a 3-way conflict of this ilk. In that case, we'd have to write an extra loop in the minimality check to find them all. However, I gather that your data aren't that nasty to us.

def findGenes(seq1, seq2, llist):
from collections import OrderedDict
from collections import Counter
from itertools import product
od = OrderedDict()
for b,a in llist:
od.setdefault(a,[]).append(b)
llv = []
for k,v in od.items():
if seq1 == k or seq2 == k:
llv.append(v)
# flat list needed for counting genes frequencies
flatL = [ x for sublist in llv for x in sublist]
cFlatl = Counter(flatL)
# this will gather genes that like gene6 have both sequencies
l_lonely = []
for k in cFlatl:
if cFlatl[k] > 1:
l_lonely.append(k)
newL = []
temp = []
for sublist in llv:
for el in sublist:
if el not in l_lonely:
newL.append(el)
temp.append(newL)
newL = []
# temp contains only genes that do not belong to both sequences
# product will connect genes from different sequence groups
p = product(*temp)
for el in list(p):
print(el)
print(l_lonely)
OUTPUT:
lt = [('gene1', 'COG1'), ('gene1', 'COG1003'),('gene2', 'COG2'), ('gene3', 'COG273'), ('gene4', 'COG1'),
('gene5', 'COG273'),('gene5', 'COG71'), ('gene6' ,'COG1'),('gene6', 'COG273')]
findGenes('COG1', 'COG273', lt )
('gene1', 'gene3')
('gene1', 'gene5')
('gene4', 'gene3')
('gene4', 'gene5')
['gene6']

Does this do it for you? Note that since you said you had a short set of COGs, I went ahead and did nested for loops; there may be ways to optimize this...
For future reference, please post any code that you've got along with your question.
import itertools
d = {'gene1':['COG1','COG1003'], 'gene2':['COG2'], 'gene3':['COG273'], 'gene4':['COG1'], 'gene5':['COG273','COG71'], 'gene6':['COG1','COG273']}
COGs = [set(['COG1','COG273'])] # example list of COGs containing only one enzyme; NOTE: your data should be a list of multiple sets
# create all pair-wise combinations of our data
gene_pairs = [l for l in itertools.combinations(d.keys(),2)]
found = set()
for pair in gene_pairs:
join = set(d[pair[0]] + d[pair[1]]) # set of COGs for gene pairs
for COG in COGs:
# check if gene already part of enzyme
if sorted(d[pair[0]]) == sorted(list(COG)):
found.add(pair[0])
elif sorted(d[pair[1]]) == sorted(list(COG)):
found.add(pair[1])
# check if gene combinations are part of enzyme
if COG <= join and pair[0] not in found and pair[1] not in found:
found.add(pair)
for l in found:
if isinstance(l, tuple): # if tuple
print l[0], l[1]
else:
print l

Thanks for the suggestions, they have inspired me to hack something together using recursion. I want to deal with arbitrary gene-cog relationships, so it needs to be a general solution. This should yield all sets of genes (enzymes) that between them are members of all required COGs, without duplicate enzymes and without redundant genes:
def get_enzyme_cogs(enzyme, gene_cog_dict):
"""Get all COGs of which there is at least one member gene in the enzyme."""
cog_list = []
for gene in enzyme:
cog_list.extend(gene_cog_dict[gene])
return set(cog_list)
def get_gene_by_gene_cogs(enzyme, gene_cog_dict):
"""Get COG memberships for each gene in enzyme."""
cogs_list = []
for gene in enzyme:
cogs_list.append(set(gene_cog_dict[gene]))
return cogs_list
def add_gene(target_enzyme_cogs, gene_cog_dict, cog_gene_dict, proposed_enzyme = None, fulfilled_cogs = None):
"""Generator for all enzymes with membership of all target_enzyme_cogs, without duplicate enzymes or redundant genes."""
base_enzyme_genes = proposed_enzyme or []
fulfilled_cogs = get_enzyme_cogs(base_enzyme_genes, target_enzyme_cogs, gene_cog_dict)
## Which COG will we try to find a member of?
next_cog_to_fill = sorted(list(target_enzyme_cogs-fulfilled_cogs))[0]
gene_members_of_cog = cog_gene_dict[next_cog_to_fill]
for gene in gene_members_of_cog:
## Check whether any already-present gene's COG set is a subset of the proposed gene's COG set, if so skip addition
subset_found = False
proposed_gene_cogs = set(gene_cog_dict[gene]) & target_enzyme_cogs
for gene_cogs_set in get_gene_by_gene_cogs(base_enzyme_genes, target_enzyme_cogs, gene_cog_dict):
if gene_cogs_set.issubset(proposed_gene_cogs):
subset_found = True
break
if subset_found:
continue
## Add gene to proposed enzyme
proposed_enzyme = deepcopy(base_enzyme_genes)
proposed_enzyme.append(gene)
## Determine which COG memberships are fulfilled by the genes in the proposed enzyme
fulfilled_cogs = get_enzyme_cogs(proposed_enzyme, target_enzyme_cogs, gene_cog_dict)
if (fulfilled_cogs & target_enzyme_cogs) == target_enzyme_cogs:
## Proposed enzyme has members of every required COG, so yield
enzyme = deepcopy(proposed_enzyme)
proposed_enzyme.remove(gene)
yield enzyme
else:
## Proposed enzyme is still missing some COG members
for enzyme in add_gene(target_enzyme_cogs, gene_cog_dict, cog_gene_dict, proposed_enzyme, fulfilled_cogs):
yield enzyme
Input:
gene_cog_dict = {'gene1':['COG1','COG1003'], 'gene2':['COG2'], 'gene3':['COG273'], 'gene4':['COG1'], 'gene5':['COG273','COG71'], 'gene6':['COG1','COG273']}
cog_gene_dict = {'COG2': ['gene2'], 'COG1': ['gene1', 'gene4', 'gene6'], 'COG71': ['gene5'], 'COG273': ['gene3', 'gene5', 'gene6'], 'COG1003': ['gene1']}
target_enzyme_cogs = ['COG1','COG273']
Usage:
for enzyme in add_gene(target_enzyme_cogs, gene_cog_dict, cog_gene_dict):
print enzyme
Output:
['gene1', 'gene3']
['gene1', 'gene5']
['gene4', 'gene3']
['gene4', 'gene5']
['gene6']
I have no idea about its performance though.

Related

Count occurances of a specific string within multi-valued elements in a set

I have generated a list of genes
genes = ['geneName1', 'geneName2', ...]
and a set of their interactions:
geneInt = {('geneName1', 'geneName2'), ('geneName1', 'geneName3'),...}
I want to find out how many interactions each gene has and put that in a vector (or dictionary) but I struggle to count them. I tried the usual approach:
interactionList = []
for gene in genes:
interactions = geneInt.count(gene)
interactionList.append(ineractions)
but of course the code fails because my set contains elements that are made out of two values while I need to iterate over the single values within.

I would argue that you are using the wrong data structure to hold interactions. You can represent interactions as a dictionary keyed by gene name, whose values are a set of all the genes it interacts with.
Let's say you currently have a process that does something like this at some point:
geneInt = set()
...
geneInt.add((gene1, gene2))
Change it to
geneInt = collections.defaultdict(set)
...
geneInt[gene1].add(gene2)
If the interactions are symmetrical, add a line
geneInt[gene2].add(gene1)
Now, to count the number of interactions, you can do something like
intCounts = {gene: len(ints) for gene, ints in geneInt.items()}
Counting your original list is simple if the interactions are one-way as well:
intCounts = dict.fromkeys(genes, 0)
for gene, _ in geneInt:
intCounts[gene] += 1
If each interaction is two-way, there are three possibilities:
Both interactions are represented in the set: the above loop will work.
Only one interaction of a pair is represented: change the loop to
for gene1, gene2 in geneInt:
intCounts[gene1] += 1
if gene1 != gene2:
intCounts[gene2] += 1
Some reverse interactions are represented, some are not. In this case, transform geneInt into a dictionary of sets as shown in the beginning.

Try something like this,
interactions = {}
for gene in genes:
interactions_count = 0
for tup in geneInt:
interactions_count += tup.count(gene)
interactions[gene] = interactions_count

Use a dictionary, and keep incrementing the value for every gene you see in each tuple in the set geneInt.
interactions_counter = dict()
for interaction in geneInt:
for gene in interaction:
interactions_counter[gene] = interactions_counter.get(gene, 0) + 1
The dict.get(key, default) method returns the value at the given key, or the specified default if the key doesn't exist. (More info)
For the set geneInt={('geneName1', 'geneName2'), ('geneName1', 'geneName3')}, we get:
interactions_counter = {'geneName1': 2, 'geneName2': 1, 'geneName3': 1}

Python Gitlab API - list shared projects of a group/subgroup

I need to find all projects and shared projects within a Gitlab group with subgroups. I managed to list the names of all projects like this:
group = gl.groups.get(11111, lazy=True)
# find all projects, also in subgroups
projects=group.projects.list(include_subgroups=True, all=True)
for prj in projects:
print(prj.attributes['name'])
print("")
What I am missing is to list also the shared projects within the group. Or maybe to put this in other words: find out all projects where my group is a member. Is this possible with the Python API?

So, inspired by the answer of sytech, I found out that it was not working in the first place, as the shared projects were still hidden in the subgroups. So I came up with the following code that digs through all various levels of subgroups to find all shared projects. I assume this can be written way more elegant, but it works for me:
# group definition
main_group_id = 11111
# create empty list that will contain final result
list_subgroups_id_all = []
# create empty list that act as temporal storage of the results outside the function
list_subgroups_id_stored = []
# function to create a list of subgroups of a group (id)
def find_subgroups(group_id):
# retrieve group object
group = gl.groups.get(group_id)
# create empty lists to store id of subgroups
list_subgroups_id = []
#iterate through group to find id of all subgroups
for sub in group.subgroups.list():
list_subgroups_id.append(sub.id)
return(list_subgroups_id)
# function to iterate over the various groups for subgroup detection
def iterate_subgroups(group_id, list_subgroups_id_all):
# for a given id, find existing subgroups (id) and store them in a list
list_subgroups_id = find_subgroups(group_id)
# add the found items to the list storage variable, so that the results are not overwritten
list_subgroups_id_stored.append(list_subgroups_id)
# for each found subgroup_id, test if it is already part of the total id list
# if not, keep store it and test for more subgroups
for test_id in list_subgroups_id:
if test_id not in list_subgroups_id_all:
# add it to total subgroup id list (final results list)
list_subgroups_id_all.append(test_id)
# check whether test_id contains more subgroups
list_subgroups_id_tmp = iterate_subgroups(test_id, list_subgroups_id_all)
#if so, append to stored subgroup list that is currently checked
list_subgroups_id_stored.append(list_subgroups_id_tmp)
return(list_subgroups_id_all)
# find all subgroup and subsubgroups, etc... store ids in list
list_subgroups_id_all = iterate_subgroups(main_group_id , list_subgroups_id_all)
print("***ids of all subgroups***")
print(list_subgroups_id_all)
print("")
print("***names of all subgroups***")
list_names = []
for ids in list_subgroups_id_all:
group = gl.groups.get(ids)
group_name = group.attributes['name']
list_names.append(group_name)
print(list_names)
#print(list_subgroups_name_all)
print("")
# print all directly integrated projects of the main group, also those in subgroups
print("***integrated projects***")
group = gl.groups.get(main_group_id)
projects=group.projects.list(include_subgroups=True, all=True)
for prj in projects:
print(prj.attributes['name'])
print("")
# print all shared projects
print("***shared projects***")
for sub in list_subgroups_id_all:
group = gl.groups.get(sub)
for shared_prj in group.shared_projects:
print(shared_prj['path_with_namespace'])
print("")
One question that remains - at the very beginning I retrieve the main group by its id (here: 11111), but can I actually also get this id by looking for the name of the group? Something like: group_id = gl.group.get(attribute={'name','foo'}) (not working)?

You can get the shared projects by the .shared_projects attribute:
group = gl.groups.get(11111)
for proj in group.shared_projects:
print(proj['path_with_namespace'])
However, you cannot use the lazy=True argument to gl.groups.get.
>>> group = gl.groups.get(11111, lazy=True)
>>> group.shared_projects
AttributeError: shared_projects

Redistribute a list of merchants_id so each user receives different set of merchants but equal in number - Python

Update: This can not be solved 100% since the number of merchants each user must receive is different. So some users might end up getting the same merchants as before. However, is it possible to let them get the same merchants, if there are not any other different merchants available?
I have the following excel file:
What I would like to do is to redistribute the merchants (Mer_id) so each user (Origin_pool) gets the same number of merchants as before, but a different set of merchants. For example, after the redistribution, Nick will receive 3 Mer_id's but not: 30303, 101020, 220340. Anna will receive 4 merchants but not 23401230,310231, 2030230, 2310505 and so on. Of course, one merchant can not be assigned to more than one person.
What I did so far is to find the total number of merchants each user must receive and randomly give them one mer_id that is not previously assigned to them. After I find a different mer_id I remove it from the list, so the other users won't receive the same merchant:
import pandas as pd
import numpy as np
df=pd.read_excel('dup_check_origin.xlsx')
dfcounts=df.groupby(['Origin_pool']).size().reset_index(name='counts')
Origin_pool=list(dfcounts['Origin_pool'])
counts=list(dfcounts['counts'])
dict_counts = dict(zip(Origin_pool, counts))
dest_name=[]
dest_mer=[]
for pool in Origin_pool:
pername=0
#for j in range(df.shape[0]):
while pername<=dict_counts[pool]:
rn=random.randint(0,df.shape[0]-1)
rid=df['Mer_id'].iloc[rn]
if (pool!=df['Origin_pool'].iloc[rn]):
#new_dict[pool]=rid
pername+=1
dest_name.append(pool)
dest_mer.append(rid)
df=df.drop(df.loc[df['Mer_id']==rid].index[0])
But it is not efficient at all, given the fact that in the future I might have more data than 18 rows.
Is there any library that does this or a way to make it more efficient?

Several days after your question, but I think it's a bullet proof code.
You can manage to create a function or class with the entire code.
I only created one, which is a recursive one, to handle the leftovers.
There are 3 lists, initialized at the beginning of the code:
pairs -> it returns your pool list (final one)
reshuffle -> it returns the pairs pool generated randomly and already appeared at pool pairs in the excel
still -> to handle the repeated pool pairs inside the function pullpush
The pullpsuh function comes first, because it will be called in different situations.
The first part of the program is a random algorithm to make pairs from mer_id(merchants) and origin_pool(poolers).
If the pair is not in the excel than it goes to the pairs list, otherwise they go to the reshuffle list.
Depending on the reshuffle characteristics another random algorithm is called or it will be processed by pullpush function.
If you execute the code once, as it is, and print(pairs) you may find a list with 15, 14 any more pool pairs lesser than 18.
Then, if you print(reshuffle) you will see the rest of the pairs to make 18.
To get the full 18 matchings in the pairs variable you must run:
pullpush(reshuffle).
The output here was obtained running the code followed by:
pullpush(reshuffle)
If you want to control that mer_id and origin_pool should not repeat for 3 rounds, you can load other 2 excels and split
them into oldpair2 and oldpair3.
[[8348201, 'Anna'], [53256236, 'Anna'], [9295, 'Anna'], [54240, 'Anna'], [30303, 'Marios'], [101020, 'Marios'], [959295, 'Marios'], [2030230, 'George'], [310231, 'George'], [23401230, 'George'], [2341134, 'Nick'], [178345, 'Marios'], [220340, 'Marios'], [737635, 'George'], [[2030230, 'George'], [928958, 'Nick']], [[5560503, 'George'], [34646, 'Nick']]]
The code:
import pandas as pd
import random
df=pd.read_excel('dup_check_origin.xlsx')
oldpair = df.values.tolist() #check previous pooling pairs
merchants = df['Mer_id'].values.tolist() #convert mer_id in list
poolers = df['Origin_pool'].values.tolist() #convert mer_id in list
random.shuffle(merchants) #1st step shuffle
pairs = [] #empty pairs list
reshuffle = [] #try again
still = [] #same as reshuffle for pullpush
def pullpush(repetition):
replacement = repetition #reshuffle transfer
for re in range(len(replacement)):
replace = next(r for r in pairs if r not in replacement)
repair = [[replace[0],replacement[re][1]],
[replacement[re][0],replace[1]]]
if repair not in oldpair:
iReplace = pairs.index(replace)#get index of pair
pairs.append(repair)
del pairs[iReplace] # remove from pairs
else:
still.append(repair)
if still:
pullpush(still) #recursive call
for p in range(len(poolers)):#avoid more merchants than poolers
pair = [merchants[p],poolers[p]]
if pair not in oldpair:
pairs.append(pair)
else:
reshuffle.append(pair)
if reshuffle:
merchants_bis = [x[0] for x in reshuffle]
poolers_bis = [x[1] for x in reshuffle]
if len(reshuffle) > 2: #shuffle needs 3 or more elements
random.shuffle(merchants_bis)
reshuffle = [] #clean before the loop
for n in range(len(poolers_bis)):
new_pair = [merchants_bis[n],poolers_bis[n]]
if new_pair not in oldpair:
pairs.append(new_pair)
else:
reshuffle.append(new_pair)
if len(reshuffle) == len(poolers_bis):#infinite loop
pullpush(reshuffle)
# double pairs and different poolers
elif (len(reshuffle) == 2 and not[i for i in reshuffle[0] if i in reshuffle[1]]):
merchants_bis = [merchants_bis[1],merchants_bis[0]]
new_pair = [[merchants_bis[1],poolers_bis[0]],
[merchants_bis[0],poolers_bis[1]]]
if new_pair not in oldpair:
pairs.append(new_pair)
else:
reshuffle.append(new_pair)
pullpush(reshuffle)
else: #one left or same poolers
pullpush(reshuffle)

My solution using dictionaries and lists, i print the result, but you can create a new dataframe with that.
from random import shuffle
import pandas as pd
df = pd.read_excel('dup_check_origin.xlsx')
dpool = {}
mers = list(df.Mer_id.unique())
shuffle(mers)
for pool in df.Origin_pool.unique():
dpool[pool] = list(df.Mer_id[df.Origin_pool == pool])
for key in dpool.keys():
inmers = dpool[key]
cnt = len(inmers)
new = [x for x in mers if x not in inmers][:cnt]
mers = [x for x in mers if x not in new]
print(key, new)

Flux variability analysis only for transport reactions between compartments?

I would like to do a FVA only for selected reactions, in my case on transport reactions between compartments (e.g. between the cytosol and mitochondrion). I know that I can use selected_reactions in doFVA like this:
import cbmpy as cbm
mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')
cbm.doFVA(mod, selected_reactions=['R_FORtm', 'R_CO2tm'])
Is there a way to get the entire list of transport reactions, not only the two I manually added? I thought about
selecting the reactions based on their ending tm but that fails for 'R_ORNt3m' (and probably other reactions, too).
I want to share this model with others. What is the best way of storing the information in the SBML file?
Currently, I would store the information in the reaction annotation as in
this answer. For example
mod.getReaction('R_FORtm').setAnnotation('FVA', 'yes')
which could be parsed.

There is no built-in function for this kind of task. As you already mentioned, relying on the IDs is generally not a good idea as those can differ between different databases, models and groups (e.g. if someone decided just to enumerate reactions from r1 till rn and or metabolites from m1 till mm, filtering based on IDs fails). Instead, one can make use of the compartment field of the species. In CBMPy you can access a species' compartment by doing
import cbmpy as cbm
import pandas as pd
mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')
mod.getSpecies('M_atp_c').getCompartmentId()
# will return 'c'
# run a FBA
cbm.doFBA(mod)
This can be used to find all fluxes between compartments as one can check for each reaction in which compartment their reagents are located. A possible implementation could look as follows:
def get_fluxes_associated_with_compartments(model_object, compartments, return_values=True):
# check whether provided compartment IDs are valid
if not isinstance(compartments, (list, set) or not set(compartments).issubset(model_object.getCompartmentIds())):
raise ValueError("Please provide valid compartment IDs as a list!")
else:
compartments = set(compartments)
# all reactions in the model
model_reactions = model_object.getReactionIds()
# check whether provided compartments are identical with the ones of the reagents of a reaction
return_reaction_ids = [ri for ri in model_reactions if compartments == set(si.getCompartmentId() for si in
model_object.getReaction(ri).getSpeciesObj())]
# return reaction along with its value
if return_values:
return {ri: model_object.getReaction(ri).getValue() for ri in return_reaction_ids}
# return only a list with reaction IDs
return return_reaction_ids
So you pass your model object and a list of compartments and then for each reaction it is checked whether there is at least one reagent located in the specified compartments.
In your case you would use it as follows:
# compartment IDs for mitochondria and cytosol
comps = ['c', 'm']
# you only want the reaction IDs; remove the ', return_values=False' part if you also want the corresponding values
trans_cyt_mit = get_fluxes_associated_with_compartments(mod, ['c', 'm'], return_values=False)
The list trans_cyt_mit will then contain all desired reaction IDs (also the two you specified in your question) which you can then pass to the doFVA function.
About the second part of your question. I highly recommend to store those reactions in a group rather than using annotation:
# create an empty group
mod.createGroup('group_trans_cyt_mit')
# get the group object so that we can manipulate it
cyt_mit = mod.getGroup('group_trans_cyt_mit')
# we can only add objects to a group so we get the reaction object for each transport reaction
reaction_objects = [mod.getReaction(ri) for ri in trans_cyt_mit]
# add all the reaction objects to the group
cyt_mit.addMember(reaction_objects)
When you now export the model, e.g. by using
cbm.CBWrite.writeSBML3FBCV2(mod, 'iMM904_with_groups.xml')
this group will be stored as well in SBML. If a colleague reads the SBML again, he/she can then easily run a FVA for the same reactions by accessing the group members which is far easier than parsing the annotation:
# do an FVA; fva_res: Reaction, Reduced Costs, Variability Min, Variability Max, abs(Max-Min), MinStatus, MaxStatus
fva_res, rea_names = cbm.doFVA(mod, selected_reactions=mod.getGroup('group_trans_cyt_mit').getMemberIDs())
fva_dict = dict(zip(rea_names, fva_res.tolist()))
# store results in a dataframe which makes the selection of reactions easier
fva_df = pd.DataFrame.from_dict(fva_dict, orient='index')
fva_df = fva_df.rename({0: "flux_value", 1: "reduced_cost_unscaled", 2: "variability_min", 3: "variability_max",
4: "abs_diff_var", 5: "min_status", 6: "max_status"}, axis='columns')
Now you can easily query the dataframe and find the flexible and not flexible reactions within your group:
# filter the reactions with flexibility
fva_flex = fva_df.query("abs_diff_var > 10 ** (-4)")
# filter the reactions that are not flexible
fva_not_flex = fva_df.query("abs_diff_var <= 10 ** (-4)")

Python find average of element in list with multiple elements

I have a ticker that grabs current information of multiple elements and adds it to a list in the format: trade_list.append([[trade_id, results]]).
Say we're tracking trade_id's 4555, 5555, 23232, the trade_list will keep ticking away adding their results to the list, I then want to find the averages of their results individually.
The code works as such:
Find accounts
for a in accounts:
find open trades of accounts
for t in range(len(trades)):
do some math
trades_list.append(trade_id,result)
avernum = 0
average = []
for r in range(len(trades_list)):
average.append(trades_list[r][1]) # This is the value attached to the trade_id
avernum+=1
results = float(sum(average)/avernum))
results_list.append([[trade_id,results]])
This fills out really quickly. This is after two ticks:
print(results_list)
[[[53471, 28.36432]], [[53477, 31.67835]], [[53474, 32.27664]], [[52232, 1908.30604]], [[52241, 350.4758]], [[53471, 28.36432]], [[53477, 31.67835]], [[53474, 32.27664]], [[52232, 1908.30604]], [[52241, 350.4758]]]
These averages will move and change very quickly. I want to use results_list to track and watch them, then compare previous averages to current ones
Thinking:
for r in range(len(results_list)):
if results_list[r][0] == trade_id:
restick.append(results_list[r][1])
resnum = len(restick)
if restick[resnum] > restick[resnum-1]:
do fancy things

Here is some short code that does what you I think you have described, although I might have misunderstood. You basically do exactly what you say; select everything that has a certain trade_id and returns its average.:
TID_INDEX = 0
DATA_INDEX = 1
def id_average(t_id, arr):
filt_arr = [i[DATA_INDEX] for i in arr if i[TID_INDEX] == t_id]
return sum(filt_arr)/len(filt_arr)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Find subsets across many-to-many mapping - python

Related

Count occurances of a specific string within multi-valued elements in a set

Python Gitlab API - list shared projects of a group/subgroup

Redistribute a list of merchants_id so each user receives different set of merchants but equal in number - Python

Flux variability analysis only for transport reactions between compartments?

Python find average of element in list with multiple elements

Categories

Resources