So, I have a huge input file that looks like this: (you can download here)
1. FLO8;PRI2
2. FLO8;EHD3
3. GRI2;BET2
4. HAL4;AAD3
5. PRI2;EHD3
6. QLN3;FZF1
7. QLN3;ABR5
8. FZF1;ABR5
...
See it like a two column table, that the element before ";" shows to the element after ";"
I want to print simple strings iteratively that show the three elements that constitute a feedforward loop.
The example numbered list from above would output:
"FLO8 PRI2 EHD3"
"QLN3 FZF1 ABR5"
...
Explaining the first output line as a feedforward loop:
A -> B (FLO8;PRI2)
B -> C (PRI2;EHD3)
A -> C (FLO8;EHD3)
Only the circled one from this link
So, I have this, but it is terribly slow...Any suggestions to make a faster implementation?
import csv
TF = []
TAR = []
# READING THE FILE
with open("MYFILE.tsv") as tsv:
for line in csv.reader(tsv, delimiter=";"):
TF.append(line[0])
TAR.append(line[1])
# I WANT A BETTER WAY TO RUN THIS.. All these for loops are killing me
for i in range(len(TAR)):
for j in range(len(TAR)):
if ( TAR[j] != TF[j] and TAR[i] != TF[i] and TAR[i] != TAR[j] and TF[j] == TF[i] ):
for k in range(len(TAR )):
if ( not(k == i or k == j) and TF[k] == TAR[j] and TAR[k] == TAR[i]):
print "FFL: "+TF[i]+ " "+TAR[j]+" "+TAR[i]
NOTE: I don't want self-loops...from A -> A, B -> B or C -> C
I use a dict of sets to allow very fast lookups, like so:
Edit: prevented self-loops:
from collections import defaultdict
INPUT = "RegulationTwoColumnTable_Documented_2013927.tsv"
# load the data as { "ABF1": set(["ABF1", "ACS1", "ADE5,7", ... ]) }
data = defaultdict(set)
with open(INPUT) as inf:
for line in inf:
a,b = line.rstrip().split(";")
if a != b: # no self-loops
data[a].add(b)
# find all triplets such that A -> B -> C and A -> C
found = []
for a,bs in data.items():
bint = bs.intersection
for b in bs:
for c in bint(data[b]):
found.append("{} {} {}".format(a, b, c))
On my machine, this loads the data in 0.36s and finds 1,933,493 solutions in 2.90s; results look like
['ABF1 ADR1 AAC1',
'ABF1 ADR1 ACC1',
'ABF1 ADR1 ACH1',
'ABF1 ADR1 ACO1',
'ABF1 ADR1 ACS1',
Edit2: not sure this is what you want, but if you need A -> B and A -> C and B -> C but not B -> A or C -> A or C -> B, you could try
found = []
for a,bs in data.items():
bint = bs.intersection
for b in bs:
if a not in data[b]:
for c in bint(data[b]):
if a not in data[c] and b not in data[c]:
found.append("{} {} {}".format(a, b, c))
but this still returns 1,380,846 solutions.
Test set
targets = {'A':['B','C','D'],'B':['C','D'],'C':['A','D']}
And the function
for i in targets.keys():
try:
for y in targets.get(i):
#compares the dict values of two keys and saves the overlapping ones to diff
diff = list(set(targets.get(i)) & set(targets.get(y)))
#if there is at least one element overlapping from key.values i and y
#take up those elements and style them with some arrows
if (len(diff) > 0 and not i == y):
feed = i +'->'+ y + '-->'
forward = '+'.join(diff)
feedForward = feed + forward
print (feedForward)
except:
pass
The output is
A->B-->C+D
A->C-->D
C->A-->D
B->C-->D
Greetings to the Radboud Computational Biology course, Robin (q1/2016).
Related
I need to change the value of two random variables out of four to '—'. How do I do it with maximum effectiveness and readability?
Code below is crap just for reference.
from random import choice
a = 10
b = 18
c = 15
d = 92
choice(a, b, c, d) = '—'
choice(a, b, c, d) = '—'
print(a, b, c, d)
>>> 12 — — 92
>>> — 19 — 92
>>> 10 18 — —
I've tried choice(a, b, c, d) = '—' but ofc it didn't work. There's probably a solution using list functions and methods but it's complicated and almost impossible to read, so I'm searching for an easier solution.
Taking into consideration that you need to create 4 separate variables. Here's what you can do:
from random import sample
a = 10
b = 18
c = 15
d = 92
for i in sample(['a','b','c','d'], k=2):
exec(f"{i} = '-'")
print(a,b,c,d)
Using sample ensures non-repeating values.
However, this approach is not recommended but is just provided to help you understand the problem better. I recommend using a list or dictionary as stated by other fellow developers.
Variable names are not available when you run your code, so you cannot change a "random variable". Instead, I recommend that you use a list or a dictionary. Then you can choose a random element from the list or a random key from the dictionary.
Given the constraint of four named variables, I might do:
from random import sample
a = 10
b = 18
c = 15
d = 92
v = sample("abcd", 2)
if "a" in v:
a = "_"
if "b" in v:
b = "_"
if "c" in v:
c = "_"
if "d" in v:
d = "_"
print(a, b, c, d)
This is readable, but it's also extremely repetitive. It's much easier to do it if there aren't four individual variables:
from random import sample
nums = {
"a": 10,
"b": 18,
"c": 15,
"d": 92,
}
for v in sample(nums, 2):
nums[v] = "_"
print(*nums.values())
import random
data = [10, 18, 15, 92]
already_changed = []
for i in range(2): # change 2 to however many numbers you want to change
index=random.randint(0,len(data)-1) # randomly select an index to change
while index in already_changed: # makes sure not to change the same index
index = random.randint(0, len(data) - 1)
data[index] = "_"
print(data)
You cannot do choice(a, b, c, d) = '-' because choice is not a variable and can therefore not be assigned.
You should store the variables in a list then replace random elements with your string. That could be done like so:
from random import randint
a = 10
b = 18
c = 15
d = 92
replacement = "--"
nums = [a, b, c, d]
while nums.count(replacement) < 2:
# create random index
index = randint(0, len(nums)-1)
# replace value at index with the replacement
nums[index] = replacement
print(*nums) # print each element in the list
However, be aware that this code doesnt change the values of a, b, c, d. If you want that, you need to reset the variables.
You can achieve this by picking any two indices from a list of items at random, and replacing the element at that index position with _:
import random
x = [10,18,15,92]
for i in random.sample(range(len(x)), 2):
x[i] = '_'
print(x)
I'm simplifying an engineering problem as much as possible for this question:
I have a working code like this:
import numpy as np
# FUNCTION DEFINITION
def Calculations(a, b): # a function defined to work based on 2 arguments, a and b
A = a * b - a
B = a * b - b
d = A - B
return(A, B, d, a, b)
# STORE LIST CREATION
A_list = []
B_list = []
d_list = []
a_list = []
b_list = [] # I will need this list later
# 1st sequential iteration in a for loop
length = np.arange(60, 62.5, 0.5)
for l in length:
lower = 50 # this is what I want the program to update based on d
upper = 70.5 # this is what I want the program to update based on d
step = 0.5
width = np.arange(lower, upper, step)
# 2nd for loop, but here I wouldn't like a sequential iteration
for w in width:
A_list.append(Calculations(l, w)[0])
B_list.append(Calculations(l, w)[1])
d_list.append(Calculations(l, w)[2])
a_list.append(Calculations(l, w)[3])
b_list.append(Calculations(l, w)[4])
print(A_list, " \n")
print(B_list, " \n")
print(d_list, " \n")
print(a_list, " \n")
print(b_list, " \n")
This is the way I have it now, but not how I want it to work.
Here, the program iterates each time through the values of length(l) in a sequential manner, meaning it evaluates everything for l=60, then for l=60.5 and so on... this is ok, but then, for l=60 it evaluates first for w=50, then for w=50.5 and so on...
What I want is that, for l=60 he evaluates for any random value (let's call this n) between the 50 (lower) and 70.5 (upper) with a step of 0.5 (step), he will then find a particular d as one of the "returned" results, if the d is negative then the n he used is the new upper, if d is positive that n is the new lower, and he will continue to do this until d is zero.
I will keep trying to figure it out by myself, but any help would be appreciated.
PD:
As I said this example is a simplification of my real problem, as side questions I would like to ask:
The real condition of while loop to break is not when d is zero, but the closest possible to zero, or phrased in other way, the min() of the abs() values composing the d_list. I tried something like:
for value in d_list:
if value = min(abs(d_list)):
print(A_list, " \n")
print(B_list, " \n")
print(d_list, " \n")
print(a_list, " \n")
print(b_list, " \n")
but that's not correct.
I don't want to use a conditions such as if d < 0.2 because sometimes I will get d's like 0.6 and that may be ok, neither do I want a condition like if d < 1 because then if for example d = 0.005 I would get a lot of d's before that, satisfying the condition of being < 1, but I only want one for each l.
I also need to find the associated values in the returned lists, for that specific d
EDIT
I made a mistake earlier in the conditions for new upper and lower based on the obtained value of d, I fixed that.
Also, I tried solving the problem like this:
length = np.arange(60, 62.5, 0.5)
for l in length:
lower_w = 59.5 # this is what I want the program to update based on d
upper_w = 63 # this is what I want the program to update based on d
step = 0.5
width = np.arange(lower_w, upper_w, step)
np.random.shuffle(width)
for w in width:
while lower_w < w < upper_w:
A_list.append(Calculations(l,w)[0])
B_list.append(Calculations(l,w)[1])
d_list.append(Calculations(l,w)[2])
a_list.append(Calculations(l,w)[3])
b_list.append(Calculations(l,w)[4])
for element in d_list:
if element < 0:
upper = w
else:
lower = w
if abs(element) < 1 :
break
But the while loop does not get to break...
Use np.random.shuffle to pick the elements of width in a random order:
width = np.arange(lower, upper, step)
np.random.shuffle(width)
But here you don't really want the second loop, just pick one element from it at random, so use np.range.choice(width):
length = np.arange(60, 62.5, 0.5)
lower = 50 # this is what I want the program to update based on d
upper = 70.5 # this is what I want the program to update based on d
step = 0.5
for l in length:
width = np.arange(lower, upper, step)
if len(width) == 0:
width = [lower]
w = np.random.choice(width)
(A, B, d, a, b) = Calculations(l, w)
A_list.append(A)
B_list.append(B)
d_list.append(d)
a_list.append(l)
b_list.append(w)
if d < 0:
lower = w
elif d:
upper = w
No need to pass a and b in the return of the Calculations function, you can just append the original parameters to a_list and b_list.
Note that you will run into an error if your lower and upper bound are identical, because the list will just be empty, so you need to fill in the list with a bound if it returns [].
I have a Dataframe of common information among users composed by:
user class
A X
B Y
C Z
D Y
E Y
F X
and another Dataframe of their activity composed by:
fing fer
A B
A E
B D
B C
B F
C A
D E
E B
F D
The question is:
How many users that have a certain class are fer of other users that have another class?
For example, if the question is: how many users of class X are fer of users of class Y, the result should be: 3 because there are just A, F that have a class X and their relations are:
find fer
A B
A E
F D
I have tried for now the following:
fing_table = pd_ci.merge(pd_f, how="right", left_on="user", right_on="fing")
fing_table.dropna(inplace=True)
fer_table = pd_ci.merge(pd_f, how="right", left_on="user", right_on="fer")
fer_table.dropna(inplace=True)
fs = fing_table.merge(fer_table, how="right", left_on="fing", right_on="fer").drop_duplicates(keep="first")
res = fs[fs["class"] == category_to and fs["class"] == category_from]
return res["user_x"].count()
But, it crashes because since it is a Series it would need either the a.any() or a.all() etc. inside the and-condition on res.
Avoiding the usage of an explicit for, how may I solve this problem?
Thanks!
I just write the solution of the problem: I solve the problem by using the piece of code proposed by #anky that is:
def fs_from_class_to_class(
pd_ci: pd.DataFrame,
pd_f: pd.DataFrame,
class_from: str,
class_to: str
) -> int:
pd_f = pd_f.assign(fing_class=pd_f["fing"].map(pd_ci.set_index("user")["class"]))\
.assign(fer_class=pd_f["fer"].map(pd_ci.set_index("user")["class"]))
counter = pd_f.loc[(pd_f["fer_class"] == class_from) & (pd_f["fing_class"] == class_to)]
counter = counter["fing"].count()
return counter
Thank you for the answer!
I need to get all descendants point of links represented with side_a - side_b (in one dataframe) until reach for each side_a their end_point (in other dataframe). So:
df1:
side_a side_b
a b
b c
c d
k l
l m
l n
p q
q r
r s
df2:
side_a end_point
a c
b c
c c
k m
k n
l m
l n
p s
q s
r s
The point is to get all points for each side_a value until reach end_point from df2 for that value.
If it has two end_point values (like "k" does) that it should be two lists.
I have some code but it's not written with this approach, it drops all rows from df1 if df1['side_a'] == df2['end_points'] and that causes certain problems. But if someone wants me to post the code I will, of course.
The desired output would be something like this:
side_a end_point
a [b, c]
b [c]
c [c]
k [l, m]
k [l, n]
l [m]
l [n]
p [q, r, s]
q [r, s]
r [s]
And one more thing, if there is the same both side, that point doesn't need to be listed at all, I can append it later, whatever it's easier.
import pandas as pd
import numpy as np
import itertools
def get_child_list(df, parent_id):
list_of_children = []
list_of_children.append(df[df['side_a'] == parent_id]['side_b'].values)
for c_, r_ in df[df['side_a'] == parent_id].iterrows():
if r_['side_b'] != parent_id:
list_of_children.append(get_child_list(df, r_['side_b']))
# to flatten the list
list_of_children = [item for sublist in list_of_children for item in sublist]
return list_of_children
new_df = pd.DataFrame(columns=['side_a', 'list_of_children'])
for index, row in df1.iterrows():
temp_df = pd.DataFrame(columns=['side_a', 'list_of_children'])
temp_df['list_of_children'] = pd.Series(get_child_list(df1, row['side_a']))
temp_df['side_a'] = row['side_a']
new_df = new_df.append(temp_df)
So, the problem with this code is that works if I drop rows where side_a is equal to end_point from df2. I don't know how to implement condition that if catch the df2 in side_b column, then stop, don't go further.
Any help or hint is welcomed here, truly.
Thanks in advance.
You can use networkx library and graphs:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='side_a',target='side_b')
df2.apply(lambda x: [nx.shortest_path(G, x.side_a,x.end_point)[0],
nx.shortest_path(G, x.side_a,x.end_point)[1:]], axis=1)
Output:
side_a end_point
0 a [b, c]
1 b [c]
2 c []
3 k [l, m]
4 k [l, n]
5 l [m]
6 l [n]
7 p [q, r, s]
8 q [r, s]
9 r [s]
Your rules are inconsistent and your definitions are unclear so you may need to add some constraints here and there because it is unclear exactly what you are asking. By organizing the data-structure to fit the problem and building a more robust function for traversal (shown below) it will be easier to add/edit constraints as needed - and solve the problem completely.
Transform the df to a dict to better represent a tree structure
This problem is a lot simpler if you transform the data structure to be more intuitive to the problem, instead of trying to solve the problem in the context of the current structure.
## Example dataframe
df = pd.DataFrame({'side_a':['a','b','c','k','l','l','p','q','r'],'side_b':['b','c','d','l','m','n','q','r','s']})
## Instantiate blank tree with every item
all_items = set(list(df['side_a']) + list(df['side_b']))
tree = {ii : set() for ii in all_items}
## Populate the tree with each row
for idx, row in df.iterrows():
tree[row['side_a']] = set(list(tree[row['side_a']]) + list(row['side_b']))
Traverse the Tree
This is much more straightforward now that the data structure is intuitive. Any standard Depth-First-Search algorithm w/ path saving will do the trick. I modified the one in the link to work with this example.
Edit: Reading again it looks you have a condition for search termination in endpoint (you need to be more clear in your question what is input and what is output). You can adjust dfs_path(tree,**target**, root) and change the termination condition to return only the correct paths.
## Standard DFS pathfinder
def dfs_paths(tree, root):
stack = [(root, [root])]
while stack:
(node, path) = stack.pop()
for nextNode in tree[node] - set(path):
# Termination condition.
### I set it to terminate search at the end of each path.
### You can edit the termination condition to fit the
### constraints of your goal
if not tree[nextNode]:
yield set(list(path) + list(nextNode)) - set(root)
else:
stack.append((nextNode, path + [nextNode]))
Build a dataframe from the generators we yielded
If you're not super comfortable with generators, you can structure the DFS traversal so that it outputs in a list. instead of a generator
set_a = []
end_points = []
gen_dict = [{ii:dfs_paths(tree,ii)} for ii in all_items]
for gen in gen_dict:
for row in list(gen.values()).pop():
set_a.append(list(gen.keys()).pop())
end_points.append(row)
## To dataframe
df_2 = pd.DataFrame({'set_a':set_a,'end_points':end_points}).sort_values('set_a')
Output
df_2[['set_a','end_points']]
set_a end_points
a {b, c, d}
b {c, d}
c {d}
k {n, l}
k {m, l}
l {n}
l {m}
p {s, r, q}
q {s, r}
r {s}
If you're OK with an extra import, this can be posed as a path problem on a graph and solved in a handful of lines using NetworkX:
import networkx
g = networkx.DiGraph(zip(df1.side_a, df1.side_b))
outdf = df2.apply(lambda row: [row.side_a,
set().union(*networkx.all_simple_paths(g, row.side_a, row.end_point)) - {row.side_a}],
axis=1)
outdf looks like this. Note that this contains sets instead of lists as in your desired output - this allows all the paths to be combined in a simple way.
side_a end_point
0 a {c, b}
1 b {c}
2 c {}
3 k {l, m}
4 k {l, n}
5 l {m}
6 l {n}
7 p {r, q, s}
8 q {r, s}
9 r {s}
Edit: I removed my explanation part since it was wrong but I still could not be able to convert it.
I was studying list and dictionaries in python and I came accross this code.
x = min(minValue, key=lambda b: min([a( \
myFunction(5,b),c) for c in something]))
What is the logical equivalent of this ? It seems simple but I do not get same thing when I try to write it with a different code. . How can write this differently without whole key and lambda thing
Seems like my explanation was wrong. Here is the updated code I try.
for b in minValue:
for c in something:
minimum=min(myFunction(5,b),c)
result=min(minimum)
return result
Note: By logical equivalent I do not mean the provided code should calculate this exactly like the code I gave but it should have the same output.
I' not sure but it can be something like this
def example(minValue):
data = []
for b in minValue:
values = []
for c in something:
values.append( a(myFunction(5,b),c) )
result = min(values)
# keep `result` and `b` which gives this `result`
data.append( [result, b] )
# find minimal `result` and `b` which gives this `result`
x = min(data) # x = [result, b]
# return `b`
return x[1]
#---
x = example(minValue)
EDIT: there can be problem with min(data) because min will be comparing result and b and oryginal version compare only result. It may need version without min() but with:
if result < min_result:
min_result = result
min_b = b
EDIT:
def key(val):
min_c = something[0]
min_result = a(myFunction(5,val),min_c)
for c in something[1:]:
result = a(myFunction(5,val),c)
if result < min_result:
min_result = result
#min_c = c
return min_result
def example(minValue):
min_b = minValue[0]
min_result = key(min_b)
for b in minValue[1:]:
result = key(b)
if result < min_result:
min_result = result
min_b = b
return min_b
#---
x = example(minValue)