How to let if fuction output only one output in python? - python

I am trying to use if function to classify the items into 3 categories in python. My code is as follows.
WBS4_ELEMENT_list_0 = ['F.1122023.117.2.001', 'F.1122012.024.2.001', 'F.1622016.AET.2.001', 'F.1622015.137.2.001', 'F.1622015.034.2.001', 'F.1622032.100.2.001', 'F.1622016.040.2.001', 'F.1622016.017.1.002', 'F.1622015.084.2.001', 'F.1622015.548.1.001', 'F.1622015.918.1.001', 'F.1122012.606.2.001', 'F.1622015.311.1.007','F.1622016.091.1.013']
print(len(WBS4_ELEMENT_list_0))
WBS4_ELEMENT_list =[]
for i in WBS4_ELEMENT_list_0:
ii=str(i)
WBS4_ELEMENT_list.append(ii)
Child_or_Parent_based_on_WBS4_element_list = []
for h in WBS4_ELEMENT_list:
pos = WBS4_ELEMENT_list.index(h)
if WBS4_ELEMENT_list[pos][13:19]==".1.001":
Child_or_Parent_based_on_WBS4_element_list.append(WBS4_ELEMENT_list[pos]+"_Parent")
if WBS4_ELEMENT_list[pos][13:19]==".2.001":
Child_or_Parent_based_on_WBS4_element_list.append(WBS4_ELEMENT_list[pos]+"_Facility")
if WBS4_ELEMENT_list[pos][13:19]!=".1.001" or WBS4_ELEMENT_list[pos][13:19]!=".2.001":
Child_or_Parent_based_on_WBS4_element_list.append(WBS4_ELEMENT_list[pos]+"_Child")
print(len(Child_or_Parent_based_on_WBS4_element_list))
print(Child_or_Parent_based_on_WBS4_element_list)
However, there are 25 outputs which is out of the range of 14 (the number of items in WBS4_ELEMENT_list_0 ). Please help me to keep if fuction output only one output in python.

You can do it in a cleaner and faster way by using list comprehensions and a dict:
WBS4_ELEMENT_list_0 = ['F.1122023.117.2.001', 'F.1122012.024.2.001', 'F.1622016.AET.2.001', 'F.1622015.137.2.001', 'F.1622015.034.2.001', 'F.1622032.100.2.001', 'F.1622016.040.2.001', 'F.1622016.017.1.002', 'F.1622015.084.2.001', 'F.1622015.548.1.001', 'F.1622015.918.1.001', 'F.1122012.606.2.001', 'F.1622015.311.1.007','F.1622016.091.1.013']
d = {'.1.001': '_Parent', '.2.001': '_Facility'}
Child_or_Parent_based_on_WBS4_element_list = [s + d.get(s[-6:], '_Child') for s in WBS4_ELEMENT_list_0]
Output:
['F.1122023.117.2.001_Facility', 'F.1122012.024.2.001_Facility', 'F.1622016.AET.2.001_Facility', 'F.1622015.137.2.001_Facility', 'F.1622015.034.2.001_Facility', 'F.1622032.100.2.001_Facility', 'F.1622016.040.2.001_Facility', 'F.1622016.017.1.002_Child', 'F.1622015.084.2.001_Facility', 'F.1622015.548.1.001_Parent', 'F.1622015.918.1.001_Parent', 'F.1122012.606.2.001_Facility', 'F.1622015.311.1.007_Child', 'F.1622016.091.1.013_Child']

Related

python - how to create a more compact group for dictionary

Hi this part of my code for a biology project:
# choosing and loading the file:
df = pd.read_csv('Dafniyot_Data.csv',delimiter=',')
#grouping data by C/I groups:
CII = df[df['group'].str.contains('CII')]
CCI = df[df['group'].str.contains('CCI')]
CCC = df[df['group'].str.contains('CCC')]
III = df[df['group'].str.contains('III')]
CIC = df[df['group'].str.contains('CIC')]
ICC = df[df['group'].str.contains('ICC')]
IIC = df[df['group'].str.contains('IIC')]
ICI = df[df['group'].str.contains('ICI')]
#creating a dictonary of the groups:
dict = {'CII':CII, 'CCI':CCI, 'CCC':CCC,'III':III,'CIC':CIC,'ICC':ICC,'IIC':IIC,'ICI':ICI}
#T test
#FERTUNITY
#using ttest for checking FERTUNITY - grandmaternal(F0)
t_F0a = stats.ttest_ind(CCC['N_offspring'],ICC['N_offspring'],nan_policy='omit')
t_F0b = stats.ttest_ind(CCI['N_offspring'],ICI['N_offspring'],nan_policy='omit')
t_F0c = stats.ttest_ind(IIC['N_offspring'],CIC['N_offspring'],nan_policy='omit')
t_F0d = stats.ttest_ind(CCI['N_offspring'],III['N_offspring'],nan_policy='omit')
t_F0 = {'FERTUNITY - grandmaternal(F0)':[t_F0a,t_F0b,t_F0c,t_F0d]}
I need to repeat the ttest part 6 more times with either changing the groups(CCC,etc..)or the row from the df('N_offspring',survival) which takes a lot of lines in the project.
I'm trying to find a way to still get the dictionary of each group in the end:
t_F0 = {'FERTUNITY - grandmaternal(F0)':[t_F0a,t_F0b,t_F0c,t_F0d]}
Because its vey useful for me later, but in a less repetitive way with less lines
Use itertools.product to generate all the keys, and a dict comprehension to generate the values:
from itertools import product
keys = [''.join(items) for items in product("CI", repeat=3)]
the_dict = { key: df[df['group'].str.contains(key)] for key in keys }
Similarly, you can generate the latter part of your test keys:
half_keys = [''.join(items) for items in product("CI", repeat=2)]
t_F0 = {
'FERTUNITY - grandmaternal(F0)': [
stats.ttest_ind(
the_dict[f"C{half_key}"]['N_offspring'],
the_dict[f"I{half_key}"]['N_offspring'],
nan_policy='omit'
) for half_key in half_keys
],
}
As an aside, you should not use dict as a variable name: it already has a meaning (the type of dict objects).
As a second aside, this deals with the literal question of how to DRY up creating a dictionary. However, do consider what Chris said in comments; this may be an XY problem.

ASCII-alphabetical topological sort

I have been trying to figure out a issue, in which my program does the topological sort but not in the format that I'm trying to get. For instance if I give it the input:
Learn Python
Understand higher-order functions
Learn Python
Read the Python tutorial
Do assignment 1
Learn Python
It should output:
Read the python tutorial
Understand higher-order functions
Learn Python
Do assignment 1
Instead I get it where the first two instances are swapped, for some of my other test cases this occurs as well where it will swap 2 random instances, heres my code:
import sys
graph={}
def populate(name,dep):
if name in graph:
graph[name].append(dep)
else:
graph[name]=[dep]
if dep not in graph:
graph[dep]=[]
def main():
last = ""
for line in sys.stdin:
lne=line.strip()
if last == "":
last=lne
else:
populate(last,lne)
last=""
def topoSort(graph):
sortedList=[] #result
zeroDegree=[]
inDegree = { u : 0 for u in graph }
for u in graph:
for v in graph[u]:
inDegree[v]+=1
for i in inDegree:
if(inDegree[i]==0):
zeroDegree.append(i)
while zeroDegree:
v=zeroDegree.pop(0)
sortedList.append(v)
#selection sort for alphabetical sort
for x in graph[v]:
inDegree[x]-=1
if (inDegree[x]==0):
zeroDegree.insert(0,x)
sortedList.reverse()
#for y in range(len(sortedList)):
# min=y
# for j in range(y+1,len(sortedList)):
# if sortedList[min]>sortedList[y]:
# min=j
# sortedList[y],sortedList[min]=sortedList[min],sortedList[y]
return sortedList
if __name__=='__main__':
main()
result=topoSort(graph)
if (len(result)==len(graph)):
print(result)
else:
print("cycle")
Any Ideas as to why this may be occurring?
The elements within dictionaries or sets are not ordered. If you add elements they are randomly inserted and not appended to the end. I think that is the reason why you get random results with your sorting algorithm. I guess it must have to do something with inDegree but I didn't debug very much.
I can't offer you a specific fix for your code, but accordingly to the wanted input and output it should look like this:
# read tuples from stdin until ctrl+d is pressed (on linux) or EOF is reached
graph = set()
while True:
try:
graph |= { (input().strip(), input().strip()) }
except:
break
# apply topological sort and print it to stdout
print("----")
while graph:
z = { (a,b) for a,b in graph if not [1 for c,d in graph if b==c] }
print ( "\n".join ( sorted ( {b for a,b in z} )
+ sorted ( {a for a,b in z if not [1 for c,d in graph if a==d]} ) ) )
graph -= z
The great advantage of Python (here 3.9.1) is the short solution you might get. Instead of lists I would use sets because those can be easier edited: graph|{elements} inserts items to this set and graph-{elements} removes entities from it. Duplicates are ignored.
At first are some tuples red from stdin with ... = input(), input() into the graph item set.
The line z = {result loop condition...} filters the printable elements which are then subtracted from the so called graph set.
The generated sets are randomly ordered so the printed output must be turned to sorted lists at the end which are separated by newlines.

Saving multiple return values from a Python function

I have a function that returns 4 values after doing some calculations. I give as input 5 parameters.
I run the above function 6 times using 6 different input parameters to obtain 6 different outputs.
def id_match(zcosmo,zphot,zmin,zmax,mlim):
data_zcosmo_lastz = zcosmo[(data_m200>mlim)*(zcosmo>zmin)*(zcosmo<zmax)]
data_zphot_lastz = zphot[(data_m200>mlim)*(zphot>zmin)*(zphot<zmax)]
halo_id_zcosmo = data_halo_id[(data_m200>mlim)*(zcosmo>zmin)*(zcosmo<zmax)]
halo_id_zphot = data_halo_id[(data_m200>mlim)*(zphot>zmin)*(zphot<zmax)]
idrep_zcosmo = data_idrep[(data_m200>mlim)*(zcosmo>zmin)*(zcosmo<zmax)]
idrep_zphot = data_idrep[(data_m200>mlim)*(zphot>zmin)*(zphot<zmax)]
file2freq1 = Counter(zip(halo_id_zcosmo,idrep_zcosmo))
file2freq2 = Counter(zip(halo_id_zphot,idrep_zphot))
set_a = len(set(file2freq1) & set(file2freq2)) # this has the number of common objects
difference = 100.0 - (set_a*100.0)/len(data_zcosmo_lastz)
print difference
return (len(data_zcosmo_lastz),len(data_zphot_lastz),set_a,difference)
zmin_limits = [0.1,0.4,0.7,1.0,1.3,1.6]
zmax_limits = [0.4,0.7,1.0,1.3,1.6,2.1]
mlim_limits = [5e13,5e13,5e13,5e13,5e13,5e13]
for a,b,c in zip(zmin_limits,zmax_limits,mlim_limits):
id_match(data_zcosmo_lastz,data_zphot_lastz,a,b,c)
The above code prints the difference for each of the 6 different input parameters.
But I would like to know how I can save the output from the function into an array so that I can save it as a csv file???
I know that by doing
a,b,c,d = id_match(input params)
will give a,b,c,d to have one of the outputs of id_match. But I want to store all the return values inside a single array.
id_match() already returns a tuple. You don't need to convert it to anything because csv.DictWriter.writerow() can handle a tuple. All you need to do is assign a variable to what id_match() returns and write to a csv file:
with open(myfilename, 'w') as csvfile:
writer = csv.DictWriter(csvfile)
for a,b,c in zip(zmin_limits,zmax_limits,mlim_limits):
info = id_match(data_zcosmo_lastz,data_zphot_lastz,a,b,c)
writer.writerow(info)

Merging lists obtained by a loop

I've only started python recently but am stuck on a problem.
# function that tells how to read the urls and how to process the data the
# way I need it.
def htmlreader(i):
# makes variable websites because it is used in a loop.
pricedata = urllib2.urlopen(
"http://website.com/" + (",".join(priceids.split(",")[i:i + 200]))).read()
# here my information processing begins but that is fine.
pricewebstring = pricedata.split("},{")
# results in [[1234,2345,3456],[3456,4567,5678]] for example.
array1 = [re.findall(r"\d+", a) for a in pricewebstring]
# writes obtained array to my text file
itemtxt2.write(str(array1) + '\n')
i = 0
while i <= totalitemnumber:
htmlreader(i)
i = i + 200
See the comments in the script as well.
This is in a loop and will each time give me an array (defined by array1).
Because I print this to a txt file it results in a txt file with separate arrays.
I need one big array so it needs to merge the results of htmlreader(i).
So my output is something like:
[[1234,2345,3456],[3456,4567,5678]]
[[6789,4567,2345],[3565,1234,2345]]
But I want:
[[1234,2345,3456],[3456,4567,5678],[6789,4567,2345],[3565,1234,2345]]
Any ideas how I can approach this?
Since you want to gather all the elements in a single list, you can simply gather them in another list, by flattening it like this
def htmlreader(i, result):
...
result.extend([re.findall(r"\d+", a) for a in pricewebstring])
i, result = 0, []
while i <= totalitemnumber:
htmlreader(i, result)
i = i + 200
itemtxt2.write(str(result) + '\n')
In this case, the result created by re.findall (a list) is added to the result list. Finally, you are writing the entire list as a whole to the file.
If the above shown method is confusing, then change it like this
def htmlreader(i):
...
return [re.findall(r"\d+", a) for a in pricewebstring]
i, result = 0, []
while i <= totalitemnumber:
result.extend(htmlreader(i))
i = i + 200

Matching strings for multiple data set in Python

I am working on python and I need to match the strings of several data files. First I used pickle to unpack my files and then I place them into a list. I only want to match strings that have the same conditions. This conditions are indicated at the end of the string.
My working script looks approximately like this:
import pickle
f = open("data_a.dat")
list_a = pickle.load( f )
f.close()
f = open("data_b.dat")
list_b = pickle.load( f )
f.close()
f = open("data_c.dat")
list_c = pickle.load( f )
f.close()
f = open("data_d.dat")
list_d = pickle.load( f )
f.close()
for a in list_a:
for b in list_b:
for c in list_c
for d in list_d:
if a.GetName()[12:] in b.GetName():
if a.GetName[12:] in c.GetName():
if a.GetName[12:] in d.GetName():
"do whatever"
This seems to work fine for these 2 lists. The problems begin when I try to add more 8 or 9 more data files for which I also need to match the same conditions. The script simple won't process and it gets stuck. I appreciate your help.
Edit: Each of the lists contains histograms named after the parameters that were used to create them. The name of the histograms contains these parameters and their values at the end of the string. In the example I did it for 2 data sets, now I would like to do it for 9 data sets without using multiple loops.
Edit 2. I just expanded the code to reflect more accurately what I want to do. Now if I try to do that for 9 lists, it does not only look horrible, but it also doesn't work.
out of my head:
files = ["file_a", "file_b", "file_c"]
sets = []
for f in files:
f = open("data_a.dat")
sets.append(set(pickle.load(f)))
f.close()
intersection = sets[0].intersection(*sets[1:])
EDIT: Well I overlooked your mapping to x.GetName()[12:], but you should be able to reduce your problem to set logic.
Here a small piece of code you can inspire on. The main idea is the use of a recursive function.
For simplicity sake, I admit that I already have data loaded in lists but you can get them from file before :
data_files = [
'data_a.dat',
'data_b.dat',
'data_c.dat',
'data_d.dat',
'data_e.dat',
]
lists = [pickle.load(open(f)) for f in data_files]
And because and don't really get the details of what you really need to do, my goal here is to found the matches on the four firsts characters :
def do_wathever(string):
print "I have match the string '%s'" % string
lists = [
["hello", "world", "how", "grown", "you", "today", "?"],
["growl", "is", "a", "now", "on", "appstore", "too bad"],
["I", "wish", "I", "grow", "Magnum", "mustache", "don't you?"],
]
positions = [0 for i in range(len(lists))]
def recursive_match(positions, lists):
strings = map(lambda p, l: l[p], positions, lists)
match = True
searched_string = strings.pop(0)[:4]
for string in strings:
if searched_string not in string:
match = False
break
if match:
do_wathever(searched_string)
# increment positions:
new_positions = positions[:]
lists_len = len(lists)
for i, l in enumerate(reversed(lists)):
max_position = len(l)-1
list_index = lists_len - i - 1
current_position = positions[list_index]
if max_position > current_position:
new_positions[list_index] += 1
break
else:
new_positions[list_index] = 0
continue
return new_positions, not any(new_positions)
search_is_finished = False
while not search_is_finished:
positions, search_is_finished = recursive_match(positions, lists)
Of course you can optimize a lot of things here, this is draft code, but take a look at the recursive function, this is a major concept.
In the end I ended up using the map built in function. I realize now I should have been even more explicit than I was (which I will do in the future).
My data files are histograms with 5 parameters, some with 3 or 4. Something like this,
par1=["list with some values"]
par2=["list with some values"]
par3=["list with some values"]
par4=["list with some values"]
par5=["list with some values"]
I need to examine the behavior of the quantity plotted for each possible combination of the values of the parameters. In the end, I get a data file with ~300 histograms each identified in their name with the corresponding values of the parameters and the sample name. It looks something like,
datasample1-par1=val1-par2=val2-par3=val3-par4=val4-par5=val5
datasample1-"permutation of the above values"
...
datasample9-par1=val1-par2=val2-par3=val3-par4=val4-par5=val5
datasample9-"permutation of the above values"
So I get 300 histograms for each of the 9 data files, but luckily all of this histograms are created in the same order. Hence I can pair all of them just using the map built in function. I unpack the data files, put each on lists and the use the map function to pair each histogram with its corresponding configuration in the other data samples.
for lst in map(None, data1_histosli, data2_histosli, ...data9_histosli):
do_something(lst)
This solves my problem. Thank you to all for your help!

Categories