How can I create function like that? - python

def f(s):
if s['col1'] == 2:
return s['new_column'] = s['col1']
elif s['col2'] == 3:
return s['new_column'] = s['col2']
else:
return s['new_column'] = s['col3']
This did not worked, I know np.select but I have different nested ifs and I must create a column with so many conditions. How can I do it?

Related

Python pandas column operations

I'm trying to do some columnar operations on a dataframe and I'm stuck at one point. I'm new to pandas and now I'm unable to figure how to do this.
So wherever there is a "Yes" value in "Prevous_Line_Has_Br" buffer should be added to the "OldTop" value but whenever there is a "No" in between it should stop incrementing, take the previous row value and start incrementing when there is a "Yes" again.
I have tried something like this
temp_df["CheckBr"] = temp_df["Prevous_Line_Has_Br"].shift(1)
temp_df["CheckBr"] = temp_df["CheckBr"].fillna("dummy")
temp_df.insert(0, 'New_ID', range(0, 0 + len(temp_df)))
temp_df["NewTop"] = "NoIncr"
temp_df["MyTop"] = 0
temp_df.loc[(temp_df["Prevous_Line_Has_Br"] == "Yes") & (temp_df["CheckBr"] == "Yes"), "NewTop"] = "Incr"
temp_df.loc[(temp_df["Prevous_Line_Has_Br"] == "Yes") & (temp_df["CheckBr"] == "No"), "NewTop"] = "Incr"
temp_df.loc[(temp_df["Prevous_Line_Has_Br"] == "Yes") & (temp_df["CheckBr"] == "dummy"), "NewTop"] = "Incr"
temp_df.loc[(temp_df["NewTop"]=="Incr"),"MyTop" ] = new_top + (temp_df.New_ID * temp_df.buffer)
temp_df.loc[(temp_df["CheckBr"] == "Yes") & (temp_df["MyTop"] == 0), "MyTop"] = temp_df["MyTop"].shift(1)
This is giving me the following output to achieve the same without the for loop:
Can someone please help achieve the values in the original dataframe using pandas?
This is what I want to achieve finally..
This would be fairly easy to do if you moved away from pandas, and treated the columns as just lists. If you want to still use the apply method, you can use to decorator to keep track of the last row.
def apply_func_decorator(func):
prev_row = {}
def wrapper(curr_row, **kwargs):
val = func(curr_row, prev_row)
prev_row.update(curr_row)
prev_row[new_col] = val
return val
return wrapper
#apply_func_decorator
def add_buffer_and_top(curr_row, prev_row):
if curr_row.Prevous_Line_Has_Br == 'Yes':
if prev_row:
return curr_row.buffer + prev_row['NewTop']
return curr_row.buffer + prev_row['OldTop']
return prev_row['NewTop']
temp_df['NewTop'] = 0
temp_df['NewTop'] = temp_df.apply(add_buffer_and_top, axis=1)
This is how I achieved the output I desired
m = temp_df['Prevous_Line_Has_Br'].eq('Yes')
temp_df['New_ID'] = m.cumsum().where(m,np.nan)
temp_df["New_ID"] = temp_df["New_ID"].ffill()
temp_df["Top"] = temp_df['Old_Top'] + (temp_df['New_ID'] * temp_df['buffer'])
Column New_ID was incremented only when there was a value 'Yes' in column Previous_Line_Has_br.

Are there better ways to write if statements in python?

So ive been writing Python for a bit now. I've decided to make an app to help my sister with multiplication tables. Im writing the code that will randomly pick from my 10 lists of the diffrent questions (I know there are better ways to write it but it gave me abilities i wanted to use with SQL). lists are by Table (Tone,Ttwo,Tthree, etc.) inside Tone would be ['1*1','1*2',...] then as seen in the if statement it calls by calling the list and problem with randomly generated numbers.
def pick_question():
Table = random.randint(0,9)
Col = random.randint(0,9)
if Table == 0:
if Col == 0:
return Tone[0]
elif Col == 1:
return Tone[1]
elif Col == 2:
return Tone[2]
elif Col == 3:
return Tone[3]
elif Col == 4:
return Tone[4]
elif Col == 5:
return Tone[5]
elif Col == 6:
return Tone[6]
elif Col == 7:
return Tone[7]
elif Col == 8:
return Tone[8]
elif Col == 9:
return Tone[9]
elif Table == 1:
if Col == 0:
return Ttwo[0]
elif Col == 1:
return Ttwo[1]
elif Col == 2:
return Ttwo[2]
elif Col == 3:
return Ttwo[3]
elif Col == 4:
return Ttwo[4]
elif Col == 5:
return Ttwo[5]
elif Col == 6:
return Ttwo[6]
elif Col == 7:
return Ttwo[7]
elif Col == 8:
return Ttwo[8]
elif Col == 9:
return Ttwo[9]
obviously it would keep going but it was already quite long. was wondering if there was anyway to make this not hae to be so repetitive and look better...
def pick_question():
Table = random.randint(0,9)
Col = random.randint(0,9)
return [Tone,Ttwo][Table][Col]
I guess what you are trying to write is
import random
Tone = [f"1*{i}" for i in range(1,10)]
Ttwo = [f"2*{i}" for i in range(1,10)]
Tthree = [f"3*{i}" for i in range(1,10)]
Tfour = [f"4*{i}" for i in range(1,10)]
Tfive = [f"5*{i}" for i in range(1,10)]
Tsix = [f"6*{i}" for i in range(1,10)]
Tseven = [f"7*{i}" for i in range(1,10)]
Teight = [f"8*{i}" for i in range(1,10)]
Tnine = [f"9*{i}" for i in range(1,10)]
Questions = [
Tone,
Ttwo,
Tthree,
Tfour,
Tfive,
Tsix,
Tseven,
Teight,
Tnine,
]
def pick_question():
Table = random.randint(0,8)
Col = random.randint(0,8)
return Questions[Table][Col]
print(pick_question())
but I guess what you are trying to do is this:
import random
A=random.randint(1,9)
B=random.randint(1,9)
print(f"{A}*{B}=?")
C=input()
try:
assert A*B==int(C)
print("You are RIGHT!")
except:
print(f"Your are WRONG, right answer is: {A*B}")
Good luck with python! it's an amazing language! :)
Just use a one-dimensional list:
def pick_question():
Table = random.randint(0,9)
Col = random.randint(0,9)
if Table == 0:
return Tone[Col]
elif Table == 1:
return Ttwo[Col]
This will do the trick.
Or even better, a two-dimensional list:
def pick_question():
Table = random.randint(0,9)
Col = random.randint(0,9)
List = [Tone, Ttwo]
return List[Table][Col]
I quite like this solution with dictionaries and the get method:
route = input()
branch = {'y': code1, 'n': code2}.get(route)
This shortens your code and will be easier to read.
Rather than write the inner if structures, why not just
return Tone[Col]
?
In fact, you can create a list with the Tone, Ttwo, etc. inside it and then write
return outer_list[Table][Col]
def pick_question(list_with_tone_ttwo):
table = random.randint(0,9)
col = random.randint(0,9)
return list_with_tone_ttwo[table][col]
EDIT: added full function

My function is returning an empty list and I don't understand why?

I am writing a file overlap function that takes two arguments: longestFile and shorterFile. I want to see if two files have the same numbers and if they do, it will append that number to an empty list.
These are the two lists I am using for the program:
The first file that I am using to compare and The second file that I use
def longestFile(firstFile, secondFile):
if len(firstFile)>len(secondFile):
return firstFile
else:
return secondFile
def shortestFile(firstFile, secondFile):
if len (firstFile) < len(secondFile):
return firstFile
else:
return secondFile
def middleNumber(theLongestFile):
return theLongestFile[len(theLongestFile)//2]
def fileOverlap(lstFirstFile,lstSecondFile):
lstMatchingNums = []
lstLongerFile = longestFile(lstFirstFile,lstSecondFile)
lstShortestFile = shortestFile(lstFirstFile,lstSecondFile)
for eachLines in range(len(lstShortestFile)):
lstLongerFile = longestfile(lstFirstFile,lstSecondFile)
for eachLine in range(len(lstLongerFile)):
if lstShortestFile[eachLines] == middleNumber(lstLongerFile):
lstMatchingNums.append(lstShortestFile[eachLines])
break
elif lstShortestFile[eachLines] < middleNumber(lstLongerFile):
lstLongerFile = lstLongerFile[0:len(lstLongerFile)//2+1]
if len(lstLongerFile) <= 2 and lstLongerFile[0] == lstShortestFile[eachLines]:
lstMatchingNums.append(lstShortestFile[eachLines])
break
elif middleNumber(lstLongerFile) != lstShortestFile[eachLines] and len(lstLongerFile) <=2:
break
elif lstShortestFile[eachLines] > middleNumber(lstLongerFile):
lstLongerFile = lstLongerFile[len(lstLongerFile)//2:]
if len(lstLongerFile) <= 2 and lstLongerFile[0] == lstShortestFile[eachLines]:
lstMatchingNums.append(lstShortestFile[eachLines])
break
elif middleNumber(lstLongerFile) != lstShortestFile[eachLines] and len(lstLongerFile) <= 2:
break
return lstMatchingNums
lstHappyNums = open('happynumbs.txt','r')
lstReadingHappyLines = lstHappyNums.readlines()
lstHappyNums.close()
lstPrimeNumsFile = open('primenumbers.txt','r')
lstReadingPrimeLines = lstPrimeNumsFile.readlines()
lstPrimeNumsFile.close()
print(fileOverlap(lstReadingHappyLines,lstReadingPrimeLines))
If I was to run this program, it would give me an empty list and I am not sure why.

How do You change a variable to a string in python?

So I am trying to change a randomized variable to a string with a function, any ideas why this isn't working?
def letter(x):
if x == 1:
x = "A"
elif x == 2:
x = "C"
elif x == 3:
x = "G"
elif x == 4:
x = "T"
else:
print "Error"
randint18= random.randrange(1,5)
letter(randint18)
print randint18 `
You have to return the value from the function, and assign it to a variable.
def letter(x):
...
return x
randint18 = random.randrange(1, 5)
result = letter(randint18)
print result
mine isn't a proper answer, which have been provided already, but a suggestion for improving your code. I'd do it in a comment, but the code formatting ain't good enough.
Why not use a dictionary for the mapping, instead of a sequence of if's? You could still place it in a function if you like:
letter = {1:'A', 2:'C', 3:'G', 4:'T'}
randint18 = random.randrange(1,5)
mapping = letter.get(randint18, 'Error')
print mapping
mind you, a list would be even more efficient, if the mapping started form zero:
letter = ['A', 'C', 'G', 'T']
randint18 = random.randrange(0,4)
try: # in case your random index were allowed to go past 3
mapping = letter[randint18]
except IndexError:
mapping = 'Error'
print mapping
You cannot alter the variable in place you must return it and capture the returned value.
import random
def letter(x):
if x == 1:
x = "A"
elif x == 2:
x = "C"
elif x == 3:
x = "G"
elif x == 4:
x = "T"
else:
print "Error"
return x # return it here
randint18= random.randrange(1,5)
randint18 = letter(randint18) # capture the returned value here
print randint18
There is a simpler way to achieve what you want, using a dictionary to map the values.
import random
def letter(x):
mapd = {1:'A', 2:'C', 3:'G', 4:'T'}
return mapd.get(x, None)
randint18= random.randrange(1,5)
randint18 = letter(randint18)
print randint18
You forgot to include a return in your function
def letter(x):
if x == 1:
x = "A"
elif x == 2:
x = "C"
elif x == 3:
x = "G"
elif x == 4:
x = "T"
else:
print "Error"
return x
randint18 = random.randrange(1,5)
returned_result = letter(randint18)
print returned_result
Add a return value of the function
return x
value_you_want = letter(randint18) ##add the return statement. Output will be saved to value_you_want
Please note that the variables defined inside a function are local to the function and cannot be accessed outside the scope of the function. You were expecting the value of x outside the function which is not possible. Just to check run your function and try to access the value in variable x. It will give error.
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
print x
NameError: name 'x' is not defined

Python: Dictionary being returned by func as string? What the heck am I doing wrong?

I'm generating a dictionary in a function and then returning this dictionary. I can't seem to access the returned dict as a dictionary though despite it being the correct format.. It is treating the data as a string only, ie i can print it but can't print d.keys() or d.items() What the heck am I doing wrong?????
data when printed as a str()
{1: '214902885,214902909', 2: '214902910,214902934', 3: '214902935,214902959', 4: '214902960,214902984', 5: '214902985,214903009', 6: '214903010,214903034', 7: '214903035,214903059', 8: '214903060,214903084', 9: '214903085,214903109', 10: '214903110,214903139'}
Error when I try to print d.items() or d.keys()
print bin_mapping.keys()
AttributeError: 'str' object has no attribute 'keys'
Once I have returned the dict from a function do I have to redefine it as a dictionary? I'd really appreciate some help as I'm super frustrated :/
Thanks,
As suggested here is the code.. Function I'm calling to return the dictionary first..
def models2bins_utr(id,type,start,end,strand):
''' chops up utr's into bins for mC analysis'''
# first deal with 5' UTR
feature_len = (int(end) - int(start))+1
bin_len = int(feature_len) /10
if int(feature_len) < 10:
return 'null'
#continue
else:
# now calculate the coordinates for each of the 10 bins
bin_start = start
d_utr_5 = {}
d_utr_3 = {}
for i in range(1,11):
# set 1-9 first, then round up bin# 10 )
if i != 10:
bin_end = (int(bin_start) +int(bin_len)) -1
if str(type) == 'utr_5':
d_utr_5[i] = str(bin_start)+','+str(bin_end)
elif str(type) == 'utr_3':
d_utr_3[i] = str(bin_start)+','+str(bin_end)
else:
pass
#now set new bin_start
bin_start = int(bin_end) + 1
# now round up last bin
else:
bin_end = end
if str(type) == 'utr_5':
d_utr_5[i] = str(bin_start)+','+str(bin_end)
elif str(type) == 'utr_3':
d_utr_3[i] = str(bin_start)+','+str(bin_end)
else:
pass
if str(type) == 'utr_5':
return d_utr_5
elif str(type) == 'utr_3':
return d_utr_3
Calling the function and trying to access the dict
def main():
# get a list of all the mrnas in the db
mrna_list = get_mrna()
for mrna_id in mrna_list:
print '-----'
print mrna_id
mrna_features = features(mrna_id)
# if feature utr, send to models2bins_utr and return dict
for feature in mrna_features:
id = feature[0]
type = feature[1]
start = feature[2]
end = feature[3]
assembly = feature[4]
strand = feature[5]
if str(type) == 'utr_5' or str(type) == 'utr_3':
bin_mapping = models2bins_utr(id,type,start,end,strand)
print bin_mapping
print bin_mapping.keys()
You return a string early on:
bin_len = int(feature_len) /10
if int(feature_len) < 10:
return 'null'
Perhaps you wanted to raise an exception instead here, or at the very least, return an empty dictionary or use None as a flag value.
If you use None do test for it:
bin_mapping = models2bins_utr(id,type,start,end,strand)
if bin_mapping is not None:
# you got a dictionary.
I'm wondering what return 'null' is supposed to achieve. My guess is that once in a while, you call the function with the wrong parameters and get this string back.
I suggest to throw an exception instead (raise Exception('Not enough arguments') or similar) or to return an empty dict.
You should also learn about repr() because it gives you more information about an object which makes debugging much easier.

Categories