I am trying to apply one funtion to a column but i am getting the error
Name weight
Person1 30
Person2 70
My code is below
def classify(x):
if 0 <= x < 20:
y = "0 to 20%"
if 20 < x < 40:
y = "20 to 40%"
if 40 < x < 60:
y = "40 to 60%"
if 60 < x < 80:
y = "60 to 80%"
if 80 < x <= 100:
y = "80 to 100%"
return ( y)
df['Target'] = df['weight'].apply(lambda x: classify(x)) throwing the Local bound error
If I use print instead of return I am able to see the outputs
Expected out
Name weight Target
Person1 30 20 to 40
Person2 70 60 to 80
Why not using cut
df['Target']=pd.cut(df['weight'],[0,20,40,60,80,100])
Related
I need to divide range of my passengers age onto 5 parts and create a new column where will be values from 0 to 4 respectively for every part(For 1 range value 0 for 2 range value 1 etc)
a = range(0,17)
b = range(17,34)
c = range(34, 51)
d = range(51, 68)
e = range(68,81)
a1 = titset.query('Age >= 0 & Age < 17')
a2 = titset.query('Age >= 17 & Age < 34')
a3 = titset.query('Age >= 34 & Age < 51')
a4 = titset.query('Age >= 51 & Age < 68')
a5 = titset.query('Age >= 68 & Age < 81')
titset['Age_bin'] = a1.apply(0 for a in range(a))
Here what i tried to do but it does not work. I also pin dataset picture
DATASET
I expect to get result where i'll see a new column named 'Age_bin' and values 0 in it for Age from 0 to 16 inclusively, values 1 for age from 17 to 33 and other 3 rangers
Binning with pandas cut is appropriate here, try:
titset['Age_bin'] = titset['Age'].cut(bins=[0,17,34,51,68,81], include_lowest=True, labels=False)
First of all, the variable a is a range object, which you are calling range(a) again, which is equivalent to range(range(0, 17)), hence the error.
Secondly, even if you fixed the above problem, you will run into an error again since .apply takes in a callable (i.e., a function be it defined with def or a lambda function).
If your goal is to assign a new column that represents the age group that each row is in, you can just filter with your result and assign them:
titset = pd.DataFrame({'Age': range(1, 81)})
a = range(0,17)
b = range(17,34)
c = range(34, 51)
d = range(51, 68)
e = range(68,81)
a1 = titset.query('Age >= 0 & Age < 17')
a2 = titset.query('Age >= 17 & Age < 34')
a3 = titset.query('Age >= 34 & Age < 51')
a4 = titset.query('Age >= 51 & Age < 68')
a5 = titset.query('Age >= 68 & Age < 81')
titset.loc[a1.index, 'Age_bin'] = 0
titset.loc[a2.index, 'Age_bin'] = 1
titset.loc[a3.index, 'Age_bin'] = 2
titset.loc[a4.index, 'Age_bin'] = 3
titset.loc[a5.index, 'Age_bin'] = 4
Or better yet, use a for loop:
age_groups = [0, 17, 34, 51, 68, 81]
for i in range(len(age_groups) - 1):
subset = titset.query(f'Age >= {age_groups[i]} & Age < {age_groups[i+1]}')
titset.loc[subset.index, 'Age_bin'] = i
I have a dataframe like below.
import pandas as pd
import numpy as np
raw_data = {'student':['A','B','C','D','E'],
'score': [100, 96, 80, 105,156],
'height': [7, 4,9,5,3],
'trigger1' : [84,95,15,78,16],
'trigger2' : [99,110,30,93,31],
'trigger3' : [114,125,45,108,46]}
df2 = pd.DataFrame(raw_data, columns = ['student','score', 'height','trigger1','trigger2','trigger3'])
print(df2)
I need to derive Flag column based on multiple conditions.
i need to compare score and height columns with trigger 1 -3 columns.
Flag Column:
if Score greater than equal trigger 1 and height less than 8 then Red --
if Score greater than equal trigger 2 and height less than 8 then Yellow --
if Score greater than equal trigger 3 and height less than 8 then Orange --
if height greater than 8 then leave it as blank
How to write if else conditions in pandas dataframe and derive columns?
Expected Output
student score height trigger1 trigger2 trigger3 Flag
0 A 100 7 84 99 114 Yellow
1 B 96 4 95 110 125 Red
2 C 80 9 15 30 45 NaN
3 D 105 5 78 93 108 Yellow
4 E 156 3 16 31 46 Orange
For other column Text1 in my original question I have tried this one but the integer columns not converting the string when concatenation using astype(str) any other approach?
def text_df(df):
if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger1'].astype(str) + " and less than height 5"
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger2'].astype(str) + " and less than height 5"
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger3'].astype(str) + " and less than height 5"
elif (df['height'] > 8):
return np.nan
You need chained comparison using upper and lower bound
def flag_df(df):
if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return 'Red'
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return 'Yellow'
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return 'Orange'
elif (df['height'] > 8):
return np.nan
df2['Flag'] = df2.apply(flag_df, axis = 1)
student score height trigger1 trigger2 trigger3 Flag
0 A 100 7 84 99 114 Yellow
1 B 96 4 95 110 125 Red
2 C 80 9 15 30 45 NaN
3 D 105 5 78 93 108 Yellow
4 E 156 3 16 31 46 Orange
Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else
Edit: answering #Cecilia's questions
what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2
Not sure what you tried but you can return a derived value instead of string using
def flag_df(df):
if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan
what if there are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working
Again not sure what code did you try but using pandas isnull would do the trick
def flag_df(df):
if pd.isnull(df['height']):
return df['height']
elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan
Here is a way to use numpy.select() for doing this with neat code, scalable and faster:
conditions = [
(df2['trigger1'] <= df2['score']) & (df2['score'] < df2['trigger2']) & (df2['height'] < 8),
(df2['trigger2'] <= df2['score']) & (df2['score'] < df2['trigger3']) & (df2['height'] < 8),
(df2['trigger3'] <= df2['score']) & (df2['height'] < 8),
(df2['height'] > 8)
]
choices = ['Red','Yellow','Orange', np.nan]
df['Flag1'] = np.select(conditions, choices, default=np.nan)
you can use also apply with a custom function on axis 1 like this :
def color_selector(x):
if (x['trigger1'] <= x['score'] < x['trigger2']) and (x['height'] < 8):
return 'Red'
elif (x['trigger2'] <= x['score'] < x['trigger3']) and (x['height'] < 8):
return 'Yellow'
elif (x['trigger3'] <= x['score']) and (x['height'] < 8):
return 'Orange'
elif (x['height'] > 8):
return ''
df2 = df2.assign(flag=df2.apply(color_selector, axis=1))
you will get something like this :
I'm using numpy.random.rand(1) to generate a number between 0-1 and the values of a and b change depending on what the number is. How can I make the probability of x > .5 and x < .5 be proportional to a and b? So if a is 75 and b is 15 then the probability of x > .5 is 5 times more probable than x < .5. I'm unsure as to what codes to use to assign probabilities. This is what I have:
a = 100
b = 100
while True:
x = numpy.random.rand(1)
print x
if x < .5:
a = a + 10
b = b - 10
print a
print b
if x > .5:
b = b + 10
a = a - 10
print a
print b
if b1 == 0:
print a
break
if b2 == 0:
print b
break
I'd make two calls to random: One to calculate a random number between 0 and 0.5 and a second to determine if the number should be above 0.5.
For example;
a = 100
b = 100
x = numpy.random.rand(1)/2.0
proportion = a/(1.0*b+a)
if numpy.random.rand(1) > proportion:
x += 0.5
what a fitting name.
ratio = a/b
x = numpy.random.uniform(0.5 + ratio*0.5)
now you have a numbers distributed between 0 and the ratio multiplied by 0.5. With uniform distribution, the ratio between the population of numbers greater than 0.5 and the population lower than 0.5 is the desired ratio.
now we just need to broadcast those ranges to be between 0.5 and 1.0.
if x >= 0.5:
x = x - math.trunc(x)
if int(str(number-int(x))[1:]) < 5:
x += 0.5
In this case I would have numpy generate a number between 1 and 100 then assign based on that. I.e. (pusdocode)
if rand(100) => 50: a=0
else: a=1
then all you have to do is changed the 50 to whatever the % you want. I can elaborate further if that is confusing.
def get_rand_value(a, b):
if rand(100) => a:
return True
else:
return 1
a = 100
b = 100
while True:
x = get_rand_value(a, b)
print x
if x < .5:
a = a + 10
b = b - 10
print a
print b
if x > .5:
b = b + 10
a = a - 10
print a
print b
if b1 == 0:
print a
break
if b2 == 0:
print b
break
When I was testing a counter, I discovered that it only seems to display the last item to go through it. For example, if something was excellent, it showed up as counted so it would be "1". However regardless of other data, the rest would be 0.
def mealrating(score, review):
for x in range(0,len(score)):
mp = 0
mg = 0
me = 0
if score[x] >= 1 and score[x] <= 3:
review.append("poor")
mp = mp + 1
if score[x] >= 4 and score[x] <= 6:
review.append("good")
mg = mg + 1
if score[x] >= 7 and score[x] <= 10:
review.append("excellent")
me = me + 1
print("The customer rated tonight's meal as:")
print('Poor:' + str(mp))
print('Good:' + str(mg))
print('Excellent:' + str(me))
print("\n")
You are resetting mp, mg, and me in each iteration.
def mealrating(score, review):
mp = 0
mg = 0
me = 0
for x in range(0,len(score)):
if score[x] >= 1 and score[x] <= 3:
review.append("poor")
mp = mp + 1
if score[x] >= 4 and score[x] <= 6:
review.append("good")
mg = mg + 1
if score[x] >= 7 and score[x] <= 10:
review.append("excellent")
me = me + 1
print("The customer rated tonight's meal as:")
print('Poor:' + str(mp))
print('Good:' + str(mg))
print('Excellent:' + str(me))
print("\n")
You must initialize the counters outside the loop:
mp = 0
mg = 0
me = 0
for x in range(0, len(score)):
# same as before
Otherwise they'll get reset at each iteration! To make your code more Pythonic, take the following tips into consideration:
A condition of the form x >= i and x <= j can be written more concisely as i <= x <= j
The idiomatic way to traverse a list is using iterators, without explicitly using indexes
The conditions are mutually exclusive, so you should use elif
Use += for incrementing a variable
This is what I mean:
mp = mg = me = 0
for s in score:
if 1 <= s <= 3:
review.append("poor")
mp += 1
elif 4 <= s <= 6:
# and so on
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
What is the value of y after the following statements?
x = 100
y = 0
while x > 50:
y = y + 1
x = x - 1
I'm having trouble with questions that involve 2 variables.
Step through the first few iterations of the loop, look for a pattern, and extrapolate.
x = 100 # x = 100
y = 0 # x = 100 y = 0
if x > 50: # x = 100 y = 0
y = y + 1 # x = 100 y = 1
x = x - 1 # x = 99 y = 1
if x > 50: # x = 99 y = 1
y = y + 1 # x = 99 y = 2
x = x - 1 # x = 98 y = 2
if x > 50: # x = 98 y = 2
y = y + 1 # x = 98 y = 3
x = x - 1 # x = 97 y = 3
if x > 50: # x = 97 y = 3
y = y + 1 # x = 97 y = 4
x = x - 1 # x = 96 y = 4
if x > 50: # x = 96 y = 4
y = y + 1 # x = 96 y = 5
x = x - 1 # x = 95 y = 5
if x > 50: # x = 95 y = 5
y = y + 1 # x = 95 y = 6
x = x - 1 # x = 94 y = 6
...
if x > 50: # x = 52 y = 48
y = y + 1 # x = 52 y = 49
x = x - 1 # x = 51 y = 49
if x > 50: # x = 51 y = 49
y = y + 1 # x = 51 y = 50
x = x - 1 # x = 50 y = 50
if x > 50: # x = 50 y = 50
(false, end process)