List comprehension with a exit once value found [duplicate] - python

This question already has answers here:
How can I simplify repetitive if-elif statements in my grading system function?
(14 answers)
Closed 2 years ago.
i am now learning list comprehensions, and want to replace a lengthy if statement with an elegant list comprehension. The following if statement is what I want to convert to comprehension list below it. The comprehension list doesn't do what I want to do yet, but atleast you can see where I am trying to go with it.
would like the list comprehension to only give back one value as how the if statement will.
Thank you in advance
weight_kg = 8
if weight_kg <= 0.25:
price_weight = 2.18
elif weight_kg <= 0.5:
price_weight = 2.32
elif weight_kg <= 1:
price_weight = 2.49
elif weight_kg <= 1.5:
price_weight = 2.65
elif weight_kg <= 2:
price_weight = 2.90
elif weight_kg <= 3:
price_weight = 4.14
elif weight_kg <= 4:
price_weight = 4.53
elif weight_kg <= 5:
price_weight = 4.62
elif weight_kg <= 6:
price_weight = 5.28
elif weight_kg <= 7:
price_weight = 5.28
elif weight_kg <= 8:
price_weight = 5.42
elif weight_kg <= 9:
price_weight = 5.42
elif weight_kg <= 10:
price_weight = 5.42
elif weight_kg <= 11:
price_weight = 5.43
else:
price_weight = 5.63
print(price_weight)
shipping_price = [{"weight": 0.25, "price" : 2.18}, {"weight": 0.5 "price" : 2.32}, {"weight": 1 "price" : 2.49}]
toy_weight = 0.6
price = [ship_price["weight"] for ship_price in shipping_price if ship_price["weight"] <= toy_weight]
print(price)

Since you only want the first value from the generator expression, you don't want a list at all. Just use next to pull the first value:
>>> shipping_price = [
... {"weight": 0.25, "price" : 2.18},
... {"weight": 0.5, "price" : 2.32},
... {"weight": 1, "price" : 2.49}
... ]
>>> toy_weight = 0.6
>>> next(sp["price"] for sp in shipping_price if sp["weight"] >= toy_weight)
2.49
I'd use tuples for this rather than dictionaries. Possibly NamedTuples if you have a lot of fields and want to give them names, but for two values I'd just use a plain old tuple like this:
>>> weights_and_prices = [(0.25, 2.18), (0.5, 2.32), (1, 2.49)]
>>> toy_weight = 0.6
>>> next(wp[1] for wp in weights_and_prices if wp[0] >= toy_weight)
2.49
Expand the weights_and_prices tuple list as needed (i.e. with all the remaining weight/price values from your original if/elif chain).

Related

Attribute change with variable number of time steps

I would like to simulate individual changes in growth and mortality for a variable number of days. My dataframe is formatted as follows...
import pandas as pd
data = {'unique_id': ['2', '4', '5', '13'],
'length': ['27.7', '30.2', '25.4', '29.1'],
'no_fish': ['3195', '1894', '8', '2774'],
'days_left': ['253', '253', '254', '256'],
'growth': ['0.3898', '0.3414', '0.4080', '0.3839']
}
df = pd.DataFrame(data)
print(df)
unique_id length no_fish days_left growth
0 2 27.7 3195 253 0.3898
1 4 30.2 1894 253 0.3414
2 5 25.4 8 254 0.4080
3 13 29.1 2774 256 0.3839
Ideally, I would like the initial length (i.e., length) to increase by the daily growth rate (i.e., growth) for each of the days remaining in the year (i.e., days_left).
df['final'] = df['length'] + (df['days_left'] * df['growth']
However, I would also like to update the number of fish that each individual represents (i.e., no_fish) on a daily basis using a size-specific equation. I'm fairly new to python so I initially thought to use a for-loop (I'm not sure if there is another, more efficient way). My code is as follows:
# keep track of run time - START
start_time = time.perf_counter()
df['z'] = 0.0
for indx in range(len(df)):
count = 1
while count <= int(df.days_to_forecast[indx]):
# (1) update individual length
df.lgth[indx] = df.lgth[indx] + df.linearGR[indx]
# (2) estimate daily size-specific mortality
if df.lgth[indx] > 50.0:
df.z[indx] = 0.01
else:
if df.lgth[indx] <= 50.0:
df.z[indx] = 0.052857-((0.03/35)*df.lgth[indx])
elif df.lgth[indx] < 15.0:
df.z[indx] = 0.728*math.exp(-0.1892*df.lgth[indx])
df['no_fish'].round(decimals = 0)
if df.no_fish[indx] < 1.0:
df.no_fish[indx] = 0.0
elif df.no_fish[indx] >= 1.0:
df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
# (3) reduce no. of days left in forecast by 1
count = count + 1
# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
The above code now works correctly, but it is still far to inefficient to run for 40,000 individuals each for 200+ days.
I would really appreciate any advice on how to modify the following code to make it pythonic.
Thanks
Another option that was suggested to me is to use the pd.dataframe.apply function. This dramatically reduced the overall the run time and could be useful to someone else in the future.
### === RUN SIMULATION === ###
start_time = time.perf_counter() # keep track of run time -- START
#-------------------------------------------------------------------------#
def function_to_apply( df ):
df['z_instantMort'] = ''
for indx in range(int(df['days_left'])):
# (1) update individual length
df['length'] = df['length'] + df['growth']
# (2) estimate daily size-specific mortality
if df['length'] > 50.0:
df['z_instantMort'] = 0.01
else:
if df['length'] <= 50.0:
df['z_instantMort'] = 0.052857-((0.03/35)*df['length'])
elif df['length'] < 15.0:
df['z_instantMort'] = 0.728*np.exp(-0.1892*df['length'])
whole_fish = round(df['no_fish'], 0)
if whole_fish < 1.0:
df['no_fish'] = 0.0
elif whole_fish >= 1.0:
df['no_fish'] = df['no_fish']*np.exp(-(df['z_instantMort']))
return df
#-------------------------------------------------------------------------#
sim_results = df.apply(function_to_apply, axis=1)
total_elapsed_time = round(time.perf_counter() - start_time, 2) # END
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(sim_results)
### ====================== ###
output being...
Forecast iteration completed in 0.05 seconds
unique_id length no_fish days_left growth z_instantMort
0 2.0 126.3194 148.729190 253.0 0.3898 0.01
1 4.0 116.5742 93.018465 253.0 0.3414 0.01
2 5.0 129.0320 0.000000 254.0 0.4080 0.01
3 13.0 127.3784 132.864757 256.0 0.3839 0.01
As I said in my comment, a preferable alternative to for loops in this setting is using vector operations. For instance, running your code:
import pandas as pd
import time
import math
import numpy as np
data = {'unique_id': [2, 4, 5, 13],
'length': [27.7, 30.2, 25.4, 29.1],
'no_fish': [3195, 1894, 8, 2774],
'days_left': [253, 253, 254, 256],
'growth': [0.3898, 0.3414, 0.4080, 0.3839]
}
df = pd.DataFrame(data)
print(df)
# keep track of run time - START
start_time = time.perf_counter()
df['z'] = 0.0
for indx in range(len(df)):
count = 1
while count <= int(df.days_left[indx]):
# (1) update individual length
df.length[indx] = df.length[indx] + df.growth[indx]
# (2) estimate daily size-specific mortality
if df.length[indx] > 50.0:
df.z[indx] = 0.01
else:
if df.length[indx] <= 50.0:
df.z[indx] = 0.052857-((0.03/35)*df.length[indx])
elif df.length[indx] < 15.0:
df.z[indx] = 0.728*math.exp(-0.1892*df.length[indx])
df['no_fish'].round(decimals = 0)
if df.no_fish[indx] < 1.0:
df.no_fish[indx] = 0.0
elif df.no_fish[indx] >= 1.0:
df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
# (3) reduce no. of days left in forecast by 1
count = count + 1
# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(df)
with output:
unique_id length no_fish days_left growth
0 2 27.7 3195 253 0.3898
1 4 30.2 1894 253 0.3414
2 5 25.4 8 254 0.4080
3 13 29.1 2774 256 0.3839
Forecast iteration completed in 31.75 seconds
unique_id length no_fish days_left growth z
0 2 126.3194 148.729190 253 0.3898 0.01
1 4 116.5742 93.018465 253 0.3414 0.01
2 5 129.0320 0.000000 254 0.4080 0.01
3 13 127.3784 132.864757 256 0.3839 0.01
Now with vector operations, you could do something like:
# keep track of run time - START
start_time = time.perf_counter()
df['z'] = 0.0
for day in range(1, df.days_left.max() + 1):
update = day <= df['days_left']
# (1) update individual length
df[update]['length'] = df[update]['length'] + df[update]['growth']
# (2) estimate daily size-specific mortality
df[update]['z'] = np.where( df[update]['length'] > 50.0, 0.01, 0.052857-( ( 0.03 / 35)*df[update]['length'] ) )
df[update]['z'] = np.where( df[update]['length'] < 15.0, 0.728 * np.exp(-0.1892*df[update]['length'] ), df[update]['z'] )
df[update]['no_fish'].round(decimals = 0)
df[update]['no_fish'] = np.where(df[update]['no_fish'] < 1.0, 0.0, df[update]['no_fish'] * np.exp(-(df[update]['z'])))
# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(df)
with output
Forecast iteration completed in 1.32 seconds
unique_id length no_fish days_left growth z
0 2 126.3194 148.729190 253 0.3898 0.0
1 4 116.5742 93.018465 253 0.3414 0.0
2 5 129.0320 0.000000 254 0.4080 0.0
3 13 127.3784 132.864757 256 0.3839 0.0

Filter dataframe with multiple conditions including OR

I wrote a little script that loops through constraints to filter a dataframe. Example and follow up explaining the issue are below.
constraints = [['stand','==','L'],['zone','<','20']]
for x in constraints:
vari = x[2]
df = df.query("{0} {1} #vari".format(x[0],x[1]))
zone
stand
speed
type
0
2
L
83.7
CH
1
7
L
95.9
SI
2
14
L
94.9
FS
3
11
L
93.3
FS
4
13
L
86.9
CH
5
7
L
96.4
SI
6
13
L
82.6
SL
I can't figure out a way to filter when there is an OR condition. For example, in the table above I'd like to return a dataframe using the constraints in the code example along with any rows that contain SI or CH in the type column. Does anyone have ideas on how to accomplish this? Any help would be greatly appreciated.
This seems to have gotten the job done but there is probably a much better way of going about it.
for x in constraints:
vari = x[2]
if isinstance(vari,list):
frame = frame[frame[x[0]].isin(vari)]
else:
frame = frame.query("{0} {1} #vari".format(x[0],x[1]))
IIUC (see my question in the comment) you can do it like this:
Made a little different df to show you the result (I guess the table you show is already filtered)
df = pd.DataFrame(
{'zone': {0: 2, 1: 11, 2: 25, 3: 11, 4: 23, 5: 7, 6: 13},
'stand': {0: 'L', 1: 'L', 2: 'L', 3: 'C', 4: 'L', 5: 'K', 6: 'L'},
'speed': {0: 83.7, 1: 95.9, 2: 94.9, 3: 93.3, 4: 86.9, 5: 96.4, 6: 82.6},
'type': {0: 'CH', 1: 'SI', 2: 'FS', 3: 'FS', 4: 'CH', 5: 'SI', 6: 'SL'}})
print(df)
zone stand speed type
0 2 L 83.7 CH
1 11 L 95.9 SI
2 25 L 94.9 FS
3 11 C 93.3 FS
4 23 L 86.9 CH
5 7 K 96.4 SI
6 13 L 82.6 SL
res = df.loc[ ( (df['type']=='SI') | (df['type']=='CH') ) & ( (df['zone']<20) & (df['stand']=='L') ) ]
print(res)
zone stand speed type
0 2 L 83.7 CH
1 11 L 95.9 SI
Let me know if that is what you are searching for.

Can't figure out NameError

I am trying to build a cost model using mostly if statements but I keep getting the error:
NameError: name 'spread_1' is not defined.
I am a beginner with python so I don't know if I've done something incorrect. Please help.
#water depth of project site in metres
water_depth = 100
#platform wells require jackups
pl = 1
#1-2 complexity required LWIV
ss_simple = 1
#3-4 complexity requires rig
ss_complex = 0
#day rates
vjackup = 75000
jackup = 90000
vsemi = 170000
semi = 300000
lwiv = 200000
#determining vessel spread for platform wells
if pl >= 1:
if water_depth == range(0, 50):
spread_1 = vjackup * 24.1
elif water_depth == range(51, 150):
spread_1 = jackup * 24.1
elif pl == 0:
spread_1 = 0
You should replace the == with in at the
if water_depth == range(0, 50):
and
elif water_depth == range(51, 150):
making your block of code:
if pl >= 1:
if water_depth == range(0, 50):
spread_1 = vjackup * 24.1
elif water_depth == range(51, 150):
spread_1 = jackup * 24.1
elif pl == 0:
spread_1 = 0
into
if pl >= 1:
if water_depth in range(0, 50):
spread_1 = vjackup * 24.1
elif water_depth in range(51, 150):
spread_1 = jackup * 24.1
elif pl == 0:
spread_1 = 0
But it would be more practical to use extreme equality operators in your case:
if pl >= 1:
if 0 <= water_depth < 50:
spread_1 = vjackup * 24.1
elif 51 <= water_depth < 150:
spread_1 = jackup * 24.1
elif pl == 0:
spread_1 = 0
Do note that the range() function omits the end value, so range(0, 50)'s last value would be 49, not 50.

Problem with getting right return from function

why does my code return wrong ticket prices? I am supposed to add a time factor as well, but can't get even this to work. This is what I am supposed to do:
"""
Price of one bus ticket
time 6-17, price 2.7, age 16-64
time 18-22, price 3.5, age 16-64
time 23 and 0-5, price 4, age 16-64
for ages 0-2 ticket is free at all times
time 6-17, price 1.7, ages 3-15 and 65 -->
time 18-22, price 2.5, ages 3-15 and 65 -->
time 23 and 0-5, price 3.0, ages 3-15 and 65 -->
"""
def calculate_ticket_price(age):
ticket_price = 0
while True:
if age >= 0 or age <= 2:
ticket_price = 1.0
if age <= 15 or age >= 3 or age >= 65:
ticket_price = 1.5
if age > 15 or age < 65:
ticket_price = 2.7
return float(ticket_price)
def main():
age = 5
price = calculate_ticket_price(age)
print(price)
if __name__ == '__main__':
main()
I think it’ll return the wrong price cause you’re using or where you need an and.
Your first if statement should be:
if ((age >= 0) and (age <= 2)):
Your second if statement should be:
if (((age <= 15) and (age >= 3)) or (age >= 65)):
Then your third one:
if ((age > 15) and (age < 65)):
def calculate_ticket_price(age, time):
while True:
if time >= 6 or time <= 17 and age > 15 or age < 65:
ticket_price = 2.7
elif time >= 18 or time <= 22 and age > 15 or age < 65:
ticket_price = 3.5
elif time >= 23 or time >= 0 or time <= 5 and age > 15 or age < 65:
ticket_price = 4.0
elif time >= 6 or time <= 17 and age <= 15 or age >= 3 or age >= 65:
ticket_price = 1.7
elif time >= 18 or time <= 22 and age <= 15 or age >= 3 or age >= 65:
ticket_price = 2.5
elif time >= 23 or time >= 0 or time <= 5 and age <= 15 or age >= 3 or age >= 65:
ticket_price = 3.0
else:
ticket_price = 0.0
return float(ticket_price)
def main():
age = 5
time = 12
price = calculate_ticket_price(age, time)
print(price)
if __name__ == '__main__':
main()
Made these edits. Should there be and between every >=, <= etc..?
You're using or when I think you want to be using and. For example, this condition:
if age >= 0 or age <= 2:
is going to be true for any positive number, since the first part will always match.
You also want to be using elif so that only one of these blocks will happen. Your last condition:
if age > 15 or age < 65:
ticket_price = 2.7
is going to happen any time the age is under 65 or over 15 (which is going to be every number), so I'd expect that your function just always returns 2.7.
A simpler way to write this function that follows the simple age-only rules you're trying to implement would be:
def calculate_ticket_price(age: int) -> float:
if age <= 2:
return 0.0 # infant price
elif age <= 15:
return 1.5 # youth price
elif age <= 65:
return 2.7 # adult price
else:
return 1.5 # senior price
In this very simple example, only the first condition that matches will return a value, so testing both sides of the range isn't necessary.
You can also check for an age to be within a particular range by writing an expression like 2 < age <= 15, or age > 2 and age < 15, or even age in range(2, 16).
Note that putting everything inside a while loop serves no purpose at all -- avoid having lines of code that don't do anything useful, since they're just one more place for bugs to appear. :)
As far as having the function account for both age and time, I notice that the fare table amounts to giving youth/seniors the same $1 discount regardless of what time it is, so I might simplify it down like this rather than have a different condition for each age/time combination:
def calculate_ticket_price(time: int, age: int) -> float:
# Infants ride free
if age <= 2:
return 0.0
# Youth and seniors get a $1.00 discount
discount = 1.0 if age <= 15 or age >= 65 else 0.0
if 6 <= time <= 17:
return 2.7 - discount
if 18 <= time <= 22:
return 3.5 - discount
if 0 <= time <= 5 or time == 23:
return 4.0 - discount
raise ValueError(f"invalid time {time}!")

How to assign months to their numeric equivalents in Python / Pandas?

Currently, I'm using the following for loop based on an if condition for each month to assign months to their numeric equivalents. It seems to be quite efficient in terms of runtime, but is too manual and ugly for my preferences.
How could this be better executed? I imagine it's possible to improve on it by simplifying/condensing the multiple if conditions somehow, as well as by using some sort of translator that is made for date conversions? Each of which would be preferable?
#make numeric month
combined = combined.sort_values('month')
combined.index = range(len(combined))
combined['month_numeric'] = None
for i in combined['month'].unique():
first = combined['month'].searchsorted(i, side='left')
last = combined['month'].searchsorted(i, side='right')
first_num = list(first)[0] #gives first instance
last_num = list(last)[0] #gives last instance
if i == 'January':
combined['month_numeric'][first_num:last_num] = "01"
elif i == 'February':
combined['month_numeric'][first_num:last_num] = "02"
elif i == 'March':
combined['month_numeric'][first_num:last_num] = "03"
elif i == 'April':
combined['month_numeric'][first_num:last_num] = "04"
elif i == 'May':
combined['month_numeric'][first_num:last_num] = "05"
elif i == 'June':
combined['month_numeric'][first_num:last_num] = "06"
elif i == 'July':
combined['month_numeric'][first_num:last_num] = "07"
elif i == 'August':
combined['month_numeric'][first_num:last_num] = "08"
elif i == 'September':
combined['month_numeric'][first_num:last_num] = "09"
elif i == 'October':
combined['month_numeric'][first_num:last_num] = "10"
elif i == 'November':
combined['month_numeric'][first_num:last_num] = "11"
elif i == 'December':
combined['month_numeric'][first_num:last_num] = "12"
You can use to_datetime, then month, convert to string and use zfill:
print (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
Sample:
import pandas as pd
df = pd.DataFrame({ 'month': ['January','February', 'December']})
print (df)
month
0 January
1 February
2 December
print (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
0 01
1 02
2 12
Name: month, dtype: object
Another solution is map by dict d:
d = {'January':'01','February':'02','December':'12'}
print (df['month'].map(d))
0 01
1 02
2 12
Name: month, dtype: object
Timings:
df = pd.DataFrame({ 'month': ['January','February', 'December']})
print (df)
df = pd.concat([df]*1000).reset_index(drop=True)
print (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
print (df['month'].map({'January':'01','February':'02','December':'12'}))
In [200]: %timeit (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
100 loops, best of 3: 13.5 ms per loop
In [201]: %timeit (df['month'].map({'January':'01','February':'02','December':'12'}))
1000 loops, best of 3: 462 µs per loop
You can use a map:
month2int = {"January":1, "February":2, ...}
combined["month_numeric"] = combined["month"].map(month2int)

Categories