convert nested conditions string to flat string - python

I have string which contains conditions like this:
"((A < 5 & B < -500) & C < 0.05)"
i want to convert this to "(A < 5) & (B < -500) & (C < 0.05)". i need it in this format becasue i want to apply the condition on a dataframe.
If i use this "((A < 5 & B < -500) & C < 0.05)" im getting the following error:
TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]

You should check Operator precedence
Your expression is wrong because 5 & B is evaluated before A < 5 then B < -500
"((A < 5 & B < -500) & C < 0.05)"
You expression should be:
"((A < 5) & (B < -500)) & (C < 0.05)"
The expression above is equivalent to:
"(A < 5) & (B < -500) & (C < 0.05)"
But ((A < 5 & B < -500) & C < 0.05) is different than (A < 5) & (B < -500) & (C < 0.05)

Related

Apply a function to two columns of dataframe and create a new column

I want to apply this function on the following columns but I could not.
def damage(a,b):
l=5
if (a==b+0.3) or ((a>=1.5* b) and (a<=1.9 * b)):
l=1
elif (a>=2* b) and (a<=2.9 * b) :
l=2
elif (a>=4) or (a>= 3* b):
l=3
elif (a==b) or (a<=b+0.3) or (a<= 1.5 *b):
l=0
return l
df['1st_day_damage'] =df[df['Cr-1'],df['Cr']].apply(damage)
It's better to use vectorized code. apply is slow.
Coding in the blind since you didn't provide any example:
a = df['Cr-1']
b = df['Cr']
df['1st_day_damage'] = np.select(
[
(a == b + 0.3) | ((1.5 * b <= a) & (a <= 1.9 * b)),
(2 * b <= a) & (a <= 2.9 * b),
(a >= 4) | (a >= 3 * b),
(a == b) | (a <= b + 0.3) | (a <= 1.5 * b)
],
[
1,
2,
3,
0
],
default=5
)

pandas change values on multiple column based on condition

I have a data frame like
x y w h
0 1593.826218 1293.189452 353.268389 74.493565
1 1680.089430 1956.536916 87.632469 42.567752
2 1362.421731 1908.648195 52.031778 42.567752
3 1599.303248 1385.419580 351.899131 78.040878
4 1500.716721 1121.144789 397.084623 46.115064
5 1513.040037 1186.770072 514.840753 86.909160
6 1387.068363 1804.002472 212.234885 44.341408
7 787.333657 379.756446 416.254225 70.946253
I want to select rows based on certain value ranges in x and y and find the values in all four x,y,w,h and perform addition or subtraction on those values and replace them with the calculated value in that row.
I am doing something like
df.loc[(df['x'] >= 1000) & (df['x'] < 1800) & (df['y'] >= 1150) & (df['y'] < 1290), ['x', 'y', 'w','h']] = df['x'] - 20, df['y'] - 165, df['w'] + 26, df['h'] - 29
and getting error:
"Must have equal len keys and value when setting with an ndarray"
when I tried this
df.loc[(df['x'] >= 1000) & (df['x'] < 1800) & (df['y'] >= 1150) & (df['y'] < 1290), 'x'] = df['x'] - 20
it works but I want to perform operation on all four columns in one go and update the values.
My desired answer is it should select row 5 and my answer should be like
x y w h
5 1493.040037 1021.770072 540.840753 57.909160
Any help will be much appreciated.
Let us fix your code
m = (df['x'] >= 1000) & (df['x'] < 1800) \
& (df['y'] >= 1150) & (df['y'] < 1290)
df.loc[m] += [-20, -165, 26, -29]
x y w h
0 1593.826218 1293.189452 353.268389 74.493565
1 1680.089430 1956.536916 87.632469 42.567752
2 1362.421731 1908.648195 52.031778 42.567752
3 1599.303248 1385.419580 351.899131 78.040878
4 1500.716721 1121.144789 397.084623 46.115064
5 1493.040037 1021.770072 540.840753 57.909160 *** updated
6 1387.068363 1804.002472 212.234885 44.341408
7 787.333657 379.756446 416.254225 70.946253
With your approach , you can use pd.concat on the R.H.S
df.loc[(df['x'] >= 1000) & (df['x'] < 1800) & (df['y'] >= 1150) & (df['y'] < 1290), ['x', 'y', 'w','h']]=pd.concat((df['x'] - 20, df['y'] - 165, df['w'] + 26, df['h'] - 29),axis=1)
x y w h
0 1593.826218 1293.189452 353.268389 74.493565
1 1680.089430 1956.536916 87.632469 42.567752
2 1362.421731 1908.648195 52.031778 42.567752
3 1599.303248 1385.419580 351.899131 78.040878
4 1500.716721 1121.144789 397.084623 46.115064
5 1493.040037 1021.770072 540.840753 57.909160
6 1387.068363 1804.002472 212.234885 44.341408
7 787.333657 379.756446 416.254225 70.946253
You have to assign with an array of the same shape. Easiest way is to use the original df:
m = (df['x'] >= 1000) & (df['x'] < 1800) & (df['y'] >= 1150) & (df['y'] < 1290)
df.loc[m] = df.assign(x=df["x"]-20, y=df["y"]-165, w=df['w']+26, h=df['h']-29)
print (df[m])
x y w h
5 1493.040037 1021.770072 540.840753 57.90916

How do I correct use of If Statement using Python

I have some Values like and using python and if statement
a = 11
b = 36
c = 70
if (a > 5 and a < 15) and (b > 25 and b < 40) and (c < 100):
#do something
and while the vales are negative
a = -11
b = -36
c = -70
if (a < -5 and a < -15) and (b < -25 and b < -40) and (c > -100):
#do something
but IF statement is doing anything no error
The reason your if statement is not doing anything is because it evaluates down to being false. This is because your comparison operators (the < and >) are looking for a to be less than -5 (True when a = - 11) and -15 (False when a = -11), and for b to be less than -25 (True when b = -36) and -40 (False when b = -36).
If I evaluate your code it looks like this:
a = -11
b = -36
c = -70
if (a < -5 and a < -15) and (b < -25 and b < -40) and (c > -100):
# The first comparison paranthesis: (a < -5 and a < -15) evaluates to (True and False)
# The second comparison paranthesis: (b < -25 and b < -40) evaluates to (True and False)
# The last comparison paranthesis: (c > -100) evaluates to (True)
# if (True and False) and (True and False) and (True)
# if False and False and True
# if False
#do something

What will be the mean of a conditional output

let's take a condition as :
(df['a'] > 10) & (df['a'] < 20)
This condition will give a true false output.
What will be the mean of this conditional output?
i.e np.mean((df['a'] > 10) & (df['a'] < 20)) = ?
It will give the mean of all the values that is > 10 and < 20.
to get the mean value you have to use square bracket
np.mean(df[(df['a'] > 10) & (df['a'] < 20)])
It working same like 1 and 0 values instead True and False values, so it return percentage of matched values of both conditions:
df = pd.DataFrame({'a':[9,13,23,16,23]})
m = (df['a'] > 10) & (df['a'] < 20)
print (m)
0 False
1 True
2 False
3 True
4 False
Name: a, dtype: bool
There is 2 matched values from 5 values, so percentage is 2/5=0.4:
print (m.mean())
0.4

Error when masking 2d numpy array

I'm not sure what the correct terminology is here but I'm trying to mask out some values in a numpy array using multiple conditions from several arrays. For example, I want to find and mask out the areas in X where arrays t/l,lat2d,x, and m meet certain criteria. All the arrays are of the same shape: (250,500). I tried this:
cs[t < 274.0 |
l > 800.0 |
lat2d > 60 |
lat2d < -60 |
(x > 0 & m > 0.8) |
(x < -25 & m < 0.2)] = np.nan
ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''.
I replaced the &,| with and/or and got the error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I've tried creating a mask: mask = t < 274.0 | l > 800.0 | lat2d > 60 | lat2d < -60 | (x > 0 & m > 0.8) | (x < -25 & m < 0.2), in order to use in a masked array but got the same error.
any idea how to do this in Python 3?
This is just a matter of operator precedence:
cs[(t < 274.0) |
(l > 800.0) |
(lat2d > 60) |
(lat2d < -60) |
((x > 0) & (m > 0.8)) |
((x < -25) & (m < 0.2))] = np.nan
should work
You could do it using with a python function and then applying that function on the array.
def cond(x):
if (np.all(t < 274.0) or np.all(l > 800.0) or np.all(lat2d > 60) or \
np.all(lat2d < -60) or (np.all(x > 0) and np.all(m > 0.8)) or \
(np.all(x < -25) and np.all(m < 0.2))):
return np.nan
Then apply this function on the array:
cs[:] = np.apply_along_axis(cond, 0, cs)

Categories