I have a problem changing my data into binary. It's not so complicated just using basic math; like if a = 60, then the result is "good", and when a >= 60, then it is "very good", but this is implemented in the data below:
This is my data; I want to change new_cases data to be binary value when the data >=1; I want the result to be 1, but when I use
Dt[Dt['new_cases'] >= 1 ] = 1
It doesn't work.
Please, is anyone able to run it? Any ideas? What could be causing this issue?
Thanks!
You have to specify the column where you want to change the values:
Dt.loc[Dt['new_cases'] >= 1, 'new_cases'] = 1
Use
Dt["new_cases"] = Dt["new_cases"].apply(lambda x: 1 if x >= 1 else 0)
OR
Dt["new_cases"] = 1
Dt.loc[Dt["new_cases"] < 1, "new_cases"] = 0
Related
Suppose I have a binary string: z = abc, where a,b,c are either 0 or 1, so we can convert c into an integer from 0 to 7. Now I want to give a,b,c another 'layer' of value, where a = 1/2^1 = 1/2, b = 1/2^2 = 1/4, c = 1/2^3 = 1/8. My goal is to create a dictionary, where the keys are integers 0-7, and values are the associated calculations based on a,b,c values.
The only way I'm able to solve this question is to 'brute force' the results. For example, when the key is 5 (z = 101), the value would be 1/2+0+1/8 = 5/8, and perform all calculations manually, then append the item to the dictionary. Is there a tool / method in python that will allow me to create the calculation faster? I really have no idea how I can do that. Any suggestions / help is appreciated.
One naïve approach would be to iterate over the bit-string, and multiply each bit by the matching power of 0.5:
res = 0
for i, bit in enumerate(z, 1):
res += int(bit) * 0.5**i
For z = "101" this will give res as 0.625 which is 5/8
Could be compacted using sum:
res = sum(int(bit) * 0.5**i for i, bit in enumerate(z, 1))
If z is actually an integer, just change the zs above to format(z, 'b') to get its binary string representation.
Just to elaborate on my comment a bit:
for key, value in {bin(key)[2:]: key/8 for key in range(8)}.items():
print(f"{key:>3}: {value}")
Output:
0: 0.0
1: 0.125
10: 0.25
11: 0.375
100: 0.5
101: 0.625
110: 0.75
111: 0.875
>>>
Is this the output you're looking for?
Another way would be to benefit vectorization :
import numpy as np
num =[1,0,1]
d = np.array(num)
r = 1 / np.logspace(1, len(num), num=len(num), base=2)
np.matmul(r,d)
output :
> 0.625
I currently use something like the similar bit of code to determine comparison
list_of_numbers = [29800.0, 29795.0, 29795.0, 29740.0, 29755.0, 29745.0]
high = 29980.0
lookback = 10
counter = 1
for number in list_of_numbers:
if (high >= number) \
and (counter < lookback):
counter += 1
else:
break
The resulted counter magnitude will be 7. However, it is very taxing on large data arrays. So, I have looked for a solution and came up with np.argmax(), but there seems to be an issue. For example the following:
list_of_numbers = [29800.0, 29795.0, 29795.0, 29740.0, 29755.0, 29745.0]
np_list = np.array(list_of_numbers)
high = 29980.0
print(np.argmax(np_list > high) + 1)
this will get output 1, just like argmax is suppose to .. but I want it to get output 7. Is there another method to do this that will give me similar output for the if statement ?
You can get a boolean array for where high >= number using NumPy:
list_of_numbers = [29800.0, 29795.0, 29795.0, 29740.0, 29755.0, 29745.0]
high = 29980.0
lookback = 10
boolean_arr = np.less_equal(np.array(list_of_numbers), high)
Then finding where is the first False argument in that to satisfy break condition in your code. Furthermore, to consider countering, you can use np.cumsum on the boolean array and find the first argument that satisfying specified lookback magnitude. So, the result will be the smaller value between break_arr and lookback_lim:
break_arr = np.where(boolean_arr == False)[0][0] + 1
lookback_lim = np.where(np.cumsum(boolean_arr) == lookback)[0][0] + 1
result = min(break_arr, lookback_lim)
If your list_of_numbers have not any bigger value than your specified high limit for break_arr or the specified lookback exceeds values in np.cumsum(boolean_arr) for lookback_lim, the aforementioned code will get stuck with an error like the following, relating to np.where:
IndexError: index 0 is out of bounds for axis 0 with size 0
Which can be handled by try-except or if statements e.g.:
try:
break_arr = np.where(boolean_arr == False)[0][0] + 1
except:
break_arr = len(boolean_arr) + 1
try:
lookback_lim = np.where(np.cumsum(boolean_arr) == lookback)[0][0] + 1
except:
lookback_lim = len(boolean_arr) + 1
You have you less than sign backwards, no? The following should work as the for-loop:
print(np.min([np.sum(np.array(list_of_numbers) < high) + 1, lookback]))
A look back can be accomplished using shift. A cumcount can be used to get a running total. A query can be used as a filter
Consider the following code.
import pandas as pd
np.random.seed(0)
df_H = pd.DataFrame( {'L0': np.random.randn(100),
'OneAndZero': np.random.randn(100),
'OneAndTwo': np.random.randn(100),
'GTwo': np.random.randn(100),
'Decide': np.random.randn(100)})
I would like to create a new column named Result, which depends on the value of the column Decide. So if the value in Decide is less than 0, I would like Result to have the corresponding value of the row in L0. If the value on the row in Decide is between 1 and 0, it should grab the value in OneAndZero, between 1 and 2, it should grab OneAndTwo and if the value of decide is > 2, then it should grab GTwo.
How would one do this with df.apply since I have only seen examples with fixed values and not values from other columns?
Just because it is Good Friday, we can try the following. Else it is a commonly asked question.
c1=df_H['Decide'].le(0)
c2=df_H['Decide'].between(0,1)
c3=df_H['Decide'].between(1,2)
c4=df_H['Decide'].gt(2)
cond=[c1,c2,c3,c4]
choices=[df_H['L0'],df_H['OneAndZero'],df_H['OneAndTwo'],df_H['GTwo']]
df_H['Result']=np.select(cond,choices)
df_H
If you really want to use apply
def choose_res(x):
if x['Decide'] <= 0:
return x['L0']
if 0 < x['Decide'] <= 1:
return x['OneAndZero']
if 1 < x['Decide'] <= 2:
return x['OneAndTwo']
if x['Decide'] > 2:
return x['GTwo']
df_H['Result'] = df_H.apply(axis=1, func=choose_res, result_type='expand')
df.iloc
df_H.reset_index(drop=True, inplace=True)
for i in range(len(df_H)):
a = df_H['Decide'].iloc[i]
if 0 <= a <=1 :
b = df_H['OneAndZero'].iloc[i]
df_H.loc[i,'Result']= b
if 1.1 <= a <= 2:
b = df_H['OneAndTwo'].iloc[i]
df_H.loc[i,'Result']= b
maybe you can try this way.
df_apply
if you want to use apply..
create the function that have the condition, and the output,
then used this code:
df_H['Result'] = df_H.apply(your function name)
I am trying to put this logic on pandas dataframe
IF base_total_price > 0
IF base_total_discount = 0
actual_price = base_total_price
IF base_total_discount > 0
actual_price = base_total_price +base_total_discount
IF base_total_price = 0
IF base_total_discount > 0
actual_price = base_total_discount
IF base_total_discount = 0
actual_price = 0
so I wrote these 2 apply functions
#for all entries where base_total_price > 0
df_slice_1['actual_price'] = df_slice_1['base_total_discount'].apply(lambda x: df_slice_1['base_total_price'] if x == 0 else df_slice_1['base_total_price']+df_slice_1['base_total_discount'])
#for all entries where base_total_price = 0
df_slice_1['actual_price'] = df_slice_1['base_total_discount'].apply(lambda x: x if x == 0 else df_slice_1['base_total_discount'])
When i run the code I get this error
ValueError: Wrong number of items passed 20, placement implies 1
I know that it is trying to put more values in one column but I do not understand why is this happening or how can I solve this problem. All I need to do is to update the dataframe with the new column `actual_price` and I need to calculate the values for this column according to the above mentioned logic. Please suggest me a better way of implementing the logic or correct me
Sample data would have been useful. Please try use np.select(condtions, choices)
Conditions=[(df.base_total_price > 0)&(df.base_total_discount == 0),(df.base_total_price > 0)&(df.base_total_discount > 0),\
(df.base_total_price == 0)&(df.base_total_discount > 0),\
(df.base_total_price == 0)&(df.base_total_discount == 0)]
choices=[df.base_total_price,df.base_total_price.add(df.base_total_discount),df.base_total_discount,0]
df.actual_price =np.select(Conditions,choices)
I solved this question simply by using iterrows. Thanks everyone who responded
The below code (calculation of moving average over N-days) works for well. But I want to replace other numbers (e.g., 5, 10, 20, etc.) with 50. Not sure if I can turn the below code into something in for loop. Could anybody please help me?
df['ma50pfret']= df['ret']
df.loc[df.adjp >= df.ma50, 'adjp > ma50']= 1
df.loc[df.adjp < df.ma50, 'adjp > ma50']= 0
df.iloc[0, -1]= 1
df['adjp > ma50']= df['adjp > ma50'].astype(int)
df.loc[df['adjp > ma50'].shift(1)== 0, 'ma50pfret']= 1.000079 # 1.02**(1/250)
df['cum_ma50pfret']=df['ma50pfret'].cumprod()
df.head(10)
Do you just mean to replace the 50's with 5, 10, 20, etc? If so, that could be done by always using brackets to access columns, and using f-strings (or some other string formatting method) to replace 50 with the other numbers, like this:
for num in [5, 10, 20, 50]:
df[f'ma{num}pfret']= df['ret']
df.loc[df.adjp >= df[f'ma{num}'], f'adjp > ma{num}']= 1
df.loc[df.adjp < df[f'ma{num}], f'adjp > ma{num}']= 0
df.iloc[0, -1]= 1
df[f'adjp > ma{num}']= df[f'adjp > ma{num}'].astype(int)
df.loc[df[f'adjp > ma{num}'].shift(1)== 0, f'ma{num}pfret']= 1.000079 # 1.02**(1/250)
df[f'cum_ma{num}pfret']=df[f'ma{num}pfret'].cumprod()
df.head(10)