Increasing column value pandas - python

I have a dataframe of 143999 rows which contains position and time data.
I already made a column "dt" which calulates the time difference between rows.
Now I want to create a new column which gives the dt values a group number.
So it starts with group = 0 and when dt > 60 the group number should increase by 1.
I tried the following:
def group(x):
c = 0 #
if densdata["dt"] < 60:
densdata["group"] = c
elif densdata["dt"] >= 60:
c += 1
densdata["group"] = c
densdata["group"] = densdata.apply(group, axis=1)'
The error that I get is: The truth value of a Series is ambiguous.
Any ideas how to fix this problem?
This is what I want:
dt group
0.01 0
2 0
0.05 0
300 1
2 1
60 2

You can take advantage of the fact that True evaluates to 1 and use .cumsum().
densdata = pd.DataFrame({'dt': np.random.randint(low=50,high=70,size=20),
'group' : np.zeros(20, dtype=np.int32)})
print(densdata.head())
dt group
0 52 0
1 59 0
2 69 0
3 55 0
4 63 0
densdata['group'] = (densdata.dt >= 60).cumsum()
print(densdata.head())
dt group
0 52 0
1 59 0
2 69 1
3 55 1
4 63 2
If you want to guarantee that the first value of group will be 0, even if the first value of dt is >= 60, then use
densdata['group'] = (densdata.dt.replace(densdata.dt[0],np.nan) >= 60).cumsum()

Related

Eliminating Negative or Non_Negative values in pandas

-)I'm working on an automation task in python wherein in each row the 1st negative value should be added up with the 1st non-negative value from the left. Further, the result should replace the positive value and 0 should replace the negative value
-)This process should continue until the entire row contains all negative or all positive values.
**CUSTOMER <30Days 31-60 Days 61-90Days 91-120Days 120-180Days 180-360Days >360Days**
ABC -2 23 2 3 2 2 -1
(>360Days)+(180-360Days)
-1 + 2
CUSTOMER <30Days 31-60 Days 61-90Days 91-120Days 120-180Days 180-360Days >360Days
ABC -2 23 2 3 2 1 0
(<30Days)+(180-360Days)
-2 + 1
CUSTOMER <30Days 31-60 Days 61-90Days 91-120Days 120-180Days 180-360Days >360Days
ABC 0 23 2 3 2 -1 0
(180-360Days)+(120-180Days)
-1 + 2
CUSTOMER <30Days 31-60 Days 61-90Days 91-120Days 120-180Days 180-360Days >360Days
ABC 0 23 2 3 2 0 0
Check this code:
import pandas as pd
#Empty DataFrame
df=pd.DataFrame()
#Enter the data
new_row={'CUSTOMER':'ABC','<30Days':-2,'31-60 Days':23,'61-90Days':2,'91-120Days':3,'120-180Days':2,'180-360Days':2,'>360Days':-1}
df=df.append(new_row,ignore_index=True)
#Keep columns order as per the requirement
df=df[['CUSTOMER','<30Days','31-60 Days','61-90Days','91-120Days','120-180Days','180-360Days','>360Days']]
#Take column names and reverse the order
ls=list(df.columns)
ls.reverse()
#Remove non integer column
ls.remove('CUSTOMER')
#Initialize variables
flag1=1
flag=0
new_ls=[]
new_ls_index=[]
for j in range(len(df)):
while flag1!=0:
#Perform logic
for i in ls:
if int(df[i][j]) < 0 and flag == 0:
new_ls.append(int(df[i][j]))
new_ls_index.append(i)
flag=1
elif flag==1 and int(df[i][j]) >= 0 :
new_ls.append(int(df[i][j]))
new_ls_index.append(i)
flag=2
elif flag==2:
df[new_ls_index[1]]=new_ls[0]+new_ls[1]
df[new_ls_index[0]]=0
flag=0
new_ls=[]
new_ls_index=[]
#Check all values in row either positive or negative
if new_ls==[]:
new_ls_neg=[]
new_ls_pos=[]
for i in ls:
if int(df[i][j]) < 0:
new_ls_neg.append(int(df[i][j]))
if int(df[i][j]) >= 0 :
new_ls_pos.append(int(df[i][j]))
if len(new_ls_neg)==len(ls) or len(new_ls_pos)==len(ls):
flag1=0 #Set flag to stop the loop

Compare current column value to different column value by row slices

Assuming a dataframe like this
In [5]: data = pd.DataFrame([[9,4],[5,4],[1,3],[26,7]])
In [6]: data
Out[6]:
0 1
0 9 4
1 5 4
2 1 3
3 26 7
I want to count how many times the values in a rolling window/slice of 2 on column 0 are greater or equal to the value in col 1 (4).
On the first number 4 at col 1, a slice of 2 on column 0 yields 5 and 1, so the output would be 2 since both numbers are greater than 4, then on the second 4 the next slice values on col 0 would be 1 and 26, so the output would be 1 because only 26 is greater than 4 but not 1. I can't use rolling window since iterating through rolling window values is not implemented.
I need something like a slice of the previous n rows and then I can iterate, compare and count how many times any of the values in that slice are above the current row.
I have done this using list instead of doing it in data frame. Check the code below:
list1, list2 = df['0'].values.tolist(), df['1'].values.tolist()
outList = []
for ix in range(len(list1)):
if ix < len(list1) - 2:
if list2[ix] < list1[ix + 1] and list2[ix] < list1[ix + 2]:
outList.append(2)
elif list2[ix] < list1[ix + 1] or list2[ix] < list1[ix + 2]:
outList.append(1)
else:
outList.append(0)
else:
outList.append(0)
df['2_rows_forward_moving_tag'] = pd.Series(outList)
Output:
0 1 2_rows_forward_moving_tag
0 9 4 1
1 5 4 1
2 1 3 0
3 26 7 0

Mapping values inside pandas column

I used the code below to map the 2 values inside S column to 0 but it didn't work. Any suggestion on how to solve this?
N.B : I want to implement an external function inside the map.
df = pd.DataFrame({
'Age': [30,40,50,60,70,80],
'Sex': ['F','M','M','F','M','F'],
'S' : [1,1,2,2,1,2]
})
def app(value):
for n in df['S']:
if n == 1:
return 1
if n == 2:
return 0
df["S"] = df.S.map(app)
Use eq to create a boolean series and conver that boolean series to int with astype:
df['S'] = df['S'].eq(1).astype(int)
OR
df['S'] = (df['S'] == 1).astype(int)
Output:
Age Sex S
0 30 F 1
1 40 M 1
2 50 M 0
3 60 F 0
4 70 M 1
5 80 F 0
Don't use apply, simply use loc to assign the values:
df.loc[df.S.eq(2), 'S'] = 0
Age Sex S
0 30 F 1
1 40 M 1
2 50 M 0
3 60 F 0
4 70 M 1
5 80 F 0
If you need a more performant option, use np.select. This is also more scalable, as you can always add more conditions:
df['S'] = np.select([df.S.eq(2)], [0], 1)
You're close but you need a few corrections. Since you want to use a function, remove the for loop and replace n with value. Additionally, use apply instead of map. Apply operates on the entire column at once. See this answer for how to properly use apply vs applymap vs map
def app(value):
if value == 1:
return 1
elif value == 2:
return 0
df['S'] = df.S.apply(app)
Age Sex S
0 30 F 1
1 40 M 1
2 50 M 0
3 60 F 0
4 70 M 1
5 80 F 0
If you only wish to change values equal to 2, you can use pd.DataFrame.loc:
df.loc[df['S'] == 0, 'S'] = 0
pd.Series.apply is not recommend and this is just a thinly veiled, inefficient loop.
You could use .replace as follows:
df["S"] = df["S"].replace([2], 0)
This will replace all of 2 values to 0 in one line
Go with vectorize numpy operation:
df['S'] = np.abs(df['S'] - 2)
and stand yourself out from competitions in interviews and SO answers :)
>>>df = pd.DataFrame({'Age':[30,40,50,60,70,80],'Sex':
['F','M','M','F','M','F'],'S':
[1,1,2,2,1,2]})
>>> def app(value):
return 1 if value == 1 else 0
# or app = lambda value : 1 if value == 1 else 0
>>> df["S"] = df["S"].map(app)
>>> df
Age S Sex
Age S Sex
0 30 1 F
1 40 1 M
2 50 0 M
3 60 0 F
4 70 1 M
5 80 0 F
You can do:
import numpy as np
df['S'] = np.where(df['S'] == 2, 0, df['S'])

Apply a custom function iteratively in a subset of rows

I am trying to write a function that enables me to do some arithmetic iteratively on a subset of rows when a condition is met in another column. My DataFrame looks like this:
Value store flag
0 16051.249 0 0
36 16140.792 0.019822 0
0 16150.500 AAA 1
37 16155.223 1.24698 0
1 16199.700 BBB 1
38 16235.732 1.90162 0
41 16252.594 2.15627 0
2 16256.300 CCC 1
42 16260.678 2.15627 0
1048 17071.513 14.7752 0
3 17071.600 DDD 1
1049 17072.347 14.7752 0
1391 17134.538 16.7026 0
4 17134.600 EEE 1
1392 17134.635 16.7026 0
1675 17227.600 19.4348 0
5 17227.800 EFG 1
1676 17228.796 19.4348 0
1722 17262.189 20.5822 0
6 17264.300 XYZ 1
1723 17266.625 20.6702 0
2630 17442.770 32.7927 0
7 17442.800 ZZZ 1
2631 17442.951 32.7927 0
3068 17517.492 37.6485 0
8 17517.500 TTT 1
3069 17518.296 37.6485 0
3295 17565.776 38.2871 0
9 17565.800 SDF 1
3296 17565.888 38.2871 0
... ... ... ...
I'd like to apply the following function to all rows where the flag value equals 1:
def f(x):
return df.iloc[0,1]+(df.iloc[2,1]-df.iloc[0,1])*((df.iloc[1,0]-df.iloc[0,0])/(df.iloc[2,0]-df.iloc[0,0]))
and finally put the return value into a dictionary with it's corresponding key value; for example {AAA: 123, BBB:456,...}.
This function requires the rows above and below the row where flag=="1"
I have tried to re-structure my df in a way that I can use rooling window with my function, i.e:
idx = (df['flag'] == "1").fillna(False)
idx |= idx.shift(1) | idx.shift(2)
idx |= idx.shift(-1) | idx.shift(-2)
df=df[idx]
df.rolling(window=3, min_periods=1).apply(f)[::3].reset_index(drop=True)
but this doesn't work!
Since the function is location dependent I am not sure how to apply it to all triplet of rows where flag value is 1. Any suggestion is much appreciated!
IIUC, your calculation could be handled directly on the df columns level, no need to apply function on specific rows.
# convert to numeric so that the column can be used for arithmetic calculations
df['store2'] = pd.to_numeric(df.store, errors='coerce')
# calculate the f(x) based on 'Value' and 'store2' column
df['result'] = df.store2.shift(1) + (df.store2.shift(-1) - df.store2.shift(1))*(df.Value - df.Value.shift(1))/(df.Value.shift(-1) - df.Value.shift(1))
# export the resultset:
df.loc[df.flag==1,['store','result']].set_index('store')['result'].to_json()
just keep the state and use apply:
zero_vals = []
def func(row):
if row.flag == 0:
zero_vals.append(row)
elif row.flag == 1:
# do math here using previous rows of data and current row
zero_vals.clear()
else:
raise ValueError('unexpected flag value')
then it's just:
df.apply(func, axis=1)

How to modify a list recursively based on previous values assigned to it without using loops?

I have to write a code that needs to assign 0/1 to out based on matrix values.
But 0/1 assignment is based on what it has assigned previously(last 2 assignments only).
The code explains the logic I am trying to achieve.
out = np.ones((1,dat_mat.shape[0]))[0]
for i in range(dat_mat.shape[0]):
if i == 0 :
out[i] = 1
elif i == 1 :
if ((dat_mat[i,0] >= dat_mat[i,1]) and (out[i-1] == 1 )):
out[i] = 0
else:
if ((dat_mat[i,0] >= dat_mat[i,1]) and (out[i-1] == 1 )) or ((dat_mat[i,0] >= dat_mat[i,2]) and (out[i-2] == 1 )) :
out[i] = 0
Sample of dat_mat is below -
Dat_Mat :
col1 col2 col3
31 nan nan
30 30 nan
28 28 30
27 27 28
26 26 27
26 24 26
I need a output like this -
out
1
0
1
0
1
0
The order of data needs to be maintained.
I want to do it without using loops. As I need to run this on 35-45 rows of over a million sets.
Thanks in advance.
You probably want dat_mat[i][0], not dat_mat[i:0].
Also you can use out = np.ones(len(dat_mat)) instead of out = np.ones((1,dat_mat.shape[0]))[0]

Categories