Adding rows in dataframe with a condition in list - python

I am trying to add a row with the condition but was having difficulty achieving this.
Currently, I have pandas dataframes in a list that looks like following
The objective is to add a row with the condition that I want to add a row with a fixed number for 'ID' and increase the month by 3.
For example, for this[1] I want it to add rows that look like following
ID | month | num
6 | 0 | 5
6 | 3 | NaN
6 | 6 | 4
6 | 9 | NaN
6 | 12 | 3
...
6 | 36 | 1
I am trying to create a function that takes the index of the list (so it would be an actual dataframe), the max number of the month of that dataframe, and month I want it to be incremented by (3), which would look like
def add_rows(df, max_mon, res):
if max_mon > res:
add rows with fixed ID and NaN num
skip the month that already exist
final = []
for i in range(len(this)):
final.append(add_rows(this[i], this[i]['month'].max(), 3))
I have tried to insert rows but I did not manage to get it work.
The toy data
d = {'ID':[5,5,5,5,5], 'month':[0,6,12,24,36], 'num':[5,4,3,2,1]}
tempo = pd.DataFrame(data = d)
d2 = {'ID':[6,6,6,6,6], 'month':[0,6,12,18,36], 'num':[5,4,3,2,1]}
tempo2 = pd.DataFrame(data = d2)
this = []
this.append(tempo)
this.append(tempo2)
I would really appreciate if I could get help on building the function!

You can use:
for i, df in enumerate(this):
this[i] = (df
.set_index('month')
.groupby('ID')
.apply(lambda x: x.drop(columns='ID')
.reindex(range(x.index.min(), x.index.max()+3, 3))
)
.reset_index()[df.columns]
)
Updated this:
[ ID month num
0 5 0 5.0
1 5 3 NaN
2 5 6 4.0
3 5 9 NaN
4 5 12 3.0
5 5 15 NaN
6 5 18 NaN
7 5 21 NaN
8 5 24 2.0
9 5 27 NaN
10 5 30 NaN
11 5 33 NaN
12 5 36 1.0,
ID month num
0 6 0 5.0
1 6 3 NaN
2 6 6 4.0
3 6 9 NaN
4 6 12 3.0
5 6 15 NaN
6 6 18 2.0
7 6 21 NaN
8 6 24 NaN
9 6 27 NaN
10 6 30 NaN
11 6 33 NaN
12 6 36 1.0]

Related

python piecewise linear interpolation across dataframes in a list

I am trying to apply piecewise linear interpolation. I first tried to use pandas built-in interpolate function but it was not working.
Example data looks below
import pandas as pd
import numpy as np
d = {'ID':[5,5,5,5,5,5,5], 'month':[0,3,6,9,12,15,18], 'num':[7,np.nan,5,np.nan,np.nan,5,8]}
tempo = pd.DataFrame(data = d)
d2 = {'ID':[6,6,6,6,6,6,6], 'month':[0,3,6,9,12,15,18], 'num':[5,np.nan,2,np.nan,np.nan,np.nan,7]}
tempo2 = pd.DataFrame(data = d2)
this = []
this.append(tempo)
this.append(tempo2)
The actual data has over 1000 unique IDs, so I filtered each ID into a dataframe and put them into the list.
The first dataframe in the list looks as below
I am trying to go through all the dataframe in the list to do a piecewise linear interpolation. I tried to change month to a index and use .interpolate(method='index', inplace = True) but it was not working.
The expected output is
ID | month | num
5 | 0 | 7
5 | 3 | 6
5 | 6 | 5
5 | 9 | 5
5 | 12 | 5
5 | 15 | 5
5 | 18 | 8
This needs to be applied across all the dataframes in the list.
Assuming this is a follow up of your previous question, change the code to:
for i, df in enumerate(this):
this[i] = (df
.set_index('month')
# optional, because of the previous question
.reindex(range(df['month'].min(), df['month'].max()+3, 3))
.interpolate()
.reset_index()[df.columns]
)
NB. I simplified the code to remove the groupby, which only works if you have a single group per DataFrame, as you mentioned in the other question.
Output:
[ ID month num
0 5 0 7.0
1 5 3 6.0
2 5 6 5.0
3 5 9 5.0
4 5 12 5.0
5 5 15 5.0
6 5 18 8.0,
ID month num
0 6 0 5.00
1 6 3 3.50
2 6 6 2.00
3 6 9 3.25
4 6 12 4.50
5 6 15 5.75
6 6 18 7.00]

I try to map values to a column in pandas but i get nan values instead

This is my code:
mapping = {"ISTJ":1, "ISTP":2, "ISFJ":3, "ISFP":4, "INFP":6, "INTJ":7, "INTP":8, "ESTP":9, "ESTJ":10, "ESFP":11, "ESFJ":12, "ENFP":13, "ENFJ":14, "ENTP":15, "ENTJ":16, "NaN": 17}
q20 = castaway_details["personality_type"]
q20["personality_type"] = q20["personality_type"].map(mapping)
the data frame is like this
personality_type
0 INTP
1 INFP
2 INTJ
3 ISTJ
4 NAN
5 ESFP
I want the output like this:
personality_type
0 8
1 6
2 7
3 1
4 17
5 11
however, what I get from my code is all NANs
Try to pandas.Series.str.strip before the pandas.Series.map :
q20["personality_type"]= q20["personality_type"].str.strip().map(mapping)
# Output :
print(q20)
personality_type
0 8
1 6
2 7
3 1
4 17
5 11
The key NaN in your mapping dictionary and NaN value in your data frame do not match. I have modified the one in your dictionary.
df.apply(lambda x: x.fillna('NAN').map(mapping))
personality_type
0 8
1 6
2 7
3 1
4 17
5 11

Why can't I unpivot (melt) this panda dataframe (python)

I have a panda data frame that I made and I pivoted it the exact way I want it. Now, I want to unpivot everything to get the position data (row and column) with the newly formed data frame and see which. For example, I want for the first row (in the new data frame that is unpivoted with the position data) to have 1 under "row", 1 under "a", and 1 as the value (example below). Can someone please figure out how I can unpivot to get the row and column values? I have tried used pd.melt but it didn't seem to work (it made no difference). Please respond soon. Thanks! Directly below is code to make the pivoted data frame.
import pandas as pd
row = [1, 2, 3, 4, 5]
df67 = {'row':row,}
df67 = pd.DataFrame(df67,columns=['row'])
df67['a'] = [1, 2, 3, 4, 5]
df67['b'] =[13, 18, 5, 10, 6]
#df67 (dataframe before pivot)
df68 = df67.pivot(index='row', columns = 'a')
#df68 (dataframe after pivot)
What I want the result to be for the first line:
row | a | value
1 | 1 | 13
Use DataFrame.stack with DataFrame.reset_index:
df = df68.stack().reset_index()
print (df)
row a b
0 1 1 13.0
1 2 2 18.0
2 3 3 5.0
3 4 4 10.0
4 5 5 6.0
EDIT:
For avoid removed missing values use dropna=False parameter:
df = df68.stack(dropna=False).reset_index()
print (df)
row a b
0 1 1 13.0
1 1 2 NaN
2 1 3 NaN
3 1 4 NaN
4 1 5 NaN
5 2 1 NaN
6 2 2 18.0
7 2 3 NaN
8 2 4 NaN
9 2 5 NaN
10 3 1 NaN
11 3 2 NaN
12 3 3 5.0
13 3 4 NaN
14 3 5 NaN
15 4 1 NaN
16 4 2 NaN
17 4 3 NaN
18 4 4 10.0
19 4 5 NaN
20 5 1 NaN
21 5 2 NaN
22 5 3 NaN
23 5 4 NaN
24 5 5 6.0

Pandas Rolling Groupby Shift back 1, Trying to lag rolling sum

I am trying to get a rolling sum of the past 3 rows for the same ID but lagging this by 1 row. My attempt looked like the below code and i is the column. There has to be a way to do this but this method doesnt seem to work.
for i in df.columns.values:
df.groupby('Id', group_keys=False)[i].rolling(window=3, min_periods=2).mean().shift(1)
id dollars lag
1 6 nan
1 7 nan
1 6 6.5
3 7 nan
3 4 nan
3 4 5.5
3 3 5
5 6 nan
5 5 nan
5 6 5.5
5 12 5.67
5 7 8.3
I am trying to get a rolling sum of the past 3 rows for the same ID but lagging this by 1 row.
You can create the lagged rolling sum by chaining DataFrame.groupby(ID), .shift(1) for the lag 1, .rolling(3) for the window 3, and .sum() for the sum.
Example: Let's say your dataset is:
import pandas as pd
# Reproducible datasets are your friend!
d = pd.DataFrame({'grp':pd.Series(['A']*4 + ['B']*5 + ['C']*6),
'x':pd.Series(range(15))})
print(d)
grp x
A 0
A 1
A 2
A 3
B 4
B 5
B 6
B 7
B 8
C 9
C 10
C 11
C 12
C 13
C 14
I think what you're asking for is this:
d['y'] = d.groupby('grp')['x'].shift(1).rolling(3).sum()
print(d)
grp x y
A 0 NaN
A 1 NaN
A 2 NaN
A 3 3.0
B 4 NaN
B 5 NaN
B 6 NaN
B 7 15.0
B 8 18.0
C 9 NaN
C 10 NaN
C 11 NaN
C 12 30.0
C 13 33.0
C 14 36.0

Compute a ratio conditional on the value in the column of a panda dataframe

I have a dataframe of the following type
df = pd.DataFrame({'Days':[1,2,5,6,7,10,11,12],
'Value':[100.3,150.5,237.0,314.15,188.0,413.0,158.2,268.0]})
Days Value
0 1 100.3
1 2 150.5
2 5 237.0
3 6 314.15
4 7 188.0
5 10 413.0
6 11 158.2
7 12 268.0
and I would like to add a column '+5Ratio' whose date is the ratio betwen Value corresponding to the Days+5 and Days.
For example in first row I would have 3.13210368893 = 314.15/100.3, in the second I would have 1.24916943522 = 188.0/150.5 and so on.
Days Value +5Ratio
0 1 100.3 3.13210368893
1 2 150.5 1.24916943522
2 5 237.0 ...
3 6 314.15
4 7 188.0
5 10 413.0
6 11 158.2
7 12 268.0
I'm strugling to find a way to do it using lambda function.
Could someone give a help to find a way to solve this problem?
Thanks in advance.
Edit
In the case I am interested in the "Days" field can vary sparsly from 1 to 18180 for instance.
You can using merge , and the benefit from doing this , can handle missing value
s=df.merge(df.assign(Days=df.Days-5),on='Days')
s.assign(Value=s.Value_y/s.Value_x).drop(['Value_x','Value_y'],axis=1)
Out[359]:
Days Value
0 1 3.132104
1 2 1.249169
2 5 1.742616
3 6 0.503581
4 7 1.425532
Consider left merging on a helper dataframe, days, for consecutive daily points and then shift by 5 rows for ratio calculation. Finally remove the blank day rows:
days_df = pd.DataFrame({'Days':range(min(df.Days), max(df.Days)+1)})
days_df = days_df.merge(df, on='Days', how='left')
print(days_df)
# Days Value
# 0 1 100.30
# 1 2 150.50
# 2 3 NaN
# 3 4 NaN
# 4 5 237.00
# 5 6 314.15
# 6 7 188.00
# 7 8 NaN
# 8 9 NaN
# 9 10 413.00
# 10 11 158.20
# 11 12 268.00
days_df['+5ratio'] = days_df.shift(-5)['Value'] / days_df['Value']
final_df = days_df[days_df['Value'].notnull()].reset_index(drop=True)
print(final_df)
# Days Value +5ratio
# 0 1 100.30 3.132104
# 1 2 150.50 1.249169
# 2 5 237.00 1.742616
# 3 6 314.15 0.503581
# 4 7 188.00 1.425532
# 5 10 413.00 NaN
# 6 11 158.20 NaN
# 7 12 268.00 NaN

Categories