I have a panda data frame that I made and I pivoted it the exact way I want it. Now, I want to unpivot everything to get the position data (row and column) with the newly formed data frame and see which. For example, I want for the first row (in the new data frame that is unpivoted with the position data) to have 1 under "row", 1 under "a", and 1 as the value (example below). Can someone please figure out how I can unpivot to get the row and column values? I have tried used pd.melt but it didn't seem to work (it made no difference). Please respond soon. Thanks! Directly below is code to make the pivoted data frame.
import pandas as pd
row = [1, 2, 3, 4, 5]
df67 = {'row':row,}
df67 = pd.DataFrame(df67,columns=['row'])
df67['a'] = [1, 2, 3, 4, 5]
df67['b'] =[13, 18, 5, 10, 6]
#df67 (dataframe before pivot)
df68 = df67.pivot(index='row', columns = 'a')
#df68 (dataframe after pivot)
What I want the result to be for the first line:
row | a | value
1 | 1 | 13
Use DataFrame.stack with DataFrame.reset_index:
df = df68.stack().reset_index()
print (df)
row a b
0 1 1 13.0
1 2 2 18.0
2 3 3 5.0
3 4 4 10.0
4 5 5 6.0
EDIT:
For avoid removed missing values use dropna=False parameter:
df = df68.stack(dropna=False).reset_index()
print (df)
row a b
0 1 1 13.0
1 1 2 NaN
2 1 3 NaN
3 1 4 NaN
4 1 5 NaN
5 2 1 NaN
6 2 2 18.0
7 2 3 NaN
8 2 4 NaN
9 2 5 NaN
10 3 1 NaN
11 3 2 NaN
12 3 3 5.0
13 3 4 NaN
14 3 5 NaN
15 4 1 NaN
16 4 2 NaN
17 4 3 NaN
18 4 4 10.0
19 4 5 NaN
20 5 1 NaN
21 5 2 NaN
22 5 3 NaN
23 5 4 NaN
24 5 5 6.0
Related
I have a Dataframe as shown below
A B C D
0 1 2 3.3 4
1 NaT NaN NaN NaN
2 NaT NaN NaN NaN
3 5 6 7 8
4 NaT NaN NaN NaN
5 NaT NaN NaN NaN
6 9 1 2 3
7 NaT NaN NaN NaN
8 NaT NaN NaN NaN
I need to copy the first row values (1,2,3,4) till the non-null row with index 2. Then, copy row values (5,6,7,8) till the non-null row with index 5 and copy (9,1,2,3) till row with index 8 and so on. Is there any way to do this in Python or Pandas. Quick help appreciated! Also is necessary not replace column D
Column C ffill gives 3.3456 as value for next row
Expected Output:
A B C D
0 1 2 3.3 4
1 1 2 3.3 NaN
2 1 2 3.3 NaN
3 5 6 7 8
4 5 6 7 NaN
5 5 6 7 NaN
6 9 1 2 3
7 9 1 2 NaN
8 9 1 2 NaN
Question was changed, so for forward filling all columns without D use Index.difference with ffill for columns names in list:
cols = df.columns.difference(['D'])
df[cols] = df[cols].ffill()
Or create mask for all columns names without D:
mask = df.columns != 'D'
df.loc[:, mask] = df.loc[:, mask].ffill()
EDIT: I cannot replicate your problem:
df = pd.DataFrame({'a':[2114.201789, np.nan, np.nan, 1]})
print (df)
a
0 2114.201789
1 NaN
2 NaN
3 1.000000
print (df.ffill())
a
0 2114.201789
1 2114.201789
2 2114.201789
3 1.000000
I am loading a csv file into a data frame using pandas.
An example dataframe is this:
X Y
1 4
2 5
3 6
I wish to append these two columns into a new column:
X Y Z
1 4 1
2 5 2
3 6 3
4
5
6
How can this be done using python.
Thank you!
Here's one way to do that:
res = pd.concat([df, df.melt()["value"]], axis=1)
print(res)
The output is:
X Y value
0 1.0 4.0 1
1 2.0 5.0 2
2 3.0 6.0 3
3 NaN NaN 4
4 NaN NaN 5
5 NaN NaN 6
I am trying to get a rolling sum of the past 3 rows for the same ID but lagging this by 1 row. My attempt looked like the below code and i is the column. There has to be a way to do this but this method doesnt seem to work.
for i in df.columns.values:
df.groupby('Id', group_keys=False)[i].rolling(window=3, min_periods=2).mean().shift(1)
id dollars lag
1 6 nan
1 7 nan
1 6 6.5
3 7 nan
3 4 nan
3 4 5.5
3 3 5
5 6 nan
5 5 nan
5 6 5.5
5 12 5.67
5 7 8.3
I am trying to get a rolling sum of the past 3 rows for the same ID but lagging this by 1 row.
You can create the lagged rolling sum by chaining DataFrame.groupby(ID), .shift(1) for the lag 1, .rolling(3) for the window 3, and .sum() for the sum.
Example: Let's say your dataset is:
import pandas as pd
# Reproducible datasets are your friend!
d = pd.DataFrame({'grp':pd.Series(['A']*4 + ['B']*5 + ['C']*6),
'x':pd.Series(range(15))})
print(d)
grp x
A 0
A 1
A 2
A 3
B 4
B 5
B 6
B 7
B 8
C 9
C 10
C 11
C 12
C 13
C 14
I think what you're asking for is this:
d['y'] = d.groupby('grp')['x'].shift(1).rolling(3).sum()
print(d)
grp x y
A 0 NaN
A 1 NaN
A 2 NaN
A 3 3.0
B 4 NaN
B 5 NaN
B 6 NaN
B 7 15.0
B 8 18.0
C 9 NaN
C 10 NaN
C 11 NaN
C 12 30.0
C 13 33.0
C 14 36.0
I have searched the forums in search of a cleaner way to create a new column in a dataframe that is the sum of the row with the previous row- the opposite of the .diff() function which takes the difference.
this is how I'm currently solving the problem
df = pd.DataFrame ({'c':['dd','ee','ff', 'gg', 'hh'], 'd':[1,2,3,4,5]}
df['e']= df['d'].shift(-1)
df['f'] = df['d'] + df['e']
Your ideas are appreciated.
You can use rolling with a window size of 2 and sum:
df['f'] = df['d'].rolling(2).sum().shift(-1)
c d f
0 dd 1 3.0
1 ee 2 5.0
2 ff 3 7.0
3 gg 4 9.0
4 hh 5 NaN
df.cumsum()
Example:
data = {'a':[1,6,3,9,5], 'b':[13,1,2,5,23]}
df = pd.DataFrame(data)
df =
a b
0 1 13
1 6 1
2 3 2
3 9 5
4 5 23
df.diff()
a b
0 NaN NaN
1 5.0 -12.0
2 -3.0 1.0
3 6.0 3.0
4 -4.0 18.0
df.cumsum()
a b
0 1 13
1 7 14
2 10 16
3 19 21
4 24 44
If you cannot use rolling, due to multindex or else, you can try using .cumsum(), and then .diff(-2) to sub the .cumsum() result from two positions before.
data = {'a':[1,6,3,9,5,30, 101, 8]}
df = pd.DataFrame(data)
df['opp_diff'] = df['a'].cumsum().diff(2)
a opp_diff
0 1 NaN
1 6 NaN
2 3 9.0
3 9 12.0
4 5 14.0
5 30 35.0
6 101 131.0
7 8 109.0
Generally to get an inverse of .diff(n) you should be able to do .cumsum().diff(n+1). The issue is that that you will get n+1 first results as NaNs
Considering the following dataframe:
index group signal
1 1 1
2 1 NAN
3 1 NAN
4 1 -1
5 1 NAN
6 2 NAN
7 2 -1
8 2 NAN
9 3 NAN
10 3 NAN
11 3 NAN
12 4 1
13 4 NAN
14 4 NAN
I want to modify the signals by ffill NANs in each group so that I can have the following dataframe:
index group signal
1 1 1
2 1 1
3 1 1
4 1 -1
5 1 -1
6 2 NAN
7 2 -1
8 2 -1
9 3 NAN
10 3 NAN
11 3 NAN
12 4 1
13 4 1
14 4 1
The dataframe is big (around 800,000 rows with about 16,000 different groups) and currently I put it into a groupby object and try to modify each group there, which is very slow. Then I tried to convert it into a pivot_table and ffill() there, but the dataframe is simple too large and the program gives errors. Any suggestions? Thank you!
Can you try out this
data_group = data.groupby('group').apply(lambda v: v.fillna(method='ffill'))
I think in your data NAN is a string. Its not a empty element. Empty data will appear as NaN. If it is a string, do a replacement of NAN. Like
data_group = data.groupby('group').apply(lambda v: v.replace('NAN', float('nan')).fillna(method='ffill'))
Or a better version as Jeff suggested
data['signal'] = data['signal'].replace('NAN', float('nan'))
data = data.groupby('group').ffill()