I have the following dataframe
0 0 0
1 0 0
1 1 0
1 1 1
1 1 1
0 0 0
0 1 0
0 1 0
0 0 0
how do you get a dataframe which looks like this
0 0 0
4 0 0
4 3 0
4 3 2
4 3 2
0 0 0
0 2 0
0 2 0
0 0 0
Thank you for your help.
You may need using for loop here , with tranform, and using cumsum create the key and assign the position back to your original df
for x in df.columns:
df.loc[df[x]!=0,x]=df[x].groupby(df[x].eq(0).cumsum()[df[x]!=0]).transform('count')
df
Out[229]:
1 2 3
0 0.0 0.0 0.0
1 4.0 0.0 0.0
2 4.0 3.0 0.0
3 4.0 3.0 2.0
4 4.0 3.0 2.0
5 0.0 0.0 0.0
6 0.0 2.0 0.0
7 0.0 2.0 0.0
8 0.0 0.0 0.0
Or without for loop
s=df.stack().sort_index(level=1)
s2=s.groupby([s.index.get_level_values(1),s.eq(0).cumsum()]).transform('count').sub(1).unstack()
df=df.mask(df!=0).combine_first(s2)
df
Out[255]:
1 2 3
0 0.0 0.0 0.0
1 4.0 0.0 0.0
2 4.0 3.0 0.0
3 4.0 3.0 2.0
4 4.0 3.0 2.0
5 0.0 0.0 0.0
6 0.0 2.0 0.0
7 0.0 2.0 0.0
8 0.0 0.0 0.0
Related
I have the following two arrays: Data and Baseline
Data
Phase ENSO EJO
0 1 -1 2
1 1 0 2
2 1 1 2
3 2 -1 7
4 2 1 1
Baseline
Phase ENSO EJO
0 1 -1 0.0
1 1 0 0.0
2 1 1 0.0
3 2 -1 0.0
4 2 0 0.0
5 2 1 0.0
I want to alter the 'Data' data frame such that the "missing row" gets filled in by the Baseline data. Final result would look like this
Data
Phase ENSO EJO
0 1 -1 2
1 1 0 2
2 1 1 2
3 2 -1 7
4 2 0 0
5 2 1 1
You could reindex Data by Baseline indices and fill the missing values with Baseline values using fillna.
Data = Data.reindex(Baseline.index).fillna(Baseline)
Another option is to use combine_first:
Data = Data.combine_first(Baseline)
Output:
Phase ENSO EJO
0 1.0 -1.0 2.0
1 1.0 0.0 2.0
2 1.0 1.0 2.0
3 2.0 -1.0 7.0
4 2.0 1.0 1.0
5 2.0 1.0 0.0
enke's method is great - here's a solution that preserves the original datatypes:
Data = pd.concat([Data, Baseline[~Baseline.index.isin(Data.index)]])
Output:
>>> Data
Phase ENSO EJO
0 1 -1 2.0
1 1 0 2.0
2 1 1 2.0
3 2 -1 7.0
4 2 1 1.0
5 2 1 0.0
Here's an ultra-short solution:
>>> tmp = baseline.copy()
>>> tmp.update(data)
>>> tmp
Phase ENSO EJO
0 1.0 -1.0 2.0
1 1.0 0.0 2.0
2 1.0 1.0 2.0
3 2.0 -1.0 7.0
4 2.0 1.0 1.0
5 2.0 1.0 0.0
Or (Python 3.8+)
>>> (tmp := baseline.copy()).update(data)
>>> tmp
Phase ENSO EJO
0 1.0 -1.0 2.0
1 1.0 0.0 2.0
2 1.0 1.0 2.0
3 2.0 -1.0 7.0
4 2.0 1.0 1.0
5 2.0 1.0 0.0
Simplest:
>>> baseline.update(data)
>>> baseline
Phase ENSO EJO
0 1.0 -1.0 2.0
1 1.0 0.0 2.0
2 1.0 1.0 2.0
3 2.0 -1.0 7.0
4 2.0 1.0 1.0
5 2.0 1.0 0.0
I made the assumption that you want to decide which rows need to be filled based on the values of Phase and ENSO columns. Here is my solution based on that.
import os
import pandas as pd
import numpy as np
from io import StringIO
x_str = """Phase ENSO EJO
0 1 -1 2
1 1 0 2
2 1 1 2
3 2 -1 7
4 2 1 1"""
y_str = """Phase ENSO EJO
0 1 -1 0.0
1 1 0 0.0
2 1 1 0.0
3 2 -1 0.0
4 2 0 0.0
5 2 1 0.0"""
x = pd.read_csv(StringIO(x_str), sep="\s+")
y = pd.read_csv(StringIO(y_str), sep="\s+")
# Outer merge with indicator on
z = pd.merge(x, y, how='outer', on=['Phase', 'ENSO'], indicator=True)
rows_to_be_filled = z['_merge'].isin(['right_only'])
z.loc[rows_to_be_filled, 'EJO_x'] = z.loc[rows_to_be_filled, 'EJO_y']
# Cleanup
z = z.rename({'EJO_x': 'EJO'}, axis=1).drop(['EJO_y', '_merge'], axis=1)
z
# Phase ENSO EJO
# 0 1 -1 2.0
# 1 1 0 2.0
# 2 1 1 2.0
# 3 2 -1 7.0
# 4 2 1 1.0
# 5 2 0 0.0
I have the dataframe with a column.
A
0.0
0.0
0.0
12.0
0.0
0.0
34.0
0.0
0.0
0.0
0.0
11.0
I want the output like this with a counter column. I want the counter to be restarted after non zero value. For the row after every non zero value, the counter should be intilaized again and then should increment.
A Counter
0.0 1
0.0 2
0.0 3
12.0 4
0.0 1
0.0 2
34.0 3
0.0 1
0.0 2
0.0 3
0.0 4
11.0 5
Let us try cumsum create the groupby key , [::-1] here is reverse the order
df['Counter'] = df.A.groupby(df.A.ne(0)[::-1].cumsum()).cumcount()+1
Out[442]:
0 1
1 2
2 3
3 4
4 1
5 2
6 3
7 1
8 2
9 3
10 4
11 5
dtype: int64
I have a dataframe that looks like the following:
df
0 1 2 3 4
0 0.0 NaN NaN NaN NaN
1 NaN 0.0 0.0 NaN 4.0
2 NaN 2.0 0.0 NaN 5.0
3 NaN NaN NaN 0.0 NaN
4 NaN 0.0 3.0 NaN 0.0
I would like to have a dataframe of all the couples of values different from NaN. The dataframe should be like the following
df
i j val
0 0 0 0.0
1 1 1 0.0
2 1 2 0.0
3 1 4 5.0
4 3 3 0.0
5 4 1 0.0
6 4 2 3.0
7 4 4 0.0
Use DataFrame.stack with DataFrame.rename_axis and DataFrame.reset_index:
df = df.stack().rename_axis(('i','j')).reset_index(name='val')
print (df)
i j val
0 0 0 0.0
1 1 1 0.0
2 1 2 0.0
3 1 4 4.0
4 2 1 2.0
5 2 2 0.0
6 2 4 5.0
7 3 3 0.0
8 4 1 0.0
9 4 2 3.0
10 4 4 0.0
Like this:
In [379]: df.stack().reset_index(name='val').rename(columns={'level_0':'i', 'level_1':'j'})
Out[379]:
i j val
0 0 0 0.0
1 1 1 0.0
2 1 2 0.0
3 1 4 4.0
4 2 1 2.0
5 2 2 0.0
6 2 4 5.0
7 3 3 0.0
8 4 1 0.0
9 4 2 3.0
10 4 4 0.0
I have dataframe like below
age type days
1 a 1
2 b 3
2 b 4
3 a 5
4 b 2
6 c 1
7 f 0
7 d 4
10 e 2
14 a 1
first I would like to binning with age
age
[0~4]
age type days
1 a 1
2 b 3
2 b 4
3 a 5
4 b 2
Then sum up and count days by grouping with type
sum count
a 6 2
b 9 3
c 0 0
d 0 0
e 0 0
f 0 0
Then I would like to apply this method to another binns.
[5~9]
[11~14]
My desired result is below
[0~4] [5~9] [10~14]
sum count sum count sum count
a 6 2 0 0 1 1
b 9 3 0 0 0 0
c 0 0 1 1 0 0
d 0 0 4 1 0 0
e 0 0 0 0 2 1
f 0 0 0 1 0 0
How can this be done?
It is very complicated for me..
Consider a pivot_table with pd.cut if you do not care too much about column ordering as count and sum are not paired together under the bin. With manipulation you can change such ordering.
df['bin'] = pd.cut(df.age, [0,4,9,14])
pvtdf = df.pivot_table(index='type', columns=['bin'], values='days',
aggfunc=('count', 'sum')).fillna(0)
# count sum
# bin (0, 4] (4, 9] (9, 14] (0, 4] (4, 9] (9, 14]
# type
# a 2.0 0.0 1.0 6.0 0.0 1.0
# b 3.0 0.0 0.0 9.0 0.0 0.0
# c 0.0 1.0 0.0 0.0 1.0 0.0
# d 0.0 1.0 0.0 0.0 4.0 0.0
# e 0.0 0.0 1.0 0.0 0.0 2.0
# f 0.0 1.0 0.0 0.0 0.0 0.0
We'll use some stacking and groupby operations to get us to the desired output.
string_ = io.StringIO('''age type days
1 a 1
2 b 3
2 b 4
3 a 5
4 b 2
6 c 1
7 f 0
7 d 4
10 e 2
14 a 1''')
df = pd.read_csv(string_, sep='\s+')
df['age_bins'] = pd.cut(df['age'], [0,4,9,14])
df_stacked = df.groupby(['age_bins', 'type']).agg({'days': np.sum,
'type': 'count'}).transpose().stack().fillna(0)
df_stacked.rename(index={'days': 'sum', 'type': 'count'}, inplace=True)
>>> df_stacked
age_bins (0, 4] (4, 9] (9, 14]
type
sum a 6.0 0.0 1.0
b 9.0 0.0 0.0
c 0.0 1.0 0.0
d 0.0 4.0 0.0
e 0.0 0.0 2.0
f 0.0 0.0 0.0
count a 2.0 0.0 1.0
b 3.0 0.0 0.0
c 0.0 1.0 0.0
d 0.0 1.0 0.0
e 0.0 0.0 1.0
f 0.0 1.0 0.0
This doesn't produce the exact output you listed, but it's similar, and I think it will be easier to index and retrieve data from. Alternatively, you could do use the following to get something like the desired output.
>>> df_stacked.unstack(level=0)
age_bins (0, 4] (4, 9] (9, 14]
count sum count sum count sum
type
a 2.0 6.0 0.0 0.0 1.0 1.0
b 3.0 9.0 0.0 0.0 0.0 0.0
c 0.0 0.0 1.0 1.0 0.0 0.0
d 0.0 0.0 1.0 4.0 0.0 0.0
e 0.0 0.0 0.0 0.0 1.0 2.0
f 0.0 0.0 1.0 0.0 0.0 0.0
I have two dataframes:
dayData
power_comparison final_average_delta_power calculated_power
1 0.0 0.0 0
2 0.0 0.0 0
3 0.0 0.0 0
4 0.0 0.0 0
5 0.0 0.0 0
7 0.0 0.0 0
and
historicPower
power
0 0.0
1 0.0
2 0.0
3 -1.0
4 0.0
5 1.0
7 0.0
I'm trying to reindex the historicPower dataframe to have the same shape as the dayData dataframe (so in this example it would looks like):
power
1 0.0
2 0.0
3 -1.0
4 0.0
5 1.0
7 0.0
The dataframes in reality will be alot larger with different shapes.
I think you can use reindex if index has no duplicates:
historicPower = historicPower.reindex(dayData.index)
print (historicPower)
power
1 0.0
2 0.0
3 -1.0
4 0.0
5 1.0
7 0.0