Pandas - Conditional resampling on MultiIndex based DataFrame based on a boolean column

Pandas - Conditional resampling on MultiIndex based DataFrame based on a boolean column - python

I have a df which has a MultiIndex [(latitude, longitude, time)] with the number of rows being 148 x 244 x 90 x 24. For each latitude and longitude, the time is hourly from 2014-01-01 00:00:00 to 2014:03:31 23:00:00.
FFDI isInRange
latitude longitude time
-39.20000 140.80000 2014-01-01 00:00:00 6.20000 True
2014-01-01 01:00:00 4.10000 True
2014-01-01 02:00:00 2.40000 True
2014-01-01 03:00:00 1.90000 True
2014-01-01 04:00:00 1.70000 True
2014-01-01 05:00:00 1.50000 True
2014-01-01 06:00:00 1.40000 True
2014-01-01 07:00:00 1.30000 True
2014-01-01 08:00:00 1.20000 True
2014-01-01 09:00:00 1.00000 True
2014-01-01 10:00:00 1.00000 True
2014-01-01 11:00:00 0.90000 True
2014-01-01 12:00:00 0.90000 True
... ... ... ...
2014-03-31 21:00:00 0.30000 False
2014-03-31 22:00:00 0.30000 False
2014-03-31 23:00:00 0.50000 False
140.83786 2014-01-01 00:00:00 3.20000 True
2014-01-01 01:00:00 2.90000 True
2014-01-01 02:00:00 2.10000 True
2014-01-01 03:00:00 2.90000 True
2014-01-01 04:00:00 1.20000 True
2014-01-01 05:00:00 0.90000 True
2014-01-01 06:00:00 1.10000 True
2014-01-01 07:00:00 1.60000 True
2014-01-01 08:00:00 1.40000 True
2014-01-01 09:00:00 1.50000 True
2014-01-01 10:00:00 1.20000 True
2014-01-01 11:00:00 0.80000 True
2014-01-01 12:00:00 0.40000 True
... ... ... ...
2014-03-31 21:00:00 0.30000 False
2014-03-31 22:00:00 0.30000 False
2014-03-31 23:00:00 0.50000 False
... ... ... ...
... ... ...
-33.90000 140.80000 2014-01-01 00:00:00 6.20000 True
2014-01-01 01:00:00 4.10000 True
2014-01-01 02:00:00 2.40000 True
2014-01-01 03:00:00 1.90000 True
2014-01-01 04:00:00 1.70000 True
2014-01-01 05:00:00 1.50000 True
2014-01-01 06:00:00 1.40000 True
2014-01-01 07:00:00 1.30000 True
2014-01-01 08:00:00 1.20000 True
2014-01-01 09:00:00 1.00000 True
2014-01-01 10:00:00 1.00000 True
2014-01-01 11:00:00 0.90000 True
2014-01-01 12:00:00 0.90000 True
... ... ... ...
2014-03-31 21:00:00 0.30000 False
2014-03-31 22:00:00 0.30000 False
2014-03-31 23:00:00 0.50000 False
140.83786 2014-01-01 00:00:00 3.20000 True
2014-01-01 01:00:00 2.90000 True
2014-01-01 02:00:00 2.10000 True
2014-01-01 03:00:00 2.90000 True
2014-01-01 04:00:00 1.20000 True
2014-01-01 05:00:00 0.90000 True
2014-01-01 06:00:00 1.10000 True
2014-01-01 07:00:00 1.60000 True
2014-01-01 08:00:00 1.40000 True
2014-01-01 09:00:00 1.50000 True
2014-01-01 10:00:00 1.20000 True
2014-01-01 11:00:00 0.80000 True
2014-01-01 12:00:00 0.40000 True
... ... ... ...
2014-03-31 21:00:00 0.30000 False
2014-03-31 22:00:00 0.30000 False
2014-03-31 23:00:00 0.50000 False
78001920 rows × 1 columns
What I want to achieve is to calculate a daily maximum FFDI value for every 24 hours for each latitude and longitude on the condition of:
If isInRange = True for all 24 hours/rows in the group - use FFDI from 13:00:00 of previous day to 12:00:00 of next day
If isInRange = False for all 24 hours/rows in the group - use FFDI from 14:00:00 of previous day to 13:00:00 of next day
Then my code is:
df_daily_max = df.groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=13,loffset='11H',label='right',level='time')])['FFDI'].max().reset_index(name='Max FFDI') if df['isInRange'] else isInRange.groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=14,loffset='10H',label='right',level='time')])['FFDI'].max().reset_index(name='Max FFDI')
However this line raised an error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You can filter first all True rows and then all Falses rows for aggregate max, then join by concat, sorting MultiIndex and convert to DataFrame by Series.reset_index:
s1 = df[df['isInRange']].groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=13,loffset='11H',label='right',level='time')])['FFDI'].max()
s2 = df[~df['isInRange']].groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=14,loffset='10H',label='right',level='time')])['FFDI'].max()
df_daily_max = pd.concat([s1, s2]).sort_index().reset_index(name='Max FFDI')

Related

Pandas groupby and get yesterday min value

How do I modify my code to have groupby return the previous days min instead of current days min Please see desired output below as this shows exactly what I am trying to achieve.
Data
np.random.seed(5)
series = pd.Series(np.random.choice([1,3,5], 10), index = pd.date_range('2014-01-01', '2014-01-04', freq = '8h'))
series
2014-01-01 00:00:00 5
2014-01-01 08:00:00 3
2014-01-01 16:00:00 5
2014-01-02 00:00:00 5
2014-01-02 08:00:00 1
2014-01-02 16:00:00 3
2014-01-03 00:00:00 1
2014-01-03 08:00:00 1
2014-01-03 16:00:00 5
2014-01-04 00:00:00 1
Output after groupby
series.groupby(series.index.date).transform(min)
2014-01-01 00:00:00 3
2014-01-01 08:00:00 3
2014-01-01 16:00:00 3
2014-01-02 00:00:00 1
2014-01-02 08:00:00 1
2014-01-02 16:00:00 1
2014-01-03 00:00:00 1
2014-01-03 08:00:00 1
2014-01-03 16:00:00 1
2014-01-04 00:00:00 1
Desired output (yesterday min)
2014-01-01 00:00:00 Nan
2014-01-01 08:00:00 Nan
2014-01-01 16:00:00 Nan
2014-01-02 00:00:00 3
2014-01-02 08:00:00 3
2014-01-02 16:00:00 3
2014-01-03 00:00:00 1
2014-01-03 08:00:00 1
2014-01-03 16:00:00 1
2014-01-04 00:00:00 1

You can swap the index to just the date, calculate min per day, shift it and swap the original index back:
# Swap the index to just the date component
s = series.set_axis(series.index.date)
# Calculate the min per day, and shift it
t = s.groupby(level=0).min().shift()
# Final assembly
s[t.index] = t
s.index = series.index

Let us do reindex
series[:] = series.groupby(series.index.date).min().shift().reindex(series.index.date)
series
Out[370]:
2014-01-01 00:00:00 NaN
2014-01-01 08:00:00 NaN
2014-01-01 16:00:00 NaN
2014-01-02 00:00:00 1.0
2014-01-02 08:00:00 1.0
2014-01-02 16:00:00 1.0
2014-01-03 00:00:00 3.0
2014-01-03 08:00:00 3.0
2014-01-03 16:00:00 3.0
2014-01-04 00:00:00 1.0
Freq: 8H, dtype: float64

Replace loop with groupby for rolling daily calculation

How do I modify my code to have Pandas rolling daily reset each day? Please see desired output below as this shows exactly what I am trying to achieve.
I think I may need to use groupby to get the same result but unsure how to progress.
Data
np.random.seed(5)
series = pd.Series(np.random.choice([1,3,5], 10), index = pd.date_range('2014-01-01', '2014-01-04', freq = '8h'))
series
2014-01-01 00:00:00 5
2014-01-01 08:00:00 3
2014-01-01 16:00:00 5
2014-01-02 00:00:00 5
2014-01-02 08:00:00 1
2014-01-02 16:00:00 3
2014-01-03 00:00:00 1
2014-01-03 08:00:00 1
2014-01-03 16:00:00 5
2014-01-04 00:00:00 1
Output after pandas rolling
series.rolling('D', min_periods=1).min()
2014-01-01 00:00:00 5.0
2014-01-01 08:00:00 3.0
2014-01-01 16:00:00 3.0
2014-01-02 00:00:00 3.0
2014-01-02 08:00:00 1.0
2014-01-02 16:00:00 1.0
2014-01-03 00:00:00 1.0
2014-01-03 08:00:00 1.0
2014-01-03 16:00:00 1.0
2014-01-04 00:00:00 1.0
Desired output (reset each day)
I can get the desired output like this but want to avoid looping:
series_list = []
for i in set(series.index.date):
series_list.append(series.loc[str(i)].rolling('D', min_periods=1).min())
pd.concat(series_list).sort_index()
2014-01-01 00:00:00 5.0
2014-01-01 08:00:00 3.0
2014-01-01 16:00:00 3.0
2014-01-02 00:00:00 5.0
2014-01-02 08:00:00 1.0
2014-01-02 16:00:00 1.0
2014-01-03 00:00:00 1.0
2014-01-03 08:00:00 1.0
2014-01-03 16:00:00 1.0
2014-01-04 00:00:00 1.0

series.groupby(series.index.date).cummin()
Output:
2014-01-01 00:00:00 5
2014-01-01 08:00:00 3
2014-01-01 16:00:00 3
2014-01-02 00:00:00 5
2014-01-02 08:00:00 1
2014-01-02 16:00:00 1
2014-01-03 00:00:00 1
2014-01-03 08:00:00 1
2014-01-03 16:00:00 1
2014-01-04 00:00:00 1
Freq: 8H, dtype: int64

Compare two columns of the same dataframe and returns a different column of the same dataframe

Unable to extract individual values from Column (Week) but a single value works.
u = eurusd.loc[eurusd['Local time'] == pd.to_datetime("2014-01-08 03:00:00",format="%Y-%m-%d %H:%M:%S")].Close
print(u)
output:
70275 1.36075
Name: Close, dtype: float64
But when i try this:
u = eurusd.loc[eurusd['Local time'] == pd.to_datetime(eurusd['Week'],format="%Y-%m-%d %H:%M:%S")].Close
print(u)
output:
Series([], Name: Close, dtype: float64)
I also tried doing the same task with the apply method, but it seems to just compare the columns row by row, not an iterative one
eurusd['ResultClose'] = eurusd.apply(lambda eurusd: eurusd if eurusd['Local time'] == "2014-01-08 03:00:00" else np.nan,axis=1)
To double check code:
eurusd.isnull().sum()
output (Shows that no values were inserted in the column):
Local time 0
Close 0
ResultClose 8760
dtype: int64
The tables below give a visual of what am trying to achieve.
Initial table
Local time
Close
Week
2014-01-01 00:00:00
1.37410
2014-01-08 00:00:00
2014-01-01 01:00:00
1.37410
2014-01-08 01:00:00
2014-01-01 02:00:00
1.37410
2014-01-08 02:00:0
2014-01-08 03:00:00
1.36075
2014-03-08 02:00:0
Final table
Local time
Close
Week
ResultClose
2014-01-01 00:00:00
1.37410
2014-01-08 00:00:00
1.36075
2014-01-01 01:00:00
1.37410
2014-01-08 01:00:00
.
2014-01-01 02:00:00
1.37410
2014-01-08 02:00:00
.
2014-01-08 03:00:00
1.36075
2014-03-08 02:00:0
.

Firstly convert 'Local time' and 'Week' to datetime dtype by using to_datetime() method:
eurusd['Local time']=pd.to_datetime(eurusd['Local time'])
eurusd['Week']=pd.to_datetime(eurusd['Week'])
Now use boolean masking and between() method:
mask=eurusd['Local time'].between(eurusd.loc[0,'Week'],eurusd.loc[len(eurusd)-1,'Week'])
value=eurusd.loc[mask,'Close'].reset_index(drop=True)
Finally use assign() method:
eurusd=eurusd.assign(ResultClose=value)
Now if you print eurusd you will get your desired output:
Local time Close Week ResultClose
0 2014-01-01 00:00:00 1.37410 2014-01-08 00:00:00 1.36075
1 2014-01-01 01:00:00 1.37410 2014-01-08 01:00:00 NaN
2 2014-01-01 02:00:00 1.37410 2014-01-08 02:00:00 NaN
3 2014-01-08 03:00:00 1.36075 2014-03-08 02:00:00 NaN

Fill datetimeindex gap by NaN

I have two dataframes which are datetimeindexed. One is missing a few of these datetimes (df1) while the other is complete (has regular timestamps without any gaps in this series) and is full of NaN's (df2).
I'm trying to match the values from df1 to the index of df2, filling with NaN's where such a datetimeindex doesn't exist in df1.
Example:
In [51]: df1
Out [51]: value
2015-01-01 14:00:00 20
2015-01-01 15:00:00 29
2015-01-01 16:00:00 41
2015-01-01 17:00:00 43
2015-01-01 18:00:00 26
2015-01-01 19:00:00 20
2015-01-01 20:00:00 31
2015-01-01 21:00:00 35
2015-01-01 22:00:00 39
2015-01-01 23:00:00 17
2015-03-01 00:00:00 6
2015-03-01 01:00:00 37
2015-03-01 02:00:00 56
2015-03-01 03:00:00 12
2015-03-01 04:00:00 41
2015-03-01 05:00:00 31
... ...
2018-12-25 23:00:00 41
<34843 rows × 1 columns>
In [52]: df2 = pd.DataFrame(data=None, index=pd.date_range(freq='60Min', start=df1.index.min(), end=df1.index.max()))
df2['value']=np.NaN
df2
Out [52]: value
2015-01-01 14:00:00 NaN
2015-01-01 15:00:00 NaN
2015-01-01 16:00:00 NaN
2015-01-01 17:00:00 NaN
2015-01-01 18:00:00 NaN
2015-01-01 19:00:00 NaN
2015-01-01 20:00:00 NaN
2015-01-01 21:00:00 NaN
2015-01-01 22:00:00 NaN
2015-01-01 23:00:00 NaN
2015-01-02 00:00:00 NaN
2015-01-02 01:00:00 NaN
2015-01-02 02:00:00 NaN
2015-01-02 03:00:00 NaN
2015-01-02 04:00:00 NaN
2015-01-02 05:00:00 NaN
... ...
2018-12-25 23:00:00 NaN
<34906 rows × 1 columns>
Using df2.combine_first(df1) returns the same data as df1.reindex(index= df2.index), which fills any gaps where there shouldn't be data with some value, instead of NaN.
In [53]: Result = df2.combine_first(df1)
Result
Out [53]: value
2015-01-01 14:00:00 20
2015-01-01 15:00:00 29
2015-01-01 16:00:00 41
2015-01-01 17:00:00 43
2015-01-01 18:00:00 26
2015-01-01 19:00:00 20
2015-01-01 20:00:00 31
2015-01-01 21:00:00 35
2015-01-01 22:00:00 39
2015-01-01 23:00:00 17
2015-01-02 00:00:00 35
2015-01-02 01:00:00 53
2015-01-02 02:00:00 28
2015-01-02 03:00:00 48
2015-01-02 04:00:00 42
2015-01-02 05:00:00 51
... ...
2018-12-25 23:00:00 41
<34906 rows × 1 columns>
This is what I was hoping to get:
Out [53]: value
2015-01-01 14:00:00 20
2015-01-01 15:00:00 29
2015-01-01 16:00:00 41
2015-01-01 17:00:00 43
2015-01-01 18:00:00 26
2015-01-01 19:00:00 20
2015-01-01 20:00:00 31
2015-01-01 21:00:00 35
2015-01-01 22:00:00 39
2015-01-01 23:00:00 17
2015-01-02 00:00:00 NaN
2015-01-02 01:00:00 NaN
2015-01-02 02:00:00 NaN
2015-01-02 03:00:00 NaN
2015-01-02 04:00:00 NaN
2015-01-02 05:00:00 NaN
... ...
2018-12-25 23:00:00 41
<34906 rows × 1 columns>
Could someone shed some light on why this is happening, and how to set how these values are filled?

IIUC you need resample df1, because you have an irregular frequency and you need regular frequency:
print df1.index.freq
None
print Result.index.freq
<60 * Minutes>
EDIT1
You can use function asfreq instead of resample - doc, resample vs asfreq.
EDIT2
First I think that resample didn't work, because after resampling the Result is the same as df1. But I try print df1.info() and print Result.info() gets different results - 34857 entries vs 34920 entries.
So I try to find rows with NaN values and it returns 63 rows.
So I think resample works well.
import pandas as pd
df1 = pd.read_csv('test/GapInTimestamps.csv', sep=",", index_col=[0], parse_dates=[0])
print df1.head()
# value
#Date/Time
#2015-01-01 00:00:00 52
#2015-01-01 01:00:00 5
#2015-01-01 02:00:00 12
#2015-01-01 03:00:00 54
#2015-01-01 04:00:00 47
print df1.info()
#<class 'pandas.core.frame.DataFrame'>
#DatetimeIndex: 34857 entries, 2015-01-01 00:00:00 to 2018-12-25 23:00:00
#Data columns (total 1 columns):
#value 34857 non-null int64
#dtypes: int64(1)
#memory usage: 544.6 KB
#None
Result = df1.resample('60min')
print Result.head()
# value
#Date/Time
#2015-01-01 00:00:00 52
#2015-01-01 01:00:00 5
#2015-01-01 02:00:00 12
#2015-01-01 03:00:00 54
#2015-01-01 04:00:00 47
print Result.info()
#<class 'pandas.core.frame.DataFrame'>
#DatetimeIndex: 34920 entries, 2015-01-01 00:00:00 to 2018-12-25 23:00:00
#Freq: 60T
#Data columns (total 1 columns):
#value 34857 non-null float64
#dtypes: float64(1)
#memory usage: 545.6 KB
#None
#find values with NaN
resultnan = Result[Result.isnull().any(axis=1)]
#temporaly display 999 rows and 15 columns
with pd.option_context('display.max_rows', 999, 'display.max_columns', 15):
print resultnan
# value
#Date/Time
#2015-01-13 19:00:00 NaN
#2015-01-13 20:00:00 NaN
#2015-01-13 21:00:00 NaN
#2015-01-13 22:00:00 NaN
#2015-01-13 23:00:00 NaN
#2015-01-14 00:00:00 NaN
#2015-01-14 01:00:00 NaN
#2015-01-14 02:00:00 NaN
#2015-01-14 03:00:00 NaN
#2015-01-14 04:00:00 NaN
#2015-01-14 05:00:00 NaN
#2015-01-14 06:00:00 NaN
#2015-01-14 07:00:00 NaN
#2015-01-14 08:00:00 NaN
#2015-01-14 09:00:00 NaN
#2015-02-01 00:00:00 NaN
#2015-02-01 01:00:00 NaN
#2015-02-01 02:00:00 NaN
#2015-02-01 03:00:00 NaN
#2015-02-01 04:00:00 NaN
#2015-02-01 05:00:00 NaN
#2015-02-01 06:00:00 NaN
#2015-02-01 07:00:00 NaN
#2015-02-01 08:00:00 NaN
#2015-02-01 09:00:00 NaN
#2015-02-01 10:00:00 NaN
#2015-02-01 11:00:00 NaN
#2015-02-01 12:00:00 NaN
#2015-02-01 13:00:00 NaN
#2015-02-01 14:00:00 NaN
#2015-02-01 15:00:00 NaN
#2015-02-01 16:00:00 NaN
#2015-02-01 17:00:00 NaN
#2015-02-01 18:00:00 NaN
#2015-02-01 19:00:00 NaN
#2015-02-01 20:00:00 NaN
#2015-02-01 21:00:00 NaN
#2015-02-01 22:00:00 NaN
#2015-02-01 23:00:00 NaN
#2015-11-01 00:00:00 NaN
#2015-11-01 01:00:00 NaN
#2015-11-01 02:00:00 NaN
#2015-11-01 03:00:00 NaN
#2015-11-01 04:00:00 NaN
#2015-11-01 05:00:00 NaN
#2015-11-01 06:00:00 NaN
#2015-11-01 07:00:00 NaN
#2015-11-01 08:00:00 NaN
#2015-11-01 09:00:00 NaN
#2015-11-01 10:00:00 NaN
#2015-11-01 11:00:00 NaN
#2015-11-01 12:00:00 NaN
#2015-11-01 13:00:00 NaN
#2015-11-01 14:00:00 NaN
#2015-11-01 15:00:00 NaN
#2015-11-01 16:00:00 NaN
#2015-11-01 17:00:00 NaN
#2015-11-01 18:00:00 NaN
#2015-11-01 19:00:00 NaN
#2015-11-01 20:00:00 NaN
#2015-11-01 21:00:00 NaN
#2015-11-01 22:00:00 NaN
#2015-11-01 23:00:00 NaN

groupby.aggregate and modification then cast / reindex

I look for applying some deviation to a monthly granularity structure of a dataframe and then recast it in the initial dataframe. I firstly do a groupby and aggregate. This part works well. Then I reindex and take NaN. I want the reindexation will be done by matching month of the groupby element with the initial dataframe.
I want be able to due this operation on different granularity (yearly -> month & year, ...)
Has someone the solution of this problem ?
>>> df['profile']
date
2015-01-01 00:00:00 3.000000
2015-01-01 01:00:00 3.000143
2015-01-01 02:00:00 3.000287
2015-01-01 03:00:00 3.000430
2015-01-01 04:00:00 3.000574
...
2015-12-31 20:00:00 2.999426
2015-12-31 21:00:00 2.999570
2015-12-31 22:00:00 2.999713
2015-12-31 23:00:00 2.999857
Freq: H, Name: profile, Length: 8760
### Deviation on monthly basis
>>> dev_monthly = np.random.uniform(0.5, 1.5, len(df['profile'].groupby(df.index.month).aggregate(np.sum)))
>>> df['profile_monthly'] = (df['profile'].groupby(df.index.month).aggregate(np.sum) * dev_monthly).reindex(df)
>>> df['profile_monthly']
date
2015-01-01 00:00:00 NaN
2015-01-01 01:00:00 NaN
2015-01-01 02:00:00 NaN
...
2015-12-31 22:00:00 NaN
2015-12-31 23:00:00 NaN
Freq: H, Name: profile_monthly, Length: 8760

Check out the documentation for resampling.
You're looking for resample followed by fillna with method='bfill':
In [105]: df = DataFrame({'profile': normal(3, 0.1, size=10000)}, pd.date_range(start='2015-01-
01', freq='H', periods=10000))
In [106]: df['profile_monthly'] = df.profile.resample('M', how='sum')
In [107]: df
Out[107]:
profile profile_monthly
2015-01-01 00:00:00 2.8328 NaN
2015-01-01 01:00:00 3.0607 NaN
2015-01-01 02:00:00 3.0138 NaN
2015-01-01 03:00:00 3.0402 NaN
2015-01-01 04:00:00 3.0335 NaN
2015-01-01 05:00:00 3.0087 NaN
2015-01-01 06:00:00 3.0557 NaN
2015-01-01 07:00:00 2.9280 NaN
2015-01-01 08:00:00 3.1359 NaN
2015-01-01 09:00:00 2.9681 NaN
2015-01-01 10:00:00 3.1240 NaN
2015-01-01 11:00:00 3.0635 NaN
2015-01-01 12:00:00 2.9206 NaN
2015-01-01 13:00:00 3.0714 NaN
2015-01-01 14:00:00 3.0688 NaN
2015-01-01 15:00:00 3.0703 NaN
2015-01-01 16:00:00 2.9102 NaN
2015-01-01 17:00:00 2.9368 NaN
2015-01-01 18:00:00 3.0864 NaN
2015-01-01 19:00:00 3.2124 NaN
2015-01-01 20:00:00 2.8988 NaN
2015-01-01 21:00:00 3.0659 NaN
2015-01-01 22:00:00 2.7973 NaN
2015-01-01 23:00:00 3.0824 NaN
2015-01-02 00:00:00 3.0199 NaN
... ...
[10000 rows x 2 columns]
In [108]: df.dropna()
Out[108]:
profile profile_monthly
2015-01-31 2.9769 2230.9931
2015-02-28 2.9930 2016.1045
2015-03-31 2.7817 2232.4096
2015-04-30 3.1695 2158.7834
2015-05-31 2.9040 2236.5962
2015-06-30 2.8697 2162.7784
2015-07-31 2.9278 2231.7232
2015-08-31 2.8289 2236.4603
2015-09-30 3.0368 2163.5916
2015-10-31 3.1517 2233.2285
2015-11-30 3.0450 2158.6998
2015-12-31 2.8261 2228.5550
2016-01-31 3.0264 2229.2221
[13 rows x 2 columns]
In [110]: df.fillna(method='bfill')
Out[110]:
profile profile_monthly
2015-01-01 00:00:00 2.8328 2230.9931
2015-01-01 01:00:00 3.0607 2230.9931
2015-01-01 02:00:00 3.0138 2230.9931
2015-01-01 03:00:00 3.0402 2230.9931
2015-01-01 04:00:00 3.0335 2230.9931
2015-01-01 05:00:00 3.0087 2230.9931
2015-01-01 06:00:00 3.0557 2230.9931
2015-01-01 07:00:00 2.9280 2230.9931
2015-01-01 08:00:00 3.1359 2230.9931
2015-01-01 09:00:00 2.9681 2230.9931
2015-01-01 10:00:00 3.1240 2230.9931
2015-01-01 11:00:00 3.0635 2230.9931
2015-01-01 12:00:00 2.9206 2230.9931
2015-01-01 13:00:00 3.0714 2230.9931
2015-01-01 14:00:00 3.0688 2230.9931
2015-01-01 15:00:00 3.0703 2230.9931
2015-01-01 16:00:00 2.9102 2230.9931
2015-01-01 17:00:00 2.9368 2230.9931
2015-01-01 18:00:00 3.0864 2230.9931
2015-01-01 19:00:00 3.2124 2230.9931
2015-01-01 20:00:00 2.8988 2230.9931
2015-01-01 21:00:00 3.0659 2230.9931
2015-01-01 22:00:00 2.7973 2230.9931
2015-01-01 23:00:00 3.0824 2230.9931
2015-01-02 00:00:00 3.0199 2230.9931
... ...
[10000 rows x 2 columns]

When I use your code, I haven't same value for 2015-12-31 00:00:00 and 2015-12-31 01:00:00 as you can see below :
>>> df.fillna(method='bfill')[np.logical_and(df.index.month==12, df.index.day==31)]
profile profile_monthly
2015-12-31 00:00:00 2.926504 2232.288997
2015-12-31 01:00:00 3.008543 2234.470731
2015-12-31 02:00:00 2.930133 2234.470731
2015-12-31 03:00:00 3.078552 2234.470731
2015-12-31 04:00:00 3.141578 2234.470731
2015-12-31 05:00:00 3.061820 2234.470731
2015-12-31 06:00:00 2.981626 2234.470731
2015-12-31 07:00:00 3.010749 2234.470731
2015-12-31 08:00:00 2.878577 2234.470731
2015-12-31 09:00:00 2.915487 2234.470731
2015-12-31 10:00:00 3.072721 2234.470731
2015-12-31 11:00:00 3.087866 2234.470731
2015-12-31 12:00:00 3.089208 2234.470731
2015-12-31 13:00:00 2.957047 2234.470731
2015-12-31 14:00:00 3.002072 2234.470731
2015-12-31 15:00:00 3.106656 2234.470731
2015-12-31 16:00:00 3.100891 2234.470731
2015-12-31 17:00:00 3.077835 2234.470731
2015-12-31 18:00:00 3.032497 2234.470731
2015-12-31 19:00:00 2.959838 2234.470731
2015-12-31 20:00:00 2.878819 2234.470731
2015-12-31 21:00:00 3.041171 2234.470731
2015-12-31 22:00:00 3.061970 2234.470731
2015-12-31 23:00:00 3.019011 2234.470731
[24 rows x 2 columns]
So I finally found the following solution :
>>> AA = df.groupby((df.index.year, df.index.month)).aggregate(np.mean)
>>> AA['dev'] = np.random.randn(0,1,len(AA))
>>> df['dev'] = AA.ix[zip(df.index.year, df.index.month)]['dev'].values
Short and rapid. The only question is :
=> How to deal with other granularity (half year, quarter, week, ...) ?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas - Conditional resampling on MultiIndex based DataFrame based on a boolean column - python

Related

Pandas groupby and get yesterday min value

Replace loop with groupby for rolling daily calculation

Compare two columns of the same dataframe and returns a different column of the same dataframe

Fill datetimeindex gap by NaN

groupby.aggregate and modification then cast / reindex

Categories

Resources