Convert pandas dataframe of lists to dict of dataframes - python

I have a dataframe (with a DateTime index) , in which some of the columns contain lists, each with 6 elements.
In: dframe.head()
Out:
A B \
timestamp
2017-05-01 00:32:25 30 [-3512, 375, -1025, -358, -1296, -4019]
2017-05-01 00:32:55 30 [-3519, 372, -1026, -361, -1302, -4020]
2017-05-01 00:33:25 30 [-3514, 371, -1026, -360, -1297, -4018]
2017-05-01 00:33:55 30 [-3517, 377, -1030, -363, -1293, -4027]
2017-05-01 00:34:25 30 [-3515, 372, -1033, -361, -1299, -4025]
C D
timestamp
2017-05-01 00:32:25 [1104, 1643, 625, 1374, 5414, 2066] 49.93
2017-05-01 00:32:55 [1106, 1643, 622, 1385, 5441, 2074] 49.94
2017-05-01 00:33:25 [1105, 1643, 623, 1373, 5445, 2074] 49.91
2017-05-01 00:33:55 [1105, 1646, 620, 1384, 5438, 2076] 49.91
2017-05-01 00:34:25 [1104, 1645, 613, 1374, 5431, 2082] 49.94
I have a dictionary dict_of_dfs which I want to populate with 6 dataframes,
dict_of_dfs = {1: df1, 2:df2, 3:df3, 4:df4, 5:df5, 6:df6}
where the ith dataframe contains the ith items from each list, so the first dataframe in the dict will be:
In:df1
Out:
A B C D
timestamp
2017-05-01 00:32:25 30 -3512 1104 49.93
2017-05-01 00:32:55 30 -3519 1106 49.94
2017-05-01 00:33:25 30 -3514 1105 49.91
2017-05-01 00:33:55 30 -3517 1105 49.91
2017-05-01 00:34:25 30 -3515 1104 49.94
and so-on.
The actual dataframe has more columns than this and thousands of rows.
What's the simplest, most python way to make the conversion?

You can use dict comprehension with assign and for select values of lists use str[0], str[1]:
N = 6
dfs = {i:df.assign(B=df['B'].str[i-1], C=df['C'].str[i-1]) for i in range(1,N + 1)}
print(dfs[1])
timestamp A B C D
0 2017-05-01 00:32:25 30 -3512 1104 49.93
1 2017-05-01 00:32:55 30 -3519 1106 49.94
2 2017-05-01 00:33:25 30 -3514 1105 49.91
3 2017-05-01 00:33:55 30 -3517 1105 49.91
4 2017-05-01 00:34:25 30 -3515 1104 49.94
Another solution:
dfs = {i:df.apply(lambda x: x.str[i-1] if type(x.iat[0]) == list else x) for i in range(1,7)}
print(dfs[1])
timestamp A B C D
0 2017-05-01 00:32:25 30 -3512 1104 49.93
1 2017-05-01 00:32:55 30 -3519 1106 49.94
2 2017-05-01 00:33:25 30 -3514 1105 49.91
3 2017-05-01 00:33:55 30 -3517 1105 49.91
4 2017-05-01 00:34:25 30 -3515 1104 49.94
Timings:
df = pd.concat([df]*10000).reset_index(drop=True)
In [185]: %timeit {i:df.assign(B=df['B'].str[i-1], C=df['C'].str[i-1]) for i in range(1,N+1)}
1 loop, best of 3: 420 ms per loop
In [186]: %timeit {i:df.apply(lambda x: x.str[i-1] if type(x.iat[0]) == list else x) for i in range(1,7)}
1 loop, best of 3: 447 ms per loop
In [187]: %timeit {(i+1):df.applymap(lambda x: x[i] if type(x) == list else x) for i in range(6)}
1 loop, best of 3: 881 ms per loop

Setup
df = pd.DataFrame({'A': {'2017-05-01 00:32:25': 30,
'2017-05-01 00:32:55': 30,
'2017-05-01 00:33:25': 30,
'2017-05-01 00:33:55': 30,
'2017-05-01 00:34:25': 30},
'B': {'2017-05-01 00:32:25': [-3512, 375, -1025, -358, -1296, -4019],
'2017-05-01 00:32:55': [-3519, 372, -1026, -361, -1302, -4020],
'2017-05-01 00:33:25': [-3514, 371, -1026, -360, -1297, -4018],
'2017-05-01 00:33:55': [-3517, 377, -1030, -363, -1293, -4027],
'2017-05-01 00:34:25': [-3515, 372, -1033, -361, -1299, -4025]},
'C': {'2017-05-01 00:32:25': [1104, 1643, 625, 1374, 5414, 2066],
'2017-05-01 00:32:55': [1106, 1643, 622, 1385, 5441, 2074],
'2017-05-01 00:33:25': [1105, 1643, 623, 1373, 5445, 2074],
'2017-05-01 00:33:55': [1105, 1646, 620, 1384, 5438, 2076],
'2017-05-01 00:34:25': [1104, 1645, 613, 1374, 5431, 2082]},
'D': {'2017-05-01 00:32:25': 49.93,
'2017-05-01 00:32:55': 49.94,
'2017-05-01 00:33:25': 49.1,
'2017-05-01 00:33:55': 49.91,
'2017-05-01 00:34:25': 49.94}})
Solution
Construct the df dict using dict comprehension. The sub df is generated using the applymap function. It can convert all columns with a list of 6 elements:
dict_of_dfs = {(i+1):df.applymap(lambda x: x[i] if type(x) == list else x) for i in range(6)}
print(dict_of_dfs[1])
A B C D
2017-05-01 00:32:25 30 -3512 1104 49.93
2017-05-01 00:32:55 30 -3519 1106 49.94
2017-05-01 00:33:25 30 -3514 1105 49.10
2017-05-01 00:33:55 30 -3517 1105 49.91
2017-05-01 00:34:25 30 -3515 1104 49.94
print(dict_of_dfs[2])
A B C D
2017-05-01 00:32:25 30 375 1643 49.93
2017-05-01 00:32:55 30 372 1643 49.94
2017-05-01 00:33:25 30 371 1643 49.10
2017-05-01 00:33:55 30 377 1646 49.91
2017-05-01 00:34:25 30 372 1645 49.94

Related

How can I split my data into chunks of 7 days each

I want to group every 7 days together. The problem is the first date is on Wednesday and I want my weeks to start on Monday and end on Sunday without dropping any data. Even the last date on my data is on Monday. This is how my data looks now:
date bike_numbers
0 2017-06-28 632
1 2017-06-29 1019
2 2017-06-30 1038
3 2017-07-01 475
4 2017-07-02 523
... ... ...
550 2018-12-30 2653
551 2018-12-31 3044
I want it to show the bike rides only, where rows are an array of 7. I want it to look like this:
[632, 1019, 1038, 475, 523, 600, 558][1103, 1277,1126, 956, 433, 1347, 1506]... and so on till the last date
Use:
s = df.groupby(df.index // 7)['bike_numbers'].agg(list)
print (s)
0 [632, 1019, 1038, 475, 523]
78 [2653, 3044]
Name: bike_numbers, dtype: object
print (s.tolist())
[[632, 1019, 1038, 475, 523], [2653, 3044]]

one to one column-value comparison between 2 dataframes - pandas

I have 2 dataframe -
print(d)
Year Salary Amount Amount1 Amount2
0 2019 1200 53 53 53
1 2020 3443 455 455 455
2 2021 6777 123 123 123
3 2019 5466 313 313 313
4 2020 4656 545 545 545
5 2021 4565 775 775 775
6 2019 4654 567 567 567
7 2020 7867 657 657 657
8 2021 6766 567 567 567
print(d1)
Year Salary Amount Amount1 Amount2
0 2019 1200 53 73 63
import pandas as pd
d = pd.DataFrame({
'Year': [
2019,
2020,
2021,
] * 3,
'Salary': [
1200,
3443,
6777,
5466,
4656,
4565,
4654,
7867,
6766
],
'Amount': [
53,
455,
123,
313,
545,
775,
567,
657,
567
],
'Amount1': [
53,
455,
123,
313,
545,
775,
567,
657,
567
], 'Amount2': [
53,
455,
123,
313,
545,
775,
567,
657,
567
]
})
d1 = pd.DataFrame({
'Year': [
2019
],
'Salary': [
1200
],
'Amount': [
53
],
'Amount1': [
73
], 'Amount2': [
63
]
})
I want to compare the 'Salary' value of dataframe d1 i.e. 1200 with all the values of 'Salary' in dataframe d and set a count if it is >= or < (a Boolean comparison) - this is to be done for all the columns(amount, amount1, amount2 etc), if the value in any column of d1 is NaN/None, no comparison needs to be done. The name of the columns will always be same so it is basically one to one column comparison.
My approach and thoughts -
I can get the values of d1 in a list by doing -
l = []
for i in range(len(d1.columns.values)):
if i == 0:
continue
else:
num = d1.iloc[0, i]
l.append(num)
print(l)
# list comprehension equivalent
lst = [d1.iloc[0, i] for i in range(len(d1.columns.values)) if i != 0]
[1200, 53, 73, 63]
and then use iterrows to iterate over all the columns and rows in dataframe d OR
I can iterate over d and then perform a similar comparison by looping over d1 - but these would be time consuming for a high dimensional dataframe(d in this case).
What would be the more efficient or pythonic way of doing it?
IIUC, you can do:
(df1 >= df2.values).sum()
Output:
Year 9
Salary 9
Amount 9
Amount1 8
Amount2 8
dtype: int64

Making pair by 2 rows and slicing from each row

I have a dataframe like:
x1 y1 x2 y2
0 149 2653 2152 2656
1 149 2465 2152 2468
2 149 1403 2152 1406
3 149 1215 2152 1218
4 170 2692 2170 2695
5 170 2475 2170 2478
6 170 1413 2170 1416
7 170 1285 2170 1288
I need to pair by each two rows from data frame index. i.e., [0,1], [2,3], [4,5], [6,7] etc.,
and extract x1,y1 from first row of the pair x2,y2 from second row of the pair, similarly for each pair of rows.
Sample Output:
[[149,2653,2152,2468],[149,1403,2152,1218],[170,2692,2170,2478],[170,1413,2170,1288]]
Please feel free to ask if it's not clear.
So far I tried grouping by pairs, and tried shift operation.
But I didn't manage to make make pair records.
Python solution:
Select values of columns by positions to lists:
a = df[['x2', 'y2']].iloc[1::2].values.tolist()
b = df[['x1', 'y1']].iloc[0::2].values.tolist()
And then zip and join together in list comprehension:
L = [y + x for x, y in zip(a, b)]
print (L)
[[149, 2653, 2152, 2468], [149, 1403, 2152, 1218],
[170, 2692, 2170, 2478], [170, 1413, 2170, 1288]]
Thank you, #user2285236 for another solution:
L = np.concatenate([df.loc[::2, ['x1', 'y1']], df.loc[1::2, ['x2', 'y2']]], axis=1).tolist()
Pure pandas solution:
First DataFrameGroupBy.shift by each 2 rows:
df[['x2', 'y2']] = df.groupby(np.arange(len(df)) // 2)[['x2', 'y2']].shift(-1)
print (df)
x1 y1 x2 y2
0 149 2653 2152.0 2468.0
1 149 2465 NaN NaN
2 149 1403 2152.0 1218.0
3 149 1215 NaN NaN
4 170 2692 2170.0 2478.0
5 170 2475 NaN NaN
6 170 1413 2170.0 1288.0
7 170 1285 NaN NaN
Then remove NaNs rows, convert to int and then to list:
print (df.dropna().astype(int).values.tolist())
[[149, 2653, 2152, 2468], [149, 1403, 2152, 1218],
[170, 2692, 2170, 2478], [170, 1413, 2170, 1288]]
Here's one solution via numpy.hstack. Note it is natural to feed numpy arrays directly to pd.DataFrame, since this is how Pandas stores data internally.
import numpy as np
arr = np.hstack((df[['x1', 'y1']].values[::2],
df[['x2', 'y2']].values[1::2]))
res = pd.DataFrame(arr)
print(res)
0 1 2 3
0 149 2653 2152 2468
1 149 1403 2152 1218
2 170 2692 2170 2478
3 170 1413 2170 1288
Here's a solution using a custom iterator based on iterrows(), but it's a bit clunky:
import pandas as pd
df = pd.DataFrame( columns=['x1','y1','x2','y2'], data=
[[149, 2653, 2152, 2656], [149, 2465, 2152, 2468], [149, 1403, 2152, 1406], [149, 1215, 2152, 1218],
[170, 2692, 2170, 2695], [170, 2475, 2170, 2478], [170, 1413, 2170, 1416], [170, 1285, 2170, 1288]] )
def iter_oddeven_pairs(df):
row_it = df.iterrows()
try:
while True:
_,row = next(row_it)
yield row[0:2]
_,row = next(row_it)
yield row[2:4]
except StopIteration:
pass
print(pd.concat([pair for pair in iter_oddeven_pairs(df)]))

groupby multi colums and change it to dataFrame/array

Hi I have a dataFrame like this:
Value day hour min
Time
2015-12-19 10:08:52 1805 2015-12-19 10 8
2015-12-19 10:09:52 1794 2015-12-19 10 9
2015-12-19 10:19:51 1796 2015-12-19 10 19
2015-12-19 10:20:51 1806 2015-12-19 10 20
2015-12-19 10:29:52 1802 2015-12-19 10 29
2015-12-19 10:30:52 1800 2015-12-19 10 30
2015-12-19 10:40:51 1804 2015-12-19 10 40
2015-12-19 10:41:51 1798 2015-12-19 10 41
2015-12-19 10:50:51 1790 2015-12-19 10 50
2015-12-19 10:51:52 1811 2015-12-19 10 51
2015-12-19 11:00:51 1803 2015-12-19 11 0
2015-12-19 11:01:52 1784 2015-12-19 11 1
... ... ... ... ...
2016-07-15 17:30:13 1811 2016-07-15 17 30
2016-07-15 17:31:13 1787 2016-07-15 17 31
2016-07-15 17:41:13 1800 2016-07-15 17 41
2016-07-15 17:42:13 1795 2016-07-15 17 42
I want to group it by day and hour, and finally make it a multi-dimentional array for the "Value" column like this for example:
based on grouping of day and hour, I need to get each hour something like this:
2015-12-19 10 [1805, 1794, 1796, 1806, 1802, 1800, 1804, 179... ]
2015-12-20 11 [1803, 1793, 1795, 1801, 1796, 1796, 1788, 180... ]
...
2016-07-15 17 [1794, 1792, 1788, 1799, 1811, 1803, 1808, 179... ]
In the end, I wish I can have a dataframe like this:
Time_index hour value1 value2 value3 ........value20
2015-12-19 10 1805, 1794, 1796, 1806 ... 1804, 1791, 1788, 1812
2015-12-20 11 1803, 1793, 1795, 1801 ... 1796, 1796, 1788, 1800
...
2016-07-15 17 1794, 1792, 1788, 1799 ... 1811, 1803, 1808, 1790
OR a array like this:
[[1805, 1794, 1796, 1806, 1802, 1800, 1804, 179... ],[1803, 1793, 1795, 1801, 1796, 1796, 1788, 180... ]....[1794, 1792, 1788, 1799, 1811, 1803, 1808, 179... ]]
I was able to get groupby with one column works:
grouped_0 = train_df.groupby(['day'])
grouped = grouped_0.aggregate(lambda x: list(x))
grouped['grouped'] = grouped['Value']
The output of the dataFrame grouped's 'grouped' column is like:
2015-12-19 [1805, 1794, 1796, 1806, 1802, 1800, 1804, 179...
2015-12-20 [1790, 1809, 1809, 1789, 1807, 1804, 1790, 179...
2015-12-21 [1794, 1792, 1788, 1799, 1811, 1803, 1808, 179...
2015-12-22 [1815, 1812, 1798, 1808, 1802, 1788, 1808, 179...
2015-12-23 [1803, 1800, 1799, 1803, 1802, 1804, 1788, 179...
2015-12-24 [1803, 1795, 1801, 1798, 1799, 1802, 1799, 179...
However, when I tried this:
grouped_0 = train_df.groupby(['day', 'hour'])
grouped = grouped_0.aggregate(lambda x: list(x))
grouped['grouped'] = grouped['Value']
it threw this error:
Traceback (most recent call last):
File "<input>", line 3, in <module>
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 4036, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 3476, in aggregate
return self._python_agg_general(arg, *args, **kwargs)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 848, in _python_agg_general
result, counts = self.grouper.agg_series(obj, f)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 2180, in agg_series
return self._aggregate_series_pure_python(obj, func)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 2215, in _aggregate_series_pure_python
raise ValueError('Function does not reduce')
ValueError: Function does not reduce
my pandas version:
pd.version
'0.20.3'
Yes, using agg for this isn't the best idea, because, unless the result is a container with a single object, the result is not considered valid.
You can use groupby + apply for this.
g = df.groupby(['day', 'hour']).Value.apply(lambda x: x.values.tolist())
g
day hour
2015-12-19 10 [1805, 1794, 1796, 1806, 1802, 1800, 1804, 179...
11 [1803, 1784]
2016-07-15 17 [1811, 1787, 1800, 1795]
Name: Value, dtype: object
If you want each element in its own column, you'd do it like this:
v = pd.DataFrame(g.values.tolist(), index=g.index)\
.rename(columns=lambda x: 'value{}'.format(x + 1)).reset_index()
v is your final result.

Python Pandas DataFrame resample daily data to week by Mon-Sun weekly definition?

import pandas as pd
import numpy as np
dates = pd.date_range('20141229',periods=14, name='Day')
df = pd.DataFrame({'Sum1': [1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518],
'Sum2': [91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111],
'Sum3': [365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44],
'Sum4': [5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates); print(df)
The df produced looks like this:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-29 1667 91 365.95 5
2014-12-30 1229 75 398.97 5
2014-12-31 1360 75 285.12 1
2015-01-01 9232 254 992.17 5
2015-01-02 8866 239 1116.57 8
2015-01-03 4083 108 512.11 8
2015-01-04 3671 99 504.47 2
2015-01-05 10085 259 1190.96 10
2015-01-06 10005 395 1753.60 12
2015-01-07 8730 355 1646.25 16
2015-01-08 10056 332 1344.05 16
2015-01-09 10176 386 1582.67 6
2015-01-10 3792 96 560.95 6
2015-01-11 3518 111 736.44 3
Let's say I resample the Dataframe to try and sum the daily data into weekly rows:
df_resampled = df.resample('W', how='sum', label='left'); print(df_resampled)
This produces the following:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-28 30108 941 4175.36 34
2015-01-04 56362 1934 8814.92 69
Question 1: my definition of a week is Mon - Sun. Since my data starts on 2014-12-29 (a Monday), I want my Day label to also start on that day. How would I make the Day index label be the date of every Monday instead of every Sunday?
Desired Output:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-29 30108 941 4175.36 34
2015-01-05 56362 1934 8814.92 69
What have I tried regarding Question 1?
I changed 'W' to 'W-MON' but it produced 3 rows by counting 2014-12-29 in 2014-12-22 row which is not what I want:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-22 1667 91 365.95 5
2014-12-29 38526 1109 5000.37 39
2015-01-05 46277 1675 7623.96 59
Question 2: how would I format the Day index label to look like a range? Ex:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-29 - 2015-01-04 30108 941 4175.36 34
2015-01-05 - 2015-01-11 56362 1934 8814.92 69
In case anyone else was not aware, it turns out that the weekly Anchored Offsets are based on the end date. So, just resampling 'W' (which is the same as 'W-SUN') is by default a Monday to Sunday sample. The date listed is the end date. See this old bug report wherein neither the documentation nor the API got updated.
Given that you specified label='left' in the resample parameters, you must have realized that fact. It's also why using 'W-MON' does not have the desired effect. What is confusing is that the left bound is not actually in the interval.
So, to display the start date for the period instead of the end date, you may add a day to the index. That would mean you would do:
df_resampled.index = df_resampled.index + pd.DateOffset(days=1)
For completeness, here is your original data with another day (a Sunday) added on the beginning to show the grouping really is Monday to Sunday:
import pandas as pd
import numpy as np
dates = pd.date_range('20141228',periods=15, name='Day')
df = pd.DataFrame({'Sum1': [10000, 1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518],
'Sum2': [10000, 91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111],
'Sum3': [10000, 365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44],
'Sum4': [10000, 5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates);
print(df)
df_resampled = df.resample('W', how='sum', label='left')
df_resampled.index = df_resampled.index - pd.DateOffset(days=1)
print(df_resampled)
This outputs:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-28 10000 10000 10000.00 10000
2014-12-29 1667 91 365.95 5
2014-12-30 1229 75 398.97 5
2014-12-31 1360 75 285.12 1
2015-01-01 9232 254 992.17 5
2015-01-02 8866 239 1116.57 8
2015-01-03 4083 108 512.11 8
2015-01-04 3671 99 504.47 2
2015-01-05 10085 259 1190.96 10
2015-01-06 10005 395 1753.60 12
2015-01-07 8730 355 1646.25 16
2015-01-08 10056 332 1344.05 16
2015-01-09 10176 386 1582.67 6
2015-01-10 3792 96 560.95 6
2015-01-11 3518 111 736.44 3
Sum1 Sum2 Sum3 Sum4
Day
2014-12-22 10000 10000 10000.00 10000
2014-12-29 30108 941 4175.36 34
2015-01-05 56362 1934 8814.92 69
I believe that is what you wanted for Question 1.
Update
There is now a loffset argument to resample() that allows you to shift the label offset. So, instead of modifying the index, you simple add the loffset argument like so:
df.resample('W', how='sum', label='left', loffset=pd.DateOffset(days=1))
Also of note how=sum is now deprecated in favor of using .sum() on the Resampler object that .resample() returns. So, the fully updated call would be:
df_resampled = df.resample('W', label='left', loffset=pd.DateOffset(days=1)).sum()
Update 1.1.0
The handy loffset argument is deprecated as of version 1.1.0. The documentation indicates the shifting should be done after the resample. In this particular case, I believe that means this is the correct code (untested):
from pandas.tseries.frequencies import to_offset
df_resampled = df.resample('W', label='left').sum()
df_resampled.index = df_resampled.index + to_offset(pd.DateOffset(days=1))
Great question.
df_resampled = df.resample('W-MON', label='left', closed='left').sum()
The parameter closed could work for your question.
This might help.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1, 1000, (100, 4)), columns='Sum1 Sum2 Sum3 Sum4'.split(), index=pd.date_range('2014-12-29', periods=100, freq='D'))
def func(group):
return pd.Series({'Sum1': group.Sum1.sum(), 'Sum2': group.Sum2.sum(),
'Sum3': group.Sum3.sum(), 'Sum4': group.Sum4.sum(), 'Day': group.index[1], 'Period': '{0} - {1}'.format(group.index[0].date(), group.index[-1].date())})
df.groupby(lambda idx: idx.week).apply(func)
Out[386]:
Day Period Sum1 Sum2 Sum3 Sum4
1 2014-12-30 2014-12-29 - 2015-01-04 3559 3692 3648 4086
2 2015-01-06 2015-01-05 - 2015-01-11 2990 3658 3348 3304
3 2015-01-13 2015-01-12 - 2015-01-18 3168 3720 3518 3273
4 2015-01-20 2015-01-19 - 2015-01-25 2275 4968 4095 2366
5 2015-01-27 2015-01-26 - 2015-02-01 4146 2167 3888 4576
.. ... ... ... ... ... ...
11 2015-03-10 2015-03-09 - 2015-03-15 4035 3518 2588 2714
12 2015-03-17 2015-03-16 - 2015-03-22 3399 3901 3430 2143
13 2015-03-24 2015-03-23 - 2015-03-29 3227 3308 3185 3814
14 2015-03-31 2015-03-30 - 2015-04-05 4278 3369 3623 4167
15 2015-04-07 2015-04-06 - 2015-04-07 1466 632 1136 1392
[15 rows x 6 columns]

Categories