How can I split my data into chunks of 7 days each

How can I split my data into chunks of 7 days each - python

I want to group every 7 days together. The problem is the first date is on Wednesday and I want my weeks to start on Monday and end on Sunday without dropping any data. Even the last date on my data is on Monday. This is how my data looks now:
date bike_numbers
0 2017-06-28 632
1 2017-06-29 1019
2 2017-06-30 1038
3 2017-07-01 475
4 2017-07-02 523
... ... ...
550 2018-12-30 2653
551 2018-12-31 3044
I want it to show the bike rides only, where rows are an array of 7. I want it to look like this:
[632, 1019, 1038, 475, 523, 600, 558][1103, 1277,1126, 956, 433, 1347, 1506]... and so on till the last date

Use:
s = df.groupby(df.index // 7)['bike_numbers'].agg(list)
print (s)
0 [632, 1019, 1038, 475, 523]
78 [2653, 3044]
Name: bike_numbers, dtype: object
print (s.tolist())
[[632, 1019, 1038, 475, 523], [2653, 3044]]

Related

ValueError: Length of values (1) does not match length of index index (12797) - Indexes are the same length

So this is driving me crazy now, cause I really don't see the problem.
I have the following code:
dataframe.to_csv(f"user_data/candle_data.csv")
print (dataframe)
st12 = self.supertrend(dataframe, 3, 12)
st12['ST'].to_csv(f"user_data/st12.csv")
print (st12)
print(dataframe.index.difference(st12.index))
dataframe.loc[:, 'st_12'] = st12['ST'],
Checking the csv files and I can see that the first index is 0 and the last index is 12796. The last row is also on line number 12798. This is true for both files.
The output from three print is as follows
date open high low close volume
0 2020-12-29 21:45:00+00:00 723.33 726.14 723.26 725.05 3540.48612
1 2020-12-29 22:00:00+00:00 725.17 728.77 723.78 726.94 3983.90892
2 2020-12-29 22:15:00+00:00 726.94 727.30 724.72 724.75 3166.57435
3 2020-12-29 22:30:00+00:00 724.94 725.99 723.80 725.91 2848.08122
4 2020-12-29 22:45:00+00:00 725.99 730.30 725.95 729.64 6288.69499
... ... ... ... ... ... ...
12792 2021-05-12 03:45:00+00:00 4292.42 4351.85 4292.35 4332.81 24410.30155
12793 2021-05-12 04:00:00+00:00 4332.12 4347.60 4300.07 4343.05 16545.66776
12794 2021-05-12 04:15:00+00:00 4342.84 4348.00 4305.87 4313.82 10048.32828
12795 2021-05-12 04:30:00+00:00 4313.82 4320.68 4273.35 4287.49 13201.88547
12796 2021-05-12 04:45:00+00:00 4287.49 4306.79 4276.87 4300.80 9663.73327
[12797 rows x 6 columns]
ST STX
0 0.000000 nan
1 0.000000 nan
2 0.000000 nan
3 0.000000 nan
4 0.000000 nan
... ... ...
12792 4217.075684 up
12793 4217.075684 up
12794 4217.260609 up
12795 4217.260609 up
12796 4217.260609 up
[12797 rows x 2 columns]
RangeIndex(start=0, stop=0, step=1)
Full Error Traceback:
Traceback (most recent call last):
File "/freqtrade/freqtrade/main.py", line 37, in main
return_code = args['func'](args)
File "/freqtrade/freqtrade/commands/optimize_commands.py", line 53, in start_backtesting
backtesting.start()
File "/freqtrade/freqtrade/optimize/backtesting.py", line 479, in start
min_date, max_date = self.backtest_one_strategy(strat, data, timerange)
File "/freqtrade/freqtrade/optimize/backtesting.py", line 437, in backtest_one_strategy
preprocessed = self.strategy.ohlcvdata_to_dataframe(data)
File "/freqtrade/freqtrade/strategy/interface.py", line 670, in ohlcvdata_to_dataframe
return {pair: self.advise_indicators(pair_data.copy(), {'pair': pair})
File "/freqtrade/freqtrade/strategy/interface.py", line 670, in <dictcomp>
return {pair: self.advise_indicators(pair_data.copy(), {'pair': pair})
File "/freqtrade/freqtrade/strategy/interface.py", line 687, in advise_indicators
return self.populate_indicators(dataframe, metadata)
File "/freqtrade/user_data/strategies/TrippleSuperTrendStrategy.py", line 94, in populate_indicators
dataframe.loc[:, 'st_12'] = st12['ST'],
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/indexing.py", line 692, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/indexing.py", line 1597, in _setitem_with_indexer
self.obj[key] = value
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 3163, in __setitem__
self._set_item(key, value)
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 3242, in _set_item
value = self._sanitize_column(key, value)
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 3899, in _sanitize_column
value = sanitize_index(value, self.index)
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 751, in sanitize_index
raise ValueError(
ValueError: Length of values (1) does not match length of index (12797)
ERROR: 1
So if both data frames have exactly the same amount of rows and the indexes are exactly the same, why am I getting this error?

There is a typo:
dataframe.loc[:, 'st_12'] = st12['ST']
The comma is a typo.

Removing brackets and commas from a Dataframe column (28k rows) with no header Python 3.8

I have a CSV file with two columns (no header) like this,
Input.csv
0,"[3001, 12029, 14145, 14270, 14581, 16976, 25564]"
1,"[17, 34, 190, 875, 951, 1370, 2003, 2039, 2211, 2514, 3153, 3290, 3364, 3490, 4069, 5011, 5789]"
2,"[32, 808, 3354, 9835, 10082, 14276, 18084, 24576, 26177]"
3,"[3421, 3585, 5150, 5607, 9093, 12034, 15401, 16049, 24280]"
4,"[1116, 5203, 5252, 6347, 10838, 14995, 16304, 17462, 23757, 24023, 24122]"
5,"[872, 1971, 2040, 2518, 4081, 5786, 7029, 7224, 8596, 8775, 9798, 11385, 11780]"
6,[935]
...
28212,[28259]
I want to remove the brackets and commas of each array in the column and the comma, which separates each column. I would like something like this,
output.csv
0 3001 12029 14145 14270 14581 16976 25564
1 17 34 190 875 951 1370 2003 2039 2211 2514 3153 3290 3364 3490 4069 5011 5789
2 32 808 3354 9835 10082 14276 18084 24576 26177
3 3421 3585 5150 5607 9093 12034 15401 16049 24280
4 1116 5203 5252 6347 10838 14995 16304 17462 23757 24023 24122
5 872 1971 2040 2518 4081 5786 7029 7224 8596 8775 9798 11385 11780
6 935
...
28212 28259
I have tried str.replace and str.strip, but it does not work. I have also tried this Removing brackets and comma's in list and this Removing brackets from a DataFrame column when exporting to CSV without success.

fout = open('output.csv','w')
for line in open('input.csv'):
fout.write( line.replace('"','').replace(' ','').replace('[','').replace(']','') )
Remember, however, that once you do this, you can't read it into pandas, because it won't have a constant number of columns.

A slightly cleaner answer in my opinion than all the calls to replace (in the above answer) might just be to do this with regex:
import re
re.sub("\[|,|\]", "", line)
which will get rid of all brackets and commas

python error when finding count of cells where value was found

I have below code on toy data which works the day i want. Last 2 columns provide how many times value in column Jan was found in column URL and in how many distinct rows value in column Jan was found in column URL
sales = [{'account': '3', 'Jan': 'xxx', 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
{'account': '1', 'Jan': 'try', 'Feb': '210', 'URL': ''},
{'account': '2', 'Jan': 'bbbbb', 'Feb': '90', 'URL': 'ea2017-104.pdf bbbbb cc for why try' }]
df = pd.DataFrame(sales)
df
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))
why does the same code fails in the last case? How could i change my code to avoid the error. In case of my last example there are special characters in the first column, I felt that they are causing the problem. But when i look at row where index is 3 and 4, they have special characters too and code runs fine
answer2=answer[['Value','non_repeat_pdf']].iloc[0:11]
print(answer2)
Value non_repeat_pdf
0 effect\nive Initials: __\nDL_ -1- Date: __\n8/14/2017\n...
1 closing ####
2 executing ####
3 order, ####
4 waives: ####
5 right ####
6 notice ####
7 intention ####
8 prohibit ####
9 further ####
10 participation ####
answer2['Value'].apply(lambda x: sum(answer2['non_repeat_pdf'].str.contains(x)))
Out[220]:
0 1
1 0
2 1
3 0
4 1
5 1
6 0
7 0
8 1
9 0
10 0
Name: Value, dtype: int64
answer2=answer[['Value','non_repeat_pdf']].iloc[10:11]
print(answer2)
Value non_repeat_pdf
10 participation ####
answer2['Value'].apply(lambda x: sum(answer2['non_repeat_pdf'].str.contains(x)))
Out[212]:
10 0
Name: Value, dtype: int64
answer2=answer[['Value','non_repeat_pdf']].iloc[11:12]
print(answer2)
Value non_repeat_pdf
11 1818(e); ####
answer2['Value'].apply(lambda x: sum(answer2['non_repeat_pdf'].str.contains(x)))
Traceback (most recent call last):
File "<ipython-input-215-2df7f4b2de41>", line 1, in <module>
answer2['Value'].apply(lambda x: sum(answer2['non_repeat_pdf'].str.contains(x)))
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py", line 2355, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/src\inference.pyx", line 1574, in pandas._libs.lib.map_infer
File "<ipython-input-215-2df7f4b2de41>", line 1, in <lambda>
answer2['Value'].apply(lambda x: sum(answer2['non_repeat_pdf'].str.contains(x)))
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\strings.py", line 1562, in contains
regex=regex)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\strings.py", line 254, in str_contains
stacklevel=3)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\warnings.py", line 99, in _showwarnmsg
msg.file, msg.line)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1069, in _showwarning
file.write(formatWarning(message, category, filename, lineno, line))
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\PyPDF2\utils.py", line 69, in formatWarning
file = filename.replace("/", "\\").rsplit("\\", 1)[1] # find the file name
IndexError: list index out of range
update
I modified my code and removed all special character from the Value column. I am still getting the error...what could be wrong.
Even with the error, the new column gets added to my answer2 dataframe
answer2=answer[['Value','non_repeat_pdf']]
print(answer2)
Value non_repeat_pdf
0 law Initials: __\nDL_ -1- Date: __\n8/14/2017\n...
1 concerned
2 rights
3 c
4 violate
5 8
6 agreement
7 voting
8 previously
9 supervisory
10 its
11 exercise
12 occs
13 entities
14 those
15 approved
16 1818h2
17 9
18 are
19 manner
20 their
21 affairs
22 b
23 solicit
24 procure
25 transfer
26 attempt
27 extraneous
28 modification
29 vote
... ...
1552 closing
1553 heavily
1554 pm
1555 throughout
1556 half
1557 window
1558 sixtysecond
1559 activity
1560 sampling
1561 using
1562 hour
1563 violated
1564 euro
1565 rates
1566 derivatives
1567 portfolios
1568 valuation
1569 parties
1570 numerous
1571 they
1572 reference
1573 because
1574 us
1575 important
1576 moment
1577 snapshot
1578 cet
1579 215
1580 finance
1581 supervision
[1582 rows x 2 columns]
answer2['found_in_all_PDF'] = answer2['Value'].apply(lambda x: ''.join(answer2['non_repeat_pdf'].tolist()).count(x))
Traceback (most recent call last):
File "<ipython-input-298-4dc80361895c>", line 1, in <module>
answer2['found_in_all_PDF'] = answer2['Value'].apply(lambda x: ''.join(answer2['non_repeat_pdf'].tolist()).count(x))
File "C:\Users\\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 2331, in __setitem__
self._set_item(key, value)
File "C:\Users\\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 2404, in _set_item
self._check_setitem_copy()
File "C:\Users\\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 1873, in _check_setitem_copy
warnings.warn(t, SettingWithCopyWarning, stacklevel=stacklevel)
File "C:\Users\\AppData\Local\Continuum\anaconda3\lib\warnings.py", line 99, in _showwarnmsg
msg.file, msg.line)
File "C:\Users\\AppData\Local\Continuum\anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1069, in _showwarning
file.write(formatWarning(message, category, filename, lineno, line))
File "C:\Users\\AppData\Local\Continuum\anaconda3\lib\site-packages\PyPDF2\utils.py", line 69, in formatWarning
file = filename.replace("/", "\\").rsplit("\\", 1)[1] # find the file name
IndexError: list index out of range
update2
below works
answer2=answer[['Value','non_repeat_pdf']]
xyz= answer2['Value'].apply(lambda x: ''.join(answer2['non_repeat_pdf'].tolist()).count(x))
xyz=xyz.to_frame()
xyz.columns=['found_in_all_PDF']
pd.concat([answer2, xyz], axis=1)
Out[305]:
Value non_repeat_pdf \
0 law Initials: __\nDL_ -1- Date: __\n8/14/2017\n...
1 concerned
2 rights
3 c
4 violate
5 8
6 agreement
7 voting
8 previously
9 supervisory
10 its
11 exercise
12 occs
13 entities
14 those
15 approved
16 1818h2
17 9
18 are
19 manner
20 their
21 affairs
22 b
23 solicit
24 procure
25 transfer
26 attempt
27 extraneous
28 modification
29 vote
... ...
1552 closing
1553 heavily
1554 pm
1555 throughout
1556 half
1557 window
1558 sixtysecond
1559 activity
1560 sampling
1561 using
1562 hour
1563 violated
1564 euro
1565 rates
1566 derivatives
1567 portfolios
1568 valuation
1569 parties
1570 numerous
1571 they
1572 reference
1573 because
1574 us
1575 important
1576 moment
1577 snapshot
1578 cet
1579 215
1580 finance
1581 supervision
found_in_all_PDF
0 6
1 1
2 4
3 1036
4 9
5 93
6 4
7 2
8 1
9 2
10 6
11 1
12 0
13 1
14 3
15 1
16 0
17 25
18 20
19 3
20 14
21 4
22 358
23 2
24 1
25 2
26 6
27 1
28 1
29 3
...
1552 3
1553 2
1554 0
1555 5
1556 2
1557 3
1558 0
1559 2
1560 1
1561 5
1562 2
1563 7
1564 8
1565 3
1566 0
1567 1
1568 1
1569 4
1570 1
1571 9
1572 2
1573 2
1574 96
1575 1
1576 1
1577 1
1578 0
1579 0
1580 1
1581 0
[1582 rows x 3 columns]

Unfortunately i can't reproduce exactly same error on my environment. But what I see is warning about wrong regexp usage. Your string was interpreted as capturing regular expression because of brackets in the string "1818(e);". Try use str.contains with regex=False.
answer2 =pd.DataFrame({'Value': {11: '1818(e);'}, 'non_repeat_pdf': {11: '####'}})
answer2['Value'].apply(lambda x: sum(answer2['non_repeat_pdf'].str.contains(x,regex=False)))
Output:
11 0
Name: Value, dtype: int64

Getting a list of the range of 2 pandas columns

I have the following DataFrame (reformatted a bit):
f_name l_name n f_bought l_bought
0 Abraham Livingston 24 1164 1187
1 John Brown 4 1188 1191
2 Samuel Barret 16 1192 1207
3 Nathan Blodget 4 1208 1212
4 Bobby Abraham 1 1212 1212
I want to create a column, bought, that is a list range(df[f_bought], df[l_bought]).
I've tried:
def getRange(l1,l2):
r = list(range(l1, l2))
df.apply(lambda index: getRange(df['f_bond'], df['l_bond']),axis=1)
but it results in a TypeError:
"cannot convert the series to <type 'int'>", u'occurred at index 0'
I've tried a df.info(), and both columns are type int64.
I'm wondering if I should use something like df.loc[] or similar? Or something else entirely?

You should be able to do this using apply which is for applying a function to every row or every column of a data frame.
def bought_range(row):
return range(row.f_bought, row.l_bought)
df['bought_range'] = df.apply(bought_range, axis=1)
Which results in:
f_name l_name n f_bought l_bought \
0 Abraham Livingston 24 1164 1187
1 John Brown 4 1188 1191
2 Samuel Barret 16 1192 1207
3 Nathan Blodget 4 1208 1212
4 Bobby Abraham 1 1212 1212
bought_range
0 [1164, 1165, 1166, 1167, 1168, 1169, 1170, 117...
1 [1188, 1189, 1190]
2 [1192, 1193, 1194, 1195, 1196, 1197, 1198, 119...
3 [1208, 1209, 1210, 1211]
4 []
One word of warning is that Python's range doesn't include the upper limit:
In [1]: range(3, 6)
Out[1]: [3, 4, 5]
It's not hard to deal with (return range(row.f_bought, row.l_bought + 1)) but does need taking into account.

Python Pandas DataFrame resample daily data to week by Mon-Sun weekly definition?

import pandas as pd
import numpy as np
dates = pd.date_range('20141229',periods=14, name='Day')
df = pd.DataFrame({'Sum1': [1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518],
'Sum2': [91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111],
'Sum3': [365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44],
'Sum4': [5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates); print(df)
The df produced looks like this:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-29 1667 91 365.95 5
2014-12-30 1229 75 398.97 5
2014-12-31 1360 75 285.12 1
2015-01-01 9232 254 992.17 5
2015-01-02 8866 239 1116.57 8
2015-01-03 4083 108 512.11 8
2015-01-04 3671 99 504.47 2
2015-01-05 10085 259 1190.96 10
2015-01-06 10005 395 1753.60 12
2015-01-07 8730 355 1646.25 16
2015-01-08 10056 332 1344.05 16
2015-01-09 10176 386 1582.67 6
2015-01-10 3792 96 560.95 6
2015-01-11 3518 111 736.44 3
Let's say I resample the Dataframe to try and sum the daily data into weekly rows:
df_resampled = df.resample('W', how='sum', label='left'); print(df_resampled)
This produces the following:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-28 30108 941 4175.36 34
2015-01-04 56362 1934 8814.92 69
Question 1: my definition of a week is Mon - Sun. Since my data starts on 2014-12-29 (a Monday), I want my Day label to also start on that day. How would I make the Day index label be the date of every Monday instead of every Sunday?
Desired Output:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-29 30108 941 4175.36 34
2015-01-05 56362 1934 8814.92 69
What have I tried regarding Question 1?
I changed 'W' to 'W-MON' but it produced 3 rows by counting 2014-12-29 in 2014-12-22 row which is not what I want:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-22 1667 91 365.95 5
2014-12-29 38526 1109 5000.37 39
2015-01-05 46277 1675 7623.96 59
Question 2: how would I format the Day index label to look like a range? Ex:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-29 - 2015-01-04 30108 941 4175.36 34
2015-01-05 - 2015-01-11 56362 1934 8814.92 69

In case anyone else was not aware, it turns out that the weekly Anchored Offsets are based on the end date. So, just resampling 'W' (which is the same as 'W-SUN') is by default a Monday to Sunday sample. The date listed is the end date. See this old bug report wherein neither the documentation nor the API got updated.
Given that you specified label='left' in the resample parameters, you must have realized that fact. It's also why using 'W-MON' does not have the desired effect. What is confusing is that the left bound is not actually in the interval.
So, to display the start date for the period instead of the end date, you may add a day to the index. That would mean you would do:
df_resampled.index = df_resampled.index + pd.DateOffset(days=1)
For completeness, here is your original data with another day (a Sunday) added on the beginning to show the grouping really is Monday to Sunday:
import pandas as pd
import numpy as np
dates = pd.date_range('20141228',periods=15, name='Day')
df = pd.DataFrame({'Sum1': [10000, 1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518],
'Sum2': [10000, 91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111],
'Sum3': [10000, 365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44],
'Sum4': [10000, 5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates);
print(df)
df_resampled = df.resample('W', how='sum', label='left')
df_resampled.index = df_resampled.index - pd.DateOffset(days=1)
print(df_resampled)
This outputs:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-28 10000 10000 10000.00 10000
2014-12-29 1667 91 365.95 5
2014-12-30 1229 75 398.97 5
2014-12-31 1360 75 285.12 1
2015-01-01 9232 254 992.17 5
2015-01-02 8866 239 1116.57 8
2015-01-03 4083 108 512.11 8
2015-01-04 3671 99 504.47 2
2015-01-05 10085 259 1190.96 10
2015-01-06 10005 395 1753.60 12
2015-01-07 8730 355 1646.25 16
2015-01-08 10056 332 1344.05 16
2015-01-09 10176 386 1582.67 6
2015-01-10 3792 96 560.95 6
2015-01-11 3518 111 736.44 3
Sum1 Sum2 Sum3 Sum4
Day
2014-12-22 10000 10000 10000.00 10000
2014-12-29 30108 941 4175.36 34
2015-01-05 56362 1934 8814.92 69
I believe that is what you wanted for Question 1.
Update
There is now a loffset argument to resample() that allows you to shift the label offset. So, instead of modifying the index, you simple add the loffset argument like so:
df.resample('W', how='sum', label='left', loffset=pd.DateOffset(days=1))
Also of note how=sum is now deprecated in favor of using .sum() on the Resampler object that .resample() returns. So, the fully updated call would be:
df_resampled = df.resample('W', label='left', loffset=pd.DateOffset(days=1)).sum()
Update 1.1.0
The handy loffset argument is deprecated as of version 1.1.0. The documentation indicates the shifting should be done after the resample. In this particular case, I believe that means this is the correct code (untested):
from pandas.tseries.frequencies import to_offset
df_resampled = df.resample('W', label='left').sum()
df_resampled.index = df_resampled.index + to_offset(pd.DateOffset(days=1))

Great question.
df_resampled = df.resample('W-MON', label='left', closed='left').sum()
The parameter closed could work for your question.

This might help.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1, 1000, (100, 4)), columns='Sum1 Sum2 Sum3 Sum4'.split(), index=pd.date_range('2014-12-29', periods=100, freq='D'))
def func(group):
return pd.Series({'Sum1': group.Sum1.sum(), 'Sum2': group.Sum2.sum(),
'Sum3': group.Sum3.sum(), 'Sum4': group.Sum4.sum(), 'Day': group.index[1], 'Period': '{0} - {1}'.format(group.index[0].date(), group.index[-1].date())})
df.groupby(lambda idx: idx.week).apply(func)
Out[386]:
Day Period Sum1 Sum2 Sum3 Sum4
1 2014-12-30 2014-12-29 - 2015-01-04 3559 3692 3648 4086
2 2015-01-06 2015-01-05 - 2015-01-11 2990 3658 3348 3304
3 2015-01-13 2015-01-12 - 2015-01-18 3168 3720 3518 3273
4 2015-01-20 2015-01-19 - 2015-01-25 2275 4968 4095 2366
5 2015-01-27 2015-01-26 - 2015-02-01 4146 2167 3888 4576
.. ... ... ... ... ... ...
11 2015-03-10 2015-03-09 - 2015-03-15 4035 3518 2588 2714
12 2015-03-17 2015-03-16 - 2015-03-22 3399 3901 3430 2143
13 2015-03-24 2015-03-23 - 2015-03-29 3227 3308 3185 3814
14 2015-03-31 2015-03-30 - 2015-04-05 4278 3369 3623 4167
15 2015-04-07 2015-04-06 - 2015-04-07 1466 632 1136 1392
[15 rows x 6 columns]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I split my data into chunks of 7 days each - python

Use: s = df.groupby(df.index // 7)['bike_numbers'].agg(list) print (s) 0 [632, 1019, 1038, 475, 523] 78 [2653, 3044] Name: bike_numbers, dtype: object print (s.tolist()) [[632, 1019, 1038, 475, 523], [2653, 3044]]

Related

ValueError: Length of values (1) does not match length of index index (12797) - Indexes are the same length

Removing brackets and commas from a Dataframe column (28k rows) with no header Python 3.8

python error when finding count of cells where value was found

Getting a list of the range of 2 pandas columns

Python Pandas DataFrame resample daily data to week by Mon-Sun weekly definition?

Categories

Resources