How to loop through a pandas grouped time series?

How to loop through a pandas grouped time series? - python

I have a dataframe like this:
datetime type d13C ... dayofyear week dmy
1 2018-01-05 15:22:30 air -8.88 ... 5 1 5-1-2018
2 2018-01-05 15:23:30 air -9.08 ... 5 1 5-1-2018
3 2018-01-05 15:24:30 air -10.08 ... 5 1 5-1-2018
4 2018-01-05 15:25:30 air -9.51 ... 5 1 5-1-2018
5 2018-01-05 15:26:30 air -9.61 ... 5 1 5-1-2018
... ... ... ... ... ... ...
341543 2018-12-17 12:42:30 air -9.99 ... 351 51 17-12-2018
341544 2018-12-17 12:43:30 air -9.53 ... 351 51 17-12-2018
341545 2018-12-17 12:44:30 air -9.54 ... 351 51 17-12-2018
341546 2018-12-17 12:45:30 air -9.93 ... 351 51 17-12-2018
341547 2018-12-17 12:46:30 air -9.66 ... 351 51 17-12-2018
Full data here: https://drive.google.com/file/d/1KmOwnpvrG2Edz1AlLyD0CKZlBpaFervM/view?usp=sharing
I'm plotting d13C column on the Y-axis and inverse total_co2 on the X and then fitting a regression line for each day in the data. I then filter out and store the dates I want depending on if the r^2 value of the regression line is > 0.8 like this:
import pandas as pd
from numpy.polynomial.polynomial import polyfit
import numpy as np
from scipy import stats
df = pd.read_csv('dataset.txt', usecols = ['datetime', 'type', 'total_co2', 'd13C', 'day','month','year','dayofyear','week','hour'], dtype = {'total_co2':
np.float64, 'd13C':np.float64, 'day':str, 'month':str, 'year':str,'week':str, 'hour': str, 'dayofyear':str})
df['dmy'] = df['day'] +'-'+ df['month'] +'-'+ df['year'] # adding a full date column to make it easir to filter through
# the rows, ie. each day
# window18 = df[((df['year']=='2018'))] # selecting just the data from the year 2018
accepted_dates_list = [] # creating an empty list to store the dates that we're interested in
for d in df['dmy'].unique(): # this will pass through each day, the .unique() ensures that it doesnt go over the same days
acceptable_date = {} # creating a dictionary to store the valid dates
period = df[df.dmy==d] # defining each period from the dmy column
p = (period['total_co2'])**-1
q = period['d13C']
c,m = polyfit(p,q,1) # intercept and gradient calculation of the regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(p, q) # getting some statistical properties of the regression line
if r_value**2 >= 0.8:
acceptable_date['period'] = d # populating the dictionary with the accpeted dates and corresponding other values
acceptable_date['r-squared'] = r_value**2
acceptable_date['intercept'] = intercept
accepted_dates_list.append(acceptable_date) # sending the valid stuff in the dictionary to the list
else:
pass
accepted_dates18 = pd.DataFrame(accepted_dates_list) # converting the list to a df
print(accepted_dates18)
But now I want to do the same thing, just over three day periods which I'm trying to select from the day of year column (unsure if this is the best way or not). For example, I would want to fit the regression line using all the rows with dayofyear=5, dayofyear=6, dayofyear=7, then for the next three days until the end of the data. There are some days missing, but essentially I just need to do this for every 3 days in the data.
The output dataframe I am then trying to get would have the list of the three day intervals with the r^2 >0.8, so anything like this that will show the valid date range:
Accepted dates
0 23-08-2018 - 25-08-2018
1 26-08-2018 - 28-08-2018
2 31-08-2018 - 02-09-2018
3 15-09-2018 - 17-09-2018
4 24-09-2018 - 26-09-2018
I'm not too sure what to do to iterate over every three days. Any help would go a long way, thanks!

Your code loops through a list of unique dates and filters the dataframe on each iteration.
Pandas implemented this with df.groupby(). It can be used to loop and get each group or it can be combined with aggregations, function applications, and transformations. You can read more about it on the user guide. This function can return groups according to any the columns (or set of columns) in df, levels of the index, or any other exogenous list-like with the same length as df (we are grouping rows, but note it can also group columns). It even has implementations for the most common statistical aggregations like mean, stdev, and corr, among many others.
Now to your problem. You not only want the correlation but the equation, so you do need to loop. And to get three-day groups you can use that dayofyear column with a twist.
Take this data
import io
fo = io.StringIO(
'''datetime,d13C
2018-01-05 15:22:30,-8.88
2018-01-05 15:23:30,-9.08
2018-01-06 15:24:30,-10.0
2018-01-06 15:25:30,-9.51
2018-01-07 15:26:30,-9.61
2018-01-07 15:27:30,-9.61
2018-01-08 15:28:30,-9.61
2018-01-08 15:29:30,-9.61
2018-01-09 15:26:30,-9.61
2018-01-09 15:27:30,-9.61
''')
df = pd.read_csv(fo)
df.datetime = pd.to_datetime(df.datetime)
fo.close()
With the code for grouping and looping
first_day = 5
days_to_group = 3
for doy, gdf in df.groupby((df.datetime.dt.dayofyear.sub(first_day) // days_to_group)
* days_to_group + first_day):
print(gdf, '\n')
print(doy, '\n')
Output
datetime d13C
0 2018-01-05 15:22:30 -8.88
1 2018-01-05 15:23:30 -9.08
2 2018-01-06 15:24:30 -10.00
3 2018-01-06 15:25:30 -9.51
4 2018-01-07 15:26:30 -9.61
5 2018-01-07 15:27:30 -9.61
5
datetime d13C
6 2018-01-08 15:28:30 -9.61
7 2018-01-08 15:29:30 -9.61
8 2018-01-09 15:26:30 -9.61
9 2018-01-09 15:27:30 -9.61
8
Now you can plug your code into this loop and get what you need.
PS
You can also use df.datetime.dt.floor('3d') as the grouper but I am not aware of how to control the first_day, so use it with caution.

Here is one approach. As I understand it, the primary goal is to get from current observations (multiple per day) to a 3-day moving average. First, I created a smaller, simpler data set:
import pandas as pd
df = pd.DataFrame({'counter': [*range(100)],
'date': pd.date_range('2020-01-01', periods=100, freq='7H')})
df = df.set_index('date')
print(df.head())
counter
date
2020-01-01 00:00:00 0
2020-01-01 07:00:00 1
2020-01-01 14:00:00 2
2020-01-01 21:00:00 3
2020-01-02 04:00:00 4
Second, I re-sampled on a daily basis:
df2 = df['counter'].resample('1D').mean() # <-- called df2
print(df2.head())
date
2020-01-01 1.5
2020-01-02 5.0
2020-01-03 8.5
2020-01-04 12.0
2020-01-05 15.5
Freq: D, Name: counter, dtype: float64
Third, I computed mean value for a 3-day moving window:
print(df2.rolling(3).mean().head())
date
2020-01-01 NaN
2020-01-02 NaN
2020-01-03 5.0
2020-01-04 8.5
2020-01-05 12.0
Freq: D, Name: counter, dtype: float64
Seems like resample().mean() and rolling().mean() would be useful in this case.

Related

python masking each day in dataframe

I have to make a daily sum on a dataframe but only if at least 70% of the daily data is not NaN. If it is then this day must not be taken into account. Is there a way to create such a mask? My dataframe is more than 17 years of hourly data.
my data is something like this:
clear skies all skies Lab
2015-02-26 13:00:00 597.5259 376.1830 307.62
2015-02-26 14:00:00 461.2014 244.0453 199.94
2015-02-26 15:00:00 283.9003 166.5772 107.84
2015-02-26 16:00:00 93.5099 50.7761 23.27
2015-02-26 17:00:00 1.1559 0.2784 0.91
... ... ...
2015-12-05 07:00:00 95.0285 29.1006 45.23
2015-12-05 08:00:00 241.8822 120.1049 113.41
2015-12-05 09:00:00 363.8040 196.0568 244.78
2015-12-05 10:00:00 438.2264 274.3733 461.28
2015-12-05 11:00:00 456.3396 330.6650 447.15
if I groupby and aggregate than there is no way to know if in any day there was some lack of data and some days will have lower sums and therefore lowering my monthly means

As said in the comments, use groupby to group the data by date and then write an appropriate selection. This is an example that would sum all days (assuming regular data points, 24 per day) with less than 50% of nan entries:
import pandas as pd
import numpy as np
# create a date range
date_rng = pd.date_range(start='1/1/2018', end='1/1/2021', freq='H')
# create random data
df = pd.DataFrame({"data":np.random.randint(0,100,size=(len(date_rng)))}, index = date_rng)
# set some values to nan
df["data"][df["data"] > 50] = np.nan
# looks like this
df.head(20)
# sum everything where less than 50% are nan
df.groupby(df.index.date).sum()[df.isna().groupby(df.index.date).sum() < 12]
Example output:
data
2018-01-01 NaN
2018-01-02 NaN
2018-01-03 487.0
2018-01-04 NaN
2018-01-05 421.0
... ...
2020-12-28 NaN
2020-12-29 NaN
2020-12-30 NaN
2020-12-31 392.0
2021-01-01 0.0

An alternative solution - you may find it useful & flexible:
# pip install convtools
from convtools import conversion as c
total_number = c.ReduceFuncs.Count()
total_not_none = c.ReduceFuncs.Count(where=c.item("amount").is_not(None))
total_sum = c.ReduceFuncs.Sum(c.item("amount"))
input_data = [] # e.g. iterable of dicts
converter = (
c.group_by(
c.item("key1"),
c.item("key2"),
)
.aggregate(
{
"key1": c.item("key1"),
"key2": c.item("key2"),
"sum_if_70": c.if_(
total_not_none / total_number < 0.7,
None,
total_sum,
),
}
)
.gen_converter(
debug=False
) # install black and set to True to see the generated ad-hoc code
)
result = converter(input_data)

monthly resampling pandas with specific start day

I'm creating a pandas DataFrame with random dates and random integers values and I want to resample it by month and compute the average value of integers. This can be done with the following code:
def random_dates(start='2018-01-01', end='2019-01-01', n=300):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
start = pd.to_datetime('2018-01-01')
end = pd.to_datetime('2019-01-01')
dates = random_dates(start, end)
ints = np.random.randint(100, size=300)
df = pd.DataFrame({'Month': dates, 'Integers': ints})
print(df.resample('M', on='Month').mean())
The thing is that the resampled months always starts from day one and I want all months to start from day 15. I'm using pandas 1.1.4 and I've tried using origin='15/01/2018' or offset='15' and none of them works with 'M' resample rule (they do work when I use 30D but it is of no use). I've also tried to use '2SM'but it also doesn't work.
So my question is if is there a way of changing the resample rule or I will have to add an offset in my data?

Assume that the source DataFrame is:
Month Amount
0 2020-05-05 1
1 2020-05-14 1
2 2020-05-15 10
3 2020-05-20 10
4 2020-05-30 10
5 2020-06-15 20
6 2020-06-20 20
To compute your "shifted" resample, first shift Month column so that
the 15-th day of month becomes the 1-st:
df.Month = df.Month - pd.Timedelta('14D')
and then resample:
res = df.resample('M', on='Month').mean()
The result is:
Amount
Month
2020-04-30 1
2020-05-31 10
2020-06-30 20
If you want, change dates in the index to month periods:
res.index = res.index.to_period('M')
Then the result will be:
Amount
Month
2020-04 1
2020-05 10
2020-06 20

Edit: Not a working solution for OP's request. See short discussion in the comments.
Interesting problem. I suggest to resample using 'SMS' - semi-month start frequency (1st and 15th). Instead of keeping just the mean values, keep the count and sum values and recalculate the weighted mean for each monthly period by its two sub-period (for example: 15/1 to 15/2 is composed of 15/1-31/1 and 1/2-15/2).
The advantages here is that unlike with an (improper use of an) offset, we are certain we always start on the 15th of the month till the 14th of the next month.
df_sm = df.resample('SMS', on='Month').aggregate(['sum', 'count'])
df_sm
Integers
sum count
Month
2018-01-01 876 16
2018-01-15 864 16
2018-02-01 412 10
2018-02-15 626 12
...
2018-12-01 492 10
2018-12-15 638 16
Rolling sum and rolling count; Find the mean out of them:
df_sm['sum_rolling'] = df_sm['Integers']['sum'].rolling(2).sum()
df_sm['count_rolling'] = df_sm['Integers']['count'].rolling(2).sum()
df_sm['mean'] = df_sm['sum_rolling'] / df_sm['count_rolling']
df_sm
Integers count_sum count_rolling mean
sum count
Month
2018-01-01 876 16 NaN NaN NaN
2018-01-15 864 16 1740.0 32.0 54.375000
2018-02-01 412 10 1276.0 26.0 49.076923
2018-02-15 626 12 1038.0 22.0 47.181818
...
2018-12-01 492 10 1556.0 27.0 57.629630
2018-12-15 638 16 1130.0 26.0 43.461538
Now, just filter the odd indices of df_sm:
df_sm.iloc[1::2]['mean']
Month
2018-01-15 54.375000
2018-02-15 47.181818
2018-03-15 51.000000
2018-04-15 44.897436
2018-05-15 52.450000
2018-06-15 33.722222
2018-07-15 41.277778
2018-08-15 46.391304
2018-09-15 45.631579
2018-10-15 54.107143
2018-11-15 58.058824
2018-12-15 43.461538
Freq: 2SMS-15, Name: mean, dtype: float64
The code:
df_sm = df.resample('SMS', on='Month').aggregate(['sum', 'count'])
df_sm['sum_rolling'] = df_sm['Integers']['sum'].rolling(2).sum()
df_sm['count_rolling'] = df_sm['Integers']['count'].rolling(2).sum()
df_sm['mean'] = df_sm['sum_rolling'] / df_sm['count_rolling']
df_out = df_sm[1::2]['mean']
Edit: Changed a name of one of the columns to make it clearer

Python Pandas - Computing stats on TimeSeriesIndexedData for each customer

UsageDate CustID1 CustID2 .... CustIDn
0 2018-01-01 00:00:00 1.095
1 2018-01-01 01:00:00 1.129
2 2018-01-01 02:00:00 1.165
3 2018-01-01 04:00:00 1.697
.
.
m 2018-31-01 23:00:00 1.835 (m,n)
The dataframe (df) has m rows and n columns. m is a Hourly TimeSeries Index which starts from first hour of month to last hour of month.
The columns are the customers which are almost 100,000.
The values at each cell of Dataframe are energy consumption values.
For every customer, I need to calculate:
1) Mean of every hour usage - so basically average of 1st hour of every day in a month, 2nd hour of every day in a month etc.
2) Summation of usage of every customer
3) Top 3 usage hours - for a customer x, it can be "2018-01-01 01:00:00",
"2018-11-01 05:00:00" "2018-21-01 17:00:00"
4) Bottom 3 usage hours - Similar explanation as above
5) Mean of usage for every customer in the month
My main point of trouble is how to aggregate data both for every customer and the hour of day, or day together.
For summation of usage for every customer, I tried:
df_temp = pd.DataFrame(columns=["TotalUsage"])
for col in df.columns:
`df_temp[col,"TotalUsage"] = df[col].apply.sum()`
However, this and many version of this which I tried are not helping me solve the problem.
Please help me with an approach and how to think about such problems.
Also, since the dataframe is large, it would be helpful if we can talk about Computational Complexity and how can we decrease computation time.

This looks like a job for pandas.groupby.
(I didn't test the code because I didn't have a good sample dataset from which to work. If there are errors, let me know.)
For some of your requirements, you'll need to add a column with the hour:
df['hour']=df['UsageDate'].dt.hour
1) Mean by hour.
mean_by_hour=df.groupby('hour').mean()
2) Summation by user.
sum_by_uers=df.sum()
3) Top usage by customer. Bottom 3 usage hours - Similar explanation as above.--I don't quite understand your desired output, you might be asking too many different questions in this question. If you want the hour and not the value, I think you may have to iterate through the columns. Adding an example may help.
4) Same comment.
5) Mean by customer.
mean_by_cust = df.mean()

I am not sure if this is all the information you are looking for but it will point you in the right direction:
import pandas as pd
import numpy as np
# sample data for 3 days
np.random.seed(1)
data = pd.DataFrame(pd.date_range('2018-01-01', periods= 72, freq='H'), columns=['UsageDate'])
data2 = pd.DataFrame(np.random.rand(72,5), columns=[f'ID_{i}' for i in range(5)])
df = data.join([data2])
# print('Sample Data:')
# print(df.head())
# print()
# mean of every month and hour per year
# groupby year month hour then find the mean of every hour in a given year and month
mean_data = df.groupby([df['UsageDate'].dt.year, df['UsageDate'].dt.month, df['UsageDate'].dt.hour]).mean()
mean_data.index.names = ['UsageDate_year', 'UsageDate_month', 'UsageDate_hour']
# print('Mean Data:')
# print(mean_data.head())
# print()
# use set_index with max and head
top_3_Usage_hours = df.set_index('UsageDate').max(1).sort_values(ascending=False).head(3)
# print('Top 3:')
# print(top_3_Usage_hours)
# print()
# use set_index with min and tail
bottom_3_Usage_hours = df.set_index('UsageDate').min(1).sort_values(ascending=False).tail(3)
# print('Bottom 3:')
# print(bottom_3_Usage_hours)
out:
Sample Data:
UsageDate ID_0 ID_1 ID_2 ID_3 ID_4
0 2018-01-01 00:00:00 0.417022 0.720324 0.000114 0.302333 0.146756
1 2018-01-01 01:00:00 0.092339 0.186260 0.345561 0.396767 0.538817
2 2018-01-01 02:00:00 0.419195 0.685220 0.204452 0.878117 0.027388
3 2018-01-01 03:00:00 0.670468 0.417305 0.558690 0.140387 0.198101
4 2018-01-01 04:00:00 0.800745 0.968262 0.313424 0.692323 0.876389
Mean Data:
ID_0 ID_1 ID_2 \
UsageDate_year UsageDate_month UsageDate_hour
2018 1 0 0.250716 0.546475 0.202093
1 0.414400 0.264330 0.535928
2 0.335119 0.877191 0.380688
3 0.577429 0.599707 0.524876
4 0.702336 0.654344 0.376141
ID_3 ID_4
UsageDate_year UsageDate_month UsageDate_hour
2018 1 0 0.244185 0.598238
1 0.400003 0.578867
2 0.623516 0.477579
3 0.429835 0.510685
4 0.503908 0.595140
Top 3:
UsageDate
2018-01-01 21:00:00 0.997323
2018-01-03 23:00:00 0.990472
2018-01-01 08:00:00 0.988861
dtype: float64
Bottom 3:
UsageDate
2018-01-01 19:00:00 0.002870
2018-01-03 02:00:00 0.000402
2018-01-01 00:00:00 0.000114
dtype: float64
For top and bottom 3 if you want to find the min sum across rows then:
df.set_index('UsageDate').sum(1).sort_values(ascending=False).tail(3)

Is there easy way to get the peak and the lowest in python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
For consider my data in the following format:
20180101,10
20180102,20
20180103,15
....
The first is date and the second is how many products are sold, instead of insert all these into a database, and use a select max xxxx SQL statements to find out what is the max number during a period, is there any shorthands or useful library can serve this purpose? Thanks.

This might be a biased answer but pandas is really good for handling data like this. While you can accomplish this kind of operation using tuples, lists, etc.
pandas offers much more functionality. For example:
import pandas as pd
data = [[20180101,15], [20180102,10], [20180103,12],[20180104,10]]
df = pd.DataFrame(data=data, columns=['date', 'products'])
# if your data is in csv, excel, database... whatever... you can easily pull
# df = pd.read_csv('name') || pd.read_excel() || pd.read_sql()
df
Out[2]:
date products
0 20180101 15
1 20180102 10
2 20180103 12
3 20180104 10
# It helps to use datetime format to perform operations on the data
# Operations make reference to an "index" in the dataframe
df.index = pd.to_datetime(df['date'], format="%Y%m%d") #strftime format
df
Out[3]:
date products
date
2018-01-01 20180101 15
2018-01-02 20180102 10
2018-01-03 20180103 12
2018-01-04 20180104 10
# Now we can drop that date column...
df.drop(columns='date', inplace=True)
df
Out[4]:
products
date
2018-01-01 15
2018-01-02 10
2018-01-03 12
2018-01-04 10
# Yes, there are ways to do the above in shorthand... lots of info on pandas on SO
# I want you to see the individual steps we are taking to keep simple
# Now is when the fun begins
df.rolling(2).sum() # prints a rolling 2-day sum
Out[5]:
products
date
2018-01-01 NaN
2018-01-02 25.0
2018-01-03 22.0
2018-01-04 22.0
df.rolling(3).mean() # prints a rolling 3-day average
Out[6]:
products
date
2018-01-01 NaN
2018-01-02 NaN
2018-01-03 12.333333
2018-01-04 10.666667
df.resample('W').sum() # Resamples the data so you can look on a weekly basis
Out[7]:
products
date
2018-01-07 47
df.rolling(2).max() # max number of products over a rolling two-day period
Out[9]:
products
date
2018-01-01 NaN
2018-01-02 15.0
2018-01-03 12.0
2018-01-04 12.0

Pandas is the lib you want.
Let me show you with an example:
import numpy as np
import pandas as pd
# let's build a dummy dataset
index = pd.date_range(start="1/1/2015", end="31/12/2018")
df = pd.DataFrame(np.random.randint(100, size=len(index)),
columns=["sales"], index=index)
>>> df.head()
sales
2015-01-01 32
2015-01-02 0
2015-01-03 12
2015-01-04 77
2015-01-05 86
Now let's say you want to aggregate sales on a monthly basis:
>>> df["sales"].groupby(pd.Grouper(freq="1M")).sum()
2015-01-31 1441
2015-02-28 1164
2015-03-31 1624
2015-04-30 1629
2015-05-31 1427
[...]
Or a semester basis
df["sales"].groupby(pd.Grouper(freq="6M", closed="left", label="right")).sum()
2015-06-30 8921
2015-12-31 9365
2016-06-30 9820
2016-12-31 8881
2017-06-30 8773
2017-12-31 8709
2018-06-30 9481
2018-12-31 9522
2019-06-30 51
for some reason Grouper binning with six months freq has some issue with 31/12 sales and it puts them into a new bin in 2019, looking into it will let you know if I find anything... or if anyone else want to comment please do
Or you want to know which one was the best semester:
>>> df["sales"].groupby(pd.Grouper(freq="6M")).sum().idxmax()
Timestamp('2016-06-30 00:00:00', freq='6M')

you should use pandas
assuming that your date column is called 'date' and that it is a datetime dtypes:
import pandas as pd
df = pd.DataFrame(data)
df = df.set_index('date')
df.groupby(pd.Grouper(freq='1M')).max()
would give you each month max. freq could be changed to whatever frequency you like.

I tried the comment from #Patrick Artner:
a = (20180101,10)
b = (20180102,20)
c = (20180103,15)
d = (a,b,c)
maximum = max( d, key = lambda x:x[1])
minimum = min(d, key= lambda x:x[1])
print(minimum)
Maybe this gives some inspiration.

Please if this is the desired result.
data = [{'date':1, 'products_sold': 2}, {'date':2, 'products_sold': 5},{'date':5, 'products_sold': 2}]
start_date = 1
end_date = 2
max_value_in_period = max(x['products_sold'] for x in data if x['date'] >= start_date and x['date'] <= end_date)
print(max_value_in_period)

Python- splitting a csv file into different dataframes according to date or time interval

I have imported a csv file into python. It has readings at 5 min intervals over a period of a month. There are about 250 readings per 5 min timestamp. Below is a sample of one row per timestamp. Is there a way to split the csv into different dataframes grouped by date or even 5 min interval for plotting purposes? Like i mentioned, this dataset has 250 readings per 5 min interval for a month so I would like to do this without having to hard code each dataframe for each day or each interval in the set.
df.head()
tmc_code measurement_tstamp ... miles road_order
0 112-05650 2018-05-01 00:00:00 ... 0.427814 768.0
1 112-05650 2018-05-01 00:05:00 ... 0.427814 768.0
2 112-05650 2018-05-01 00:10:00 ... 0.427814 768.0
3 112-05650 2018-05-01 00:15:00 ... 0.427814 768.0
4 112-05650 2018-05-01 00:20:00 ... 0.427814 768.0

What it sounds like to me is that you want a new DataFrame for each date. If that is what you desire, the following code will take your dataframe, and make a list of dataframes, each which will only contain data for one date.
df.measurement_tstamp = df.measurement_tstamp.str[:10]
l = df.measurement_tstamp.unique()
data = [df.loc[df['measurement_tstamp']==i] for i in l]
Edit
If you want to do it by 5 min interval, it's even simpler!
data = [df.loc[df['measurement_tstamp']==i] for i in df.measurement_tstamp.unique()]
That should do it

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to loop through a pandas grouped time series? - python

Related

python masking each day in dataframe

monthly resampling pandas with specific start day

Python Pandas - Computing stats on TimeSeriesIndexedData for each customer

Is there easy way to get the peak and the lowest in python? [closed]

Python- splitting a csv file into different dataframes according to date or time interval

Categories

Resources