How to apply avg function to DataFrame series monthly? - python

I have a DataFrame series with day resolution. I want to transform the series to a series of monthly averages. Ofcourse I can apply rolling mean and select only every 30th of means but it would not precise. I want to get series which contains mean value from the previous month on every first day of a month. For example, on February 1 I want to have daily average for the January. How can I do this in pythonic way?

data.resample('M', how='mean')

Related

Calculating mean total seasonal precipitation using python

I am new to python and using it to analyse climate data in NetCDF. I am wanting to calculate the total precipitation for each season in each year and then average these seasonal totals across the time period (i.e. an average for DJF over all years in the file and an average for MAM etc.).
Here is what I thought to do:
fn1 = 'cru_fixed.nc'
ds1 = xr.open_dataset(fn1)
ds1_season = ds1['pre'].groupby('time.season').mean('time')
#Then plot each season
ds1_season.plot(col='season')
plt.show()
The original file contains monthly totals of precipitation. This is calculating an average for each season and I need the sum of Dec, Jan and Feb and the sum of Mar, Apr, May etc. for each season in each year. How do I sum and then average over the years?
If I'm not mistaking, you need to first resample you data to have the sum of each seasons on a DataArray, then to average theses sum on multiple years.
To resample:
sum_of_seasons = ds1['pre'].resample(time='Q').sum(dim="time")
resample is an operator to upsample or downsample time series, it uses time offsets of pandas.
However be careful to choose the right offset, it will define the month included in each season. Depending on your needs, you may want to use "Q", "QS" or an anchored offset like "QS-DEC".
To have the same splitting as "time.season", the offset is "QS-DEC" I believe.
Then to group over multiple years, same as you did above:
result = sum_of_seasons.groupby('time.season').mean('time')

How to use resample count to screen original data

Say I have a dataset at daily scale, but not all days have valid data. In other words, some days are missing in the data. I want to compute the summer season mean from the dataset, and want to remove the month which has less than 20 days of valid data.
How do I achieve this (in pythonic fashion)?
Say my dataframe (df) is like this:
DATE VAR
1900-01-01 123
1900-01-02 456
1900-01-10 789
...
I know how to compute the count:
df_count = df.resample('MS').count()
I also know how to compute the summer season mean:
df_summer = df.resample('Q-NOV').mean()
You can based on df_count to filter out the month which have less than 20 days of valid data. After that compute the summer season mean using your formula.
df_count = df.resample('MS').count()
relevant_month = df_count[df_count > 10].index
df_summer = df[df.index.isin(relevant_month)].resample('Q-NOV').mean()
I suppose you store the month in index. If the month or time is stored in a different column, change df.index.isin(relevant_month) to df.columnName.isin(relevant_month).
I also don't know the format of your time column (date or datetime) so you might need to modify the code to change this part df.index.isin(relevant_month) accordingly. It is just the general idea.

Is there an inbuilt method in python(pandas) which can simulate a single day from multiple days

I have a time series data for solar radiation with 15 min time step values (from 1st June till 30th June) for a month. My aim is to simulate one single day from all the 30 days by taking an average of each time instants. For example, initially i have 30 different values at 11am , 11.15am, 11.45am and so on. I want to average those 30 values so that i have a single value at 11am, 11.15am, 11.45am respectively.
You can extract minutes to separate column an group by it:
data['Minutes15'] = data['Date'].apply(lambda x: int(x.minute/15) *15))
data.groupby('Minutes15').mean()
Where Date is your date column in datetime format

groupby nightime with varying hours

I am trying to calculate night-time averages of a dataframe except that what I need is a mix between daily average and hour range average.
More specifically, I have a dataframe storing day and night hours and I want to use it as a boolean key to calculate night-time averages of another dataframe.
I cannot use daily averages because each night spreads over two calendar days, and I cannot use by hour range either because hours change by season.
Thanks for your help!
Dariush.
Based on comments received here is what I am looking for - see spreadsheet below. I need to calculate the average of 'Value' during nighttime using the Nighttime flag, and then repeat the average value for all time stamps until the following night, at which time the average is updated and repeated until the next nighttime flag.

Calculating a cumulative deviation from mean monthly value in pandas series

How would I use pandas to calculate a cumulative deviation from a mean monthly rainfall value?
I am given daily rainfall data (e.g. s, below) which I can convert to a pd.Series and resample into monthly periods (sum; e.g. sm, below). But I then want to calculate the difference between each monthly value and the mean for the month. I have added a synthetic example:
rng = pd.period_range(20010101, 20131231, freq='D')
s = pd.Series(np.random.normal(2.5,2,size=len(rng)), index=rng)
sm = s.resample('M', how='sum')
For example, for January 2010 I would like to calculate the difference between the value for that month and the average monthly rainfall for January (over a long period). Then I want a cumulative sum of that difference.
I have tried to use the groupby function:
sm.groupby(lambda x: x.month).mean()
But not successfully. I want each monthly value in 'sm' to have the average for all similar months to be subtracted, then a cumulative sum of that series created. This could be in one step I guess.
How could I achieve this efficiently?
Thanks
This is closely related to an example in the docs. This is untested code, but you want something like this:
monthly_rainfall = daily_rainfall.resample('D', how=np.sum)
To group all Januarys over all the years together (and so on for each month):
grouped = monthly_rainfall.groupby(lambda x: x.month)
Then
deviation = grouped.transform(lambda x: x - x.mean())
deviation.cumsum()

Categories