How would I use pandas to calculate a cumulative deviation from a mean monthly rainfall value?
I am given daily rainfall data (e.g. s, below) which I can convert to a pd.Series and resample into monthly periods (sum; e.g. sm, below). But I then want to calculate the difference between each monthly value and the mean for the month. I have added a synthetic example:
rng = pd.period_range(20010101, 20131231, freq='D')
s = pd.Series(np.random.normal(2.5,2,size=len(rng)), index=rng)
sm = s.resample('M', how='sum')
For example, for January 2010 I would like to calculate the difference between the value for that month and the average monthly rainfall for January (over a long period). Then I want a cumulative sum of that difference.
I have tried to use the groupby function:
sm.groupby(lambda x: x.month).mean()
But not successfully. I want each monthly value in 'sm' to have the average for all similar months to be subtracted, then a cumulative sum of that series created. This could be in one step I guess.
How could I achieve this efficiently?
Thanks
This is closely related to an example in the docs. This is untested code, but you want something like this:
monthly_rainfall = daily_rainfall.resample('D', how=np.sum)
To group all Januarys over all the years together (and so on for each month):
grouped = monthly_rainfall.groupby(lambda x: x.month)
Then
deviation = grouped.transform(lambda x: x - x.mean())
deviation.cumsum()
Related
I'm working with a dataframe that has daily information (measured data) across 30 years for different variables. I am trying to groupby days of the year, and then find a mean across 30 years. How do I go about this? This is what the dataframe looks like
I tried to groupby day after checking for type of YYYYMMDD (it's an int64 type.) now I have the dataframe looking like this. It has just added new columns for Day, Month year
[]
I'm a bit stuck on how to calculate means from here, i would need to somehow group all Jan-1sts, jan-2nds etc over 30 years and average it after.
You can groupby with month and day:
df.index = pd.to_datetime(df.index)
( df.groupby([df.index.month, df.index.day]).mean().reset_index().
rename({'level_0':'month', 'level_1':'day'}, axis=1))
or if you want to group them by the day of year, i.e. 1, 2, .. 365, set as_index=False:
df.groupby([df.index.month, df.index.day], as_index=False).mean()
Assuming that I have a series made of daily values:
dates = pd.date_range('1/1/2004', periods=365, freq="D")
ts = pd.Series(np.random.randint(0,101, 365), index=dates)
I need to use .groupby or .reduce with a fixed schema of dates.
Use of the ts.resample('8d') isn't an option as dates need to not fluctuate within the month and the last chunk of the month needs to be flexible to address the different lengths of the months and moreover in case of a leap year.
A list of dates can be obtained through:
g = dates[dates.day.isin([1,8,16,24])]
How I can group or reduce my data to the specific schema so I can compute the sum, max, min in a more elegant and efficient way than:
for i in range(0,len(g)-1):
ts.loc[(dec[i] < ts.index) & (ts.index < dec[i+1])]
Well from calendar point of view, you can group them to calendar weeks, day of week, months and so on.
If that is something that you would be intrested in, you could do that easily with datetime and pandas for example:
import datetime
df['week'] = df['date'].dt.week #create week column
df.groupby(['week'])['values'].sum() #sum values by weeks
I am new to python and using it to analyse climate data in NetCDF. I am wanting to calculate the total precipitation for each season in each year and then average these seasonal totals across the time period (i.e. an average for DJF over all years in the file and an average for MAM etc.).
Here is what I thought to do:
fn1 = 'cru_fixed.nc'
ds1 = xr.open_dataset(fn1)
ds1_season = ds1['pre'].groupby('time.season').mean('time')
#Then plot each season
ds1_season.plot(col='season')
plt.show()
The original file contains monthly totals of precipitation. This is calculating an average for each season and I need the sum of Dec, Jan and Feb and the sum of Mar, Apr, May etc. for each season in each year. How do I sum and then average over the years?
If I'm not mistaking, you need to first resample you data to have the sum of each seasons on a DataArray, then to average theses sum on multiple years.
To resample:
sum_of_seasons = ds1['pre'].resample(time='Q').sum(dim="time")
resample is an operator to upsample or downsample time series, it uses time offsets of pandas.
However be careful to choose the right offset, it will define the month included in each season. Depending on your needs, you may want to use "Q", "QS" or an anchored offset like "QS-DEC".
To have the same splitting as "time.season", the offset is "QS-DEC" I believe.
Then to group over multiple years, same as you did above:
result = sum_of_seasons.groupby('time.season').mean('time')
I have a list of account numbers and transaction dates. I would like to calculate the variance in transaction dates per account number. So if there are 10 transactions with dates on one account I would like to know the interval variance. For the amounts in the list I calculated several statistics via groupby:
df.groupby('AcctNr').agg({'Amount': [np.count_nonzero, np.sum, np.min, np.max, np.std, np.mean], 'Date': [np.min, np.max]})
I succeeded in the min and max date per account number but I can't calculate the variance in intervals.
I think you are looking for numpy.var.
I have a DataFrame series with day resolution. I want to transform the series to a series of monthly averages. Ofcourse I can apply rolling mean and select only every 30th of means but it would not precise. I want to get series which contains mean value from the previous month on every first day of a month. For example, on February 1 I want to have daily average for the January. How can I do this in pythonic way?
data.resample('M', how='mean')