I am trying to calculate night-time averages of a dataframe except that what I need is a mix between daily average and hour range average.
More specifically, I have a dataframe storing day and night hours and I want to use it as a boolean key to calculate night-time averages of another dataframe.
I cannot use daily averages because each night spreads over two calendar days, and I cannot use by hour range either because hours change by season.
Thanks for your help!
Dariush.
Based on comments received here is what I am looking for - see spreadsheet below. I need to calculate the average of 'Value' during nighttime using the Nighttime flag, and then repeat the average value for all time stamps until the following night, at which time the average is updated and repeated until the next nighttime flag.
Related
I have a data frame with a date column and a sales volume column.
There are days that I need to set sales volumes to zero and then distribute those volumes equally to the volumes of the next five non-weekend days.
So if I have a volume of 100 on a Monday, the next five non-business days are their volumes + (100/5).
I've tried some work arounds using date_range, but haven't had any success. I'm very lost on how to do this since I'm not so good with datetime methods.
I am new to python and using it to analyse climate data in NetCDF. I am wanting to calculate the total precipitation for each season in each year and then average these seasonal totals across the time period (i.e. an average for DJF over all years in the file and an average for MAM etc.).
Here is what I thought to do:
fn1 = 'cru_fixed.nc'
ds1 = xr.open_dataset(fn1)
ds1_season = ds1['pre'].groupby('time.season').mean('time')
#Then plot each season
ds1_season.plot(col='season')
plt.show()
The original file contains monthly totals of precipitation. This is calculating an average for each season and I need the sum of Dec, Jan and Feb and the sum of Mar, Apr, May etc. for each season in each year. How do I sum and then average over the years?
If I'm not mistaking, you need to first resample you data to have the sum of each seasons on a DataArray, then to average theses sum on multiple years.
To resample:
sum_of_seasons = ds1['pre'].resample(time='Q').sum(dim="time")
resample is an operator to upsample or downsample time series, it uses time offsets of pandas.
However be careful to choose the right offset, it will define the month included in each season. Depending on your needs, you may want to use "Q", "QS" or an anchored offset like "QS-DEC".
To have the same splitting as "time.season", the offset is "QS-DEC" I believe.
Then to group over multiple years, same as you did above:
result = sum_of_seasons.groupby('time.season').mean('time')
A csv file with data on orders (for meals to be delivered) was provided, the documents comprises the folowing columns with information:
dateTime,restaurant,address,zippcodeFrom,zippcodeTo,dist,tm
With the format for dateTime like this: YYYY-MM-DD HH:MM:ss
I'd personally prefer to use MS Excel to apply the FFT (fast fourrier transform) to forecast based on time-series data. However, this is a python course, and the file is to large for MS Excel.
Getting the average number of orders per day of the week would be a start. But if I try the aggregation function, it sums all orders of all mondays alltogether.
How can i retrieve either the average number per weekday or the total number of mondays (and then divide the total number of orders on all mondays by the number of mondays? (Subsequently we have to do the same for the average total travel time (tm in the csv file) for delivery.
The challanage: multiple orders per day, result in multiple lines of data for each day.
(The next thing is to get some kind of forecast hourly...)
What would be the best way to solve this?
I have a DataFrame series with day resolution. I want to transform the series to a series of monthly averages. Ofcourse I can apply rolling mean and select only every 30th of means but it would not precise. I want to get series which contains mean value from the previous month on every first day of a month. For example, on February 1 I want to have daily average for the January. How can I do this in pythonic way?
data.resample('M', how='mean')
I have looked at the resample/Timegrouper functionality in Pandas. However, I'm trying to figure out how to use it for this specific case. I want to do a seasonal analysis across a financial asset - let's say S&P 500. I want to know how the asset performs between any two custom dates on average across many years.
Example: If I have a 10 year history of daily changes of S&P 500 and I pick the date range between March 13th and March 23rd, then I want to know the average change for each date in my range across the last 10 years - i.e. average change on 3/13 each year for the last 10 years, and then for 3/14, 3/15 and so on until 3/23. This means I need to groupby month and day and do an average of values across different years.
I can probably do this by creating 3 different columns for year, month, and day and then grouping by two of them, but I wonder if there are more elegant ways of doing this.
I figured it out. It turned out to be pretty simple and I was just being dumb.
x.groupby([x.index.month, x.index.day], as_index=True).mean()
where x is a pandas series in my case (but I suppose could also be a dataframe?). This will return a multi-index series which is ok in my case, but if it's not in your case then you can manipulate it to drop a level or turn the index into new columns