Python Pandas: Resample date range - python

I create a list of pandas datetimes with the following line:
range = pd.date_range(start = '5/3/2005', periods =5+1, freq='1D')
Is there a quick way to resample that list so that it spans the same range but with a frequency of 30 min? (so far I can only see applications of that logic to Series or DataFrames index, but no daterange)

One way is:
date_range = pd.date_range(start = '5/3/2005', periods =5+1, freq='1D')
new_date_range = date_range.to_series().asfreq('30 min').index
Also, range is a builtin function, so I would not call the variable "range".
I hope this helps.

You could also do
date_range = pd.date_range(start = '5/3/2005', periods =5+1, freq='1D')
new_date_range = pd.date_range(date_range.min(), date_range.max(), freq='30 min')

Related

Python: Time Series with Pandas

I want to use time series with Pandas. I read multiple time series one by one, from a csv file which has the date in the column named "Date" as (YYYY-MM-DD):
Date,Business,Education,Holiday
2005-01-01,6665,8511,86397
2005-02-01,8910,12043,92453
2005-03-01,8834,12720,78846
2005-04-01,8127,11667,52644
2005-05-01,7762,11092,33789
2005-06-01,7652,10898,34245
2005-07-01,7403,12787,42020
2005-08-01,7968,13235,36190
2005-09-01,8345,12141,36038
2005-10-01,8553,12067,41089
2005-11-01,8880,11603,59415
2005-12-01,8331,9175,70736
df = pd.read_csv(csv_file, index_col = 'Date',header=0)
Series_list = df.keys()
The time series can have different frequencies: day, week, month, quarter, year and I want to index the time series according to a frequency I decide before I generate the Arima model. Could someone please explain how can I define the frequency of the series?
stepwise_fit = auto_arima(df[Series_name]....
pandas has a built in function pandas.infer_freq()
import pandas as pd
df = pd.DataFrame({'Date': ['2005-01-01', '2005-02-01', '2005-03-01', '2005-04-01'],
'Date1': ['2005-01-01', '2005-01-02', '2005-01-03', '2005-01-04'],
'Date2': ['2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01'],
'Date3': ['2006-01-01', '2006-02-06', '2006-03-11', '2006-04-01']})
df['Date'] = pd.to_datetime(df['Date'])
df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
df['Date3'] = pd.to_datetime(df['Date3'])
pd.infer_freq(df.Date)
#'MS'
pd.infer_freq(df.Date1)
#'D'
pd.infer_freq(df.Date2)
#'AS-JAN'
Alternatively you could also make use of the datetime functionality of the columns.
df.Date.dt.freq
#'MS'
Of course if your data doesn't actually have a real frequency, then you won't get anything.
pd.infer_freq(df.Date3)
#
The frequency descriptions are docmented under offset-aliases.

How to delete a date from pandas date_range

So I have a pandas date_range like so
dates = pd.date_range(start='2005-1-1', end='2014-12-31', freq='D')
I want to remove all the extra days resulting from leap years.
I do a for loop
for each in index:
if each.month==2 and each.day==29:
print(each) # I actually want to delete this item from dates
But my problem is that I don't know how to delete the item. The regular python list methods and functions doesn't work.
I've looked everywhere on SO. I've looked at the documentation for pandas.date_range but found nothing
Any help will be appreciated.
You probably want to use drop to remove the rows.
import pandas as pd
dates = pd.date_range(start='2005-1-1', end='2014-12-31', freq='D')
leap = []
for each in dates:
if each.month==2 and each.day ==29:
leap.append(each)
dates = dates.drop(leap)
You could try creating two Series objects to store the months and days separately and use them as masks.
dates = pd.date_range(start='2005-1-1', end='2014-12-31', freq='D') #All dates between range
days = dates.day #Store all the days
months = dates.month #Store all the months
dates = dates[(days != 29) & (months != 2)] #Filter dates using a mask
Just to check if the approach works, If you change the != condition to ==, we can see the dates you wish to eliminate.
UnwantedDates = dates[(days == 29) & (months == 2)]
Output:
DatetimeIndex(['2008-02-29', '2012-02-29'], dtype='datetime64[ns]', freq=None)
You can try:
dates = dates[~dates['Date'].str.contains('02-29')]
In place of Date you will have to put the name of the column where the dates are stored.
You don't have to use the for loop so it is faster to run.

Randomly select n dates from pandas date_range

Given a date, I'm using pandas date_range to generate additional 30 dates:
import pandas as pd
from datetime import timedelta
pd.date_range(startdate, startdate + timedelta(days=30))
Out of these 30 dates, how can I randomly select 10 dates in order starting from date in first position and ending with date in last position?
use np.random.choice to choose specified number of items from a given set of choices.
In order to guarantee that the first and last dates are preserved, I pull them out explicitly and choose 8 more dates at random.
I then pass them back to pd.to_datetime and sort_values to ensure they stay in order.
dates = pd.date_range('2011-04-01', periods=30, freq='D')
random_dates = pd.to_datetime(
np.concatenate([
np.random.choice(dates[1:-1], size=8, replace=False),
dates[[0, -1]]
])
).sort_values()
random_dates
DatetimeIndex(['2011-04-01', '2011-04-02', '2011-04-03', '2011-04-13',
'2011-04-14', '2011-04-21', '2011-04-22', '2011-04-26',
'2011-04-27', '2011-04-30'],
dtype='datetime64[ns]', freq=None)
You can use numpy.random.choice with replace=False if is not necessary explicitly get first and last value (if yes use another answer):
a = pd.date_range('2011-04-01', periods=30, freq='D')
print (pd.to_datetime(np.sort(np.random.choice(a, size=10, replace=False))))
DatetimeIndex(['2011-04-01', '2011-04-03', '2011-04-05', '2011-04-09',
'2011-04-12', '2011-04-17', '2011-04-22', '2011-04-24',
'2011-04-29', '2011-04-30'],
dtype='datetime64[ns]', freq=None)

Filling in missing date/times in my pd.date_range

I have a column of data which looks like the following:
I am trying to set a range of the entire month:
rng = pd.date_range('2016-09-01 00:00:00', '2016-09-30 23:59:58', freq='S')
But my column of data (above) is missing a few hours, and I am unsure where (since my data is 2 million rows large.
I tried to use the reindex command, but it instead seemed to have filled everyhthing with zeroes.
The code that I was using is as follows:
df = pd.DataFrame(df_csv)
rng = pd.date_range('2016-09-01 00:00:00', '2016-09-30 23:59:58', freq='S')
df = df.reindex(rng,fill_value=0.0)
How do I properly fill in the missing date/times without filling everything with a 0?
I think you need set_index from column date first, then is possible use reindex:
#cast column date if dtype is not datetime
df.date = pd.to_datetime(df.date)
df = df.set_index('date').reindex(rng,fill_value=0.0)
You get all NaN values, because reindexing int index by datetime values (After using fill_value=0.0 all NaN are replaced to 0.0).
Also if column date is sorted, you can use more general solution with selecting first and last value of column date:
start_date = df.date.iat[0]
end_date = df.date.iat[-1]
rng = pd.date_range(start_date, end_date, freq='S')

Pandas date_range starting from the end date to start date

In am trying to generate a range of semi-annual dates using Python. Pandas provides a function pd.date_range to help with this however I would like my date range to start from the end date and iterate backwards.
For instance given the input:
start = datetime.datetime(2016 ,2, 8)
end = datetime.datetime(2018 , 6, 1)
pd.date_range(start, end, freq='6m')
The result is:
DatetimeIndex(['2016-02-29', '2016-08-31', '2017-02-28', '2017-08-31',
'2018-02-28'])
How can I generate the following:
DatetimeIndex(['2016-02-08', '2016-06-01', '2016-12-01', '2017-06-01',
'2017-12-01', '2018-06-01'])
With the updated output (from the edit you made) you can do something like the following:
from pandas.tseries.offsets import DateOffset
end = datetime.datetime(2018 , 6, 1)
start = datetime.datetime(2016 ,2, 8)
#Get the range of months to cover
months = (end.year - start.year)*12 + end.month - start.month
#The frequency of periods
period = 6 # in months
pd.DatetimeIndex([end - DateOffset(months=e) for e in range(0, months, period)][::-1]).insert(0, start)
This is a fairly concise solution, though I didn't compare runtimes so I'm not sure how fast it is.
Basically this is just creating the dates you need as a list, and then converting it to a datetime index.
This can be done without pandas and using datutil instead. However it is more involved than it perhaps should:
from datetime import date
import math
from dateutil.relativedelta import relativedelta
#set up key dates
start = date(2016 ,2, 8)
end = date(2018 , 6, 1)
#calculate date range and number of 6 month periods
daterange = end-start
periods = daterange.days *2//365
#calculate next date in sequence and check for year roll-over
next_date = date(start.year,math.ceil(start.month/6)*6,1)
if next_date < start: next_date = date(next_date.year+1,next_date.month,1)
#add the first two values to a list
arr = [start.isoformat(),next_date.isoformat()]
#calculate all subsequent dates using 'relativedelta'
for i in range(periods):
next_date = next_date+ relativedelta(months=+6)
arr.append(next_date.isoformat())
#display results
print(arr)

Categories