Creating a dataframe with a range of datetimes - python

I'm creating a dataframe that has a range of dates in datetime. This works but I know there must be a more elegant way to do this. Any thoughts?
date_range = pd.DataFrame(pd.date_range(date(2019,8,30), date.today(), freq='D'))
date_range.rename(columns = {0:'date'}, inplace=True)
date_range = pd.DataFrame(set(date_range['date'].dt.date))
date_range.rename(columns = {0:'date'}, inplace=True)

To avoid the rename parts you can name them directly
from datetime import date
import pandas as pd
date_range = pd.DataFrame({'date': pd.date_range(date(2019,8,30), date.today(), freq='D')})
date_range = pd.DataFrame({'date':set(date_range['date'].dt.date)})

Related

Pandas mistake while sorting values

Im trying to sort my dataframe based on 'date' and 'hour' columns. Its sorting 01/11/2020 before dates like 24/10/2020.
df = pd.read_csv("some_folder")
df = df.sort_values(by = ['date','hour']).reset_index(drop=True)
In the picture you can see the sorting error.
Try to convert the column date to datetime before sorting (pd.to_datetime):
df = pd.read_csv("some_folder")
df['date'] = pd.to_datetime(df['date'], dayfirst=True) # <-- convert the column to `datetime`
df = df.sort_values(by = ['date','hour']).reset_index(drop=True)

Convert '9999-12-31 00:00:00' to 'dd/mm/yyyy' in Pandas

I have a dataframe containing the column 'Date' with value as '9999-12-31 00:00:00'. I need to convert it to 'dd/mm/yyyy'.
import pandas as pd
data = (['9999-12-31 00:00:00'])
df = pd.DataFrame(data, columns=['Date'])
Use daily periods by custom function with remove times by split and change format by strftime:
df['Date'] = (df['Date'].str.split()
.str[0]
.apply(lambda x: pd.Period(x, freq='D'))
.dt.strftime('%d/%m/%Y'))
print (df)
Date
0 31/12/9999

Set new column from datetime on dataframe pandas

I am trying to set a new column(Day of year & Hour)
My date time consist of date and hour, i tried to split it up by using
data['dayofyear'] = data['Date'].dt.dayofyear
and
df['Various', 'Day'] = df.index.dayofyear
df['Various', 'Hour'] = df.index.hour
but it is always returning error, im not sure how i can split this up and get it to a new column.
I think problem is there is no DatetimeIndex, so use to_datetime first and then assign to new columns names:
df.index = pd.to_datetime(df.index)
df['Day'] = df.index.dayofyear
df['Hour'] = df.index.hour
Or use DataFrame.assign:
df.index = pd.to_datetime(df.index)
df = df.assign(Day = df.index.dayofyear, Hour = df.index.hour)

How do I convert a 8760x2 DataFrame to a 365x25 DataFrame with pandas?

I am trying to convert a 8760x2 pandas DataFrame which has the following data:
[Datetime]...[Value]
01-01-2019 00:00...1
01-01-2019 01:00...1
etc.
into a 365x25 DataFrame:
[Date]...[hour 0]...[hour 1]...until [hour 23]
01-01-2019...1...1...etc.
etc.
I already made this:
Date= pd.DataFrame(df.drop_duplicates(subset='Date'))
Date= pd.DataFrame(Date.Date)
Newdf = pd.DataFrame()
for i in arange(0, 24):
if i == 0:
Newdf.insert(0, "Date", Date['Date'])
Newdf.insert(int(i), "hour "+str(i), NaN*len(Newdf))
I get NaNs instead of Numbers. The date should also be checked, because sometimes I have the problem of a leap-year and time change (summertime, wintertime)...
What is the best method to do this?
Thanks in advance!
IIUC, you can do:
df['Datetime'] = pd.to_datetime(df['Datetime'])
df['Date'] = df['Datetime'].dt.date
df['Hour'] = df['Datetime'].dt.hour
df.pivot(index='Date', columns='Hour', values='Value')

Python: Time Series with Pandas

I want to use time series with Pandas. I read multiple time series one by one, from a csv file which has the date in the column named "Date" as (YYYY-MM-DD):
Date,Business,Education,Holiday
2005-01-01,6665,8511,86397
2005-02-01,8910,12043,92453
2005-03-01,8834,12720,78846
2005-04-01,8127,11667,52644
2005-05-01,7762,11092,33789
2005-06-01,7652,10898,34245
2005-07-01,7403,12787,42020
2005-08-01,7968,13235,36190
2005-09-01,8345,12141,36038
2005-10-01,8553,12067,41089
2005-11-01,8880,11603,59415
2005-12-01,8331,9175,70736
df = pd.read_csv(csv_file, index_col = 'Date',header=0)
Series_list = df.keys()
The time series can have different frequencies: day, week, month, quarter, year and I want to index the time series according to a frequency I decide before I generate the Arima model. Could someone please explain how can I define the frequency of the series?
stepwise_fit = auto_arima(df[Series_name]....
pandas has a built in function pandas.infer_freq()
import pandas as pd
df = pd.DataFrame({'Date': ['2005-01-01', '2005-02-01', '2005-03-01', '2005-04-01'],
'Date1': ['2005-01-01', '2005-01-02', '2005-01-03', '2005-01-04'],
'Date2': ['2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01'],
'Date3': ['2006-01-01', '2006-02-06', '2006-03-11', '2006-04-01']})
df['Date'] = pd.to_datetime(df['Date'])
df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
df['Date3'] = pd.to_datetime(df['Date3'])
pd.infer_freq(df.Date)
#'MS'
pd.infer_freq(df.Date1)
#'D'
pd.infer_freq(df.Date2)
#'AS-JAN'
Alternatively you could also make use of the datetime functionality of the columns.
df.Date.dt.freq
#'MS'
Of course if your data doesn't actually have a real frequency, then you won't get anything.
pd.infer_freq(df.Date3)
#
The frequency descriptions are docmented under offset-aliases.

Categories