Pandas: changing dataframe date index format - python

Would like to change the Date index of the dataframe from the default style into '%m/%d/%Y' format.
In: df
Out:
Date Close
2006-01-24 48.812471
2006-01-25 47.448712
2006-01-26 53.341202
2006-01-27 58.728122
2006-01-30 59.481986
2006-01-31 55.691974
df.index
Out:
DatetimeIndex(['2006-01-04', '2006-01-05', '2006-01-06', '2006-01-09',
'2006-01-10', '2006-01-11', '2006-01-12', '2006-01-13',
'2006-01-17', '2006-01-18',
...
'2018-02-21', '2018-02-22', '2018-02-23', '2018-02-26',
'2018-02-27', '2018-02-28', '2018-03-01', '2018-03-02',
'2018-03-05', '2018-03-06'],
dtype='datetime64[ns]', name=u'Date', length=3063, freq=None)
Into:
In: df1
Out:
Date Close
01/24/2006 48.812471
01/25/2006 47.448712
01/26/2006 53.341202
01/27/2006 58.728122
01/28/2006 59.481986
01/29/2006 55.691974
I tried this method before...
df1.index = pd.to_datetime(df.index, format = '%m/%d/%Y')
df1.index = df.dt.strftime('%Y-%m-%d')
AttributeError: 'DataFrame' object has no attribute 'dt'

Use DatetimeIndex.strftime - instead dt need index:
df1.index = pd.to_datetime(df1.index, format = '%m/%d/%Y').strftime('%Y-%m-%d')
What is same:
df1.index = pd.to_datetime(df1.index, format = '%m/%d/%Y')
df1.index = df1.index.strftime('%Y-%m-%d')
EDIT if need convert DatetimeIndex to another string format:
print (df1.index)
DatetimeIndex(['2006-01-24', '2006-01-25', '2006-01-26', '2006-01-27',
'2006-01-30', '2006-01-31'],
dtype='datetime64[ns]', name='Date', freq=None)
df1.index = df1.index.strftime('%m/%d/%Y')
print (df1)
Close
01/24/2006 48.812471
01/25/2006 47.448712
01/26/2006 53.341202
01/27/2006 58.728122
01/30/2006 59.481986
01/31/2006 55.691974

Related

How to split a date index into separate day , month ,year column in pandas

I have dataset df1:
df1
I did a column and index transpose previously:
df1 = df.T
The dataset df previously looked like this:
df
I have already use the .to_datetime function to convert my dates:
df1.index = pd.to_datetime(df1.index).strftime('%Y-%m')
How could I split my date index and add them to new 'year' and 'month' columns on the right of the table?
I tried:
df1['month'] = df.index.month
df1['year'] = df.index.year
However, it is returning me the following error:
AttributeError: 'Index' object has no attribute 'day'
This is actually a follow up to another question raised before here
I wasn't able to add comment over there as I am a new account holder to stack overflow.
Thank you everyone, I am a new learner so please bear with me.
Try this
df.index = pd.to_datetime(df.index)
df['day'] = df.index.day
df['month'] = df.index.month
df['year'] = df.index.year
If your dates are index then your code should have worked. However, if the dates are in date column then try:
df['day'] = df.date.dt.day
df['month'] = df.date.dt.month
df['year'] = df.date.dt.year
import calendar as cal
import locale
df.Dates = pd.to_datetime(df.Dates)
df['Year'] = df.Dates.dt.year
df['Month'] = df.Dates.dt.month_name()
df['Day'] = df.Dates.dt.day
Try this:
df['time'] = pd.to_datetime(df['time'])
df['Which Day'] = df['time'].dt.day_name()
df['Year'] = df['time'].dt.year
df['Month'] = df['time'].dt.month_name())

How to split csv into 2 dataframe with the condition

My idea is seperate both of the "String" then convert both dataframe into same datetime format. I try the code
data['date'] = pd.to_datetime(data['date'])
data['date'] = data['date'].dt.strftime('%Y-%m-%d')
but there are some error on the output. The 13/02/2020 will become 2020-02-13 that is what i want. But the 12/02/2020 will become 2020-12-02.
My dataframe have 2 type of date format. Which is YYYY-MM-DD and DD/MM/YYYY.
dataframe
I need to split it into 2 dataframe, all the row that have the date YYYY-MM-DD into the df1.
The data type is object.
All all the row that have the date DD/MM/YYYY into the df2.
Anyone know how to code it?
If dont need convert to datetimes use Series.str.contains with boolean indexing:
mask = df['date'].str.contains('-')
df1 = df[mask].copy()
df2 = df[~mask].copy()
If need datetimes you can use parameter errors='coerce' in to_datetime for missing values if not matching format, so last remove missing values:
df1 = (df.assign(date = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
.dropna(subset=['date']))
df2 = (df.assign(date = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
.dropna(subset=['date']))
EDIT: If need output column filled by correct datetimes you can replace missing values by another Series by Series.fillna:
date1 = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
df['date'] = date1.fillna(date2)
you can use the fact that the separation is different to find the dates.
If your dataframe is in this format:
df = pd.DataFrame({'id' : [1,1,2,2,3,3],
"Date": ["30/8/2020","30/8/2021","30/8/2022","2019-10-24","2019-10-25","2020-10-24"] })
With either "-" or "/" to separate the data
you can use a function that finds this element and apply it to the date column:
def find(string):
if string.find('/')==2:
return True
else:
return False
df[df['date'].apply(find)]

Convert '9999-12-31 00:00:00' to 'dd/mm/yyyy' in Pandas

I have a dataframe containing the column 'Date' with value as '9999-12-31 00:00:00'. I need to convert it to 'dd/mm/yyyy'.
import pandas as pd
data = (['9999-12-31 00:00:00'])
df = pd.DataFrame(data, columns=['Date'])
Use daily periods by custom function with remove times by split and change format by strftime:
df['Date'] = (df['Date'].str.split()
.str[0]
.apply(lambda x: pd.Period(x, freq='D'))
.dt.strftime('%d/%m/%Y'))
print (df)
Date
0 31/12/9999

i have a Dataframe with a data column ranging from 2005-01-01 to 2014-12-31. How do i sort the columns?

input:
data["Date"] = ["2005-01-01", "2005-01-02" , ""2005-01-03" ,..., "2014-12-30","2014-12-31"]
how can i sort the column such that the column gives 1st date of every year, 2nd date of every and so on:
i.e.
output:
data["Date"] = ["2005-01-01","2006-01-01","2007-01-01", ... "2013-12-31","2014-12-31"]
NOTE: assuming the date column has no leap days
First:
data['D'] = data['Date'].apply(lambda x : datetime.datetime.strptime(x, '%Y-%m-%d'))
data['Day'] = data['D'].apply(lambda x: x.day)
data['Month'] = data['D'].apply(lambda x: x.month)
data['Year'] = data['D'].apply(lambda x: x.year)
data.drop(columns='D', inplace=True)
Then, having 4 columns dataframe, we sort as following:
data.sort_values(by=['Day','Month','Year'], inplace=True)
Finally, you can drop new columns if you won't need them:
data.drop(columns = ['Day','Month','Year'], inplace=True)
Try using lambda expressions.
from datetime import datetime
data = {"Date": ["2005-01-02", "2005-01-01", "2014-12-30", "2014-12-31"]}
data["Date"].sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
>>> import datetime
>>> dates = [datetime.datetime.strptime(ts, "%Y-%m-%d") for ts in data["Date"]]
>>> dates.sort()
>>> sorteddates = [datetime.datetime.strftime(ts, "%Y-%m-%d") for ts in dates]
>>> sorteddates
['2010-01-12', '2010-01-14', '2010-02-07', '2010-02-11', '2010-11-16', '2010-11-
22', '2010-11-23', '2010-11-26', '2010-12-02', '2010-12-13', '2011-02-04', '2011
-06-02', '2011-08-05', '2011-11-30']
Why dont't you try and create a new column in which you change the format of the date?
Like this :
def change_format(row):
date_parts = row.split('-')
new_date = date_parts(2)+"-"+date_parts(1)+"-"+date_parts(0)
return new_date
data["Date_new_format"] = data["Date"].apply(lambda row => change_format(row))
Now you can sort your dataframe according to the column Date_new_format and you will get what you need.
Use:
data["temp"] = pd.to_datetime(data["Date"]).dt.strftime("%d-%Y-%m")
data = data.sort_values(by="temp").drop(columns=["temp"])

Change Pandas index from integer to datetime format

I have a huge size DataFrame that contains index in integer form for date time representation, for example, 20171001. What I'm going to do is to change the form, for example, 20171001, to the datetime format, '2017-10-01'.
For simplicity, I generate such a dataframe.
>>> df = pd.DataFrame(np.random.randn(3,2), columns=list('ab'), index=
[20171001,20171002,20171003])
>>> df
a b
20171001 2.205108 0.926963
20171002 1.104884 -0.445450
20171003 0.621504 -0.584352
>>> df.index
Int64Index([20171001, 20171002, 20171003], dtype='int64')
If we apply 'to_datetime' to df.index, we have the weird result:
>>> pd.to_datetime(df.index)
DatetimeIndex(['1970-01-01 00:00:00.020171001',
'1970-01-01 00:00:00.020171002',
'1970-01-01 00:00:00.020171003'],
dtype='datetime64[ns]', freq=None)
What I want is DatetimeIndex(['2017-10-01', '2017-10-02', '2017-10--3'], ...)
How can I manage this problem? Note that the file is given.
Use format %Y%m%d in pd.to_datetime i.e
pd.to_datetime(df.index, format='%Y%m%d')
DatetimeIndex(['2017-10-01', '2017-10-02', '2017-10-03'], dtype='datetime64[ns]', freq=None)
To assign df.index = pd.to_datetime(df.index, format='%Y%m%d')
pd.to_datetime is the panda way of doing it. But here are two alternatives:
import datetime
df.index = (datetime.datetime.strptime(str(i),"%Y%m%d") for i in df.index)
or
import datetime
df.index = df.index.map(lambda x: datetime.datetime.strptime(str(x),"%Y%m%d"))

Categories