Reformat a datetime column in pandas - python

I have a Pandas dataframe df containing datetimes and their respective values. Now I want to make some format changes to each datetime in the dataframe, but noticed that a normal for loop doesn't actually change anything in the dataframe.
This is what I tried, and also shows what I'm trying to do:
#original format of the datetimes: sunnuntai 1.1.2017 00:00
for i in df["Datetime"]:
#removes the string containing the weekday from the beginning
i = re.sub("^[^ ]* ","", i)
#converts 1.1.2017 00:00 into 2017-01-01 00:00
i = datetime.datetime.strptime(i, "%d.%m.%Y %H:%M").strftime("%Y-%m-%d %H:%M")
How should I go about doing these format changes permanently? Thank you.

Ditch the loop, aim to vectorize. I break down the steps —
use str.split to get rid of leading text,
pd.to_datetime with dayfirst=True for datetime conversion, and
dt.strftime to convert the result to your format
df['Datetime'] = pd.to_datetime(
df['Datetime'].str.split(n=1).str[1], dayfirst=True
).dt.strftime("%Y-%m-%d %H:%M")

Related

Weird Date formats part of column in YY-mm-dd and the rest of columns YY-dd-mm

This is a strange one but I have an original excel with 10/11/2018 and the above problem happens when i convert column to datetime using:
df.Date = pd.to_datetime(df['Date'])
So the date column is 2018-01-11, then the date/months are equal for example 2018-11-11, it swaps the format of previous row and the row is now
''2018-11-12''
''2018-11-13''
ive tried to write a for loop for each entry changing the series but get error cant change series, then i tried writing a loop but get the time error
for date_ in jda.Date:
jda.Date[date_] = jda.Date[date_].strftime('%Y-%m-%d')
KeyError: Timestamp('2019-05-17 00:00:00')
Beow is a pic of where the forat changes
Thank you for your help
Solution if dates are saved like strings:
I think problem is wrong parsed datetimes, because by default are 10/11/2018 parsed to 11.October 2018, so if need parse to 10. November 2018 format add dayfirst=True parameter in to_datetime:
df.Date = pd.to_datetime(df['Date'], dayfirst=True)
Or you can specify format e.g. %d/%m/%Y for DD/MM/YYYY:
df.Date = pd.to_datetime(df['Date'], format='%d/%m/%Y')

How do I format date using pandas?

My data 'df' shows data 'Date' as 1970-01-01 00:00:00.019990103 when this is formatted to date_to using pandas. How do I show the date as 01/03/1999?
consider LoneWanderer's comment for next time and show some of the code that you have tried.
I would try this:
from datetime import datetime
now = datetime.now()
print(now.strftime('%d/%m/%Y'))
You can print now to see that is in the same format that you have and after that is formatted to the format required.
I see that the actual date is in last 10 chars of your source string.
To convert such strings to a Timestamp (ignoring the starting part), run:
df.Date = df.Date.apply(lambda src: pd.to_datetime(src[-8:]))
It is worth to consider to keep this date just as Timestamp, as it
simplifies operations on date / time and apply your formatting only in printouts.
But if you want to have this date as a string in "your" format, in the
"original" column, perform the second conversion (Timestamp to string):
df.Date = df.Date.dt.strftime('%m/%d/%Y')

How do I extract a DateTimeIndex for use in a new column?

I've extracted the dates from filenames in a set of Excel files into a list of DateTimeIndex objects. I now need to write the extracted date from each to a new date column for the dataframes I've created from each Excel sheet. My code works in that it writes the the new 'Date' column to each dataframe, but I'm unable to convert the objects out of their generator object DateTimeIndex format and into a %Y-%m-%d format.
Link to code creating the list of DateTimeIndexes from the filenames:
How do I turn datefinder output into a list?
Code to write each list entry to a new 'Date' column in each dataframe created from the spreadsheets:
for i in range(0, len(df)):
df[i]['Date'] = (event_dates_dto[i] for frames in df)
The involved objects:
type(event_dates_dto)
<class 'list'>
type(event_dates_dto[0])
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
event_dates_dto
[DatetimeIndex(['2019-03-29'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-04-13'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-05-11'], dtype='datetime64[ns]', freq=None)]
The dates were extracted using datefinder: http://www.blog.pythonlibrary.org/2016/02/04/python-the-datefinder-package/
I've tried using methods here that seemed like they could make sense but none of them are the right ticket: Keep only date part when using pandas.to_datetime
Again, the simple for function is working correctly, but I'm unsure how to coerce the generator object into the correct format so that it not only writes to the new 'Date' column but also so that it is is in a useful '%Y-%m-%d' format that makes sense within the dataframe. Any help is greatly appreciated.
force evaluation with a one line loop like dates = [_ for _ in matches]
convert the index to a column using the .index (or .reset_index() if you don't need to keep it)
convert the column to datetime using pd.to_datetime()
. use the .dt.date object of the datetime column to convert to Y-m-d
Here's a sample
import datefinder
import pandas as pd
data = '''Your appointment is on July 14th, 2016 15:24. Your bill is due 05/05/2016 16:00'''
matches = datefinder.find_dates(data)
# force evaluation with 1 line loop
dates = [_ for _ in matches] # 'dates = list(matches)' also works
df = pd.DataFrame({'dt_index':dates,'value':['appointment','bill']}).set_index('dt_index')
df['date'] = df.index
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.date
df
which gives
value date
dt_index
2016-07-14 15:24:00 appointment 2016-07-14
2016-05-05 16:00:00 bill 2016-05-05
Edit: Edited to account for forced evaluation
A minor fix got it working, I was just trying to carry out too much at once and overthinking it.
#create empty list and append each date
event_dates_transfer = []
#use .strftime('%Y-%m-%d') method on event_dates_dto here if you wish to return a string instead of a datetimeindex
for i in range(0,len(event_dates_dto)):
event_dates_transfer.append(event_dates_dto[i][0])
#Create a 'Date' column for each dataframe correlating to the filename it was created from and set it as the index
for i in range(0, len(df)):
new_date = event_dates_transfer[i]
df[i]['Date'] = new_date
df[i].set_index('Date', inplace=True)

Convert date string YYYY-MM-DD to YYYYMM in pandas

Is there a way in pandas to convert my column date which has the following format '1997-01-31' to '199701', without including any information about the day?
I tried solution of the following form:
df['DATE'] = df['DATE'].apply(lambda x: datetime.strptime(x, '%Y%m'))
but I obtain this error : 'ValueError: time data '1997-01-31' does not match format '%Y%m''
Probably the reason is that I am not including the day in the format. Is there a way better to pass from YYYY-MM_DD format to YYYYMM in pandas?
One way is to convert the date to date time and then use strftime. Just a note that you do lose the datetime functionality of the date
df = pd.DataFrame({'date':['1997-01-31' ]})
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y%m')
date
0 199701
Might not need to go through the datetime conversion if the data are sufficiently clean (no incorrect strings like 'foo' or '001231'):
df = pd.DataFrame({'date':['1997-01-31', '1997-03-31', '1997-12-18']})
df['date'] = [''.join(x.split('-')[0:2]) for x in df.date]
# date
#0 199701
#1 199703
#2 199712
Or if you have null values:
df['date'] = df.date.str.replace('-', '').str[0:6]

Extract Date from excel and append it in a list using python

I have an column in excel which has dates in the format ''17-12-2015 19:35". How can I extract the first 2 digits as integers and append it to a list? In this case I need to extract 17 and append it to a list. Can it be done using pandas also?
Code thus far:
import pandas as pd
Location = r'F:\Analytics Materials\files\paymenttransactions.csv'
df = pd.read_csv(Location)
time = df['Creation Date'].tolist()
print (time)
You could extract the day of each timestamp like
from datetime import datetime
import pandas as pd
location = r'F:\Analytics Materials\files\paymenttransactions.csv'
df = pd.read_csv(location)
timestamps = df['Creation Date'].tolist()
dates = [datetime.strptime(timestamp, '%d-%m-%Y %H:%M') for timestamp in timestamps]
days = [date.strftime('%d') for date in dates]
print(days)
The '%d-%m-%Y %H:%M'and '%d' bits are format specififers, that describe how your timestamp is formatted. See e.g. here for a complete list of directives.
datetime.strptime parses a string into a datetimeobject using such a specifier. dateswill thus hold a list of datetime instances instead of strings.
datetime.strftime does the opposite: It turns a datetime object into string, again using a format specifier. %d simply instructs strftime to only output the day of a date.

Categories