I have a dataframe with column date which looks like this:
Feb 24, 2020 # 12:47:31.616
I would like it to become this:
2020-02-24
I can achieve this using slicing since I am dealing only with one week's data hence all months will be Feb.
Is there a neat pandas way to change the datestamp to date format I desire?
Thank you for your suggestions.
Use to_datetime with format %b %d, %Y # %H:%M:%S.%f and then if necessary convert to dates by Series.dt.date or to datetimes by Series.dt.floor:
#dates
df = pd.DataFrame({'dates':['Feb 24, 2020 # 12:47:31.616','Feb 24, 2020 # 12:47:31.616']})
df['dates'] = pd.to_datetime(df['dates'], format='%b %d, %Y # %H:%M:%S.%f').dt.date
#datetimes
df['dates'] = pd.to_datetime(df['dates'], format='%b %d, %Y # %H:%M:%S.%f').dt.floor('d')
print (df)
dates
0 2020-02-24
1 2020-02-24
Using pd.to_datetime with Series.str.split:
df = pd.DataFrame({'date':['Feb 24, 2020 # 12:47:31.616']})
date
0 Feb 24, 2020 # 12:47:31.616
df['date'] = pd.to_datetime(df['date'].str.split('\s#\s').str[0], format='%b %d, %Y')
date
0 2020-02-24
Related
I have a dataset with a column "date" with values like "Jul 31, 2014", "Sep 23, 2018"...
I want to place months in a different column, convert them in integer using "df.to_datetime(df.MONTH, format='%b').dt.month" and then return back in order to sort it by the date index.
How can I choose only the first 3 letters from the cells?
You can try to_datetime with the date format %b %d, %Y:
df["date"] = pd.to_datetime(df["date"], format='%b %d, %Y')
df["month"] = df["date"].dt.month
Code:
print(df)
# date
# 0 Jul 31, 2014
# 1 Sep 23, 2018
df["date"] = pd.to_datetime(df["date"], format='%b %d, %Y')
df["month"] = df["date"].dt.month
print(df)
# date month
# 0 2014-07-31 7
# 1 2018-09-23 9
For more detail on how to get the date format, refer the doc
I am working with the following piece of data which has a different format of dates and which creates confusion later in the process. The data is like:
S. No DateTime Area
1 03/05/2019 6:33 A
2 06/03/2019 07:23:45 AM B
The first row is the format %m/%d/%Y h: mm and the second row is the format of %d/%m/%Y hh:mm: ss AM/PM. The first date value can be confusing though, is it 5th march or 3rd May. So to make everything of the same format, I want that my code detects the date format and changes in the desired format.
I have tried doing this:
df['Detection Date'] = pd.to_datetime(df['Detection Date & Time'], errors = 'coerce').dt.datetime
col = df['Detection Date'].apply(str)
for i in df.index:
if datetime.datetime.strptime(col, '%m/%d/%Y h:mm'):
ColDate = datetime.datetime.strftime(col, '%d/%m/%Y hh:mm:ss AM/PM')
But i am getting an error saying:
TypeError: strptime() argument 1 must be str, not Series
How it should be conducted.
Thanks
If it is ok to install a dependency then you can use dateparser link
import pandas as pd
import dateparser
df = pd.DataFrame({'Detection Date & Time': ['03/05/2019 6:33', '06/03/2019 07:23:45 AM']})
df['Date & time'] = df['Detection Date & Time'].apply(dateparser.parse)
You can specify both possible formats in to_datetime, so if format not match is returned missing values, so is possible use Series.fillna:
date1 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%m/%d/%Y %H:%M')
date2 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%d/%m/%Y %H:%M:%S %p')
df['DateTime'] = date1.fillna(date2)
print (df)
S. No DateTime Area
0 1 2019-03-05 06:33:00 A
1 2 2019-03-06 07:23:45 B
Last if want specify new format add Series.dt.strftime - advanatage of solution are verify both formats:
df['DateTime'] = date1.fillna(date2).dt.strftime('%d/%m/%Y %H:%M:%S %p')
print (df)
S. No DateTime Area
0 1 05/03/2019 06:33:00 AM A
1 2 06/03/2019 07:23:45 AM B
Details:
print (date1)
0 2019-03-05 06:33:00
1 NaT
Name: DateTime, dtype: datetime64[ns]
print (date2)
0 NaT
1 2019-03-06 07:23:45
Name: DateTime, dtype: datetime64[ns]
Another possible solution without verify another formats - only repalaced format %m/%d/%Y %H:%M to %d/%m/%Y %H:%M:%S %p:
date1 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%m/%d/%Y %H:%M').dt.strftime('%d/%m/%Y %H:%M:%S %p')
df['DateTime'] = date1.replace('NaT', df['DateTime'])
print (df)
S. No DateTime Area
0 1 05/03/2019 06:33:00 AM A
1 2 06/03/2019 07:23:45 AM B
I've converted from string to datetimes in columns numerous times. However in each of those instances, the string format was consistent. Now I have a dataframe with mixed formats to change. Example below, but this is throughout 100,000s of rows.
index date
0 30 Jan 2018
1 January 30 2018
I could convert each type on an individual basis, but is there a way to convert that df['date'] to datetime with mixed formats easily?
Here is a module which can do this for you dateparser
from dateparser import parse
print(parse('2018-04-18 22:33:40'))
print(parse('Wed 11 Jul 2018 23:00:00 GMT'))
Output:
datetime.datetime(2018, 4, 18, 22, 33, 40)
datetime.datetime(2018, 7, 11, 23, 0, tzinfo=<StaticTzInfo 'GMT'>)
Here is a way to do it using datetime.strptime
from datetime import datetime
def IsNumber(s):
try:
int(s)
return True
except ValueError:
return False
def ConvertToDatetime(date):
date=date.split(" ") #split by space
if(IsNumber(date[0])): #is of the form dd month year
if(len(date[1])==3): #if month is for form Jan,Feb...
datetime_object = datetime.strptime(" ".join(date), '%d %b %Y')
else: #if month is for form January ,February ...
datetime_object = datetime.strptime(" ".join(date), '%d %B %Y')
else: #is of the form month date year
if(len(date[0])==3): #if month is for form Jan,Feb...
datetime_object = datetime.strptime(" ".join(date), '%b %d %Y')
else: #if month is for form January ,February ...
datetime_object = datetime.strptime(" ".join(date), '%B %d %Y')
return datetime_object
You can add more cases based on the documentation and the format
An example for the two in your question are :
ConvertToDatetime("30 Jan 2018")
2018-01-30 00:00:00
ConvertToDatetime("January 30 2018")
2018-01-30 00:00:00
I have an ascii file where the dates are formatted as follows:
Jan 20 2015 00:00:00.000
Jan 20 2015 00:10:00.000
Jan 20 2015 00:20:00.000
Jan 20 2015 00:30:00.000
Jan 20 2015 00:40:00.000
When loading the file into pandas, each column above gets its own column in a pandas dataframe. I've tried the variations of the following:
from pandas import read_csv
from datetime import datetime
df = read_csv('file.txt', header=None, delim_whitespace=True,
parse_dates={'datetime': [0, 1, 2, 3]},
date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H %M %S'))
I get a couple errors:
TypeError: <lambda>() takes 1 positional argument but 4 were given
ValueError: time data 'Jun 29 2017 00:35:00.000' does not match format '%b %d %Y %H %M %S'
I'm confused because:
I'm passing a dict to parse_dates to parse the different columns as a single date.
I'm using: %b - abbreviated month name, %d - day of the month, %Y year with century, %H 24-hour, %M - minute, and %S - second
Anyone see what I'm doing incorrectly?
Edit:
I've tried date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H:%M:%S') which returns ValueError: unconverted data remains: .000
Edit 2:
I tried what #MaxU suggested in his update, but it was problematic because my original data is formatted like the following:
Jan 1 2017 00:00:00.000 123 456 789 111 222 333
I'm only interested in the first 7 columns so I import my file with the following:
df = read_csv(fn, header=None, delim_whitespace=True, usecols=[0, 1, 2, 3, 4, 5, 6])
Then to create a column with datetime information from the first 4 columns I try:
df['datetime'] = to_datetime(df.ix[:, :3], format='%b %d %Y %H:%M:%S.%f')
However this doesn't work because to_datetime expects "integer, float, string, datetime, list, tuple, 1-d array, Series" as the first argument and df.ix[:, :3] returns a dataframe with the following format:
0 1 2 3
0 Jan 1 2017 00:00:00.000
How do I feed in every row of the first four columns to to_datetime such that I get one column of datetimes?
Edit 3:
I think I solved my second problem.
I just use to following command and do everything when I read my file in (I was basically just missing %f to parse past seconds):
df = read_csv(fileName, header=None, delim_whitespace=True,
parse_dates={'datetime': [0, 1, 2, 3]},
date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H:%M:%S.%f'),
usecols=[0, 1, 2, 3, 4, 5, 6])
The whole reason I wanted to parse manually instead of letting pandas handle it like #MaxU suggested was to see if manually feeding in instructions would be faster - and it is! From my tests the snippet above runs approximately 5-6 times faster than letting pandas infer parsing for you.
Have a go to this simpler approach:
df = pandas.read_csv('file.txt')
df.columns = ['date']
df should be a dataframe with a single column. After that try casting that column to datetime
df['date'] = pd.to_datetime(df['date'])
Pandas (tested with version 0.20.1) is smart enough to do it for you:
In [4]: pd.read_csv(fn, sep='\s+', parse_dates={'datetime': [0, 1, 2, 3]})
Out[4]:
datetime
0 2015-01-20 00:10:00
1 2015-01-20 00:20:00
2 2015-01-20 00:30:00
3 2015-01-20 00:40:00
UPDATE: if all entries have the same format you can try to do it this way:
df = pd.read_csv(fn, sep='~', names=['datetime'])
df['datetime'] = pd.to_datetime(df['datetime'], format='%b %d %Y %H:%M:%S.%f')
I have a DataFrame column where value is of string type 'June 6, 2016, 6' and I want to convert it into DataTime as 'YYYY-MM-DD HH:MM' format.
When tried convert by just taking value , I could able to convert it into right format.
import datetime
stringDate = "June 6, 2016, 11"
dateObject = datetime.datetime.strptime(stringDate, "%B %d, %Y, %H")
print dateObject
**Output : 2016-06-06 11:00:00**
But when I tried different options to apply the same conversion on python dataframe columns I'm not getting time part in the conversion.
**Option1**
df['Date'] = df.Date.apply(lambda x: dt.datetime.strptime(x, "%B %d, %Y, %H").date())
**Option2**
df['Date'] = pd.to_datetime(df['Date'] = df.Date.apply(lambda x: dt.datetime.strptime(x, "%B %d, %Y, %H"))
Output: both cases got 2016-06-06
Any suggestions will be appreciated.
I think you need add parameter format to to_datetime:
print (pd.to_datetime('June 6, 2016, 11', format='%B %d, %Y, %H'))
2016-06-06 11:00:00
It works with DataFrame too:
df = pd.DataFrame({'Date':['June 6, 2016, 11', 'May 6, 2016, 11']})
print (df)
Date
0 June 6, 2016, 11
1 May 6, 2016, 11
print (pd.to_datetime(df['Date'], format='%B %d, %Y, %H'))
0 2016-06-06 11:00:00
1 2016-05-06 11:00:00
Name: Date, dtype: datetime64[ns]