Changing date string to pandas datestamp - python

I have a dataframe with column date which looks like this:
Feb 24, 2020 # 12:47:31.616
I would like it to become this:
2020-02-24
I can achieve this using slicing since I am dealing only with one week's data hence all months will be Feb.
Is there a neat pandas way to change the datestamp to date format I desire?
Thank you for your suggestions.

Use to_datetime with format %b %d, %Y # %H:%M:%S.%f and then if necessary convert to dates by Series.dt.date or to datetimes by Series.dt.floor:
#dates
df = pd.DataFrame({'dates':['Feb 24, 2020 # 12:47:31.616','Feb 24, 2020 # 12:47:31.616']})
df['dates'] = pd.to_datetime(df['dates'], format='%b %d, %Y # %H:%M:%S.%f').dt.date
#datetimes
df['dates'] = pd.to_datetime(df['dates'], format='%b %d, %Y # %H:%M:%S.%f').dt.floor('d')
print (df)
dates
0 2020-02-24
1 2020-02-24

Using pd.to_datetime with Series.str.split:
df = pd.DataFrame({'date':['Feb 24, 2020 # 12:47:31.616']})
date
0 Feb 24, 2020 # 12:47:31.616
df['date'] = pd.to_datetime(df['date'].str.split('\s#\s').str[0], format='%b %d, %Y')
date
0 2020-02-24

Related

Creating a column with certain position of value from a cell in pandas

I have a dataset with a column "date" with values like "Jul 31, 2014", "Sep 23, 2018"...
I want to place months in a different column, convert them in integer using "df.to_datetime(df.MONTH, format='%b').dt.month" and then return back in order to sort it by the date index.
How can I choose only the first 3 letters from the cells?
You can try to_datetime with the date format %b %d, %Y:
df["date"] = pd.to_datetime(df["date"], format='%b %d, %Y')
df["month"] = df["date"].dt.month
Code:
print(df)
# date
# 0 Jul 31, 2014
# 1 Sep 23, 2018
df["date"] = pd.to_datetime(df["date"], format='%b %d, %Y')
df["month"] = df["date"].dt.month
print(df)
# date month
# 0 2014-07-31 7
# 1 2018-09-23 9
For more detail on how to get the date format, refer the doc

Identifying a dateformat and change it into another

I am working with the following piece of data which has a different format of dates and which creates confusion later in the process. The data is like:
S. No DateTime Area
1 03/05/2019 6:33 A
2 06/03/2019 07:23:45 AM B
The first row is the format %m/%d/%Y h: mm and the second row is the format of %d/%m/%Y hh:mm: ss AM/PM. The first date value can be confusing though, is it 5th march or 3rd May. So to make everything of the same format, I want that my code detects the date format and changes in the desired format.
I have tried doing this:
df['Detection Date'] = pd.to_datetime(df['Detection Date & Time'], errors = 'coerce').dt.datetime
col = df['Detection Date'].apply(str)
for i in df.index:
if datetime.datetime.strptime(col, '%m/%d/%Y h:mm'):
ColDate = datetime.datetime.strftime(col, '%d/%m/%Y hh:mm:ss AM/PM')
But i am getting an error saying:
TypeError: strptime() argument 1 must be str, not Series
How it should be conducted.
Thanks
If it is ok to install a dependency then you can use dateparser link
import pandas as pd
import dateparser
df = pd.DataFrame({'Detection Date & Time': ['03/05/2019 6:33', '06/03/2019 07:23:45 AM']})
df['Date & time'] = df['Detection Date & Time'].apply(dateparser.parse)
You can specify both possible formats in to_datetime, so if format not match is returned missing values, so is possible use Series.fillna:
date1 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%m/%d/%Y %H:%M')
date2 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%d/%m/%Y %H:%M:%S %p')
df['DateTime'] = date1.fillna(date2)
print (df)
S. No DateTime Area
0 1 2019-03-05 06:33:00 A
1 2 2019-03-06 07:23:45 B
Last if want specify new format add Series.dt.strftime - advanatage of solution are verify both formats:
df['DateTime'] = date1.fillna(date2).dt.strftime('%d/%m/%Y %H:%M:%S %p')
print (df)
S. No DateTime Area
0 1 05/03/2019 06:33:00 AM A
1 2 06/03/2019 07:23:45 AM B
Details:
print (date1)
0 2019-03-05 06:33:00
1 NaT
Name: DateTime, dtype: datetime64[ns]
print (date2)
0 NaT
1 2019-03-06 07:23:45
Name: DateTime, dtype: datetime64[ns]
Another possible solution without verify another formats - only repalaced format %m/%d/%Y %H:%M to %d/%m/%Y %H:%M:%S %p:
date1 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%m/%d/%Y %H:%M').dt.strftime('%d/%m/%Y %H:%M:%S %p')
df['DateTime'] = date1.replace('NaT', df['DateTime'])
print (df)
S. No DateTime Area
0 1 05/03/2019 06:33:00 AM A
1 2 06/03/2019 07:23:45 AM B

python: convert column from string to datetime with mixed formats

I've converted from string to datetimes in columns numerous times. However in each of those instances, the string format was consistent. Now I have a dataframe with mixed formats to change. Example below, but this is throughout 100,000s of rows.
index date
0 30 Jan 2018
1 January 30 2018
I could convert each type on an individual basis, but is there a way to convert that df['date'] to datetime with mixed formats easily?
Here is a module which can do this for you dateparser
from dateparser import parse
print(parse('2018-04-18 22:33:40'))
print(parse('Wed 11 Jul 2018 23:00:00 GMT'))
Output:
datetime.datetime(2018, 4, 18, 22, 33, 40)
datetime.datetime(2018, 7, 11, 23, 0, tzinfo=<StaticTzInfo 'GMT'>)
Here is a way to do it using datetime.strptime
from datetime import datetime
def IsNumber(s):
try:
int(s)
return True
except ValueError:
return False
def ConvertToDatetime(date):
date=date.split(" ") #split by space
if(IsNumber(date[0])): #is of the form dd month year
if(len(date[1])==3): #if month is for form Jan,Feb...
datetime_object = datetime.strptime(" ".join(date), '%d %b %Y')
else: #if month is for form January ,February ...
datetime_object = datetime.strptime(" ".join(date), '%d %B %Y')
else: #is of the form month date year
if(len(date[0])==3): #if month is for form Jan,Feb...
datetime_object = datetime.strptime(" ".join(date), '%b %d %Y')
else: #if month is for form January ,February ...
datetime_object = datetime.strptime(" ".join(date), '%B %d %Y')
return datetime_object
You can add more cases based on the documentation and the format
An example for the two in your question are :
ConvertToDatetime("30 Jan 2018")
2018-01-30 00:00:00
ConvertToDatetime("January 30 2018")
2018-01-30 00:00:00

Pandas: Parsing dates in different columns with read_csv

I have an ascii file where the dates are formatted as follows:
Jan 20 2015 00:00:00.000
Jan 20 2015 00:10:00.000
Jan 20 2015 00:20:00.000
Jan 20 2015 00:30:00.000
Jan 20 2015 00:40:00.000
When loading the file into pandas, each column above gets its own column in a pandas dataframe. I've tried the variations of the following:
from pandas import read_csv
from datetime import datetime
df = read_csv('file.txt', header=None, delim_whitespace=True,
parse_dates={'datetime': [0, 1, 2, 3]},
date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H %M %S'))
I get a couple errors:
TypeError: <lambda>() takes 1 positional argument but 4 were given
ValueError: time data 'Jun 29 2017 00:35:00.000' does not match format '%b %d %Y %H %M %S'
I'm confused because:
I'm passing a dict to parse_dates to parse the different columns as a single date.
I'm using: %b - abbreviated month name, %d - day of the month, %Y year with century, %H 24-hour, %M - minute, and %S - second
Anyone see what I'm doing incorrectly?
Edit:
I've tried date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H:%M:%S') which returns ValueError: unconverted data remains: .000
Edit 2:
I tried what #MaxU suggested in his update, but it was problematic because my original data is formatted like the following:
Jan 1 2017 00:00:00.000 123 456 789 111 222 333
I'm only interested in the first 7 columns so I import my file with the following:
df = read_csv(fn, header=None, delim_whitespace=True, usecols=[0, 1, 2, 3, 4, 5, 6])
Then to create a column with datetime information from the first 4 columns I try:
df['datetime'] = to_datetime(df.ix[:, :3], format='%b %d %Y %H:%M:%S.%f')
However this doesn't work because to_datetime expects "integer, float, string, datetime, list, tuple, 1-d array, Series" as the first argument and df.ix[:, :3] returns a dataframe with the following format:
0 1 2 3
0 Jan 1 2017 00:00:00.000
How do I feed in every row of the first four columns to to_datetime such that I get one column of datetimes?
Edit 3:
I think I solved my second problem.
I just use to following command and do everything when I read my file in (I was basically just missing %f to parse past seconds):
df = read_csv(fileName, header=None, delim_whitespace=True,
parse_dates={'datetime': [0, 1, 2, 3]},
date_parser=lambda x: datetime.strptime(x, '%b %d %Y %H:%M:%S.%f'),
usecols=[0, 1, 2, 3, 4, 5, 6])
The whole reason I wanted to parse manually instead of letting pandas handle it like #MaxU suggested was to see if manually feeding in instructions would be faster - and it is! From my tests the snippet above runs approximately 5-6 times faster than letting pandas infer parsing for you.
Have a go to this simpler approach:
df = pandas.read_csv('file.txt')
df.columns = ['date']
df should be a dataframe with a single column. After that try casting that column to datetime
df['date'] = pd.to_datetime(df['date'])
Pandas (tested with version 0.20.1) is smart enough to do it for you:
In [4]: pd.read_csv(fn, sep='\s+', parse_dates={'datetime': [0, 1, 2, 3]})
Out[4]:
datetime
0 2015-01-20 00:10:00
1 2015-01-20 00:20:00
2 2015-01-20 00:30:00
3 2015-01-20 00:40:00
UPDATE: if all entries have the same format you can try to do it this way:
df = pd.read_csv(fn, sep='~', names=['datetime'])
df['datetime'] = pd.to_datetime(df['datetime'], format='%b %d %Y %H:%M:%S.%f')

Convert string column to DateTime format

I have a DataFrame column where value is of string type 'June 6, 2016, 6' and I want to convert it into DataTime as 'YYYY-MM-DD HH:MM' format.
When tried convert by just taking value , I could able to convert it into right format.
import datetime
stringDate = "June 6, 2016, 11"
dateObject = datetime.datetime.strptime(stringDate, "%B %d, %Y, %H")
print dateObject
**Output : 2016-06-06 11:00:00**
But when I tried different options to apply the same conversion on python dataframe columns I'm not getting time part in the conversion.
**Option1**
df['Date'] = df.Date.apply(lambda x: dt.datetime.strptime(x, "%B %d, %Y, %H").date())
**Option2**
df['Date'] = pd.to_datetime(df['Date'] = df.Date.apply(lambda x: dt.datetime.strptime(x, "%B %d, %Y, %H"))
Output: both cases got 2016-06-06
Any suggestions will be appreciated.
I think you need add parameter format to to_datetime:
print (pd.to_datetime('June 6, 2016, 11', format='%B %d, %Y, %H'))
2016-06-06 11:00:00
It works with DataFrame too:
df = pd.DataFrame({'Date':['June 6, 2016, 11', 'May 6, 2016, 11']})
print (df)
Date
0 June 6, 2016, 11
1 May 6, 2016, 11
print (pd.to_datetime(df['Date'], format='%B %d, %Y, %H'))
0 2016-06-06 11:00:00
1 2016-05-06 11:00:00
Name: Date, dtype: datetime64[ns]

Categories