Converting Pandas Series to a String to Convert DateTime - python

I have a DataFrame that includes column df['date'] and df['time']
Which I have put together in one column named df['datetime']
the output looks like the following: 2017-04-12 17:30:18.733
My end goal is to have it converted to a string Wed, 12 Apr 2017 17:30:18 733
When I try different methods as pd.to_datetime() it tells me I need it to be a string.
and I can't find a method to turn the whole column to a bunch of strings
I tried calling .astype(str) .apply(str)
Any suggestions?

You are taking to strings (one in the date column and the other in the time column), joining them together with a space to create a new datetime string (e.g. "2017-04-12 17:30:18.733"). You then use strptime to parse this string into a datetime object. I used a form that is amenable to the inclusion of microseconds or not. You now use 'strftime' to parse this datetime object back into your desired string format.
from datetime import datetime
df = pd.DataFrame({'date': ['2017-04-12', '2017-04-13'],
'time': ['17:30:18.733', '07:30:18']})
def date_parser(date_string):
try:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
timestamp = timestamp.strftime('%a, %d %b %Y %H:%M:%S %f')[:-3]
except ValueError:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S')
timestamp = timestamp.strftime('%a, %d %b %Y %H:%M:%S 000')
return timestamp
df['datetime_str'] = (df['date'] + ' ' + df['time']).apply(lambda x: date_parser(x))
>>> df
date time datetime_str
0 2017-04-12 17:30:18.733 Wed, 12 Apr 2017 17:30:18 733
1 2017-04-13 07:30:18 Thu, 13 Apr 2017 07:30:18 000

use something like this:
df.apply(lambda x: x.datetime.strftime('%D %d ...the format you want...'))

Related

How to change date format in python with pandas

I'm working with big data in pandas and I have a problem with the format of the dates, this is the format of one column
Wed Feb 24 12:06:14 +0000 2021
and I think it is easier to change the format of all the columns with a format like this
'%d/%m/%Y, %H:%M:%S'
how can i do that?
Does this work for you?
pandas.to_datetime(s, format='%d/%m/%Y, %H:%M:%S')
Source: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
You can use the following function for your dataset.
def change_format(x):
format = dt.datetime.strptime(x, "%a %b %d %H:%M:%S %z %Y")
new_format = format.strftime('%d/%m/%Y, %H:%M:%S')
return new_format
Then apply it using df['date_column'] = df['date_column'].apply(change_format).
Here df is your dataset.

Unable to get time difference between to pandas dataframe columns

I have a pandas dataframe that contains a couple of columns. Two of which are start_time and end_time. In those columns the values look like - 2020-01-04 01:38:33 +0000 UTC
I am not able to create a datetime object from these strings because I am not able to get the format right -
df['start_time'] = pd.to_datetime(df['start_time'], format="yyyy-MM-dd HH:mm:ss +0000 UTC")
I also tried using yyyy-MM-dd HH:mm:ss %z UTC as a format
This gives the error -
ValueError: time data '2020-01-04 01:38:33 +0000 UTC' does not match format 'yyyy-MM-dd HH:mm:ss +0000 UTC' (match)
You just need to use the proper timestamp format that to_datetime will recognize
df['start_time'] = pd.to_datetime(df['start_time'], format="%Y-%m-%d %H:%M:%S +0000 UTC")
There are some notes below about this problem:
1. About your error
This gives the error -
You have parsed a wrong datetime format that will cause the error. For correct format check this one https://strftime.org/. Correct format for this problem would be: "%Y-%m-%d %H:%M:%S %z UTC"
2. Pandas limitation with timezone
Parsing UTC timezone as %z doesn't working on pd.Series (it only works on index value). So if you use this, it will not work:
df['startTime'] = pd.to_datetime(df.startTime, format="%Y-%m-%d %H:%M:%S %z UTC", utc=True)
Solution for this is using python built-in library for inferring the datetime data:
from datetime import datetime
f = lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S %z UTC")
df['startTime'] = pd.to_datetime(df.startTime.apply(f), utc=True)
#fmarm answer only help you dealing with date and hour data, not UTC timezone.

format 01-01-16 7:43 string to datetime

I have the following strings that I'd like to convert to datetime objects:
'01-01-16 7:43'
'01-01-16 3:24'
However, when I try to use strptime it always results in a does not match format error.
Pandas to_datetime function nicely handles the automatic conversion, but I'd like to solve it with the datetime library as well.
format_ = '%m-%d-%Y %H:%M'
my_date = datetime.strptime("01-01-16 4:51", format_)
ValueError: time data '01-01-16 4:51' does not match format '%m-%d-%Y %H:%M'
as i see your date time string '01-01-16 7:43'
its a 2-digit year not 4-digit year
that in order to parse through a 2-digit year, e.g. '16' rather than '2016', a %y is required instead of a %Y.
you can do that like this
from datetime import datetime
datetime_str = '01-01-16 7:43'
datetime_object = datetime.strptime(datetime_str, '%m-%d-%y %H:%M')
print(type(datetime_object))
print(datetime_object)
give you output 2016-01-01 07:43:00
First of all, if you want to match 2016 you should write %Y while for 16 you should write %y.
That means you should write:
format_ = '%m-%d-%y %H:%M'
Check this link for all format codes.

Convert weekday name string into datetime

I have the following date (as an object format) : Tue 31 Jan in a pandas Series.
and I try to change it into : 31/01/2019
Please, how can I achieve this ? I understand more or less that pandas.Datetime can convert easily when a string date is clearer (like 6/1/1930 22:00) but not in my case, when their is a weekday name.
Thank you for your help.
Concat the year and callpd.to_datetime with a custom format:
s = pd.Series(['Tue 31 Jan', 'Mon 20 Feb',])
pd.to_datetime(s + ' 2019', format='%a %d %b %Y')
0 2019-01-31
1 2019-02-20
dtype: datetime64[ns]
This is fine as long as all your dates follow this format. If that is not the case, this cannot be solved reliably.
More information on datetime formats at strftime.org.
Another option is using the 3rd party dateutil library:
import dateutil
s.apply(dateutil.parser.parse)
0 2018-01-31
1 2018-02-20
dtype: datetime64[ns]
This can be installed with PyPi.
Another, slower option (but more flexible) is using the 3rd party datefinder library to sniff dates from string containing random text (if this is what you need):
import datefinder
s.apply(lambda x: next(datefinder.find_dates(x)))
0 2018-01-31
1 2018-02-20
dtype: datetime64[ns]
You can install it with PyPi.
Convert to a datetime object
If you wanted to use the datetime module, you could get the year by doing the following:
import datetime as dt
d = dt.datetime.strptime('Tue 31 Jan', '%a %d %b').replace(year=dt.datetime.now().year)
This is taking the date in your format, but replacing the default year 1900 with the current year in a reliable way.
This is similar to the other answers, but uses the builtin replace method as opposed to concatenating a string.
Output
To get the desired output from your new datetime object, you could perform the following:
>>> d.strftime('%d/%m/%Y')
'31/01/2018'
Here is two alternate ways to achieve the same result.
Method 1: Using datetime module
from datetime import datetime
datetime_object = datetime.strptime('Tue 31 Jan', '%a %d %b')
print(datetime_object) # outputs 1900-01-31 00:00:00
If you had given an Year parameter like Tue 31 Jan 2018, then this code would work.
from datetime import datetime
datetime_object = datetime.strptime('Tue 31 Jan 2018', '%a %d %b %Y')
print(datetime_object) # outputs 2018-01-31 00:00:00
To print the resultant date in a format like this 31/01/2019. You can use
print(datetime_object.strftime("%d/%m/%Y")) # outputs 31/01/2018
Here are all the possible formatting options available with datetime object.
Method 2: Using dateutil.parser
This method automatically fills in the Year parameter with current year.
from dateutil import parser
string = "Tue 31 Jan"
date = parser.parse(string)
print(date) # outputs 2018-01-31 00:00:00

pandas read_csv parse foreign dates

I am trying to use read_csv on a .csv file that contains a date column. The problem is that the date column is in a foreign language (romanian), with entries like:
'26 septembrie 2017'
'13 iulie 2017'
etc. How can I parse this nicely into a pandas dataframe which has a US date format?
you can pass a converter for that column:
df = pd.read_csv(myfile, converters={'date_column': foreign_date_converter})
But first you have to define the converter to do what you want. This approach uses locale manipulation:
def foreign_date_converter(text):
# Resets locale to "ro_RO" to parse romanian date properly
# (non thread-safe code)
loc = locale.getlocale(locale.LC_TIME)
locale.setlocale(locale.LC_TIME, 'ro_RO')
date = datetime.datetime.strptime(text '%d %b %Y').date()
locale.setlocale(locale.LC_TIME, loc) # restores locale
return date
Use dateparser module.
import dateparser
df = pd.read_csv('yourfile.csv', parse_dates=['date'], date_parser=dateparser.parse)
Enter your date column name in parse_dates parameter. I'm just assuming it as date
You may have output like this:
date
0 2017-09-26
1 2017-07-13
If you want to change the format use strftime strftime
df['date'] = df.date.dt.strftime(date_format = '%d %B %Y')
output:
date
0 26 September 2017
1 13 July 2017
The easiest solution would be to simply use 12 times the str.replace(old, new) function.
It is not pretty but if you just built the function:
def translater(date_string_with_exatly_one_date):
date_str = date_string_with_exatly_one_date
date_str = date_str.replace("iulie", "july")
date_str = date_str.replace("septembrie", "september")
#do this 10 more times with the right translation
return date_str
Now you just have to call it for every entry. After that you can handle it like a US date string. This is not very efficient but it will get the job done and you do not have to search for special libraries.

Categories