I have two columns Date_x and Date_y. I would like to compare them (i.e Date_x + 1 hour < Date_y)
Format of the strings looks as follows "2020-01-29 11:31:32.754292 UTC"
I have tried converting it using datetime:
from datetime import datetime as dt
df["Date_x"] = [dt.strptime(x, '%Y-%m-%d %H:%M:%S.%f') for x in df['Date_x']]
However, it throws an error regarding the UTC part. I tried removing it with no avail.
Last traceback:
time data '2020-01-29 18:30:28' does not match format '%Y-%m-%d %H:%M:%S.%f'
How would you go about converting the string to hh:mm:ss only?
You could use an if statement:
df["Date_x"] = [dt.strptime(x, '%Y-%m-%d %H:%M:%S.%f') if '.' in x else dt.strptime(x, '%Y-%m-%d %H:%M:%S') for x in df['Date_x']]
But why not just pd.to_datetime:
df["Date_x"] = pd.to_datetime(df["Date_x"], infer_datetime_format=True)
Related
How can I convert string 2021-09-30_1 to datetime 2021/09/30 00:00, which means that from the last string we have to substract one to get the hour.
I tried datetime.strptime(date, '%Y %d %Y %I')
datetime.strptime if to define the timestamp from a string, the format should match the provided one. datetime.strftime (note the f) is to generate a string from a datetime object.
You can use:
datetime.strptime(date, '%Y-%m-%d_%H').strftime('%Y/%m/%d %H:%M')
output: '2021/09/30 01:00'
in case the _x defines a delta:
from datetime import datetime, timedelta
d, h = date.split('_')
d = datetime.strptime(d, '%Y-%m-%d')
h = timedelta(hours=int(h))
(d-h).strftime('%Y/%m/%d %H:%M')
output: '2021/09/29 23:00'
Considering the _1 is hour and appears in al of your data (The hour part takes value between [1, 24]), your format was wrong.
For reading the date from string you'll need format it correctly:
from datetime import datetime, timedelta
date = "2021-09-30_1"
date_part, hour_part = date.split("_")
date_object = datetime.strptime(date_part, '%Y-%m-%d') + timedelta(hours=int(hour_part) - 1)
Now you have the date object. And you can display it as:
print(date_object.strftime('%Y/%m/%d %H:%M'))
from datetime import datetime
raw_date = "2021-09-30_1"
date = raw_date.split("_")[0]
parsed_date = datetime.strptime(date, '%Y-%m-%d')
formated_date = parsed_date.strftime('%Y/%m/%d %H:%M')
strptime is used for parsing string and strftime for formating.
Also for date representation you should provide format codes for hours and minutes as in:
formated_date = parsed_date.strftime('%Y/%m/%d %H:%M')
I want to extract the date (eg.2018-07-16) from strings (eg. 2018-07-16 10:17:53.460035).
The strings have two formats: "2018-07-16 10:17:53.460035" and "2018-05-20 14:37:21".
When I use strptime(d, "%Y-%m-%d %H:%M:%S.%f") to convert the strings before extracting the date, it pops this error:
ValueError: time data '2018-05-20 14:37:21' does not match format
%Y-%m-%d %H:%M:%S.%f'
How can I convert both time formats to DateTime type and extract date from it?
Use to_datetime from pandas.
import pandas as pd
a = "2018-07-16 10:17:53.460035"
b = "2018-05-20 14:37:21"
print(pd.to_datetime(a).date())
print(pd.to_datetime(b).date())
You don't need the .%f at the end for the first format, that is what is causing the format error.
t = "2018-05-20 14:37:21"
strptime(t, "%Y-%m-%d %H:%M:%S")
You need to create a second format for the other time string:
t = "2018-07-16 10:17:53.460035"
strptime(t, "%Y-%m-%d %H:%M:%S.%f")
Edit: Here is another example which excepts both
time_stamps = ["2018-05-20 14:37:21", "2018-07-16 10:17:53.460035"]
for stamp in time_stamps:
fmt = "%Y-%m-%d %H:%M:%S"
try:
time = datetime.datetime.strptime(stamp, fmt+".%f")
except ValueError:
time = datetime.datetime.strptime(stamp, fmt)
print(time)
Right now I have:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
This works great unless I'm converting a string that doesn't have the microseconds. How can I specify that the microseconds are optional (and should be considered 0 if they aren't in the string)?
You could use a try/except block:
try:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
except ValueError:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S')
What about just appending it if it doesn't exist?
if '.' not in date_string:
date_string = date_string + '.0'
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
I'm late to the party but I found if you don't care about the optional bits this will lop off the .%f for you.
datestring.split('.')[0]
I prefer using regex matches instead of try and except. This allows for many fallbacks of acceptable formats.
# full timestamp with milliseconds
match = re.match(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%M:%S.%fZ")
# timestamp missing milliseconds
match = re.match(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%M:%SZ")
# timestamp missing milliseconds & seconds
match = re.match(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%MZ")
# unknown timestamp format
return false
Don't forget to import "re" as well as "datetime" for this method.
datetime(*map(int, re.findall('\d+', date_string)))
can parse both '%Y-%m-%d %H:%M:%S.%f' and '%Y-%m-%d %H:%M:%S'. It is too permissive if your input is not filtered.
It is quick-and-dirty but sometimes strptime() is too slow. It can be used if you know that the input has the expected date format.
If you are using Pandas you can also filter the the Series and concatenate it. The index is automatically joined.
import pandas as pd
# Every other row has a different format
df = pd.DataFrame({"datetime_string": ["21-06-08 14:36:09", "21-06-08 14:36:09.50", "21-06-08 14:36:10", "21-06-08 14:36:10.50"]})
df["datetime"] = pd.concat([
pd.to_datetime(df["datetime_string"].iloc[1::2], format="%y-%m-%d %H:%M:%S.%f"),
pd.to_datetime(df["datetime_string"].iloc[::2], format="%y-%m-%d %H:%M:%S"),
])
datetime_string
datetime
0
21-06-08 14:36:09
2021-06-08 14:36:09
1
21-06-08 14:36:09.50
2021-06-08 14:36:09.500000
2
21-06-08 14:36:10
2021-06-08 14:36:10
3
21-06-08 14:36:10.50
2021-06-08 14:36:10.500000
using one regular expression and some list expressions
time_str = "12:34.567"
# time format is [HH:]MM:SS[.FFF]
sum([a*b for a,b in zip(map(lambda x: int(x) if x else 0, re.match(r"(?:(\d{2}):)?(\d{2}):(\d{2})(?:\.(\d{3}))?", time_str).groups()), [3600, 60, 1, 1/1000])])
# result = 754.567
For my similar problem using jq I used the following:
|split("Z")[0]|split(".")[0]|strptime("%Y-%m-%dT%H:%M:%S")|mktime
As the solution to sort my list by time properly.
I have a df column with the following days example 2018-07-25 19:23:17.000000
and i cannot find the correct way to convert this string into a datetime value
I've been trying with the following code
dfa['time_event_utc'] = pd.to_datetime(df['time_event_utc'],format='%d%b%Y:%H:%M:%S +000000',utc=True)
your format is '%Y-%m-%d %H:%M:%S.%f'
mydt = '2018-07-25 19:23:17.000000'
datetime.datetime.strptime(mydt , '%Y-%m-%d %H:%M:%S.%f')
To convert a string date to date format dropping the '00:00:00' I use :
import datetime
strDate = '2017-04-17 00:00:00'
datetime.datetime.strptime(strDate, '%Y/%m/%d %H:%M:%S').strftime('%Y-%m-%d')
Returns :
ValueError: time data '2017-04-17 00:00:00' does not match format '%Y/%m/%d %H:%M:%S'
Is %H:%M:%S not correct format ?
This is the correct way:
datetime.datetime.strptime(strDate, '%Y-%m-%d %H:%M:%S').strftime('%Y-%m-%d')
Notice the - instead of / in strptime. The date is converted to: 2017-04-17.
If you would like to have it displayed a different way, have a look here.