This might be simple but I had no luck finding the right solution.
I have a 'date' column in np array with dates in format 'Tue Feb 04 17:04:01 +0000 2020' which I would like to convert to '2020-02-04 17:04:01'
Are there any inherent methods in np which does that?
There are solutions which suggested looping through the elements in the column, but I guess that's not Numpy - thonic way.
Maybe you can try dateutil to parse dates
from dateutil import parser
date_str = 'Tue Feb 04 17:04:01 +0000 2020'
new_date = parser.parse(date_str).strftime('%Y-%m-%d %T')
With NumPy maybe you do as below:
np.datetime64(new_date)
#Example
date_str = 'Tue Feb 04 17:04:01 +0000 2020'
date_str2 = 'Fri Feb 07 17:04:01 +0000 2020'
new_date = parser.parse(date_str).strftime('%Y-%m-%d %T')
new_date2 = parser.parse(date_str2).strftime('%Y-%m-%d %T')
np.arange(np.datetime64(new_date), np.datetime64(new_date2))
Related
I have a date column in a df with values like Fri Apr 01 16:41:32 +0000 2022. I want to convert it into proper date column format 01/04/2022 16:41:32. Where 01 is day and 04 is the month.
Any guidance please?
You can use pandas.to_datetime for getting datetime then with Series.dt.strftime convert to desired format.
import pandas as pd
# example df
df = pd.DataFrame({'date': ['Fri Apr 01 16:41:32 +0000 2022' ,
'Sat Apr 02 16:41:32 +0000 2022']})
df['date'] = pd.to_datetime(df['date']).dt.strftime('%d/%m/%Y %H:%M:%S')
print(df)
date
0 01/04/2022 16:41:32
1 02/04/2022 16:41:32
You can use this to get the datetime type.
from dateutil import parser
date=parser.parse("Fri Apr 01 16:41:32 +0000 2022")
If you want a specific string format, you can then use strftime()
first create a dictionary from month and the number of month for example for key "apr" value is 04.
Then with regex create a function for extract the name of month, year, time and day and then with the apply method, apply it on all rows and store output in a new column as a tuple.
now you can use from apply method again for create custom column as
datetime.datetime(year= ..., Month=..., ...)
Consider this simple example
import pandas as pd
mydf = pd.DataFrame({'timestamp': ['Tue, 27 Jul 2021 06:43:18 +0000',
'Sun, 20 Jun 2021 17:00:17 GMT',
'Wed, 28 Jul 2021 08:44:00 -0400']})
mydf
Out[50]:
timestamp
0 Tue, 27 Jul 2021 06:43:18 +0000
1 Sun, 20 Jun 2021 17:00:17 GMT
2 Wed, 28 Jul 2021 08:44:00 -0400
I am trying to convert all the timestamps to GMT and get rid of the timezone offset.
Unfortunately, the usual solution does not work
pd.to_datetime(mydf['timestamp']).dt.tz_localize(None)
AttributeError: Can only use .dt accessor with datetimelike values
What is the issue?
Thanks!
The problem is your function call is not able to convert the mydf['timestamp'] into datetime object but plain object and therefore, when you try to access .dt, it complains.
You can use the flag utc=True to do time-zone aware conversion.
pd.to_datetime(mydf['timestamp'], utc=True).dt.tz_localize(None)
Is there a way to convert this type (String) of date format below:
Wed Feb 24 18:04:49 SGT 2021
To datetime64 ns
2021-02-24
I tried using code below using pandas and it does not work
data = {'UpdateTime': [
'Thu May 28 01:24:38 SGT 2020',
'Wed Feb 24 18:04:49 SGT 2021',
'Mon Mar 01 20:34:49 SGT 2021',
'Fri Sep 18 21:29:35 SGT 2020',
'Tue Feb 09 14:21:56 SGT 2021',
'Thu Jan 01 07:30:00 SGT 1970',
]}
df = pd.DataFrame(data)
df['UpdateTime']=pd.to_datetime(df['UpdateTime'].str.split(' ',1).str[0])
and got error
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-04 00:00:00
I'm pretty sure this is regex issue and I'm not familiar with it. Please help.
Thanks!
I think regex here is not necessary, you can specify format in to_datetime:
df['UpdateTime']=pd.to_datetime(df['UpdateTime'], format='%a %b %d %H:%M:%S SGT %Y')
print (df)
UpdateTime
0 2020-05-28 01:24:38
1 2021-02-24 18:04:49
2 2021-03-01 20:34:49
3 2020-09-18 21:29:35
4 2021-02-09 14:21:56
5 1970-01-01 07:30:00
I have a list of strings date. Formatted in like
Fri Apr 23 12:38:07 +0000 2021
How can I change its format? I want to take only the hours. I checked other source before, but you need to change the date format, which obviously I'm struggling rn
As I know, you can write the code like
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%A %b %d %H:%M:%S %z %Y')
to change its format. But idk what +0000 means.
If you only want to take the hours from the date strings, you can use .dt.strftime() after the pd.to_datetime() call, as follows:
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%a %b %d %H:%M:%S %z %Y').dt.strftime('%H:%M:%S')
Note that your format string for pd.to_datetime() is not correct and need to replace %A by %a.
+0000 is the time zone, which you can parse with %z in the format string.
Demo
ds = pd.DataFrame({'tanggal': ['Fri Apr 23 12:38:07 +0000 2021', 'Thu Apr 22 11:28:17 +0000 2021']})
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%a %b %d %H:%M:%S %z %Y').dt.strftime('%H:%M:%S')
print(ds)
tanggal waktu
0 Fri Apr 23 12:38:07 +0000 2021 12:38:07
1 Thu Apr 22 11:28:17 +0000 2021 11:28:17
I have a datetime data in this format,
08:15:54:012 12 03 2016 +0000 GMT+00:00
I need to extract only date,that is 12 03 2016 in python.
I have tried
datetime_object=datetime.strptime('08:15:54:012 12 03 2016 +0000 GMT+00:00','%H:%M:%S:%f %d %m %Y')
I get an
ValueError: unconverted data remains: +0000 GMT+00:00
If you don't mind using an external library, I find the dateparser module much more intuitive than pythons internal datetime. It can parse pretty much anything if you just do
>>> import dateparser
>>> dateparser.parse('08:15:54:012 12 03 2016 +0000 GMT+00:00')
It claims it can handle timezone offsets tho I haven't tested it.
If you need this as string then use slicing
text = '08:15:54:012 12 03 2016 +0000 GMT+00:00'
print(text[13:23])
# 12 03 2016
but you can also convert to datetime
from datetime import datetime
text = '08:15:54:012 12 03 2016 +0000 GMT+00:00'
datetime_object = datetime.strptime(text[13:23],'%d %m %Y')
print(datetime_object)
# datetime.datetime(2016, 3, 12, 0, 0)
BTW:
in your oryginal version you have to remove +0000 GMT+00:00 usinig slicing [:-16]
strptime('08:15:54:012 12 03 2016 +0000 GMT+00:00'[:-16], '%H:%M:%S:%f %d %m %Y')
You can also use split() and join()
>>> x = '08:15:54:012 12 03 2016 +0000 GMT+00:00'.split()
['08:15:54:012', '12', '03', '2016', '+0000', 'GMT+00:00']
>>> x[1:4]
['12', '03', '2016']
>>> ' '.join(x[1:4])
'12 03 2016'
You can do it like this:
d = '08:15:54:012 12 03 2016 +0000 GMT+00:00'
d = d[:23] #Remove the timezone details
from datetime import datetime
d = datetime.strptime(d, "%H:%M:%S:%f %m %d %Y") #parse the string
d.strftime('%m %d %Y') #format the string
You get:
'12 03 2016'