convert to datatype (in csv file)
(yyyy-mm-dd h:mm:ss(current column datatype) --(convert)--> 143141231(nanosec))
problem
-> object does not match format '%Y-%m-%d %H:%M:%S'
csv file
ex)
time
2019-01-01 0:00:00
2019-01-01 8:50:00
2019-01-01 8:50:00
2019-01-01 8:51:00
2019-01-01 8:51:00
2019-01-01 8:51:00
code
import time
import datetime
import pandas as pd
df = pd.read_csv("data/test.csv")
df.head()
mydate =str(df["time"])
timestamp = time.mktime(datetime.datetime.strptime(mydate, "%Y-%m-%d %H:%M:%S").timetuple())
df["time"] = [str(df[timestamp()][t]) for t in range(len(df))]
df.head()
ns_precision = df
ns_precision.to_csv("data/test_convert.csv", index=False)
expect
time
1131314141
1412313131
1312414123
1231225244
2421412141
result
ValueError: time data '0
2019-01-01 0:00:00\n1 ... 2019-01-30 20:15:00\n
Name: time, Length: 77876,
dtype: object' does not match format '%Y-%m-%d %H:%M:%S'
Related
I'm looking to convert a UNIX timestamp object to pandas date time. I'm importing the timestamps from a separate source, which displays a date time of 21-01-22 00:01 for the first timepoint and 21-01-22 00:15 for the second time point. Yet my conversion is 10 hours behind these two. Is this related to the +1000 at the end of each string?
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/'],
})
df['Time'] = df['Time'].str.split('+').str[0]
df['Time'] = df['Time'].str.split('(').str[1]
df['Time'] = pd.to_datetime(df['Time'], unit = 'ms')
Out:
Time
0 2022-01-20 14:01:00
1 2022-01-20 14:15:00
Other source:
Time
0 2022-01-21 00:01:00
1 2022-01-21 00:15:00
You could use a regex to extract Unix time and UTC offset, then parse Unix time to datetime and add the UTC offset as a timedelta, e.g.
import pandas as pd
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/', None],
})
df[['unix', 'offset']] = df['Time'].str.extract(r'(\d+)([+-]\d+)')
# datetime from unix first, leaves NaT for invalid values
df['datetime'] = pd.to_datetime(df['unix'], unit='ms')
# where datetime is not NaT, add the offset:
df.loc[~df['datetime'].isnull(), 'datetime'] += (
pd.to_datetime(df['offset'][~df['datetime'].isnull()], format='%z').apply(lambda t: t.utcoffset())
)
# or without the apply, but by using an underscored method:
# df['datetime'] = (pd.to_datetime(df['unix'], unit='ms') +
# pd.to_datetime(df['offset'], format='%z').dt.tz._offset)
df['datetime']
# 0 2022-01-21 00:01:00
# 1 2022-01-21 00:15:00
# 2 NaT
# Name: datetime, dtype: datetime64[ns]
Unfortunately, you'll have to use an underscored ("private") method, if you want to avoid the apply. This also only works if you have a constant offset, i.e. if it's the same offset throughout the whole series.
I am struggling with datetime format... This is my dataframe in pandas:
Datetime Date Field
2020-01-12 00:00:00 2020-12-01 6.543916
2020-01-12 00:10:00 2020-12-01 6.505547
2020-01-12 00:20:00 2020-12-01 7.047578
2020-01-12 00:30:00 2020-12-01 6.070998
2020-01-12 00:40:00 2020-12-01 6.452112
df.dtypes
Datetime object
Date datetime64[ns]
Field float64
I need to convert Datetime to datetime64 and swap months with days to get values in the format %Y-%m-%d %H:%M:%S, e.g. 2020-12-01 00:00:00.
import pandas as pd
from datetime import datetime
df["Datetime"] = pd.to_datetime(df["Datetime"])
df["Datetime"] = df["Datetime"].apply(lambda x: datetime.strftime(x, "%Y-%m-%d %H:%M:%S"))
Still I get the same dataframe as shown above...
Consider placing the parameter "errors":
df["Datetime"] = pd.to_datetime(df["Datetime"], errors='coerce')
See if it helps you!
I think you'll get what you want with "%Y-%d-%m %H:%M:%S" instead of "%Y-%m-%d %H:%M:%S" on your last line.
EDIT: Or better even, simply replace the last 2 lines of your code by the following:
df["Datetime"] = pd.to_datetime(df["Datetime"], format="%Y-%d-%m %H:%M:%S")
That way you won't get a ParserError: month must be in 1..12 from pd.to_datetime in the case where your Datetime column contains something like "2020-30-12 00:00:00"
I am converting my date from yyyymmdd to yyyy-mm-dd hh:mm:ss using:
d7['date'] = pd.to_datetime(d7['date'], format='%Y%m%d', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S.%f')
but I am getting error AttributeError: 'str' object has no attribute 'strftime'
I have tried implementing this Remove dtype datetime NaT but having a hard time combining it with my formatting above.
in my date column ill get dates like 2028-01-31 00:00:00.000000 or NaT, I want to make the NaT blanks or None instead.
Sample dataframe:
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-04 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
NaT
Thank you.
"I am converting my date from yyyymmdd to yyyy-mm-dd hh:mm:ss", therefore 20180101 is an example of the initial date format.
Remove dtype datetime NaT expects a datetime format, but pd.to_datetime(d7['date'], format='%Y%m%d', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S.%f') creates a string.
import pandas as pd
# sample data
df = pd.DataFrame({'date': ['20180101', '', '20190101']})
# df view
date
0 20180101
1
2 20190101
# convert to datetime format
df.date = pd.to_datetime(df.date, format='%Y%m%d', errors='coerce')
# df view
date
0 2018-01-01
1 NaT
2 2019-01-01
# apply method from link
# use None if you want NoneType. If you want a string, use 'None'
df.date = df.date.apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else None)
# final output
date
0 2018-01-01 00:00:00
1 None
2 2019-01-01 00:00:00
To find problem rows, try the following code instead.
The function will return a str, None or print any 'date' row that causes an AttributeError
df = pd.DataFrame({'date': ['20180101', '', '20190101', 'abcd', 3456]})
df.date = pd.to_datetime(df.date, format='%Y%m%d', errors='coerce')
# convert from datetime format to string format
def convert(x) -> (str, None):
try:
return x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else None
except AttributeError:
return None
print(x)
# apply the function
df.date = df.date.apply(lambda x: convert(x))
# output df
date
0 2018-01-01 00:00:00
1 None
2 2019-01-01 00:00:00
3 None
4 None
I have a dataframe as shown below
df1_new = pd.DataFrame({'person_id': [1, 1, 3, 3, 5, 5],'obs_date': ['7/23/2377 12:00:00 AM', 'NA-NA-NA NA:NA:NA', 'NA-NA-NA NA:NA:NA', '7/27/2277 12:00:00 AM', '7/13/2077 12:00:00 AM', 'NA-NA-NA NA:NA:NA']})
I would not like to use pd.to_datetime approach because of year constraint (upper limit) it imposes. OOB error here
The below is what I tried but it isn't efficient as you can see below
yr = df1_new['obs_date'][0][5:9]
m = df1_new['obs_date'][0][2:4]
d = df1_new['obs_date'][0][0]
t = df1_new['obs_date'][0][11:19]
output = yr + "-" + m + "-" + d + " " + t
Is there any other efficient and elegant way to achieve the below expected output without using pd.datetime functions
updated screenshot
try/except screenshot
apply try catch when converting string into datetime
import datetime
import pandas as pd
def str2time(x):
try:
return datetime.datetime.strptime(x, '%m/%d/%Y %I:%M:%S %p')
except:
return pd.NaT
df1_new['obs_date'] = df1_new['obs_date'].apply(str2time)
print(df1_new)
person_id obs_date
0 1 2377-07-23 12:00:00
1 1 NaT
2 3 NaT
3 3 2277-07-27 12:00:00
4 5 2077-07-13 12:00:00
5 5 NaT
from datetime import datetime
def convert_func(x):
return datetime.strptime(x, "%m/%d/%Y %I:%M:%S %p").strftime("%Y-%m-%d %H:%M:%S")
df1_new['obs_date'] = df1_new['obs_date'].astype(str)
df1_new['obs_date'] = df1_new['obs_date'].apply(convert_func)
This should work
Convert the string date to datetime and then back to the format you want. Example below:
from datetime import datetime
d = "7/23/2377 12:00:00 AM"
datetime.strptime(d, "%m/%d/%Y %I:%M:%S %p").strftime("%Y-%m-%d %H:%M:%S %p")
#output
>>>'2377-07-23 00:00:00 AM'
I have two columns, one has type datetime64 and datetime.time. The
first column has the day and the second one the hour and minutes. I
am having trouble parsing them:
Leistung_0011
ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00
Leistung_0011['Start_datetime'] = pd.to_datetime(Leistung_0011['ActStartDateExecution'].astype(str) + ' ' + Leistung_0011['ActStartTimeExecution'].astype(str))
ValueError: ('Unknown string format:', 'NaT 00:00:00')
You can convert to str and join with whitespace before passing to pd.to_datetime:
df['datetime'] = pd.to_datetime(df['day'].astype(str) + ' ' + df['time'].astype(str))
print(df, df.dtypes, sep='\n')
# day time datetime
# 0 2018-01-01 15:00:00 2018-01-01 15:00:00
# 1 2015-12-30 05:00:00 2015-12-30 05:00:00
# day datetime64[ns]
# time object
# datetime datetime64[ns]
# dtype: object
Setup
from datetime import datetime
df = pd.DataFrame({'day': ['2018-01-01', '2015-12-30'],
'time': ['15:00', '05:00']})
df['day'] = pd.to_datetime(df['day'])
df['time'] = df['time'].apply(lambda x: datetime.strptime(x, '%H:%M').time())
print(df['day'].dtype, type(df['time'].iloc[0]), sep='\n')
# datetime64[ns]
# <class 'datetime.time'>
Complete example including seconds:
import pandas as pd
from io import StringIO
x = StringIO(""" ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00""")
df = pd.read_csv(x, delim_whitespace=True)
df['ActStartDateExecution'] = pd.to_datetime(df['ActStartDateExecution'])
df['ActStartTimeExecution'] = df['ActStartTimeExecution'].apply(lambda x: datetime.strptime(x, '%H:%M:%S').time())
df['datetime'] = pd.to_datetime(df['ActStartDateExecution'].astype(str) + ' ' + df['ActStartTimeExecution'].astype(str))
print(df.dtypes)
ActStartDateExecution datetime64[ns]
ActStartTimeExecution object
datetime datetime64[ns]
dtype: object