ValueError when converting String to datetime - python

I have a dataframe as follows, and I am trying to reduce the dataframe to only contain rows for which the Date is greater than a variable curve_enddate. The df['Date'] is in datetime and hence I'm trying to convert curve_enddate[i][0] which gives a string of the form 2015-06-24 to datetime but am getting the error ValueError: time data '2015-06-24' does not match format '%Y-%b-%d'.
Date Maturity Yield_pct Currency
0 2015-06-24 0.25 na CAD
1 2015-06-25 0.25 0.0948511020 CAD
The line where I get the Error:
df = df[df['Date'] > time.strptime(curve_enddate[i][0], '%Y-%b-%d')]
Thank You

You are using wrong date format, %b is for the named months (abbreviations like Jan or Feb , etc), use %m for the numbered months.
Code -
df = df[df['Date'] > time.strptime(curve_enddate[i][0], '%Y-%m-%d')]

You cannot compare a time.struct_time tuple which is what time.strptime returns to a Timestamp so you also need to change that as well as using '%Y-%m-%d' using m which is the month as a decimal number. You can use pd.to_datetime to create the object to compare:
df = df[df['Date'] > pd.to_datetime(curve_enddate[i][0], '%Y-%m-%d')]

Related

Pandas date column: problem with date conversion

I have a column date in a Covid data set. The dates appear in this format 20211030 (year - month - day).
However, when converting that column, everything appears with 1970.
This is my code:
df["FECHA"] = pd.to_datetime(df["FECHA"], unit='s')
The result is this:
0 MI PERU 1970-08-22 21:58:27
1 SAN JUAN DE LURIGANCHO 1970-08-22 19:27:09
2 YANAHUARA 1970-08-22 19:22:01
3 CUSCO 1970-08-22 22:08:41
4 PANGOA 1970-08-22 21:58:36
Thank you in advance for your help, big hug.
I get this error:
ValueError: The 'datetime64' dtype has no unit. Please pass in 'datetime64[ns]' instead.
my complete code
import pandas as pd
import numpy as np
import matplotlib.pyplot as
plt from datetime import datetime
dataset_covid = "datasetcovid.csv"
df = pd.read_csv(dataset_covid, sep=";", usecols=["DISTRITO", "FECHA_RESULTADO"])
df['FECHA_RESULTADO'] = df['FECHA_RESULTADO'].astype('datetime64')
also try this other code
df['FECHA_RESULTADO'] = df['FECHA_RESULTADO''].astype(str).astype('datetime64')
ParserError: year 20210307 is out of range: 20210307.0
In your case, you don't need pd.to_datetime IF column contains strings:
df = pd.DataFrame({'FECHA': ['20211030']})
print(df)
# Output:
FECHA
0 20211030
Use astype:
df['FECHA'] = df['FECHA'].astype('datetime64')
print(df)
# Output:
FECHA
0 2021-10-30
BUT if the dtype of your column FECHA is integer, you have to cast your column to string before:
df['FECHA'] = df['FECHA'].astype(str).astype('datetime64')
print(df)
# Output:
FECHA
0 2021-10-30
As noted in the comments, the result is caused by the parameters you are inputing in the to_datetime function. To fix this you should :
drop the unit parameter which is not related to your formating
add a format parameter which correspond to the date format you are using.
Hence, your code should go from:
df["FECHA"] = pd.to_datetime(df["FECHA"], unit='s')
To this:
df["FECHA"] = pd.to_datetime(df["FECHA"], format='%Y%m%d')
In order to find the proper formating you can lookup the values that correspond within this documentation. Docs related to the to_datetime function can be found here.
In our scenario the %Y corresponds to a year with century as a decimal number.
The %m to a padded month (with a starting zero). And the %d to the day in the month. This should match the 20211030 (year - month - day) given.

How to convert python dataframe timestamp to datetime format

I have a dataframe with date information in one column.
The date visually appears in the dataframe in this format: 2019-11-24
but when you print the type it shows up as:
Timestamp('2019-11-24 00:00:00')
I'd like to convert each value in the dataframe to a format like this:
24-Nov
or
7-Nov
for single digit days.
I've tried using various datetime and strptime commands to convert but I am getting errors.
Here's a way to do:
df = pd.DataFrame({'date': ["2014-10-23","2016-09-08"]})
df['date_new'] = pd.to_datetime(df['date'])
df['date_new'] = df['date_new'].dt.strftime("%d-%b")
date date_new
0 2014-10-23 23-Oct
1 2016-09-08 08-Sept

csv Pandas datetime convert time to seconds

I work with data from Datalogger and the timestap is not supported by datetime in the Pandas Dataframe.
I would like to convert this timestamp into a format pandas knows and the then convert the datetime into seconds, starting with 0.
>>>df.time
0 05/20/2019 19:20:27:374
1 05/20/2019 19:20:28:674
2 05/20/2019 19:20:29:874
3 05/20/2019 19:20:30:274
Name: time, dtype: object
I tried to convert it from the object into datetime64[ns]. with %m or %b for month.
df_time = pd.to_datetime(df["time"], format = '%m/%d/%y %H:%M:%S:%MS')
df_time = pd.to_datetime(df["time"], format = '%b/%d/%y %H:%M:%S:%MS')
with error: redefinition of group name 'M' as group 7; was group 5 at position 155
I tried to reduce the data set and remove the milliseconds without success.
df['time'] = pd.to_datetime(df['time'],).str[:-3]
ValueError: ('Unknown string format:', '05/20/2019 19:20:26:383')
or is it possible to just subtract the first time line from all the other values in the column time?
Use '%m/%d/%Y %H:%M:%S:%f' as format instead of '%m/%d/%y %H:%M:%S:%MS'
Here is the format documentation for future reference
I am not exactly sure what you are looking for but you can use the above example to format your output and then you can remove items from your results like the microseconds this way:
date = str(datetime.now())
print(date)
2019-07-28 14:04:28.986601
print(date[11:-7])
14:04:28
time = date[11:-7]
print(time)
14:04:28

Pandas to_datetime not formatting as expected

I have a data frame with a column 'Date' with data type datetime64. The values are in YYYY-MM-DD format.
How can I convert it to YYYY-MM format and use it as a datetime64 object itself.
I tried converting my datetime object to a string in YYYY-MM format and then back to datetime object in YYYY-MM format but it didn't work.
Original data = 1988-01-01.
Converting datatime object to string in YY-MM format
df['Date']=df['Date'].dt.strftime('%Y-%m')
This worked as expected, my column value became
1988-01
Converting the string back to datetime object in Y-m format
df['Date']=pd.to_datetime(df['Date'],format= '%Y-%m')
I was expecting the Date column in YYYY-MM format but it became YYYY-MM-DD format.
1988-01-01
Can you please let me know if I am missing something.
Thanks
It is expected behaviour, in datetimes the year, month and day arguments are required.
If want remove days need month period by to_period:
df['Date'] = df['Date'].dt.to_period('M')
df['Date'] = pd.to_datetime(df['Date'],format= '%Y-%m').dt.to_period('M')
Sample:
df = pd.DataFrame({'Date':pd.to_datetime(['1988-01-01','1999-01-15'])})
print (df)
Date
0 1988-01-01
1 1999-01-15
df['Date'] = df['Date'].dt.to_period('M')
print (df)
Date
0 1988-01
1 1999-01

YYMM to date time python

I have a dateframe column in Python that is in the format YYMM. E.g January 1996 is 9601.
I'm having a hard time converting it from 9601 to a useable date time format. I want the new format to be 01-01-1996. Does anyone have any suggestions? I tried pd.to_datetime function but it's not getting the results I'm looking for.
Use to_datetime with parameter format:
df = pd.DataFrame({'col':['9601', '9705']})
df['col'] = pd.to_datetime(df['col'], format='%y%m')
print (df)
col
0 1996-01-01
1 1997-05-01

Categories