parse_dates cannot convert string to datetime - python

I try to read a CSV from the link https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/challenge/traffic.csv
df = pd.read_csv('https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/challenge/traffic.csv', parse_dates=['time'])
But, the time column is still in string format.
df.dtypes
[output]
ip object
time object
path object
status int64
size int64
dtype: object
Interestingly, when I read a similar csv from a different url, it works. So
df = pd.read_csv('https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/solution/traffic.csv', parse_dates=['time'])
indeed converts the time column to a datetime object. Why does parse_dates fail in the first link and how can I fix it?

There is typo in datetimes:
1017-06-19 14:46:24
Possible solution is convert values to NaT:
df['time'] = pd.to_datetime(df['time'], errors='coerce')

Related

pandas convert object to time format

I have a dataframe time column with object datatype and would like to convert time format for graph.
import pandas as pd
df = pd.DataFrame({
"time":["12:30:31.320"]
})
df["time"]
df['time'] = pd.to_datetime(df['time'],format='%H:%M:%S.%f').dt.strftime('%H:%M:%S')
df['time'] # Output Name: time, dtype: object
To keep Python's time instance, you can use:
df['time'] = (pd.to_datetime(df['time'],format='%H:%M:%S.%f')
.dt.floor('S') # remove milliseconds
.dt.time) # keep time part
Output:
>>> df['time']
0 12:30:31
Name: time, dtype: object # the dtype is object but...
>>> df.loc[0, 'time']
datetime.time(12, 30, 31) # ...contain a list of time objects
You appear to be attempting to convert the 'time' column back to a string in the format '%H:%M:%S' after converting it to datetime.
You may accomplish this by using the dt.strftime function.
However, after converting back to string, the output of df['time'] is still of object data type.
You may use the astype method to convert the data type of this column to string:
df['time'] = df['time'].astype(str)

can't get python to format date correctly

I'm reading in a csv file into a new dataframe "df" using
df = pd.read_csv(r'C:\projects\tstr_results.csv',index_col=None)
The file has a column 'date' that is in a format of 4-Nov-2021. df.dtypes shows 'date' to be an object.
I used the following command to the column into a datetime stamp:
df['date'] = pd.to_datetime(df['date'], format='%d-%b-%Y')
However, df['date'] shows the date to be 2021-11-04 and as a dtype of datetime64[ns].
Am I missing a parameter to get to the desired format of 04-Nov-2021?
You can set the right format directly while reading the csv with keyword arguments parse_dates and date_parser:
df = pd.read_csv(r'C:\projects\tstr_results.csv',index_col=None, parse_dates=['date'], date_parser=lambda d: pd.Timestamp(d).strftime("%d-%b-%Y"))

How to remove the time from datetime of the pandas Dataframe. The type of the column is str and objects, but the value is dateime [duplicate]

i have a variable consisting of 300k records with dates and the date look like
2015-02-21 12:08:51
from that date i want to remove time
type of date variable is pandas.core.series.series
This is the way i tried
from datetime import datetime,date
date_str = textdata['vfreceiveddate']
format_string = "%Y-%m-%d"
then = datetime.strftime(date_str,format_string)
some Random ERROR
In the above code textdata is my datasetname and vfreceived date is a variable consisting of dates
How can i write the code to remove the time from the datetime.
Assuming all your datetime strings are in a similar format then just convert them to datetime using to_datetime and then call the dt.date attribute to get just the date portion:
In [37]:
df = pd.DataFrame({'date':['2015-02-21 12:08:51']})
df
Out[37]:
date
0 2015-02-21 12:08:51
In [39]:
df['date'] = pd.to_datetime(df['date']).dt.date
df
Out[39]:
date
0 2015-02-21
EDIT
If you just want to change the display and not the dtype then you can call dt.normalize:
In[10]:
df['date'] = pd.to_datetime(df['date']).dt.normalize()
df
Out[10]:
date
0 2015-02-21
You can see that the dtype remains as datetime:
In[11]:
df.dtypes
Out[11]:
date datetime64[ns]
dtype: object
You're calling datetime.datetime.strftime, which requires as its first argument a datetime.datetime instance, because it's an unbound method; but you're passing it a string instead of a datetime instance, whence the obvious error.
You can work purely at a string level if that's the result you want; with the data you give as an example, date_str.split()[0] for example would be exactly the 2015-02-21 string you appear to require.
Or, you can use datetime, but then you need to parse the string first, not format it -- hence, strptime, not strftime:
dt = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
date = dt.date()
if it's a datetime.date object you want (but if all you want is the string form of the date, such an approach might be "overkill":-).
simply writing
date.strftime("%d-%m-%Y") will remove the Hour min & sec

Converting dates to datetime64 results in day and month places getting swapped

I am pulling a time series from a csv file which has dates in "mm/dd/yyyy" format
df = pd.read_csv(lib_file.csv)
df['Date'] = df['Date'].apply(lambda x:datetime.strptime(x,'%m/%d/%Y').strftime('%d/%m/%Y'))
below is the output
I convert dtypes for ['Date'] from object to datetime64
df['Date'] = pd.to_datetime(df['Date'])
but that changes my dates as well
how do I fix it?
Try this:
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
This will infer your dates based on the first non-NaN element which is being correctly parsed in your case and will not infer the format for each and every row of the dataframe.
just using the below code helped
df = pd.read_csv(lib_file.csv)
df['Date'] = pd.to_datetime(df['Date])

Python - pd.to_datetime does not convert object string to datetime, keeps being object

following problem:
I'm trying to convert a column of a DataFrame from a string to a datetime object with following code:
df = pd.read_csv('data.csv')
df['Time (CET)'] = pd.to_datetime(df['Time (CET)'])
Should be the standard pandas way to do so. But the dtype of the column doesn't change, keeps being an object while no error or exception is raised.
The entries look like 2018-12-31 17:47:14+01:00.
If I apply pd.to_datetime with utc=True, it works completely fine, dtype changes from object to datetime64[ns, UTC]. Unfortunately I don't want to convert the time to UTC, only converting the string to a datetime object without any time zone changes.
Thanks a lot!

Categories