pandas convert object to time format - python

I have a dataframe time column with object datatype and would like to convert time format for graph.
import pandas as pd
df = pd.DataFrame({
"time":["12:30:31.320"]
})
df["time"]
df['time'] = pd.to_datetime(df['time'],format='%H:%M:%S.%f').dt.strftime('%H:%M:%S')
df['time'] # Output Name: time, dtype: object

To keep Python's time instance, you can use:
df['time'] = (pd.to_datetime(df['time'],format='%H:%M:%S.%f')
.dt.floor('S') # remove milliseconds
.dt.time) # keep time part
Output:
>>> df['time']
0 12:30:31
Name: time, dtype: object # the dtype is object but...
>>> df.loc[0, 'time']
datetime.time(12, 30, 31) # ...contain a list of time objects

You appear to be attempting to convert the 'time' column back to a string in the format '%H:%M:%S' after converting it to datetime.
You may accomplish this by using the dt.strftime function.
However, after converting back to string, the output of df['time'] is still of object data type.
You may use the astype method to convert the data type of this column to string:
df['time'] = df['time'].astype(str)

Related

Dataframe - Converting entire column from str object to datetime object - TypeError: strptime() argument 1 must be str, not Series

I want to convert values in entire column from strings to datetime objects, but I can't accomplish it with this code which works on solo strings i.e. (if I add .iloc[] and specify the index):
price_df_higher_interval['DateTime'] = datetime.datetime.strptime(price_df_higher_interval['DateTime'],
'%Y-%m-%d %H:%M:%S')
Also I would like to ommit looping through the dataframe, but I don't know if that won't be necessery.
Thank you for your help :)
You could use the pd.to_datetime function.
df = pd.DataFrame({"str_date": ["2023-01-01 12:13:21", "2023-01-02 13:10:24 "]})
df["date"] = pd.to_datetime(df["str_date"], format="%Y-%m-%d %H:%M:%S")
df.dtypes
str_date object
date datetime64[ns]
dtype: object

parse_dates cannot convert string to datetime

I try to read a CSV from the link https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/challenge/traffic.csv
df = pd.read_csv('https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/challenge/traffic.csv', parse_dates=['time'])
But, the time column is still in string format.
df.dtypes
[output]
ip object
time object
path object
status int64
size int64
dtype: object
Interestingly, when I read a similar csv from a different url, it works. So
df = pd.read_csv('https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/solution/traffic.csv', parse_dates=['time'])
indeed converts the time column to a datetime object. Why does parse_dates fail in the first link and how can I fix it?
There is typo in datetimes:
1017-06-19 14:46:24
Possible solution is convert values to NaT:
df['time'] = pd.to_datetime(df['time'], errors='coerce')

How to convert a pandas datetime column from UTC to EST

There is another question that is eleven years old with a similar title.
I have a pandas dataframe with a column of datetime.time values.
val time
a 12:30:01.323
b 12:48:04.583
c 14:38:29.162
I want to convert the time column from UTC to EST.
I tried to do dataframe.tz_localize('utc').tz_convert('US/Eastern') but it gave me the following error: RangeIndex Object has no attribute tz_localize
tz_localize and tz_convert work on the index of the DataFrame. So you can do the following:
convert the "time" to Timestamp format
set the "time" column as index and use the conversion functions
reset_index()
keep only the time
Try:
dataframe["time"] = pd.to_datetime(dataframe["time"],format="%H:%M:%S.%f")
output = (dataframe.set_index("time")
.tz_localize("utc")
.tz_convert("US/Eastern")
.reset_index()
)
output["time"] = output["time"].dt.time
>>> output
time val
0 15:13:12.349211 a
1 15:13:13.435233 b
2 15:13:14.345233 c
to_datetime accepts an argument utc (bool) which, when true, coerces the timestamp to utc.
to_datetime returns a DateTimeIndex, which has a method tz_convert. this method will convert tz-aware timestamps from one timezeone to another.
So, this transformation could be concisely written as
df = pd.DataFrame(
[['a', '12:30:01.323'],
['b', '12:48:04.583'],
['c', '14:38:29.162']],
columns=['val', 'time']
)
df['time'] = pd.to_datetime(df.time, utc=True, format='%H:%M:%S.%f')
# convert string to timezone aware field ^^^
df['time'] = df.time.dt.tz_convert('EST').dt.time
# convert timezone, discarding the date part ^^^
This produces the following dataframe:
val time
0 a 07:30:01.323000
1 b 07:48:04.583000
2 c 09:38:29.162000
This could also be a 1-liner as below:
pd.to_datetime(df.time, utc=True, format='%H:%M:%S.%f').dt.tz_convert('EST').dt.time
list_temp = []
for row in df['time_UTC']:
list_temp.append(Timestamp(row, tz = 'UTC').tz_convert('US/Eastern'))
df['time_EST'] = list_temp

How to remove the time from datetime of the pandas Dataframe. The type of the column is str and objects, but the value is dateime [duplicate]

i have a variable consisting of 300k records with dates and the date look like
2015-02-21 12:08:51
from that date i want to remove time
type of date variable is pandas.core.series.series
This is the way i tried
from datetime import datetime,date
date_str = textdata['vfreceiveddate']
format_string = "%Y-%m-%d"
then = datetime.strftime(date_str,format_string)
some Random ERROR
In the above code textdata is my datasetname and vfreceived date is a variable consisting of dates
How can i write the code to remove the time from the datetime.
Assuming all your datetime strings are in a similar format then just convert them to datetime using to_datetime and then call the dt.date attribute to get just the date portion:
In [37]:
df = pd.DataFrame({'date':['2015-02-21 12:08:51']})
df
Out[37]:
date
0 2015-02-21 12:08:51
In [39]:
df['date'] = pd.to_datetime(df['date']).dt.date
df
Out[39]:
date
0 2015-02-21
EDIT
If you just want to change the display and not the dtype then you can call dt.normalize:
In[10]:
df['date'] = pd.to_datetime(df['date']).dt.normalize()
df
Out[10]:
date
0 2015-02-21
You can see that the dtype remains as datetime:
In[11]:
df.dtypes
Out[11]:
date datetime64[ns]
dtype: object
You're calling datetime.datetime.strftime, which requires as its first argument a datetime.datetime instance, because it's an unbound method; but you're passing it a string instead of a datetime instance, whence the obvious error.
You can work purely at a string level if that's the result you want; with the data you give as an example, date_str.split()[0] for example would be exactly the 2015-02-21 string you appear to require.
Or, you can use datetime, but then you need to parse the string first, not format it -- hence, strptime, not strftime:
dt = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
date = dt.date()
if it's a datetime.date object you want (but if all you want is the string form of the date, such an approach might be "overkill":-).
simply writing
date.strftime("%d-%m-%Y") will remove the Hour min & sec

Comparing Time in Python between String and Naive format

I need to compare the time between two dates in python. One is given as a string and the other in datetime.datetime format. I have tried a few ideas, but the error is always Cannot compare tz-naive and tz-aware datetime-like objects
Idea 1: Convert the string time into pandas Timestamp. Then reconvert into string. Then convert to isoformat. Then compare new isoformat to datetime.datetime object
from datetime import datetime, timedelta
time_to_compare = datetime.utcnow()-timedelta(minutes=60)
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = df['Date'].apply(lambda x: str(x))
df['Date'] = df['Date'].apply(lambda x: datetime.fromisoformat(x))
df= df.loc[df['Date']>=time_to_compare]
Idea 2: Change the datetime.datetime object to a Timestamp
time_to_compare = pd.to_datetime(datetime.utcnow()-timedelta(minutes=60))
df['Date']=pd.to_datetime(df['Date'])
df= df.loc[df['Date']>=time_to_compare]
Ideally I want to filter the dataframe and say if time_to_compare is less than df['Date'] keep said element in the dataframe.
Use to test:
d = {'Date':['2020-03-12T13:59:15.739Z','2020-02-28T22:22:06.827Z']}
df = pd.DataFrame(data=d)
wih Pandas 1.0.1, you can add utc=True while creating time_to_compare like:
time_to_compare = pd.to_datetime(datetime.utcnow()-timedelta(minutes=60), utc=True)
to make it timezone aware
I could not reproduce, because on my Pandas 0.23, df['Date'] = pd.to_datetime(df['Date']) gives a naive pd.Timestamp column wich can be compared to datetime.utcnow()-timedelta(minutes=60) which is by definition naive.
If your system is able to build df['Date'] as a timezone aware column, you should just build a timezone aware time_to_compare with:
time_to_compare = datetime.now(timezone.utc)-timedelta(minutes=60)

Categories