Efficient way to convert datetime object to string in Python [duplicate] - python

This question already has answers here:
datetime to string with series in pandas
(3 answers)
Closed 2 years ago.
I'm converting a datetime column (referred to as DATE) in my Pandas dataframe df to a string of the form 'Ymd' (e.g. '20191201' for December 1st 2019). My current way of doing that is:
import datetime as dt
df['DATE'] = df['DATE'].apply(lambda x: dt.datetime.strftime(x, '%Y%m%d'))
But this is surprisingly inefficient and slow when run on large dataframes with millions of rows. Is there a more efficient alternative I am not seeing? That would be extremely helpful. Thanks.

In pandas you do not need apply
df['Date']=df['DATE'].dt.strftime('%Y%m%d')

Related

pandas calculate time difference to now [duplicate]

This question already has answers here:
Compare timestamp with datetime
(2 answers)
Timestamp String in Zulu Format To Datetime
(1 answer)
Closed 9 months ago.
I have a pandas dataframe df with a time column containing datetime values. I now want to filter the dataframe to show rows with time values lying in the next 15 minutes.
So first I try to simply subtract the current time from the datetimes.
df.Time = pd.to_datetime(df.Time)
print(df.Time - pd.to_datetime("today"))
But got this error:
TypeError: Cannot subtract tz-naive and tz-aware datetime-like objects
I tried to remove the tz-awareness with .replace(tzinfo=None) but it was not working. In the end I am looking for a command like this (assuming the difference of two datetimes in is minutes):
df.loc[df.Time - pd.to_datetime("today") < 15]

Remove "days 00:00:00"from dataframe [duplicate]

This question already has answers here:
Pandas Timedelta in Days
(5 answers)
Closed 3 years ago.
So, I have a pandas dataframe with a lot of variables including start/end date of loans.
I subtract these two in order to get their difference in days.
The result I get is of the type i.e. 349 days 00:00:00.
How can I keep only for example the number 349 from this column?
Check this format,
df['date'] = pd.to_timedelta(df['date'], errors='coerce').days
also, check .normalize() function in pandas.

numpy datetime64 from day, month, year series [duplicate]

This question already has answers here:
Vectorized year/month/day operations with NumPy datetime64
(3 answers)
Closed 3 years ago.
I have dates are three numpy arrays containing each all the days, months or years separately.
From these date-components I would like to construct a numpy.datetime64 array:
date = np.datetime64(days, months, years)
Of course the above does not work. The numpy documentation is silent on how to parse dates from anything other than strings.
I am sure somebody has already solved this riddle before...
First Convert to date time like
from datetime import datetime
dt = datetime(year, month, date)
then
date = np.datetime64(dt)

Manipulating Series in Dataframe using Pandas [duplicate]

This question already has answers here:
Pandas filter dataframe rows with a specific year
(2 answers)
Closed 3 years ago.
I have a date column in a data frame that looks like this:
(Year-Month-Day)
2017-09-21
2018-11-25
I am trying to create a function that considers only the year, I have been trying the following.
df[df['DateColumn'].str[:3]=='2017']
But I am receiving this error:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
How can I only consider the first four characters of the date in a function? Thanks.
I think you are looking for:
df['year'] = [d.year for d in df['DateColumn']]
This works only if the elements of the column are pandas.tslib.Timestamp. If not then :
df['DateColumn'] = pd.to_datetime(df['DateColumn'])
df['year'] = [d.year for d in df['DateColumn']]
UPDATE: Use this instead:
df.loc[pd.to_datetime(df['DateColumn']).dt.year == 2017]
According to this:
https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#dt-accessor
If you have a Series in a DateTime format, you should be able to use the dt accessor.
So you might be able to do something like this:
df[df.dt.year == 2017]
Try:
df = pd.to_datetime(df.col).apply(lambda x: x.year)
This converts col into datetime format, then extracts year from it to make it a series.

day counter using pandas [duplicate]

This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Closed 4 years ago.
if I have a data set of time series and I want to estimate the number of the day of a groupby time series per each day as seen in the figure and act as a counter :
nothing special in my code yet, it is just reading the data and convert time and day into
import pandas as pd
df = pd.read_csv('*file location and name*',sep=",")
df.head()
df['Date'] =pd.to_datetime(df['Date']+" "+df['Time'])
df.set_index('Date', inplace=True)
See if answers your query:
df['dayOfMonth']= df.groupby('day').cumcount() + 1

Categories