I need to compare the time between two dates in python. One is given as a string and the other in datetime.datetime format. I have tried a few ideas, but the error is always Cannot compare tz-naive and tz-aware datetime-like objects
Idea 1: Convert the string time into pandas Timestamp. Then reconvert into string. Then convert to isoformat. Then compare new isoformat to datetime.datetime object
from datetime import datetime, timedelta
time_to_compare = datetime.utcnow()-timedelta(minutes=60)
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = df['Date'].apply(lambda x: str(x))
df['Date'] = df['Date'].apply(lambda x: datetime.fromisoformat(x))
df= df.loc[df['Date']>=time_to_compare]
Idea 2: Change the datetime.datetime object to a Timestamp
time_to_compare = pd.to_datetime(datetime.utcnow()-timedelta(minutes=60))
df['Date']=pd.to_datetime(df['Date'])
df= df.loc[df['Date']>=time_to_compare]
Ideally I want to filter the dataframe and say if time_to_compare is less than df['Date'] keep said element in the dataframe.
Use to test:
d = {'Date':['2020-03-12T13:59:15.739Z','2020-02-28T22:22:06.827Z']}
df = pd.DataFrame(data=d)
wih Pandas 1.0.1, you can add utc=True while creating time_to_compare like:
time_to_compare = pd.to_datetime(datetime.utcnow()-timedelta(minutes=60), utc=True)
to make it timezone aware
I could not reproduce, because on my Pandas 0.23, df['Date'] = pd.to_datetime(df['Date']) gives a naive pd.Timestamp column wich can be compared to datetime.utcnow()-timedelta(minutes=60) which is by definition naive.
If your system is able to build df['Date'] as a timezone aware column, you should just build a timezone aware time_to_compare with:
time_to_compare = datetime.now(timezone.utc)-timedelta(minutes=60)
Related
I have a dataframe time column with object datatype and would like to convert time format for graph.
import pandas as pd
df = pd.DataFrame({
"time":["12:30:31.320"]
})
df["time"]
df['time'] = pd.to_datetime(df['time'],format='%H:%M:%S.%f').dt.strftime('%H:%M:%S')
df['time'] # Output Name: time, dtype: object
To keep Python's time instance, you can use:
df['time'] = (pd.to_datetime(df['time'],format='%H:%M:%S.%f')
.dt.floor('S') # remove milliseconds
.dt.time) # keep time part
Output:
>>> df['time']
0 12:30:31
Name: time, dtype: object # the dtype is object but...
>>> df.loc[0, 'time']
datetime.time(12, 30, 31) # ...contain a list of time objects
You appear to be attempting to convert the 'time' column back to a string in the format '%H:%M:%S' after converting it to datetime.
You may accomplish this by using the dt.strftime function.
However, after converting back to string, the output of df['time'] is still of object data type.
You may use the astype method to convert the data type of this column to string:
df['time'] = df['time'].astype(str)
I am trying to convert a datetime datatype of the form 24/12/2021 07:24:00 to mm-yyyy format which is 12-2021 with datetime datatype. I need the mm-yyyy in datetime format in order to sort the column 'Month-Year' in a time series. I have tried
import pandas as pd
from datetime import datetime
df = pd.read_excel('abc.xlsx')
df['Month-Year'] = df['Due Date'].map(lambda x: x.strftime('%m-%y'))
df.set_index(['ID', 'Month-Year'], inplace=True)
df.sort_index(inplace=True)
df
The column 'Month-Year' does not sort in time series because 'Month-Year' is of object datatype. How do I please convert 'Month-Year' column to datetime datatype?
I have been able to obtain a solution to the problem.
df['month_year'] = pd.to_datetime(df['Due Date']).dt.to_period('M')
I got this from the link below
https://www.interviewqs.com/ddi-code-snippets/extract-month-year-pandas
df['Month-Year']=pd.to_datetime(df['Month-Year']).dt.normalize()
will convert the Month-Year to datetime64[ns].
Use it before sorting.
i have a variable consisting of 300k records with dates and the date look like
2015-02-21 12:08:51
from that date i want to remove time
type of date variable is pandas.core.series.series
This is the way i tried
from datetime import datetime,date
date_str = textdata['vfreceiveddate']
format_string = "%Y-%m-%d"
then = datetime.strftime(date_str,format_string)
some Random ERROR
In the above code textdata is my datasetname and vfreceived date is a variable consisting of dates
How can i write the code to remove the time from the datetime.
Assuming all your datetime strings are in a similar format then just convert them to datetime using to_datetime and then call the dt.date attribute to get just the date portion:
In [37]:
df = pd.DataFrame({'date':['2015-02-21 12:08:51']})
df
Out[37]:
date
0 2015-02-21 12:08:51
In [39]:
df['date'] = pd.to_datetime(df['date']).dt.date
df
Out[39]:
date
0 2015-02-21
EDIT
If you just want to change the display and not the dtype then you can call dt.normalize:
In[10]:
df['date'] = pd.to_datetime(df['date']).dt.normalize()
df
Out[10]:
date
0 2015-02-21
You can see that the dtype remains as datetime:
In[11]:
df.dtypes
Out[11]:
date datetime64[ns]
dtype: object
You're calling datetime.datetime.strftime, which requires as its first argument a datetime.datetime instance, because it's an unbound method; but you're passing it a string instead of a datetime instance, whence the obvious error.
You can work purely at a string level if that's the result you want; with the data you give as an example, date_str.split()[0] for example would be exactly the 2015-02-21 string you appear to require.
Or, you can use datetime, but then you need to parse the string first, not format it -- hence, strptime, not strftime:
dt = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
date = dt.date()
if it's a datetime.date object you want (but if all you want is the string form of the date, such an approach might be "overkill":-).
simply writing
date.strftime("%d-%m-%Y") will remove the Hour min & sec
I am pulling a time series from a csv file which has dates in "mm/dd/yyyy" format
df = pd.read_csv(lib_file.csv)
df['Date'] = df['Date'].apply(lambda x:datetime.strptime(x,'%m/%d/%Y').strftime('%d/%m/%Y'))
below is the output
I convert dtypes for ['Date'] from object to datetime64
df['Date'] = pd.to_datetime(df['Date'])
but that changes my dates as well
how do I fix it?
Try this:
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
This will infer your dates based on the first non-NaN element which is being correctly parsed in your case and will not infer the format for each and every row of the dataframe.
just using the below code helped
df = pd.read_csv(lib_file.csv)
df['Date'] = pd.to_datetime(df['Date])
Is there a way in pandas to convert my column date which has the following format '1997-01-31' to '199701', without including any information about the day?
I tried solution of the following form:
df['DATE'] = df['DATE'].apply(lambda x: datetime.strptime(x, '%Y%m'))
but I obtain this error : 'ValueError: time data '1997-01-31' does not match format '%Y%m''
Probably the reason is that I am not including the day in the format. Is there a way better to pass from YYYY-MM_DD format to YYYYMM in pandas?
One way is to convert the date to date time and then use strftime. Just a note that you do lose the datetime functionality of the date
df = pd.DataFrame({'date':['1997-01-31' ]})
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y%m')
date
0 199701
Might not need to go through the datetime conversion if the data are sufficiently clean (no incorrect strings like 'foo' or '001231'):
df = pd.DataFrame({'date':['1997-01-31', '1997-03-31', '1997-12-18']})
df['date'] = [''.join(x.split('-')[0:2]) for x in df.date]
# date
#0 199701
#1 199703
#2 199712
Or if you have null values:
df['date'] = df.date.str.replace('-', '').str[0:6]