How to compare pandas date column with hardcoded date - python

This is my code:
print (df.loc[df.DATE == '2016-02-05'])
I am trying to compare this date with date of pandas. It returns empty dataframe. What should I do ?
Edit: Original dataframe:

Just convert your string to datetime (I suppose that you dataframe also contains datetimes, rather than strings) and do the comparison you wanted to do:
from datetime import datetime
if __name__ == "__main__":
t = datetime.strptime('2016-02-05', '%Y-%m-%d')
print(t)
Hope the answer will help, feel free to ask questions.
If your DATE df column is not datetimes, but just strings, convert them to datetimes the same way.

You need to convert the string to a datetime object as well.
print(df)
datetime_str = '2015/02/04'
print("({}){}".format(type(datetime_str), datetime_str))
datetime_object = datetime.strptime(datetime_str, '%Y/%m/%d')
print("({}){}".format(type(datetime_object), datetime_object))
value = df.loc[df.DATE == datetime_object]
print("value =", value)
OUTPUT:
year month day DATE
0 2015 2 4 2015-02-04
1 2016 3 5 2016-03-05
(<class 'str'>)2015/02/04
(<class 'datetime.datetime'>)2015-02-04 00:00:00
value = year month day DATE
0 2015 2 4 2015-02-04

Related

Pandas DateTime for Month

I have month column with values formatted as: 2019M01
To find the seasonality I need this formatted into Pandas DateTime format.
How to format 2019M01 into datetime so that I can use it for my seasonality plotting?
Thanks.
Use to_datetime with format parameter:
print (df)
date
0 2019M01
1 2019M03
2 2019M04
df['date'] = pd.to_datetime(df['date'], format='%YM%m')
print (df)
date
0 2019-01-01
1 2019-03-01
2 2019-04-01

Split Date Time string (not in usual format) and pull out month

I have a dataframe that has a date time string but is not in traditional date time format. I would like to separate out the date from the time into two separate columns. And then eventually also separate out the month.
This is what the date/time string looks like: 2019-03-20T16:55:52.981-06:00
>>> df.head()
Date Score
2019-03-20T16:55:52.981-06:00 10
2019-03-07T06:16:52.174-07:00 9
2019-06-17T04:32:09.749-06:003 1
I tried this but got a type error:
df['Month'] = pd.DatetimeIndex(df['Date']).month
This can be done just using pandas itself. You can first convert the Date column to datetime by passing utc = True:
df['Date'] = pd.to_datetime(df['Date'], utc = True)
And then just extract the month using dt.month:
df['Month'] = df['Date'].dt.month
Output:
Date Score Month
0 2019-03-20 22:55:52.981000+00:00 10 3
1 2019-03-07 13:16:52.174000+00:00 9 3
2 2019-06-17 10:32:09.749000+00:00 1 6
From the documentation of pd.to_datetime you can see a parameter:
utc : boolean, default None
Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).

calculate date difference between today's date and pandas date series

Want to calculate the difference of days between pandas date series -
0 2013-02-16
1 2013-01-29
2 2013-02-21
3 2013-02-22
4 2013-03-01
5 2013-03-14
6 2013-03-18
7 2013-03-21
and today's date.
I tried but could not come up with logical solution.
Please help me with the code. Actually I am new to python and there are lot of syntactical errors happening while applying any function.
You could do something like
# generate time data
data = pd.to_datetime(pd.Series(["2018-09-1", "2019-01-25", "2018-10-10"]))
pd.to_datetime("now") > data
returns:
0 False
1 True
2 False
you could then use that to select the data
data[pd.to_datetime("now") > data]
Hope it helps.
Edit: I misread it but you can easily alter this example to calculate the difference:
data - pd.to_datetime("now")
returns:
0 -122 days +13:10:37.489823
1 24 days 13:10:37.489823
2 -83 days +13:10:37.489823
dtype: timedelta64[ns]
You can try as Follows:
>>> from datetime import datetime
>>> df
col1
0 2013-02-16
1 2013-01-29
2 2013-02-21
3 2013-02-22
4 2013-03-01
5 2013-03-14
6 2013-03-18
7 2013-03-21
Make Sure to convert the column names to_datetime:
>>> df['col1'] = pd.to_datetime(df['col1'], infer_datetime_format=True)
set the current datetime in order to Further get the diffrence:
>>> curr_time = pd.to_datetime("now")
Now get the Difference as follows:
>>> df['col1'] - curr_time
0 -2145 days +07:48:48.736939
1 -2163 days +07:48:48.736939
2 -2140 days +07:48:48.736939
3 -2139 days +07:48:48.736939
4 -2132 days +07:48:48.736939
5 -2119 days +07:48:48.736939
6 -2115 days +07:48:48.736939
7 -2112 days +07:48:48.736939
Name: col1, dtype: timedelta64[ns]
With numpy you can solve it like difference-two-dates-days-weeks-months-years-pandas-python-2
. bottom line
df['diff_days'] = df['First dates column'] - df['Second Date column']
# for days use 'D' for weeks use 'W', for month use 'M' and for years use 'Y'
df['diff_days']=df['diff_days']/np.timedelta64(1,'D')
print(df)
if you want days as int and not as float use
df['diff_days']=df['diff_days']//np.timedelta64(1,'D')
From the pandas docs under Converting To Timestamps you will find:
"Converting to Timestamps To convert a Series or list-like object of date-like objects e.g. strings, epochs, or a mixture, you can use the to_datetime function"
I haven't used pandas before but this suggests your pandas date series (a list-like object) is iterable and each element of this series is an instance of a class which has a to_datetime function.
Assuming my assumptions are correct, the following function would take such a list and return a list of timedeltas' (a datetime object representing the difference between two date time objects).
from datetime import datetime
def convert(pandas_series):
# get the current date
now = datetime.now()
# Use a list comprehension and the pandas to_datetime method to calculate timedeltas.
return [now - pandas_element.to_datetime() for pandas_series]
# assuming 'some_pandas_series' is a list-like pandas series object
list_of_timedeltas = convert(some_pandas_series)

Parse DateTime from Object with Pandas

I would like to parse the year column to datetime.
name id nametype recclass mass (g) fall year
0 Aachen 1 Valid L5 21.0 Fell 01/01/1880 12:00:00 AM
... reclat reclong GeoLocation
... 50.77500 6.08333 (50.775000, 6.083330)
df['year'].apply(dateutil.parser.parse) and that parses as 1880-01-01 00:00:00 but i can use that for selecting dates.
Do anyone have a tip for me?
I think need to_datetime:
df = pd.DataFrame({'year':['01/01/1880 12:00:00 AM']})
df['year'] = pd.to_datetime(df['year'])
print (df)
year
0 1880-01-01
Since you have converted the year column to datetime, you can use:
df['year'] = df['year'].dt.date
# year
# 1880-01-01
However, for datetime casting, note that pandas has an inbuilt datetime parser that is easy to use than dateutil IMO.

Datetime and Timestamp equality in Python and Pandas

I've been playing around with datetimes and timestamps, and I've come across something that I can't understand.
import pandas as pd
import datetime
year_month = pd.DataFrame({'year':[2001,2002,2003], 'month':[1,2,3]})
year_month['date'] = [datetime.datetime.strptime(str(y) + str(m) + '1', '%Y%m%d') for y,m in zip(year_month['year'], year_month['month'])]
>>> year_month
month year date
0 1 2001 2001-01-01
1 2 2002 2002-02-01
2 3 2003 2003-03-01
I think the unique function is doing something to the timestamps that is changing them somehow:
first_date = year_month['date'].unique()[0]
>>> first_date == year_month['date'][0]
False
In fact:
>>> year_month['date'].unique()
array(['2000-12-31T16:00:00.000000000-0800',
'2002-01-31T16:00:00.000000000-0800',
'2003-02-28T16:00:00.000000000-0800'], dtype='datetime64[ns]')
My suspicions are that there is some sort of timezone difference underneath the functions, but I can't figure it out.
EDIT
I just checked the python commands list(set()) as an alternative to the unique function, and that works. This must be a quirk of the unique() function.
You have to convert to datetime64 to compare:
In [12]:
first_date == year_month['date'][0].to_datetime64()
Out[12]:
True
This is because unique has converted the dtype to datetime64:
In [6]:
first_date = year_month['date'].unique()[0]
first_date
Out[6]:
numpy.datetime64('2001-01-01T00:00:00.000000000+0000')
I think is because unique returns a np array and there is no dtype that numpy understands TimeStamp currently: Converting between datetime, Timestamp and datetime64

Categories