id
date
0
2021-18-01
1
2021-17-01
How can I keep rows if the date column has a value not equal to today's date (17th Jan)?
df[df.date != datetime.datetime.today().date()]
Expected Output
id
date
0
2021-18-01
try:
df.date = pd.to_datetime(df.date, format="%Y-%d-%m")
df[df.date!=str(datetime.datetime.today().date())]
date
1 2021-01-17
Setup:
Here we populate a dataframe and convert the data to correct datatypes
df = pd.DataFrame([dict(id=0, date='2021-01-17'),
dict(id=1, date='2021-01-18'),
dict(id=2, date='2021-01-19')])
df = df.set_index('id')
df.date = pd.to_datetime(df.date)
The result would look like this:
id
date
0
2021-01-17
1
2021-01-18
2
2021-01-19
Filtering:
df.loc[df.date.dt.date != pd.Timestamp.now().date()]
The result would look like this (In my timezone it is January, 18th already):
id
date
0
2021-01-17
2
2021-01-19
Explanation
We use LocIndexer .loc accessor to filter the dataframe via an array of booleans.
In order to make the comparison correct we take the date() part from the current timestamp on the right hand side and use the DateTimeProperties .dt accessor to use date property of the underlying Datetime object for the left hand side.
Related
I have a Pandas dataframe with a "Datetime" column that contains a, well, a datetime :)
ColA Datetime ColB
---- ------------------- ----
1 2021-01-01 05:02:22 SomeVal
2 2021-01-01 01:01:22 SomeOtherVal
I want to create a new Date column that has only two rules:
1. If the "time" element of datetime is between 00:00:00 and 02:00:00 then make Date the "date" element of Datetime - 1 (the previous day)
2. Otherwise make Date the "date" element of Datetime as is
To achieve this, I'm going to have to run a check on the Datetime column. How would that look? Also, bonus points if I don't need to iterate the dataframe in order to achieve this.
Convert values to datetimes and if time is less like 02:00:00 subtract one day in Series.mask:
from datetime import time
df['Datetime'] = pd.to_datetime(df['Datetime'])
df['Datetime'] = df['Datetime'].mask(df['Datetime'].dt.time <= time(2, 0, 0),
df['Datetime'] - pd.Timedelta('1 day'))
print (df)
ColA Datetime ColB
0 1 2021-01-01 05:02:22 SomeVal
1 2 2020-12-31 01:01:22 SomeOtherVal
I would like to filter for customer_id'sthat first appear after a certain date in this case 2019-01-10 and then create a new df with a list of new customers
df
date customer_id
2019-01-01 429492
2019-01-01 344343
2019-01-01 949222
2019-01-10 429492
2019-01-10 344343
2019-01-10 129292
Output df
customer_id
129292
This is what I have tried so far but this gives me also customer_id's that were active before 10th January 2019
s = df.loc[df["date"]>="2019-01-10", "customer_id"]
df_new = df[df["customer_id"].isin(s)]
df_new
You can use boolean indexing with filtering with Series.isin:
df["date"] = pd.to_datetime(df["date"])
mask1 = df["date"]>="2019-01-10"
mask2 = df["customer_id"].isin(df.loc[~mask1,"customer_id"])
df = df.loc[mask1 & ~mask2, ['customer_id']]
print (df)
customer_id
5 129292
df['date'] = pd.to_datetime(df['date'])
cutoff = pd.to_datetime('2019-01-10')
mask = df['date'] >= cutoff
customers_before = df.loc[~mask, 'customer_id'].unique().tolist()
customers_after = df.loc[mask, 'customer_id'].unique().tolist()
result = set(customers_after) - set(customers_before)
"then create a new df with a list of new customers" so in this case your output is null, because 2019-01-10 is last date, there is no new customers after this date
but if you want to get list of customers after certain date or equal than :
df=pd.DataFrame({
'date':['2019-01-01','2019-01-01','2019-01-01',
'2019-01-10','2019-01-10','2019-01-10'],
'customer_id':[429492,344343,949222,429492,344343,129292]
})
certain_date=pd.to_datetime('2019-01-10')
df.date=pd.to_datetime(df.date)
df=df[
df.date>=certain_date
]
print(df)
date customer_id
3 2019-01-10 429492
4 2019-01-10 344343
5 2019-01-10 129292
If your 'date' column has datetime objects you just have to do:
df_new = df[df['date'] >= datetime(2019, 1, 10)]['customer_id']
If your 'date' column doesn't contain datetime objects, you should convert it first it by using to_datetime method:
df['date'] = pd.to_datetime(df['date'])
And then apply the methodology described above.
I have a dataframe that has a date time string but is not in traditional date time format. I would like to separate out the date from the time into two separate columns. And then eventually also separate out the month.
This is what the date/time string looks like: 2019-03-20T16:55:52.981-06:00
>>> df.head()
Date Score
2019-03-20T16:55:52.981-06:00 10
2019-03-07T06:16:52.174-07:00 9
2019-06-17T04:32:09.749-06:003 1
I tried this but got a type error:
df['Month'] = pd.DatetimeIndex(df['Date']).month
This can be done just using pandas itself. You can first convert the Date column to datetime by passing utc = True:
df['Date'] = pd.to_datetime(df['Date'], utc = True)
And then just extract the month using dt.month:
df['Month'] = df['Date'].dt.month
Output:
Date Score Month
0 2019-03-20 22:55:52.981000+00:00 10 3
1 2019-03-07 13:16:52.174000+00:00 9 3
2 2019-06-17 10:32:09.749000+00:00 1 6
From the documentation of pd.to_datetime you can see a parameter:
utc : boolean, default None
Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).
what is the efficient way to convert the column values into dates "DD-MM-YYYY" when the values given like "Feb-15" which needs to be "01-02-2015". if it's "Dec-46" it must return "01-12-1946".
You can pass the format '%b-%y' to to_datetime:
In[42]:
df = pd.DataFrame({'date':["Feb-15","Dec-46"]})
df['new_date'] = pd.to_datetime(df['date'], format='%b-%y')
df
Out[42]:
date new_date
0 Feb-15 2015-02-01
1 Dec-46 2046-12-01
Note that the new dtype is datetime64, you cannot control the display output, if you insist on DD-MM-YYYY then you would have to convert to a string using dt.strftime:
In[43]:
df['str_date'] = df['new_date'].dt.strftime('%d-%m-%Y')
df
Out[43]:
date new_date str_date
0 Feb-15 2015-02-01 01-02-2015
1 Dec-46 2046-12-01 01-12-2046
but then you have strings which is not that useful if you need to perform arithmetic operations or filtering
EDIT
You cannot store dates earlier than 1970 so '01-01-1946' is not a valid datetime that can be represented by datetime64
I have a Pandas Dataframe df:
a date
1 2014-06-29 00:00:00
df.types return:
a object
date object
I want convert column data to data without time but:
df['date']=df['date'].astype('datetime64[s]')
return:
a date
1 2014-06-28 22:00:00
df.types return:
a object
date datetime64[ns]
But value is wrong.
I'd have:
a date
1 2014-06-29
or:
a date
1 2014-06-29 00:00:00
I would start by putting your dates in pd.datetime:
df['date'] = pd.to_datetime(df.date)
Now, you can see that the time component is still there:
df.date.values
array(['2014-06-28T19:00:00.000000000-0500'], dtype='datetime64[ns]')
If you are ok having a date object again, you want:
df['date'] = [x.strftime("%y-%m-%d") for x in df.date]
Here would be ending with a datetime:
df['date'] = [x.date() for x in df.date]
df.date
datetime.date(2014, 6, 29)
Here you go. Just use this pattern:
df.to_datetime().date()