I have a pandas dataframe with a date column the data type is datetime64[ns]. there are over 1000 observations in the dataframe. I want to transform the following column:
date
2013-05-01
2013-05-01
to
date
05/2013
05/2013
or
date
05-2013
05-2013
EDIT//
this is my sample code as of now
test = pd.DataFrame({'a':['07/2017','07/2017',pd.NaT]})
a
0 2017-07-13
1 2017-07-13
2 NaT
test['a'].apply(lambda x: x if pd.isnull(x) == True else x.strftime('%Y-%m'))
0 2017-07-01
1 2017-07-01
2 NaT
Name: a, dtype: datetime64[ns]
why did only the date change and not the format?
You can convert datetime64 into whatever string format you like using the strftime method. In your case you would apply it like this:
df.date = df.date[df.date.notnull()].map(lambda x: x.strftime('%m/%Y'))
df.date
Out[111]:
0 05/2013
1 05/2013
Related
I have two dataframes that look like:
df1:
Date Multiplier
0 1995-01-01 5.248256
1 1995-02-01 5.262376
2 1995-03-01 5.255998
3 1995-04-01 5.215762
4 1995-05-01 5.207806
df2:
PRICE Date
0 77500 1995-01-01
1 60000 1995-01-01
2 39250 1995-01-01
3 51250 1995-01-01
4 224950 1995-01-01
Both date columns have been made using the pd.to_datetime() method, and they both supposedly have <M8[ns] data types when using df1.Date.dtype and df2.Date.dtype. However when trying to merge the dataframes with pd.merge(df,hpi,how="left",on="Date") I get the error:
ValueError: You are trying to merge on object and datetime64[ns] columns. If you wish to proceed you should use pd.concat
Try to convert the Date column of df1 to a datetime64
Check dtypes first:
>>> df1.dtypes
Date object # <- Not a datetime
Multiplier float64
dtype: object
>>> df2.dtypes
PRICE int64
Date datetime64[ns] # <- Right dtype
dtype: object
Convert and merge:
df1['Date'] = pd.to_datetime(df1['Date'])
out = pd.merge(df1, df2,how='left',on='Date')
I've a dataframe of this format -
var1 date
A 2017/01/01
A 2017/01/02
...
I want the date to be converted into YYYY-MM format but the df['date'].dtype is object.
How can I remove the day part from date while keeping the data type as datetime?
Expected Output -
A - 2017/01
Thanks
You can't have custom representation for the datetime dtype. But you have the following options:
use strings - you might have any representation (as you wish), but all datetime methods and attributes get lost
use datetime, but set the day part to 1 (as #Kopytok) has already shown.
use period dtype, which still allows you to use some date arithmetic
Demo:
In [207]: df
Out[207]:
var1 date
0 A 2018-12-31
1 A 2017-09-07
2 B 2016-02-29
In [208]: df['new'] = df['date'].dt.to_period('M')
In [209]: df
Out[209]:
var1 date new
0 A 2018-12-31 2018-12
1 A 2017-09-07 2017-09
2 B 2016-02-29 2016-02
In [210]: df.dtypes
Out[210]:
var1 object
date datetime64[ns]
new object
dtype: object
In [211]: df['new'] + 8
Out[211]:
0 2019-08
1 2018-05
2 2016-10
Name: new, dtype: object
It is possible replace every date with the first day of month:
pd.to_datetime(d["date"], format="%Y/%m/%d").apply(lambda x: x.replace(day=1))
Result:
0 2017-01-01
1 2017-01-01
I have a dataframe (df) with a column 'Date of birth' column:
Date of birth
0 1957-04-30 00:00:00
1 1966-11-10 00:00:00
2 1966-11-10 00:00:00
3 1962-03-28 00:00:00
4 1958-10-28 00:00:00
5 1958-06-04 00:00:00
How can I reformat the column to a date only format? After I reformat I'm going to work out age from a specific date:
Date of birth
0 1957-04-30
1 1966-11-10
2 1966-11-10
3 1962-03-28
4 1958-10-28
5 1958-06-04
I have tried using
df["Date of birth"] = pd.to_datetime(df['Date of birth'], format='%d%b%Y')
df["Date of birth"] = df["Date of birth"].dt.strftime('%m/%d/%Y')
but with no joy.
After the column becomes a date, use date accessor to access it.
df["Date of birth"] = pd.to_datetime(df['Date of birth']).dt.date
I have a column I_DATE of type string(object) in a dataframe called train as show below.
I_DATE
28-03-2012 2:15:00 PM
28-03-2012 2:17:28 PM
28-03-2012 2:50:50 PM
How to convert I_DATE from string to datetime format & specify the format of input string.
Also, how to filter rows based on a range of dates in pandas?
Use to_datetime. There is no need for a format string since the parser is able to handle it:
In [51]:
pd.to_datetime(df['I_DATE'])
Out[51]:
0 2012-03-28 14:15:00
1 2012-03-28 14:17:28
2 2012-03-28 14:50:50
Name: I_DATE, dtype: datetime64[ns]
To access the date/day/time component use the dt accessor:
In [54]:
df['I_DATE'].dt.date
Out[54]:
0 2012-03-28
1 2012-03-28
2 2012-03-28
dtype: object
In [56]:
df['I_DATE'].dt.time
Out[56]:
0 14:15:00
1 14:17:28
2 14:50:50
dtype: object
You can use strings to filter as an example:
In [59]:
df = pd.DataFrame({'date':pd.date_range(start = dt.datetime(2015,1,1), end = dt.datetime.now())})
df[(df['date'] > '2015-02-04') & (df['date'] < '2015-02-10')]
Out[59]:
date
35 2015-02-05
36 2015-02-06
37 2015-02-07
38 2015-02-08
39 2015-02-09
Approach: 1
Given original string format: 2019/03/04 00:08:48
you can use
updated_df = df['timestamp'].astype('datetime64[ns]')
The result will be in this datetime format: 2019-03-04 00:08:48
Approach: 2
updated_df = df.astype({'timestamp':'datetime64[ns]'})
For a datetime in AM/PM format, the time format is '%I:%M:%S %p'. See all possible format combinations at https://strftime.org/. N.B. If you have time component as in the OP, the conversion will be done much, much faster if you pass the format= (see here for more info).
df['I_DATE'] = pd.to_datetime(df['I_DATE'], format='%d-%m-%Y %I:%M:%S %p')
To filter a datetime using a range, you can use query:
df = pd.DataFrame({'date': pd.date_range('2015-01-01', '2015-04-01')})
df.query("'2015-02-04' < date < '2015-02-10'")
or use between to create a mask and filter.
df[df['date'].between('2015-02-04', '2015-02-10')]
I have a dataframe in pandas called 'munged_data' with two columns 'entry_date' and 'dob' which i have converted to Timestamps using pd.to_timestamp.I am trying to figure out how to calculate ages of people based on the time difference between 'entry_date' and 'dob' and to do this i need to get the difference in days between the two columns ( so that i can then do somehting like round(days/365.25). I do not seem to be able to find a way to do this using a vectorized operation. When I do munged_data.entry_date-munged_data.dob i get the following :
internal_quote_id
2 15685977 days, 23:54:30.457856
3 11651985 days, 23:49:15.359744
4 9491988 days, 23:39:55.621376
7 11907004 days, 0:10:30.196224
9 15282164 days, 23:30:30.196224
15 15282227 days, 23:50:40.261632
However i do not seem to be able to extract the days as an integer so that i can continue with my calculation.
Any help appreciated.
Using the Pandas type Timedelta available since v0.15.0 you also can do:
In[1]: import pandas as pd
In[2]: df = pd.DataFrame([ pd.Timestamp('20150111'),
pd.Timestamp('20150301') ], columns=['date'])
In[3]: df['today'] = pd.Timestamp('20150315')
In[4]: df
Out[4]:
date today
0 2015-01-11 2015-03-15
1 2015-03-01 2015-03-15
In[5]: (df['today'] - df['date']).dt.days
Out[5]:
0 63
1 14
dtype: int64
You need 0.11 for this (0.11rc1 is out, final prob next week)
In [9]: df = DataFrame([ Timestamp('20010101'), Timestamp('20040601') ])
In [10]: df
Out[10]:
0
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [11]: df = DataFrame([ Timestamp('20010101'),
Timestamp('20040601') ],columns=['age'])
In [12]: df
Out[12]:
age
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [13]: df['today'] = Timestamp('20130419')
In [14]: df['diff'] = df['today']-df['age']
In [16]: df['years'] = df['diff'].apply(lambda x: float(x.item().days)/365)
In [17]: df
Out[17]:
age today diff years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00 12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00 8.887671
You need this odd apply at the end because not yet full support for timedelta64[ns] scalars (e.g. like how we use Timestamps now for datetime64[ns], coming in 0.12)
Not sure if you still need it, but in Pandas 0.14 i usually use .astype('timedelta64[X]') method
http://pandas.pydata.org/pandas-docs/stable/timeseries.html (frequency conversion)
df = pd.DataFrame([ pd.Timestamp('20010101'), pd.Timestamp('20040605') ])
df.ix[0]-df.ix[1]
Returns:
0 -1251 days
dtype: timedelta64[ns]
(df.ix[0]-df.ix[1]).astype('timedelta64[Y]')
Returns:
0 -4
dtype: float64
Hope that will help
Let's specify that you have a pandas series named time_difference which has type
numpy.timedelta64[ns]
One way of extracting just the day (or whatever desired attribute) is the following:
just_day = time_difference.apply(lambda x: pd.tslib.Timedelta(x).days)
This function is used because the numpy.timedelta64 object does not have a 'days' attribute.
To convert any type of data into days just use pd.Timedelta().days:
pd.Timedelta(1985, unit='Y').days
84494