How to make all non-date values null in Pandas

How to make all non-date values null in Pandas - python

I have an excel doc where the users put dates and strings in the same column. I want to make every string object null and leave all the dates. How do I do this in pandas? Thanks.

An easy way to convert dates in a DataFrame is with pandas.DataFrame.convert_objects, as mentioned by #Jeff, and it also handles numbers and timedeltas. Here is an example of using it:
# contents of Sheet1 of test.xlsx
x y date1 z date2 date3
1 fum 6/1/2016 7 9/1/2015 string3
2 fo 6/2/2016 alpha string0 10/1/2016
3 fi 6/3/2016 9 9/3/2015 10/2/2016
4 fee 6/4/2016 10 string1 string4
5 dumbledum 6/5/2016 beta string2 10/3/2015
6 dumbledee 6/6/2016 12 9/4/2015 string5
import pandas as pd
xl = pd.ExcelFile('test.xlsx')
df = xl.parse("Sheet1")
df1 = df.convert_objects(convert_dates='coerce')
# 'coerce' required for conversion to NaT on error
df1
Out[7]:
x y date1 z date2 date3
0 1 fum 2016-06-01 7 2015-09-01 NaT
1 2 fo 2016-06-02 alpha NaT 2016-10-01
2 3 fi 2016-06-03 9 2015-09-03 2016-10-02
3 4 fee 2016-06-04 10 NaT NaT
4 5 dumbledum 2016-06-05 beta NaT 2015-10-03
5 6 dumbledee 2016-06-06 12 2015-09-04 NaT
Individual columns in a DataFrame can be converted using pandas.to_datetime, as pointed out by #Jeff, and with pandas.Series.map, however neither are done in place. For example, with pandas.to_datetime:
import pandas as pd
xl2 = pd.ExcelFile('test.xlsx')
df2 = xl2.parse("Sheet1")
for col in ['date1', 'date2', 'date3']:
df2[col] = pd.to_datetime(df2[col],coerce=True, infer_datetime_format=True)
df2
Out[8]:
x y date1 z date2 date3
0 1 fum 2016-06-01 7 2015-09-01 NaT
1 2 fo 2016-06-02 alpha NaT 2016-10-01
2 3 fi 2016-06-03 9 2015-09-03 2016-10-02
3 4 fee 2016-06-04 10 NaT NaT
4 5 dumbledum 2016-06-05 beta NaT 2015-10-03
5 6 dumbledee 2016-06-06 12 2015-09-04 NaT
And using pandas.Series.map:
import pandas as pd
import datetime
xl3 = pd.ExcelFile('test.xlsx')
df3 = xl3.parse("Sheet1")
for col in ['date1', 'date2', 'date3']:
df3[col] = df3[col].map(lambda x: x if isinstance(x,(datetime.datetime)) else None)
df3
Out[9]:
x y date1 z date2 date3
0 1 fum 2016-06-01 7 2015-09-01 NaT
1 2 fo 2016-06-02 alpha NaT 2016-10-01
2 3 fi 2016-06-03 9 2015-09-03 2016-10-02
3 4 fee 2016-06-04 10 NaT NaT
4 5 dumbledum 2016-06-05 beta NaT 2015-10-03
5 6 dumbledee 2016-06-06 12 2015-09-04 NaT
An upfront way to convert dates in an excel doc is while parsing its sheets. This can be done using pandas.ExcelFile.parse's converters option with a function derived from pandas.to_datetime as the functions in the converters dict and enabling it with coerce=True to force errors to NaT. For example:
def converter(x):
return pd.to_datetime(x,coerce=True,infer_datetime_format=True)
# the following also works for this example
# return pd.to_datetime(x,format='%d/%m/%Y',coerce=True)
converters={'date1': converter,'date2': converter, 'date3': converter}
xl4 = pd.ExcelFile('test.xlsx')
df4 = xl4.parse("Sheet1",converters=converters)
df4
Out[10]:
x y date1 z date2 date3
0 1 fum 2016-06-01 7 2015-09-01 NaT
1 2 fo 2016-06-02 alpha NaT 2016-10-01
2 3 fi 2016-06-03 9 2015-09-03 2016-10-02
3 4 fee 2016-06-04 10 NaT NaT
4 5 dumbledum 2016-06-05 beta NaT 2015-10-03
5 6 dumbledee 2016-06-06 12 2015-09-04 NaT

Related

Pandas: Add and Preserve time component when input file has only date in dataframe

Scenario:
The input file which I read in Pandas has column with sparsely populated date in String/Object format.
I need to add time component, for ex.
2021-08-27 is my input in String format, and 2021-08-28 00:00:00 should by output in datetime64[ns] format
My Trials:
df = pd.read_parquet("sample.parquet")
df.head()
a
b
c
dttime_col
1
1
2
2021-07-12 00:00:00
0
1
0
NaN
1
2
0
NaN
2
1
1
2021-02-04 00:00:00
3
5
2
NaN
df["dttime_col"] = pd.to_datetime(df["dttime_col"])
df["dttime_col"]
Out[16]:
0 2021-07-12
1 NaT
2 NaT
3 2021-02-04
4 NaT
5 2021-05-22
6 NaT
7 2021-10-06
8 2021-01-31
9 NaT
Name: dttime_col, dtype: datetime64[ns]
But as you see here, there is not time component. I tried adding format as %Y-%m-%d %H:%M:%S but still the output is same. Further more, I tried adding Time component as default as a String type.
df["dttime_col"] = df["dttime_col"].dt.strftime("%Y-%m-%d 00:00:00").replace('NaT', np.nan)
Out[17]:
0 2021-07-12 00:00:00
1 NaN
2 NaN
3 2021-02-04 00:00:00
4 NaN
5 2021-05-22 00:00:00
6 NaN
7 2021-10-06 00:00:00
8 2021-01-31 00:00:00
9 NaN
Name: dttime_col, dtype: object
Now this gives me time next to date, but in String/Object format. The moment I convert it back to datetime format, all the HH:MM:SS are removed.
df["dttime_col"].apply(lambda x: datetime.datetime.strptime(x, "%Y-%m-%d %H:%M:%S") if not isinstance(x, float) else np.nan)
Out[24]:
0 2021-07-12
1 NaT
2 NaT
3 2021-02-04
4 NaT
5 2021-05-22
6 NaT
7 2021-10-06
8 2021-01-31
9 NaT
Name: dttime_col, dtype: datetime64[ns]
It feels like going in circles all again.
Output I expect:
0 2021-07-12 00:00:00
1 NaN
2 NaN
3 2021-02-04 00:00:00
4 NaN
5 2021-05-22 00:00:00
6 NaN
7 2021-10-06 00:00:00
8 2021-01-31 00:00:00
9 NaN
Name: dttime_col, dtype: datetime64[ns]
EDIT 1:
Providing output as asked by #mozway
df["dttime_col"].dt.second
Out[27]:
0 0.0
1 NaN
2 NaN
3 0.0
4 NaN
5 0.0
6 NaN
7 0.0
8 0.0
9 NaN
Name: dttime_col, dtype: float64

Extract year from pandas datetime column as numeric value with NaN for empty cells instead of NaT

I want to extract the year from a datetime column into a new 'yyyy'-column AND I want the missing values (NaT) to be displayed as 'NaN', so the datetime-dtype of the new column should be changed I guess but there I'm stuck..
Initial df:
Date ID
0 2016-01-01 12
1 2015-01-01 96
2 NaT 20
3 2018-01-01 73
4 2017-01-01 84
5 NaT 26
6 2013-01-01 87
7 2016-01-01 64
8 2019-01-01 11
9 2014-01-01 34
Desired df:
Date ID yyyy
0 2016-01-01 12 2016
1 2015-01-01 96 2015
2 NaT 20 NaN
3 2018-01-01 73 2018
4 2017-01-01 84 2017
5 NaT 26 NaN
6 2013-01-01 87 2013
7 2016-01-01 64 2016
8 2019-01-01 11 2019
9 2014-01-01 34 2014
Code:
import pandas as pd import numpy as np  # example df
df = pd.DataFrame({"ID": [12,96,20,73,84,26,87,64,11,34], "Date": ['2016-01-01', '2015-01-01', np.nan, '2018-01-01', '2017-01-01', np.nan, '2013-01-01', '2016-01-01', '2019-01-01', '2014-01-01']})  df.ID = pd.to_numeric(df.ID)
 df.Date = pd.to_datetime(df.Date) print(df)
#extraction of year from date
df['yyyy'] = pd.to_datetime(df.Date).dt.strftime('%Y')  #Try to set NaT to NaN or datetime to numeric, PROBLEM: empty cells keep 'NaT'
df.loc[(df['yyyy'].isna()), 'yyyy'] = np.nan   #(try1)
df.yyyy = df.Date.astype(float)  #(try2)
df.yyyy = pd.to_numeric(df.Date)  #(try3)
print(df)

Use Series.dt.year with converting to integers with Int64:
df.Date = pd.to_datetime(df.Date)
df['yyyy'] = df.Date.dt.year.astype('Int64')
print (df)
ID Date yyyy
0 12 2016-01-01 2016
1 96 2015-01-01 2015
2 20 NaT <NA>
3 73 2018-01-01 2018
4 84 2017-01-01 2017
5 26 NaT <NA>
6 87 2013-01-01 2013
7 64 2016-01-01 2016
8 11 2019-01-01 2019
9 34 2014-01-01 2014
With no convert floats to integers:
df['yyyy'] = df.Date.dt.year
print (df)
ID Date yyyy
0 12 2016-01-01 2016.0
1 96 2015-01-01 2015.0
2 20 NaT NaN
3 73 2018-01-01 2018.0
4 84 2017-01-01 2017.0
5 26 NaT NaN
6 87 2013-01-01 2013.0
7 64 2016-01-01 2016.0
8 11 2019-01-01 2019.0
9 34 2014-01-01 2014.0
Your solution convert NaT to strings NaT, so is possible use replace.
Btw, in last versions of pandas replace is not necessary, it working correctly.
df['yyyy'] = pd.to_datetime(df.Date).dt.strftime('%Y').replace('NaT', np.nan)

Isn't it:
df['yyyy'] = df.Date.dt.year
Output:
Date ID yyyy
0 2016-01-01 12 2016.0
1 2015-01-01 96 2015.0
2 NaT 20 NaN
3 2018-01-01 73 2018.0
4 2017-01-01 84 2017.0
5 NaT 26 NaN
6 2013-01-01 87 2013.0
7 2016-01-01 64 2016.0
8 2019-01-01 11 2019.0
9 2014-01-01 34 2014.0
For pandas 0.24.2+, you can use Int64 data type for nullable integers:
df['yyyy'] = df.Date.dt.year.astype('Int64')
which gives:
Date ID yyyy
0 2016-01-01 12 2016
1 2015-01-01 96 2015
2 NaT 20 <NA>
3 2018-01-01 73 2018
4 2017-01-01 84 2017
5 NaT 26 <NA>
6 2013-01-01 87 2013
7 2016-01-01 64 2016
8 2019-01-01 11 2019
9 2014-01-01 34 2014

Slice Pandas DataFrame to show all records of a particular day

I want to return a dataframe that contains only the records of a particular day given a datetime value.
Code below is working:
def dataframeByDay(datetimeValue):
cYear = datetimeValue.year
cMonth = datetimeValue.month
cDay = datetimeValue.day
crit = (df.index.year == cYear) & (df.index.month == cMonth) & (df.index.day == cDay)
return df.loc[crit]
Is there a better (faster) way to accomplish this?

Since the index is a DatetimeIndex you can use strings to slice it.
Consider the dataframe df
np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(10, size=(10, 3)),
pd.date_range('2016-03-31', periods=10, freq='12H'),
list('ABC'))
df
A B C
2016-03-31 00:00:00 0 2 7
2016-03-31 12:00:00 3 8 7
2016-04-01 00:00:00 0 6 8
2016-04-01 12:00:00 6 0 2
2016-04-02 00:00:00 0 4 9
2016-04-02 12:00:00 7 3 2
2016-04-03 00:00:00 4 3 3
2016-04-03 12:00:00 6 7 7
2016-04-04 00:00:00 4 5 3
2016-04-04 12:00:00 7 5 9
Not What You Want
You don't want to use the Timestamp
df.loc[pd.to_datetime('2016-04-01')]
A 0
B 6
C 8
Name: 2016-04-01 00:00:00, dtype: int64
Instead
You can use this technique:
df.loc['{:%Y-%m-%d}'.format(pd.to_datetime('2016-04-01'))]
A B C
2016-04-01 00:00:00 7 3 1
2016-04-01 12:00:00 0 6 6
Your Function
def dataframeByDay(datetimeValue):
return df.loc['{:%Y-%m-%d}'.format(datetimeValue)]
dataframeByDay(pd.to_datetime('2016-04-01'))
A B C
2016-04-01 00:00:00 7 3 1
2016-04-01 12:00:00 0 6 6
Here are some alternative approaches
def dataframeByDay2(datetimeValue):
dtype = 'datetime64[D]'
d = np.array('{:%Y-%m-%d}'.format(datetimeValue), dtype)
return df[df.index.values.astype(dtype) == d]
def dataframeByDay3(datetimeValue):
return df[df.index.floor('D') == datetimeValue.floor('D')]

Pandas: Subtracting two date columns and the result being an integer

I have two columns in a Pandas data frame that are dates.
I am looking to subtract one column from another and the result being the difference in numbers of days as an integer.
A peek at the data:
df_test.head(10)
Out[20]:
First_Date Second Date
0 2016-02-09 2015-11-19
1 2016-01-06 2015-11-30
2 NaT 2015-12-04
3 2016-01-06 2015-12-08
4 NaT 2015-12-09
5 2016-01-07 2015-12-11
6 NaT 2015-12-12
7 NaT 2015-12-14
8 2016-01-06 2015-12-14
9 NaT 2015-12-15
I have created a new column successfully with the difference:
df_test['Difference'] = df_test['First_Date'].sub(df_test['Second Date'], axis=0)
df_test.head()
Out[22]:
First_Date Second Date Difference
0 2016-02-09 2015-11-19 82 days
1 2016-01-06 2015-11-30 37 days
2 NaT 2015-12-04 NaT
3 2016-01-06 2015-12-08 29 days
4 NaT 2015-12-09 NaT
However I am unable to get a numeric version of the result:
df_test['Difference'] = df_test[['Difference']].apply(pd.to_numeric)
df_test.head()
Out[25]:
First_Date Second Date Difference
0 2016-02-09 2015-11-19 7.084800e+15
1 2016-01-06 2015-11-30 3.196800e+15
2 NaT 2015-12-04 NaN
3 2016-01-06 2015-12-08 2.505600e+15
4 NaT 2015-12-09 NaN

How about:
df_test['Difference'] = (df_test['First_Date'] - df_test['Second Date']).dt.days
This will return difference as int if there are no missing values(NaT) and float if there is.
Pandas have a rich documentation on Time series / date functionality and Time deltas

You can divide column of dtype timedelta by np.timedelta64(1, 'D'), but output is not int, but float, because NaN values:
df_test['Difference'] = df_test['Difference'] / np.timedelta64(1, 'D')
print (df_test)
First_Date Second Date Difference
0 2016-02-09 2015-11-19 82.0
1 2016-01-06 2015-11-30 37.0
2 NaT 2015-12-04 NaN
3 2016-01-06 2015-12-08 29.0
4 NaT 2015-12-09 NaN
5 2016-01-07 2015-12-11 27.0
6 NaT 2015-12-12 NaN
7 NaT 2015-12-14 NaN
8 2016-01-06 2015-12-14 23.0
9 NaT 2015-12-15 NaN
Frequency conversion.

You can use datetime module to help here. Also, as a side note, a simple date subtraction should work as below:
import datetime as dt
import numpy as np
import pandas as pd
#Assume we have df_test:
In [222]: df_test
Out[222]:
first_date second_date
0 2016-01-31 2015-11-19
1 2016-02-29 2015-11-20
2 2016-03-31 2015-11-21
3 2016-04-30 2015-11-22
4 2016-05-31 2015-11-23
5 2016-06-30 2015-11-24
6 NaT 2015-11-25
7 NaT 2015-11-26
8 2016-01-31 2015-11-27
9 NaT 2015-11-28
10 NaT 2015-11-29
11 NaT 2015-11-30
12 2016-04-30 2015-12-01
13 NaT 2015-12-02
14 NaT 2015-12-03
15 2016-04-30 2015-12-04
16 NaT 2015-12-05
17 NaT 2015-12-06
In [223]: df_test['Difference'] = df_test['first_date'] - df_test['second_date']
In [224]: df_test
Out[224]:
first_date second_date Difference
0 2016-01-31 2015-11-19 73 days
1 2016-02-29 2015-11-20 101 days
2 2016-03-31 2015-11-21 131 days
3 2016-04-30 2015-11-22 160 days
4 2016-05-31 2015-11-23 190 days
5 2016-06-30 2015-11-24 219 days
6 NaT 2015-11-25 NaT
7 NaT 2015-11-26 NaT
8 2016-01-31 2015-11-27 65 days
9 NaT 2015-11-28 NaT
10 NaT 2015-11-29 NaT
11 NaT 2015-11-30 NaT
12 2016-04-30 2015-12-01 151 days
13 NaT 2015-12-02 NaT
14 NaT 2015-12-03 NaT
15 2016-04-30 2015-12-04 148 days
16 NaT 2015-12-05 NaT
17 NaT 2015-12-06 NaT
Now, change type to datetime.timedelta, and then use the .days method on valid timedelta objects.
In [226]: df_test['Diffference'] = df_test['Difference'].astype(dt.timedelta).map(lambda x: np.nan if pd.isnull(x) else x.days)
In [227]: df_test
Out[227]:
first_date second_date Difference Diffference
0 2016-01-31 2015-11-19 73 days 73
1 2016-02-29 2015-11-20 101 days 101
2 2016-03-31 2015-11-21 131 days 131
3 2016-04-30 2015-11-22 160 days 160
4 2016-05-31 2015-11-23 190 days 190
5 2016-06-30 2015-11-24 219 days 219
6 NaT 2015-11-25 NaT NaN
7 NaT 2015-11-26 NaT NaN
8 2016-01-31 2015-11-27 65 days 65
9 NaT 2015-11-28 NaT NaN
10 NaT 2015-11-29 NaT NaN
11 NaT 2015-11-30 NaT NaN
12 2016-04-30 2015-12-01 151 days 151
13 NaT 2015-12-02 NaT NaN
14 NaT 2015-12-03 NaT NaN
15 2016-04-30 2015-12-04 148 days 148
16 NaT 2015-12-05 NaT NaN
17 NaT 2015-12-06 NaT NaN
Hope that helps.

I feel that the overall answer does not handle if the dates 'wrap' around a year. This would be useful in understanding proximity to a date being accurate by day of year. In order to do these row operations, I did the following. (I had this used in a business setting in renewing customer subscriptions).
def get_date_difference(row, x, y):
try:
# Calcuating the smallest date difference between the start and the close date
# There's some tricky logic in here to calculate for determining date difference
# the other way around (Dec -> Jan is 1 month rather than 11)
sub_start_date = int(row[x].strftime('%j')) # day of year (1-366)
close_date = int(row[y].strftime('%j')) # day of year (1-366)
later_date_of_year = max(sub_start_date, close_date)
earlier_date_of_year = min(sub_start_date, close_date)
days_diff = later_date_of_year - earlier_date_of_year
# Calculates the difference going across the next year (December -> Jan)
days_diff_reversed = (365 - later_date_of_year) + earlier_date_of_year
return min(days_diff, days_diff_reversed)
except ValueError:
return None
Then the function could be:
dfAC_Renew['date_difference'] = dfAC_Renew.apply(get_date_difference, x = 'customer_since_date', y = 'renewal_date', axis = 1)

Create a vectorized method
def calc_xb_minus_xa(df):
time_dict = {
'<Minute>': 'm',
'<Hour>': 'h',
'<Day>': 'D',
'<Week>': 'W',
'<Month>': 'M',
'<Year>': 'Y'
}
time_delta = df.at[df.index[0], 'end_time'] - df.at[df.index[0], 'open_time']
offset_base_name = str(to_offset(time_delta).base)
time_term = time_dict.get(offset_base_name)
result = (df.end_time - df.open_time) / np.timedelta64(1, time_term)
return result
Then in your df do:
df['x'] = calc_xb_minus_xa(df)
This will work for minutes, hours, days, weeks, month and Year.
open_time and end_time need to change according your df

pandas error creating TimeDeltas from Datetime operation

I have looked at several other related questions here, here, and here, and none of them have come across quite the same problem as me.
I am using Pandas version 0.16.2. I have several columns in a Pandas dataframe, of dtype datetime64[ns]:
In [6]: date_list = ["SubmittedDate","PolicyStartDate", "PaidUpDate", "MaturityDate", "DraftDate", "CurrentValuationDate", "DOB", "InForceDate"]
In [11]: data[date_list].head()
Out[11]:
SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate \
0 NaT 2002-11-18 NaT 2041-03-04 NaT
1 NaT 2015-01-13 NaT NaT NaT
2 NaT 2014-10-15 NaT NaT NaT
3 NaT 2009-08-27 NaT NaT NaT
4 NaT 2007-04-19 NaT 2013-10-01 NaT
CurrentValuationDate DOB InForceDate
0 2015-04-30 1976-03-04 2002-11-18
1 NaT 1949-09-27 2015-01-13
2 NaT 1947-06-15 2014-10-15
3 2015-07-30 1960-06-07 2009-08-27
4 2010-04-21 1950-10-01 2007-04-19
These were originally in string format (e.g. '1976-03-04') which I converted to datetime objects using:
In [7]: for datecol in date_list:
...: data[datecol] = pd.to_datetime(data[datecol], coerce=True, errors = 'raise')
Here are the dtypes for each of these columns:
In [8]: for datecol in date_list:
print data[datecol].dtypes
returns:
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
So far, so good. But what I want to do is create a new column for each of these columns that gives the age in days (as an integer) from a certain date.
In [13]: current_date = pd.to_datetime("2015-07-31")
I first ran this:
In [14]: for i in date_list:
....: data[i+"InDays"] = data[i].apply(lambda x: current_date - x)
However, when I check the dtype of the returned columns:
In [15]: for datecol in date_list:
....: print data[datecol + "InDays"].dtypes
I get these:
object
timedelta64[ns]
object
timedelta64[ns]
object
timedelta64[ns]
timedelta64[ns]
timedelta64[ns]
I don't know why three of them are objects, when they should be timedeltas. What I want to do next is:
In [16]: for i in date_list:
....: data[i+"InDays"] = data[i+"InDays"].dt.days
This approach works fine for the timedelta columns. However, since three of the columns are not timedeltas, I get this error:
AttributeError: Can only use .dt accessor with datetimelike values
I suspect that there are some values in those three columns that are preventing Pandas from converting them to timedeltas. I can't figure out how to work out what those values might be.

The issue occurs because you have three columns with only NaT values, which is causing those columns to be treated as objects when you do apply your condition on it.
You should put some kind of condition in your apply part, to default to some timedelta in case of NaT. Example -
for i in date_list:
data[i+"InDays"] = data[i].apply(lambda x: current_date - x if x is not pd.NaT else pd.Timedelta(0))
Or if you cannot do the above, you should put a condition where you want to do - data[i+"InDays"] = data[i+"InDays"].dt.days , to take it only if the dtype of the series allows it.
Or a simpler way to change the apply part to directly get what you want would be -
for i in date_list:
data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else x)
This would output -
In [110]: data
Out[110]:
SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate \
0 NaT 2002-11-18 NaT 2041-03-04 NaT
1 NaT 2015-01-13 NaT NaT NaT
2 NaT 2014-10-15 NaT NaT NaT
3 NaT 2009-08-27 NaT NaT NaT
4 NaT 2007-04-19 NaT 2013-10-01 NaT
CurrentValuationDate DOB InForceDate SubmittedDateInDays \
0 2015-04-30 1976-03-04 2002-11-18 NaT
1 NaT 1949-09-27 2015-01-13 NaT
2 NaT 1947-06-15 2014-10-15 NaT
3 2015-07-30 1960-06-07 2009-08-27 NaT
4 2010-04-21 1950-10-01 2007-04-19 NaT
PolicyStartDateInDays PaidUpDateInDays MaturityDateInDays DraftDateInDays \
0 4638 NaT -9348 NaT
1 199 NaT NaN NaT
2 289 NaT NaN NaT
3 2164 NaT NaN NaT
4 3025 NaT 668 NaT
CurrentValuationDateInDays DOBInDays InForceDateInDays
0 92 14393 4638
1 NaN 24048 199
2 NaN 24883 289
3 1 20142 2164
4 1927 23679 3025
If you want your NaT to be changed to NaN you can use -
for i in date_list:
data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)
Example/Demo -
In [114]: for i in date_list:
.....: data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)
.....:
In [115]: data
Out[115]:
SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate \
0 NaT 2002-11-18 NaT 2041-03-04 NaT
1 NaT 2015-01-13 NaT NaT NaT
2 NaT 2014-10-15 NaT NaT NaT
3 NaT 2009-08-27 NaT NaT NaT
4 NaT 2007-04-19 NaT 2013-10-01 NaT
CurrentValuationDate DOB InForceDate SubmittedDateInDays \
0 2015-04-30 1976-03-04 2002-11-18 NaN
1 NaT 1949-09-27 2015-01-13 NaN
2 NaT 1947-06-15 2014-10-15 NaN
3 2015-07-30 1960-06-07 2009-08-27 NaN
4 2010-04-21 1950-10-01 2007-04-19 NaN
PolicyStartDateInDays PaidUpDateInDays MaturityDateInDays \
0 4638 NaN -9348
1 199 NaN NaN
2 289 NaN NaN
3 2164 NaN NaN
4 3025 NaN 668
DraftDateInDays CurrentValuationDateInDays DOBInDays InForceDateInDays
0 NaN 92 14393 4638
1 NaN NaN 24048 199
2 NaN NaN 24883 289
3 NaN 1 20142 2164
4 NaN 1927 23679 3025

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make all non-date values null in Pandas - python

I have an excel doc where the users put dates and strings in the same column. I want to make every string object null and leave all the dates. How do I do this in pandas? Thanks.

Related

Pandas: Add and Preserve time component when input file has only date in dataframe

Extract year from pandas datetime column as numeric value with NaN for empty cells instead of NaT

Slice Pandas DataFrame to show all records of a particular day

Pandas: Subtracting two date columns and the result being an integer

pandas error creating TimeDeltas from Datetime operation

Categories

Resources