How to change year value in numpy datetime64? - python

I have a pandas DataFrame with dtype=numpy.datetime64
In the data I want to change
'2011-11-14T00:00:00.000000000'
to:
'2010-11-14T00:00:00.000000000'
or other year. Timedelta is not known, only year number to assign.
this displays year in int
Dates_profit.iloc[50][stock].astype('datetime64[Y]').astype(int)+1970
but can't assign value.
Anyone know how to assign year to numpy.datetime64?

Since you're using a DataFrame, consider using pandas.Timestamp.replace:
In [1]: import pandas as pd
In [2]: dates = pd.DatetimeIndex([f'200{i}-0{i+1}-0{i+1}' for i in range(5)])
In [3]: df = pd.DataFrame({'Date': dates})
In [4]: df
Out[4]:
Date
0 2000-01-01
1 2001-02-02
2 2002-03-03
3 2003-04-04
4 2004-05-05
In [5]: df.loc[:, 'Date'] = df['Date'].apply(lambda x: x.replace(year=1999))
In [6]: df
Out[6]:
Date
0 1999-01-01
1 1999-02-02
2 1999-03-03
3 1999-04-04
4 1999-05-05

numpy.datetime64 objects are hard to work with. To update a value, it is normally easier to convert the date to a standard Python datetime object, do the change and then convert it back to a numpy.datetime64 value again:
import numpy as np
from datetime import datetime
dt64 = np.datetime64('2011-11-14T00:00:00.000000000')
# convert to timestamp:
ts = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')
# standard utctime from timestamp
dt = datetime.utcfromtimestamp(ts)
# get the new updated year
dt = dt.replace(year=2010)
# convert back to numpy.datetime64:
dt64 = np.datetime64(dt)
There might be simpler ways, but this works, at least.

This vectorised solution gives the same result as using pandas to iterate over with x.replace(year=n), but the speed up on large arrays is at least x10 faster.
It is important to remember the year that the datetime64 object is replaced with should be a leap year. Using the python datetime library, the following crashes: datetime(2012,2,29).replace(year=2011) crashes. Here, the function 'replace_year' will simply move 2012-02-29 to 2011-03-01.
I'm using numpy v 1.13.1.
import numpy as np
import pandas as pd
def replace_year(x, year):
""" Year must be a leap year for this to work """
# Add number of days x is from JAN-01 to year-01-01
x_year = np.datetime64(str(year)+'-01-01') + (x - x.astype('M8[Y]'))
# Due to leap years calculate offset of 1 day for those days in non-leap year
yr_mn = x.astype('M8[Y]') + np.timedelta64(59,'D')
leap_day_offset = (yr_mn.astype('M8[M]') - yr_mn.astype('M8[Y]') - 1).astype(np.int)
# However, due to days in non-leap years prior March-01,
# correct for previous step by removing an extra day
non_leap_yr_beforeMarch1 = (x.astype('M8[D]') - x.astype('M8[Y]')).astype(np.int) < 59
non_leap_yr_beforeMarch1 = np.logical_and(non_leap_yr_beforeMarch1, leap_day_offset).astype(np.int)
day_offset = np.datetime64('1970') - (leap_day_offset - non_leap_yr_beforeMarch1).astype('M8[D]')
# Finally, apply the day offset
x_year = x_year - day_offset
return x_year
x = np.arange('2012-01-01', '2014-01-01', dtype='datetime64[h]')
x_datetime = pd.to_datetime(x)
x_year = replace_year(x, 1992)
x_datetime = x_datetime.map(lambda x: x.replace(year=1992))
print(x)
print(x_year)
print(x_datetime)
print(np.all(x_datetime.values == x_year))

Related

Converting days to years in Pandas DataFrame

I am trying to find the difference between 2 dates in a Pandas DataFrame this is my code:
raw['CALCULATED_AGE'] = ((raw.COMMENCEMENT_DATE - raw.DATE_OF_BIRTH))
this gives me the following output:
Pandas Output Column
I just want to convert the days to years, any easy way to do this ?
Thank you so much
You can use "relativedelta" and match it to your case:
from dateutil.relativedelta import relativedelta
rdelta = relativedelta(raw.COMMENCEMENT_DATE,raw.DATE_OF_BIRTH).years
Full code example:
create the data:
import pandas as pd
from dateutil.relativedelta import relativedelta
raw = pd.DataFrame({'COMMENCEMENT_DATE': ['3/10/2000', '3/11/2000', '3/12/2000'],
'DATE_OF_BIRTH': ['3/10/1990', '3/11/1991', '3/12/1990']})
raw['COMMENCEMENT_DATE'] = pd.to_datetime(raw['COMMENCEMENT_DATE'])
raw['DATE_OF_BIRTH'] = pd.to_datetime(raw['DATE_OF_BIRTH'])
Calc:
raw['CALCULATED_AGE'] = raw.apply(lambda x: relativedelta(x.COMMENCEMENT_DATE, x.DATE_OF_BIRTH).years, axis=1)
Output:
COMMENCEMENT_DATE DATE_OF_BIRTH CALCULATED_AGE
0 2000-03-10 1990-03-10 10
1 2000-03-11 1991-03-11 9
2 2000-03-12 1990-03-12 10
EDIT
Another solution works also for months:
raw['CALCULATED_AGE'] = (raw.COMMENCEMENT_DATE - raw.DATE_OF_BIRTH)/np.timedelta64(1, 'Y')
raw['CALCULATED_AGE'] = raw['CALCULATED_AGE'].astype(int)
If you want calc for months just change 'Y' to 'M'.

How to get the age of a person when having their dates of birth in a column of the type 'object'?

I have a dataframe in Python with several columns, one of which is date of birth of persons. The data type of the date of birth column is object. I would like to get the age of persons as an integer number.
For example: date of birth = 23.6.2005 gives (as today is 1.5.2021) age = 15 (years)
The ages are to be returned in a column of the dataframe.
It may work for you
import datetime
today = datetime.date.today()
df["age"] = ((today - df["DOB"]).dt.days //365.25)
You can use datetime.date.today() to get the current date, and subtract the column from that, and then divide by a timedelta of one year for a reasonably accurate measurement.
import datetime
import pandas as pd
p = pd.DataFrame({'birthdate': [datetime.date(1969,5,21), datetime.date(1996, 8, 15), datetime.date(1981, 4, 30)]})
# birthdate
# 0 1969-05-21
# 1 1996-08-15
# 2 1981-04-30
p['age'] = (datetime.date.today() - p['birthdate']) // datetime.timedelta(days=365.25)
# birthdate age
# 0 1969-05-21 51
# 1 1996-08-15 24
# 2 1981-04-30 40
You can get the date difference by substracting pd.Timestamp.now() to your date of birth column (with conversion from 'object' format to datetime format). Then divide by np.timedelta64(1, 'Y') (which mean 1 year time difference. Use numpy function since Pandas has no corresponding function with up to year time difference.)
df['age'] = (pd.Timestamp.now() - pd.to_datetime(df['date of birth'])) // np.timedelta64(1, 'Y')
Rounding down to integer age is automatically achieved through division by //
Demo
import numpy as np
df = pd.DataFrame({'date of birth': ['23.6.2005', '22.4.1995', '12.12.2002']})
df['age'] = (pd.Timestamp.now() - pd.to_datetime(df['date of birth'])) // np.timedelta64(1, 'Y')
print(df)
date of birth age
0 23.6.2005 15
1 22.4.1995 26
2 12.12.2002 18

Identify if a record is within 1 year in Pandas [duplicate]

This question already has answers here:
How to subtract a day from a date?
(7 answers)
Closed 2 years ago.
I am trying to identify records that expire within 1 year of today's date. This is the code I have and it doesn't work because I can't add or subtract integers from dates. Can someone assist? I know this is simple.
from datetime import date
today = date.today()
mask = (df['[PCW]Contract (Expiration Date)'] <= today + 365)
You need to use time deltas.
from datetime import timedelta
one_year = timedelta(days=365)
mask = (df['[PCW]Contract (Expiration Date)'] <= today + one_year)
Assuming you are using datetime objects in your dataframe.
UPDATE
import pandas as pd
import numpy as np
df = pd.DataFrame({'[PCW]Contract (Expiration Date)' :["2020-01-21T02:37:21", '2021-01-21T02:37:21', '2022-01-21T02:37:21']})
s = pd.to_datetime(df['[PCW]Contract (Expiration Date)'])
one_year = np.timedelta64(365,'D')
today = np.datetime64('today')
mask = s <= today + one_year
mask
Output
0 True
1 True
2 False
Name: [PCW]Contract (Expiration Date), dtype: bool

Python datetime delta format

I am attempting to find records in my dataframe that are 30 days old or older. I pretty much have everything working but I need to correct the format of the Age column. Most everything in the program is stuff I found on stack overflow, but I can't figure out how to change the format of the delta that is returned.
import pandas as pd
import datetime as dt
file_name = '/Aging_SRs.xls'
sheet = 'All'
df = pd.read_excel(io=file_name, sheet_name=sheet)
df.rename(columns={'SR Create Date': 'Create_Date', 'SR Number': 'SR'}, inplace=True)
tday = dt.date.today()
tdelta = dt.timedelta(days=30)
aged = tday - tdelta
df = df.loc[df.Create_Date <= aged, :]
# Sets the SR as the index.
df = df.set_index('SR', drop = True)
# Created the Age column.
df.insert(2, 'Age', 0)
# Calculates the days between the Create Date and Today.
df['Age'] = df['Create_Date'].subtract(tday)
The calculation in the last line above gives me the result, but it looks like -197 days +09:39:12 and I need it to just be a positive number 197. I have also tried to search using the python, pandas, and datetime keywords.
df.rename(columns={'Create_Date': 'SR Create Date'}, inplace=True)
writer = pd.ExcelWriter('output_test.xlsx')
df.to_excel(writer)
writer.save()
I can't see your example data, but IIUC and you're just trying to get the absolute value of the number of days of a timedelta, this should work:
df['Age'] = abs(df['Create_Date'].subtract(tday)).dt.days)
Explanation:
Given a dataframe with a timedelta column:
>>> df
delta
0 26523 days 01:57:59
1 -1601 days +01:57:59
You can extract just the number of days as an int using dt.days:
>>> df['delta']dt.days
0 26523
1 -1601
Name: delta, dtype: int64
Then, all you need to do is wrap that in a call to abs to get the absolute value of that int:
>>> abs(df.delta.dt.days)
0 26523
1 1601
Name: delta, dtype: int64
here is what i worked out for basically the same issue.
# create timestamp for today, normalize to 00:00:00
today = pd.to_datetime('today', ).normalize()
# match timezone with datetimes in df so subtraction works
today = today.tz_localize(df['posted'].dt.tz)
# create 'age' column for days old
df['age'] = (today - df['posted']).dt.days
pretty much the same as the answer above, but without the call to abs().

Converting string 'yyyy-mm-dd' into datetime [duplicate]

This question already has answers here:
Convert string "Jun 1 2005 1:33PM" into datetime
(26 answers)
Closed 7 years ago.
I have a raw input from the user such as "2015-01-30"...for the query I am using, the date has to be inputed as a string as such "yyyy-mm-dd".
I would like to increment the date by 1 month at end of my loop s.t "2015-01-30" becomes "2015-02-27" (ideally the last business day of the next month). I was hoping someone could help me; I am using PYTHON, the reason I want to convert to datetime is I found a function to add 1 month.
Ideally my two questions to be answered are (in Python):
1) how to convert string "yyyy-mm-dd" into a python datetime and convert back into string after applying a timedelta function
2) AND/or how to add 1 month to string "yyyy-mm-dd"
Maybe these examples will help you get an idea:
from dateutil.relativedelta import relativedelta
import datetime
date1 = datetime.datetime.strptime("2015-01-30", "%Y-%m-%d").strftime("%d-%m-%Y")
print(date1)
today = datetime.date.today()
print(today)
addMonths = relativedelta(months=3)
future = today + addMonths
print(future)
If you import datetime it will give you more options in managing date and time variables. In my example above I have some example code that will show you how it works.
It is also very usefull if you would for example would like to add a x number of days, months or years to a certain date.
Edit:
To answer you question below this post I would suggest you to look at "calendar"
For example:
import calendar
january2012 = calendar.monthrange(2002,1)
print(january2012)
february2008 = calendar.monthrange(2008,2)
print(february2008)
This return you the first workday of the month, and the number of days of the month.
With that you can calculate what was the last workday of the month.
Here is more information about it: Link
Also have a loook here, looks what you might could use: Link
converting string 'yyyy-mm-dd' into datetime/date python
from datetime import date
date_string = '2015-01-30'
now = date(*map(int, date_string.split('-')))
# or now = datetime.strptime(date_string, '%Y-%m-%d').date()
the last business day of the next month
from datetime import timedelta
DAY = timedelta(1)
last_bday = (now.replace(day=1) + 2*31*DAY).replace(day=1) - DAY
while last_bday.weekday() > 4: # Sat, Sun
last_bday -= DAY
print(last_bday)
# -> 2015-02-27
It doesn't take into account holidays.
You can use a one-liner, that takes the datetime, adds a month (using a defined function), and converts back to a string:
x = add_months(datetime.datetime(*[int(item) for item in x.split('-')]), 1).strftime("%Y-%m-%d")
>>> import datetime, calendar
>>> x = "2015-01-30"
>>> x = add_months(datetime.datetime(*[int(item) for item in x.split('-')]), 1).strftime("%Y-%m-%d")
>>> x
'2015-02-28'
>>>
add_months:
def add_months(sourcedate,months):
month = sourcedate.month - 1 + months
year = sourcedate.year + month / 12
month = month % 12 + 1
day = min(sourcedate.day,calendar.monthrange(year,month)[1])
return datetime.date(year,month,day)
To convert a string of that format into a Python date object:
In [1]: import datetime
In [2]: t = "2015-01-30"
In [3]: d = datetime.date(*(int(s) for s in t.split('-')))
In [4]: d
Out[4]: datetime.date(2015, 1, 30)
To move forward to the last day of next month:
In [4]: d
Out[4]: datetime.date(2015, 1, 30)
In [5]: new_month = (d.month + 1) if d.month != 12 else 1
In [6]: new_year = d.year if d.month != 12 else d.year + 1
In [7]: import calendar
In [8]: new_day = calendar.monthrange(new_year, new_month)[1]
In [9]: d = d.replace(year=new_year,month=new_month,day=new_day)
In [10]: d
Out[10]: datetime.date(2015, 2, 28)
And this datetime.date object can be easily converted to a 'YYYY-MM-DD' string:
In [11]: str(d)
Out[11]: '2015-02-28'
EDIT:
To get the last business day (i.e. Monday-Friday) of the month:
In [8]: new_day = calendar.monthrange(new_year, new_month)[1]
In [9]: d = d.replace(year=new_year,month=new_month,day=new_day)
In [10]: day_of_the_week = d.isoweekday()
In [11]: if day_of_the_week > 5:
....: adj_new_day = new_day - (day_of_the_week - 5)
....: d = d.replace(day=adj_new_day)
....:
In [11]: d
Out[11]: datetime.date(2015, 2, 27)

Categories