Add timedelta to a date column above weeks - python

How would I add 1 year to a column?
I've tried using map and apply but I failed miserably.
I also wonder why pl.date() accepts integers while it advertises that it only accepts str or pli.Expr.
A small hack workaround is:
col = pl.col('date').dt
df = df.with_column(pl.when(pl.col(column).is_not_null())
.then(pl.date(col.year() + 1, col.month(), col.day()))
.otherwise(pl.date(col.year() + 1,col.month(), col.day()))
.alias("date"))
but this won't work for months or days. I can't just add a number or I'll get a:
> thread 'thread '<unnamed>' panicked at 'invalid or out-of-range date<unnamed>',
' panicked at '/github/home/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rsinvalid or out-of-range date:', 173:/github/home/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rs51
:note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Most likely because day and month cycle while year goes to infinity.
I could also do this:
df = df.with_column(
pl.when(col.month() == 1)
.then(pl.date(col.year(), 2, col.day()))
.when(col.month() == 2)
.then(pl.date(col.year(), 3, col.day()))
.when(col.month() == 3)
.then(pl.date(col.year(), 4, col.day()))
.when(col.month() == 4)
.then(pl.date(col.year(), 5, col.day()))
.when(col.month() == 5)
.then(pl.date(col.year(), 6, col.day()))
.when(col.month() == 6)
.then(pl.date(col.year(), 7, col.day()))
.when(col.month() == 7)
.then(pl.date(col.year(), 8, col.day()))
.when(col.month() == 8)
.then(pl.date(col.year(), 9, col.day()))
.when(col.month() == 9)
.then(pl.date(col.year(), 10, col.day()))
.when(col.month() == 10)
.then(pl.date(col.year(), 11, col.day()))
.when(col.month() == 11)
.then(pl.date(col.year(), 12, col.day()))
.otherwise(pl.date(col.year() + 1, 1, 1))
.alias("valid_from")
)

Polars allows to do addition and subtraction with python's timedelta objects. However above week units things get a bit more complicated as we have to take different days of the month and leap years into account.
For this polars has offset_by under the dt namespace.
(pl.DataFrame({
"dates": pl.date_range(datetime(2000, 1, 1), datetime(2026, 1, 1), "1y")
}).with_columns([
pl.col("dates").dt.offset_by("1y").alias("dates_and_1_yr")
]))
shape: (27, 2)
┌─────────────────────┬─────────────────────┐
│ dates ┆ dates_and_1_yr │
│ --- ┆ --- │
│ datetime[ns] ┆ datetime[ns] │
╞═════════════════════╪═════════════════════╡
│ 2000-01-01 00:00:00 ┆ 2001-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2001-01-01 00:00:00 ┆ 2002-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2002-01-01 00:00:00 ┆ 2003-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2003-01-01 00:00:00 ┆ 2004-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2023-01-01 00:00:00 ┆ 2024-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2024-01-01 00:00:00 ┆ 2025-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2025-01-01 00:00:00 ┆ 2026-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2026-01-01 00:00:00 ┆ 2027-01-01 00:00:00 │
└─────────────────────┴─────────────────────┘

You can use polars.apply and dateutil.relativedelta which works for years, months, days and much more, but can be slow for lots of data.
from datetime import date
from dateutil.relativedelta import relativedelta
df = pl.DataFrame(pl.date_range(date(2019, 1, 1), date(2020, 10, 1), '3mo', name='date'))
df.with_column(pl.col('date').apply(lambda x: x + relativedelta(years=1)))
Update: Since the offset_by method is now also available for months, it should be used whenever possible (see accepted answer). I leave this answer here because the approach can be used for more complicated cases that are not supported by offset_by.

Related

Polars Adding Days to a date [duplicate]

This question already has an answer here:
How to add a duration to datetime in Python polars
(1 answer)
Closed 17 days ago.
I am using Polars in Python to try and add thirty days to a date
I run the code, get no errors but also get no new dates
Can anyone see my mistake?
import polars as pl
mydf = pl.DataFrame(
{"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"]})
mydf = mydf.with_column(
pl.col("start_date").str.strptime(pl.Date, "%Y-%m-%d"),
)
# Generate the days above and below
mydf = mydf.with_column(
pl.col('start_date') + pl.duration(days=30).alias('date_plus_delta')
)
mydf = mydf.with_column(
pl.col('start_date') + pl.duration(days=-30).alias('date_minus_delta')
)
print(mydf)
shape: (3, 1)
┌────────────┐
│ start_date │
│ --- │
│ date │
╞════════════╡
│ 2020-01-02 │
│ 2020-01-03 │
│ 2020-01-04 │
└────────────┘
Quick References
The Manual: https://pola-rs.github.io/polars-book/user-guide/howcani/data/timestamps.html
strftime formats: https://docs.rs/chrono/latest/chrono/format/strftime/index.html
SO Answer from a previous Post: How to add a duration to datetime in Python polars
You're supposed to call .alias on the entire operation pl.col('start_date') + pl.duration(days=30). Instead you're only alias-ing on pl.duration(days=30).
So the correct way would be:
import polars as pl
mydf = pl.DataFrame({"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"]})
mydf = mydf.with_columns(pl.col("start_date").str.strptime(pl.Date, r"%Y-%m-%d"))
# Generate the days above and below
mydf = mydf.with_columns((pl.col('start_date') + pl.duration(days=30)).alias('date_plus_delta'))
mydf = mydf.with_columns((pl.col('start_date') - pl.duration(days=30)).alias('date_minus_delta'))
print(mydf)
Output
shape: (3, 3)
┌────────────┬─────────────────┬──────────────────┐
│ start_date ┆ date_plus_delta ┆ date_minus_delta │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ date │
╞════════════╪═════════════════╪══════════════════╡
│ 2020-01-02 ┆ 2020-02-01 ┆ 2019-12-03 │
│ 2020-01-03 ┆ 2020-02-02 ┆ 2019-12-04 │
│ 2020-01-04 ┆ 2020-02-03 ┆ 2019-12-05 │
└────────────┴─────────────────┴──────────────────┘

how to convert an empty pandas Dataframe into a polars Dataframe

I have defined a pandas DataFrame as follows:
df_tmp = pd.DataFrame({'EDT': pd.Series(dtype='datetime64[ns]'),
'FSPB': pd.Series(dtype='str'),
'FS_LA': pd.Series(dtype='str'),
'lA': pd.Series(dtype='int'),
'avg': pd.Series(dtype='float64'),
'nw': pd.Series(dtype='float64')})
Is there any way to convert the above into an empty polars DataFrame?
According to the polars docs, polars DataFrames can take a pandas DataFrame in their constructor, so:
import pandas as pd
import polars as pl
df_tmp = pd.DataFrame({'EDT': pd.Series(dtype='datetime64[ns]'),
'FSPB': pd.Series(dtype='str'),
'FS_LA': pd.Series(dtype='str'),
'lA': pd.Series(dtype='int'),
'avg': pd.Series(dtype='float64'),
'nw': pd.Series(dtype='float64')})
df = pl.DataFrame(df_tmp)
should work.
import polars as pl
import pandas as pd
pandas_df = pd.DataFrame({'EDT': pd.Series(dtype='datetime64[ns]'),
'FSPB': pd.Series(dtype='str'),
'FS_LA': pd.Series(dtype='str'),
'lA': pd.Series(dtype='int'),
'avg': pd.Series(dtype='float64'),
'nw': pd.Series(dtype='float64')})
pl.from_pandas(pandas_df)
shape: (0, 6)
┌──────────────┬──────┬───────┬─────┬─────┬─────┐
│ EDT ┆ FSPB ┆ FS_LA ┆ lA ┆ avg ┆ nw │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[ns] ┆ str ┆ str ┆ i64 ┆ f64 ┆ f64 │
╞══════════════╪══════╪═══════╪═════╪═════╪═════╡
└──────────────┴──────┴───────┴─────┴─────┴─────┘

How do I calculate a 12-month return based on monthly observations within dataframe in Python?

How to calculate rolling cumulative product on Pandas DataFrame.
I have a time series of returns in a pandas DataFrame. How can I calculate a rolling annualized alpha for the relevant columns in the DataFrame? I would normally use Excel and do: =PRODUCT(1+[trailing 12 months])-1
My DataFrame looks like the below (a small portion):
Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 \
2009-08-31 00:00:00 --- --- 0.1489 0.072377
2009-09-30 00:00:00 --- --- 0.0662 0.069608
2009-10-31 00:00:00 --- --- -0.0288 -0.016967
2009-11-30 00:00:00 --- --- -0.0089 0.0009
2009-12-31 00:00:00 --- --- 0.044 0.044388
2010-01-31 00:00:00 --- --- -0.0301 -0.054953
2010-02-28 00:00:00 --- --- -0.0014 0.00821
2010-03-31 00:00:00 --- --- 0.0405 0.049959
2010-04-30 00:00:00 --- --- 0.0396 -0.007146
2010-05-31 00:00:00 --- --- -0.0736 -0.079834
2010-06-30 00:00:00 --- --- -0.0658 -0.028655
2010-07-31 00:00:00 --- --- 0.0535 0.038826
2010-08-31 00:00:00 --- --- -0.0031 -0.013885
2010-09-30 00:00:00 --- --- 0.0503 0.045781
2010-10-31 00:00:00 --- --- 0.0499 0.025335
2010-11-30 00:00:00 --- --- 0.012 -0.007495
I've tried the code below provided for a similar question, but it looks like it doesn't work anymore ...
import pandas as pd
import numpy as np
# your DataFrame; df = ...
pd.rolling_apply(df, 12, lambda x: np.prod(1 + x) - 1)
... and the pages that I'm redirected seem not to be as relevant.
Ideally, I'd like to reproduce the DataFrame but with 12 month returns, not monthly so I can locate the relevant 12 month return depending on the month.
If I understand correctly, you could try something like the below:
import pandas as pd
import numpy as np
#define dummy dataframe with monthly returns
df = pd.DataFrame(1 + np.random.rand(20), columns=['returns'])
#compute 12-month rolling returns
df_roll = df.rolling(window=12).apply(np.prod) - 1

Extract multi-year three month series (winter) from pandas dataframe

I've got a pandas DataFrame containing 70 years with hourly data, looking like this:
pressure
2015-06-01 18:00:00 945.6
2015-06-01 19:00:00 945.6
2015-06-01 20:00:00 945.4
2015-06-01 21:00:00 945.4
2015-06-01 22:00:00 945.3
I want to extract the winter months (D-J-F) from every year and generate a new DataFrame with a series of winters.
I found a lot of complicated stuff (e.g. extracting the df.index.month as a new column and then adress this one afterwards), but is there a way to get the winter months straightforward?
You can use map():
import pandas as pd
df = pd.DataFrame({'date' : [datetime.date(2015, 11, 1), datetime.date(2015, 12, 1), datetime.date(2015, 1, 1), datetime.date(2015, 2, 1)],
'pressure': [1,2,3,4]})
winter_months = [12, 1, 2]
print df
# date pressure
# 0 2015-11-01 1
# 1 2015-12-01 2
# 2 2015-01-01 3
# 3 2015-02-01 4
df = df[df["date"].map(lambda t: t.month in winter_months)]
print df
# date pressure
# 1 2015-12-01 2
# 2 2015-01-01 3
# 3 2015-02-01 4
EDIT: I noticed that in your example the dates are the dataframe's index. This still works:
df = df[df.index.map(lambda t: t.month in winter_months)]
I just found that
df[(df.index.month==12) | (df.index.month==1) | (df.index.month==2)]
works fine.

Python checking daytime

Basically, I want my script to pause between 4 and 5 AM. The only way to do this I've come up with so far is this:
seconds_into_day = time.time() % (60*60*24)
if 60*60*4 < seconds_into_day < 60*60*5:
sleep(time_left_till_5am)
Any "proper" way to do this? Aka some built-in function/lib for calculating time; rather than just using seconds all the time?
You want datetime
The datetime module supplies classes for manipulating dates and times in both simple and complex ways
If you use date.hour from datetime.now() you'll get the current hour:
datetimenow = datetime.now();
if datetimenow.hour in range(4, 5)
sleep(time_left_till_5am)
You can calculate time_left_till_5am by taking 60 - datetimenow.minute multiplying by 60 and adding to 60 - datetimenow.second.
Python has a built-in datetime library: http://docs.python.org/library/datetime.html
This should probably get you what you're after:
import datetime as dt
from time import sleep
now = dt.datetime.now()
if now.hour >= 4 andnow.hour < 5:
sleep((60 - now.minute)*60 + (60 - now.second))
OK, the above works, but here's the purer, less error-prone solution (and what I was originally thinking of but suddenly forgot how to do):
import datetime as dt
from time import sleep
now = dt.datetime.now()
pause = dt.datetime(now.year, now.month, now.day, 4)
start = dt.datetime(now.year, now.month, now.day, 5)
if now >= pause and now < start:
sleep((start - now).seconds)
That's where my original "timedelta" comment came from -- what you get from subtracting two datetime objects is a timedelta object (which in this case we pull the 'seconds' attribute from).
The following code covers the more general case where a script needs to pause during any fixed window of less than 24 hours duration. Example: must sleep between 11:00 PM and 01:00 AM.
import datetime as dt
def sleep_duration(sleep_from, sleep_to, now=None):
# sleep_* are datetime.time objects
# now is a datetime.datetime object
if now is None:
now = dt.datetime.now()
duration = 0
lo = dt.datetime.combine(now, sleep_from)
hi = dt.datetime.combine(now, sleep_to)
if lo <= now < hi:
duration = (hi - now).seconds
elif hi < lo:
if now >= lo:
duration = (hi + dt.timedelta(hours=24) - now).seconds
elif now < hi:
duration = (hi - now).seconds
return duration
tests = [
(4, 5, 3, 30),
(4, 5, 4, 0),
(4, 5, 4, 30),
(4, 5, 5, 0),
(4, 5, 5, 30),
(23, 1, 0, 0),
(23, 1, 0, 30),
(23, 1, 0, 59),
(23, 1, 1, 0),
(23, 1, 1, 30),
(23, 1, 22, 30),
(23, 1, 22, 59),
(23, 1, 23, 0),
(23, 1, 23, 1),
(23, 1, 23, 59),
]
for hfrom, hto, hnow, mnow in tests:
sfrom = dt.time(hfrom)
sto = dt.time(hto)
dnow = dt.datetime(2010, 7, 5, hnow, mnow)
print sfrom, sto, dnow, sleep_duration(sfrom, sto, dnow)
and here's the output:
04:00:00 05:00:00 2010-07-05 03:30:00 0
04:00:00 05:00:00 2010-07-05 04:00:00 3600
04:00:00 05:00:00 2010-07-05 04:30:00 1800
04:00:00 05:00:00 2010-07-05 05:00:00 0
04:00:00 05:00:00 2010-07-05 05:30:00 0
23:00:00 01:00:00 2010-07-05 00:00:00 3600
23:00:00 01:00:00 2010-07-05 00:30:00 1800
23:00:00 01:00:00 2010-07-05 00:59:00 60
23:00:00 01:00:00 2010-07-05 01:00:00 0
23:00:00 01:00:00 2010-07-05 01:30:00 0
23:00:00 01:00:00 2010-07-05 22:30:00 0
23:00:00 01:00:00 2010-07-05 22:59:00 0
23:00:00 01:00:00 2010-07-05 23:00:00 7200
23:00:00 01:00:00 2010-07-05 23:01:00 7140
23:00:00 01:00:00 2010-07-05 23:59:00 3660
When dealing with dates and times in Python I still prefer mxDateTime over Python's datetime module as although the built-in one has improved greatly over the years it's still rather awkward and lacking in comparison. So if interested go here: mxDateTime It's free to download and use. Makes life much easier when dealing with datetime math.
import mx.DateTime as dt
from time import sleep
now = dt.now()
if 4 <= now.hour < 5:
stop = dt.RelativeDateTime(hour=5, minute=0, second=0)
secs_remaining = ((now + stop) - now).seconds
sleep(secs_remaining)

Categories