This question already has an answer here:
How to add a duration to datetime in Python polars
(1 answer)
Closed 17 days ago.
I am using Polars in Python to try and add thirty days to a date
I run the code, get no errors but also get no new dates
Can anyone see my mistake?
import polars as pl
mydf = pl.DataFrame(
{"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"]})
mydf = mydf.with_column(
pl.col("start_date").str.strptime(pl.Date, "%Y-%m-%d"),
)
# Generate the days above and below
mydf = mydf.with_column(
pl.col('start_date') + pl.duration(days=30).alias('date_plus_delta')
)
mydf = mydf.with_column(
pl.col('start_date') + pl.duration(days=-30).alias('date_minus_delta')
)
print(mydf)
shape: (3, 1)
┌────────────┐
│ start_date │
│ --- │
│ date │
╞════════════╡
│ 2020-01-02 │
│ 2020-01-03 │
│ 2020-01-04 │
└────────────┘
Quick References
The Manual: https://pola-rs.github.io/polars-book/user-guide/howcani/data/timestamps.html
strftime formats: https://docs.rs/chrono/latest/chrono/format/strftime/index.html
SO Answer from a previous Post: How to add a duration to datetime in Python polars
You're supposed to call .alias on the entire operation pl.col('start_date') + pl.duration(days=30). Instead you're only alias-ing on pl.duration(days=30).
So the correct way would be:
import polars as pl
mydf = pl.DataFrame({"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"]})
mydf = mydf.with_columns(pl.col("start_date").str.strptime(pl.Date, r"%Y-%m-%d"))
# Generate the days above and below
mydf = mydf.with_columns((pl.col('start_date') + pl.duration(days=30)).alias('date_plus_delta'))
mydf = mydf.with_columns((pl.col('start_date') - pl.duration(days=30)).alias('date_minus_delta'))
print(mydf)
Output
shape: (3, 3)
┌────────────┬─────────────────┬──────────────────┐
│ start_date ┆ date_plus_delta ┆ date_minus_delta │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ date │
╞════════════╪═════════════════╪══════════════════╡
│ 2020-01-02 ┆ 2020-02-01 ┆ 2019-12-03 │
│ 2020-01-03 ┆ 2020-02-02 ┆ 2019-12-04 │
│ 2020-01-04 ┆ 2020-02-03 ┆ 2019-12-05 │
└────────────┴─────────────────┴──────────────────┘
I have defined a pandas DataFrame as follows:
df_tmp = pd.DataFrame({'EDT': pd.Series(dtype='datetime64[ns]'),
'FSPB': pd.Series(dtype='str'),
'FS_LA': pd.Series(dtype='str'),
'lA': pd.Series(dtype='int'),
'avg': pd.Series(dtype='float64'),
'nw': pd.Series(dtype='float64')})
Is there any way to convert the above into an empty polars DataFrame?
According to the polars docs, polars DataFrames can take a pandas DataFrame in their constructor, so:
import pandas as pd
import polars as pl
df_tmp = pd.DataFrame({'EDT': pd.Series(dtype='datetime64[ns]'),
'FSPB': pd.Series(dtype='str'),
'FS_LA': pd.Series(dtype='str'),
'lA': pd.Series(dtype='int'),
'avg': pd.Series(dtype='float64'),
'nw': pd.Series(dtype='float64')})
df = pl.DataFrame(df_tmp)
should work.
import polars as pl
import pandas as pd
pandas_df = pd.DataFrame({'EDT': pd.Series(dtype='datetime64[ns]'),
'FSPB': pd.Series(dtype='str'),
'FS_LA': pd.Series(dtype='str'),
'lA': pd.Series(dtype='int'),
'avg': pd.Series(dtype='float64'),
'nw': pd.Series(dtype='float64')})
pl.from_pandas(pandas_df)
shape: (0, 6)
┌──────────────┬──────┬───────┬─────┬─────┬─────┐
│ EDT ┆ FSPB ┆ FS_LA ┆ lA ┆ avg ┆ nw │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[ns] ┆ str ┆ str ┆ i64 ┆ f64 ┆ f64 │
╞══════════════╪══════╪═══════╪═════╪═════╪═════╡
└──────────────┴──────┴───────┴─────┴─────┴─────┘
How to calculate rolling cumulative product on Pandas DataFrame.
I have a time series of returns in a pandas DataFrame. How can I calculate a rolling annualized alpha for the relevant columns in the DataFrame? I would normally use Excel and do: =PRODUCT(1+[trailing 12 months])-1
My DataFrame looks like the below (a small portion):
Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 \
2009-08-31 00:00:00 --- --- 0.1489 0.072377
2009-09-30 00:00:00 --- --- 0.0662 0.069608
2009-10-31 00:00:00 --- --- -0.0288 -0.016967
2009-11-30 00:00:00 --- --- -0.0089 0.0009
2009-12-31 00:00:00 --- --- 0.044 0.044388
2010-01-31 00:00:00 --- --- -0.0301 -0.054953
2010-02-28 00:00:00 --- --- -0.0014 0.00821
2010-03-31 00:00:00 --- --- 0.0405 0.049959
2010-04-30 00:00:00 --- --- 0.0396 -0.007146
2010-05-31 00:00:00 --- --- -0.0736 -0.079834
2010-06-30 00:00:00 --- --- -0.0658 -0.028655
2010-07-31 00:00:00 --- --- 0.0535 0.038826
2010-08-31 00:00:00 --- --- -0.0031 -0.013885
2010-09-30 00:00:00 --- --- 0.0503 0.045781
2010-10-31 00:00:00 --- --- 0.0499 0.025335
2010-11-30 00:00:00 --- --- 0.012 -0.007495
I've tried the code below provided for a similar question, but it looks like it doesn't work anymore ...
import pandas as pd
import numpy as np
# your DataFrame; df = ...
pd.rolling_apply(df, 12, lambda x: np.prod(1 + x) - 1)
... and the pages that I'm redirected seem not to be as relevant.
Ideally, I'd like to reproduce the DataFrame but with 12 month returns, not monthly so I can locate the relevant 12 month return depending on the month.
If I understand correctly, you could try something like the below:
import pandas as pd
import numpy as np
#define dummy dataframe with monthly returns
df = pd.DataFrame(1 + np.random.rand(20), columns=['returns'])
#compute 12-month rolling returns
df_roll = df.rolling(window=12).apply(np.prod) - 1
I've got a pandas DataFrame containing 70 years with hourly data, looking like this:
pressure
2015-06-01 18:00:00 945.6
2015-06-01 19:00:00 945.6
2015-06-01 20:00:00 945.4
2015-06-01 21:00:00 945.4
2015-06-01 22:00:00 945.3
I want to extract the winter months (D-J-F) from every year and generate a new DataFrame with a series of winters.
I found a lot of complicated stuff (e.g. extracting the df.index.month as a new column and then adress this one afterwards), but is there a way to get the winter months straightforward?
You can use map():
import pandas as pd
df = pd.DataFrame({'date' : [datetime.date(2015, 11, 1), datetime.date(2015, 12, 1), datetime.date(2015, 1, 1), datetime.date(2015, 2, 1)],
'pressure': [1,2,3,4]})
winter_months = [12, 1, 2]
print df
# date pressure
# 0 2015-11-01 1
# 1 2015-12-01 2
# 2 2015-01-01 3
# 3 2015-02-01 4
df = df[df["date"].map(lambda t: t.month in winter_months)]
print df
# date pressure
# 1 2015-12-01 2
# 2 2015-01-01 3
# 3 2015-02-01 4
EDIT: I noticed that in your example the dates are the dataframe's index. This still works:
df = df[df.index.map(lambda t: t.month in winter_months)]
I just found that
df[(df.index.month==12) | (df.index.month==1) | (df.index.month==2)]
works fine.
Basically, I want my script to pause between 4 and 5 AM. The only way to do this I've come up with so far is this:
seconds_into_day = time.time() % (60*60*24)
if 60*60*4 < seconds_into_day < 60*60*5:
sleep(time_left_till_5am)
Any "proper" way to do this? Aka some built-in function/lib for calculating time; rather than just using seconds all the time?
You want datetime
The datetime module supplies classes for manipulating dates and times in both simple and complex ways
If you use date.hour from datetime.now() you'll get the current hour:
datetimenow = datetime.now();
if datetimenow.hour in range(4, 5)
sleep(time_left_till_5am)
You can calculate time_left_till_5am by taking 60 - datetimenow.minute multiplying by 60 and adding to 60 - datetimenow.second.
Python has a built-in datetime library: http://docs.python.org/library/datetime.html
This should probably get you what you're after:
import datetime as dt
from time import sleep
now = dt.datetime.now()
if now.hour >= 4 andnow.hour < 5:
sleep((60 - now.minute)*60 + (60 - now.second))
OK, the above works, but here's the purer, less error-prone solution (and what I was originally thinking of but suddenly forgot how to do):
import datetime as dt
from time import sleep
now = dt.datetime.now()
pause = dt.datetime(now.year, now.month, now.day, 4)
start = dt.datetime(now.year, now.month, now.day, 5)
if now >= pause and now < start:
sleep((start - now).seconds)
That's where my original "timedelta" comment came from -- what you get from subtracting two datetime objects is a timedelta object (which in this case we pull the 'seconds' attribute from).
The following code covers the more general case where a script needs to pause during any fixed window of less than 24 hours duration. Example: must sleep between 11:00 PM and 01:00 AM.
import datetime as dt
def sleep_duration(sleep_from, sleep_to, now=None):
# sleep_* are datetime.time objects
# now is a datetime.datetime object
if now is None:
now = dt.datetime.now()
duration = 0
lo = dt.datetime.combine(now, sleep_from)
hi = dt.datetime.combine(now, sleep_to)
if lo <= now < hi:
duration = (hi - now).seconds
elif hi < lo:
if now >= lo:
duration = (hi + dt.timedelta(hours=24) - now).seconds
elif now < hi:
duration = (hi - now).seconds
return duration
tests = [
(4, 5, 3, 30),
(4, 5, 4, 0),
(4, 5, 4, 30),
(4, 5, 5, 0),
(4, 5, 5, 30),
(23, 1, 0, 0),
(23, 1, 0, 30),
(23, 1, 0, 59),
(23, 1, 1, 0),
(23, 1, 1, 30),
(23, 1, 22, 30),
(23, 1, 22, 59),
(23, 1, 23, 0),
(23, 1, 23, 1),
(23, 1, 23, 59),
]
for hfrom, hto, hnow, mnow in tests:
sfrom = dt.time(hfrom)
sto = dt.time(hto)
dnow = dt.datetime(2010, 7, 5, hnow, mnow)
print sfrom, sto, dnow, sleep_duration(sfrom, sto, dnow)
and here's the output:
04:00:00 05:00:00 2010-07-05 03:30:00 0
04:00:00 05:00:00 2010-07-05 04:00:00 3600
04:00:00 05:00:00 2010-07-05 04:30:00 1800
04:00:00 05:00:00 2010-07-05 05:00:00 0
04:00:00 05:00:00 2010-07-05 05:30:00 0
23:00:00 01:00:00 2010-07-05 00:00:00 3600
23:00:00 01:00:00 2010-07-05 00:30:00 1800
23:00:00 01:00:00 2010-07-05 00:59:00 60
23:00:00 01:00:00 2010-07-05 01:00:00 0
23:00:00 01:00:00 2010-07-05 01:30:00 0
23:00:00 01:00:00 2010-07-05 22:30:00 0
23:00:00 01:00:00 2010-07-05 22:59:00 0
23:00:00 01:00:00 2010-07-05 23:00:00 7200
23:00:00 01:00:00 2010-07-05 23:01:00 7140
23:00:00 01:00:00 2010-07-05 23:59:00 3660
When dealing with dates and times in Python I still prefer mxDateTime over Python's datetime module as although the built-in one has improved greatly over the years it's still rather awkward and lacking in comparison. So if interested go here: mxDateTime It's free to download and use. Makes life much easier when dealing with datetime math.
import mx.DateTime as dt
from time import sleep
now = dt.now()
if 4 <= now.hour < 5:
stop = dt.RelativeDateTime(hour=5, minute=0, second=0)
secs_remaining = ((now + stop) - now).seconds
sleep(secs_remaining)