Does anyone know how to parse YYYY Week into a date column in Polars?
I have tried this code but it throws an error. Thx
import polars as pl
pl.DataFrame(
{
"week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col('week').cast(pl.Utf8).str.strptime(pl.Date, fmt='%Y%U').alias("date"))
This seems like a bug (although one with the underlying rust package chrono rather than polars itself). I tried using base python's strptime and it ignores the %U and just gives the first of the year for all cases so you can either do string manipulation and math like this (assuming you don't need an exact response)
pl.DataFrame({
"week": [201901, 201902, 201903, 201942, 201943, 201944]
}) \
.with_columns(pl.col('week').cast(pl.Utf8)) \
.with_columns([pl.col('week').str.slice(0,4).cast(pl.Int32).alias('year'),
pl.col('week').str.slice(4,2).cast(pl.Int32).alias('week')]) \
.select(pl.date(pl.col('year'),1,1) + pl.duration(days=(pl.col('week')-1)*7).alias('date'))
If you look at the definition of %U, it's supposed to be based the xth Sunday of the year whereas my math is just multiplying by 7.
Another approach is to make a df of dates, then make the strftime of them and then join the dfs. So that might be like this:
dfdates=pl.DataFrame({'date':pl.date_range(datetime(2019,1,1), datetime(2019,12,31),'1d').cast(pl.Date())}) \
.with_columns(pl.col('date').dt.strftime("%Y%U").alias('week')) \
.groupby('week').agg(pl.col('date').min())
And then joining it with what you have
pl.DataFrame({
"week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col('week').cast(pl.Utf8())).join(dfdates, on='week')
shape: (6, 2)
┌────────┬────────────┐
│ week ┆ date │
│ --- ┆ --- │
│ str ┆ date │
╞════════╪════════════╡
│ 201903 ┆ 2019-01-20 │
│ 201944 ┆ 2019-11-03 │
│ 201902 ┆ 2019-01-13 │
│ 201943 ┆ 2019-10-27 │
│ 201942 ┆ 2019-10-20 │
│ 201901 ┆ 2019-01-06 │
└────────┴────────────┘
That's really weird mate, looks like only dates on 2019 are broken, take a look at my example bellow:
pl.DataFrame(
{
"week": [
202201,
202202,
202203,
202242,
202243,
202244,
202101,
202102,
202103,
202142,
202143,
202144,
201901,
201902,
201903,
201942,
201943,
201944,
201801,
201802,
201803,
201842,
201843,
201844,
]
}
).with_columns(pl.format("{}0", "week")).with_columns(
pl.col("week").str.strptime(pl.Date, fmt="%Y%W%w", strict=False).alias("teste")
)
shape: (24, 2)
┌─────────┬────────────┐
│ week ┆ teste │
│ --- ┆ --- │
│ str ┆ date │
╞═════════╪════════════╡
│ 2022010 ┆ 2022-01-09 │
│ 2022020 ┆ 2022-01-16 │
│ 2022030 ┆ 2022-01-23 │
│ 2022420 ┆ 2022-10-23 │
│ 2022430 ┆ 2022-10-30 │
│ 2022440 ┆ 2022-11-06 │
│ 2021010 ┆ 2021-01-10 │
│ 2021020 ┆ 2021-01-17 │
│ 2021030 ┆ 2021-01-24 │
│ 2021420 ┆ 2021-10-24 │
│ 2021430 ┆ 2021-10-31 │
│ 2021440 ┆ 2021-11-07 │
│ 2019010 ┆ null │
│ 2019020 ┆ null │
│ 2019030 ┆ null │
│ 2019420 ┆ null │
│ 2019430 ┆ null │
│ 2019440 ┆ null │
│ 2018010 ┆ 2018-01-07 │
│ 2018020 ┆ 2018-01-14 │
│ 2018030 ┆ 2018-01-21 │
│ 2018420 ┆ 2018-10-21 │
│ 2018430 ┆ 2018-10-28 │
│ 2018440 ┆ 2018-11-04 │
└─────────┴────────────┘
Besides the bug I always use the following expression to parse week counts to proper dates
.with_columns(pl.format("{}0", "week")).with_columns(pl.col("week").str.strptime(pl.Date, fmt="%Y%W%w", strict=False)
It is important take note that is necessary to concatenate a weekday, to really parse this pattern, I think this is mentioned on the other post comments.
Related
I have a large Polars dataframe that I'd like to split into n number of dataframes given the size. Like take dataframe and split it into 2 or 3 or 5 dataframes.
There are several observations that will show up for each column and would like to choose splitting into a chosen number of dataframes. A simple example is like the following where I am splitting on a specific id, but would like to have similar behave, but more like split into 2 approximately even dataframes since the full example has a large number of identifiers.
df = pl.DataFrame({'Identifier': [1234,1234, 2345,2345],
'DateColumn': ['2022-02-13','2022-02-14', '2022-02-13',
'2022-02-14']
})
df2 = df.with_columns(
[
pl.col('DateColumn').str.strptime(pl.Date).cast(pl.Date)
]
)
print(df)
┌────────────┬────────────┐
│ Identifier ┆ DateColumn │
│ --- ┆ --- │
│ i64 ┆ str │
╞════════════╪════════════╡
│ 1234 ┆ 2022-02-13 │
│ 1234 ┆ 2022-02-14 │
│ 2345 ┆ 2022-02-13 │
│ 2345 ┆ 2022-02-14 │
└────────────┴────────────┘
df1 = df.filter(
pl.col('Identifier')==1234
)
df2 = df.filter(
pl.col('Identifier')==2345
)
print(df1)
shape: (2, 2)
┌────────────┬────────────┐
│ Identifier ┆ DateColumn │
│ --- ┆ --- │
│ i64 ┆ str │
╞════════════╪════════════╡
│ 1234 ┆ 2022-02-13 │
│ 1234 ┆ 2022-02-14 │
└────────────┴────────────┘
print(df2)
┌────────────┬────────────┐
│ Identifier ┆ DateColumn │
│ --- ┆ --- │
│ i64 ┆ str │
╞════════════╪════════════╡
│ 2345 ┆ 2022-02-13 │
│ 2345 ┆ 2022-02-14 │
└────────────┴────────────┘
If you want to divide your DataFrame by let's say your identifier, the best way to do so is use the partition_by method.
df = pl.DataFrame({
"foo": ["A", "A", "B", "B", "C"],
"N": [1, 2, 2, 4, 2],
"bar": ["k", "l", "m", "m", "l"],
})
df.partition_by(groups="foo", maintain_order=True)
[shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ N ┆ bar │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ A ┆ 1 ┆ k │
│ A ┆ 2 ┆ l │
└─────┴─────┴─────┘,
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ N ┆ bar │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ B ┆ 2 ┆ m │
│ B ┆ 4 ┆ m │
└─────┴─────┴─────┘,
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ N ┆ bar │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ C ┆ 2 ┆ l │
└─────┴─────┴─────┘]
https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.partition_by.html
This automatically divides the DataFrame by values in a column.
I have a collection of polars expressions being used to generate features for an ML model. I'd like to add a poission cdf feature to this collection whilst maintaining lazy execution (with benefits of speed, caching etc...). I so far have not found an easy way of achieving this.
I've been able to get the result I'd like outside of the desired lazy expression framework with:
import polars as pl
from scipy.stats import poisson
df = pl.DataFrame({"count": [9,2,3,4,5], "expected_count": [7.7, 0.2, 0.7, 1.1, 7.5]})
result = poisson.cdf(df["count"].to_numpy(), df["expected_count"].to_numpy())
df = df.with_column(pl.Series(result).alias("poission_cdf"))
However, in reality I'd like this to look like:
df = pl.DataFrame({"count": [9,2,3,4,5], "expected_count": [7.7, 0.2, 0.7, 1.1, 7.5]})
df = df.select(
[
... # bunch of other expressions here
poisson_cdf()
]
)
where poisson_cdf is some polars expression like:
def poisson_cdf():
# this is just for illustration, clearly wont work
return scipy.stats.poisson.cdf(pl.col("count"), pl.col("expected_count")).alias("poisson_cdf")
I also tried using a struct made up of "count" and "expected_count" and apply like advised in the docs when applying custom functions. However, my dataset is several millions of rows in reality - leading to absurd execution time.
Any advice or guidance here would be appreciated. Ideally there exists an expression like this somewhere out there? Thanks in advance!
If scipy.stats.poisson.cdf was implemented as a proper numpy universal function, it would be possible to use it directly on polars expressions, but it is not. Fortunately, Poisson CDF is almost the same as regularized upper incomplete gamma function for which scipy supplies gammaincc which can be used in polars expressions:
>>> import polars as pl
>>> from scipy.special import gammaincc
>>> df = pl.select(pl.arange(0, 10).alias('k'))
>>> df.with_columns(cdf=gammaincc(pl.col('k') + 1, 4.0))
shape: (10, 2)
┌─────┬──────────┐
│ k ┆ cdf │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪══════════╡
│ 0 ┆ 0.018316 │
│ 1 ┆ 0.091578 │
│ 2 ┆ 0.238103 │
│ 3 ┆ 0.43347 │
│ ... ┆ ... │
│ 6 ┆ 0.889326 │
│ 7 ┆ 0.948866 │
│ 8 ┆ 0.978637 │
│ 9 ┆ 0.991868 │
└─────┴──────────┘
The result is the same as returned by poisson.cdf:
>>> _.with_columns(cdf2=pl.lit(poisson.cdf(df['k'], 4)))
shape: (10, 3)
┌─────┬──────────┬──────────┐
│ k ┆ cdf ┆ cdf2 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 │
╞═════╪══════════╪══════════╡
│ 0 ┆ 0.018316 ┆ 0.018316 │
│ 1 ┆ 0.091578 ┆ 0.091578 │
│ 2 ┆ 0.238103 ┆ 0.238103 │
│ 3 ┆ 0.43347 ┆ 0.43347 │
│ ... ┆ ... ┆ ... │
│ 6 ┆ 0.889326 ┆ 0.889326 │
│ 7 ┆ 0.948866 ┆ 0.948866 │
│ 8 ┆ 0.978637 ┆ 0.978637 │
│ 9 ┆ 0.991868 ┆ 0.991868 │
└─────┴──────────┴──────────┘
It sounds like you want to use .map() instead of .apply() - which will pass whole columns at once.
df.select([
pl.all(),
# ...
pl.struct(["count", "expected_count"])
.map(lambda x:
poisson.cdf(x.struct.field("count"), x.struct.field("expected_count")))
.flatten()
.alias("poisson_cdf")
])
shape: (5, 3)
┌───────┬────────────────┬─────────────┐
│ count | expected_count | poisson_cdf │
│ --- | --- | --- │
│ i64 | f64 | f64 │
╞═══════╪════════════════╪═════════════╡
│ 9 | 7.7 | 0.75308 │
│ 2 | 0.2 | 0.998852 │
│ 3 | 0.7 | 0.994247 │
│ 4 | 1.1 | 0.994565 │
│ 5 | 7.5 | 0.241436 │
└───────┴────────────────┴─────────────┘
You want to take advantage of the fact that scipy has a set of functions which are numpy ufuncs as those
still have fast columnar operation through the NumPy API.
Specifically you want the pdtr function.
You then want to use reduce rather than map or apply as those are for generic python functions and aren't going to perform as well.
So if we have...
df = pl.DataFrame({"count": [9,2,3,4,5], "expected_count": [7.7, 0.2, 0.7, 1.1, 7.5]})
result = poisson.cdf(df["count"].to_numpy(), df["expected_count"].to_numpy())
df = df.with_columns(pl.Series(result).alias("poission_cdf"))
then we can add to it with
df=df.with_columns([
pl.reduce(f=pdtr, exprs=[pl.col('count'),pl.col('expected_count')]).alias("poicdf")
])
df
shape: (5, 4)
┌───────┬────────────────┬──────────────┬──────────┐
│ count ┆ expected_count ┆ poission_cdf ┆ poicdf │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════╪════════════════╪══════════════╪══════════╡
│ 9 ┆ 7.7 ┆ 0.75308 ┆ 0.75308 │
│ 2 ┆ 0.2 ┆ 0.998852 ┆ 0.998852 │
│ 3 ┆ 0.7 ┆ 0.994247 ┆ 0.994247 │
│ 4 ┆ 1.1 ┆ 0.994565 ┆ 0.994565 │
│ 5 ┆ 7.5 ┆ 0.241436 ┆ 0.241436 │
└───────┴────────────────┴──────────────┴──────────┘
You can see it gives the same answer.
I have encountered a message as in the title when running some simple polars code. Example code and its outputs are provided below:
import datetime
import polars as pl
df = pl.DataFrame(
{
"a": [datetime.date(2022, 12, 1), datetime.date(2022, 1, 1)],
"b": [datetime.date(2021, 12, 1), datetime.date(2000, 1, 1)],
}
)
>>> df.with_columns([(pl.col("a").dt.year() + pl.col("b").dt.month()).alias("diff")])
[/home/runner/work/polars/polars/polars/polars-lazy/polars-plan/src/logical_plan/optimizer/simplify_expr.rs:456] eval_binary_same_type!(left_aexpr, +, right_aexpr) = None
[/home/runner/work/polars/polars/polars/polars-lazy/polars-plan/src/logical_plan/optimizer/simplify_expr.rs:456] eval_binary_same_type!(left_aexpr, +, right_aexpr) = None
shape: (2, 3)
┌────────────┬────────────┬──────┐
│ a ┆ b ┆ diff │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ i64 │
╞════════════╪════════════╪══════╡
│ 2022-12-01 ┆ 2021-12-01 ┆ 2034 │
│ 2022-01-01 ┆ 2000-01-01 ┆ 2023 │
└────────────┴────────────┴──────┘
>>> df.with_columns([(pl.col("a").dt.year().cast(pl.Int32) + pl.col("b").dt.month().cast(pl.Int32)).alias("diff")])
[/home/runner/work/polars/polars/polars/polars-lazy/polars-plan/src/logical_plan/optimizer/simplify_expr.rs:456] eval_binary_same_type!(left_aexpr, +, right_aexpr) = None
shape: (2, 3)
┌────────────┬────────────┬──────┐
│ a ┆ b ┆ diff │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ i32 │
╞════════════╪════════════╪══════╡
│ 2022-12-01 ┆ 2021-12-01 ┆ 2034 │
│ 2022-01-01 ┆ 2000-01-01 ┆ 2023 │
└────────────┴────────────┴──────┘
>>> df.with_columns([(pl.col("a").dt.year() - pl.col("b").dt.month()).alias("diff")])
shape: (2, 3)
┌────────────┬────────────┬──────┐
│ a ┆ b ┆ diff │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ i64 │
╞════════════╪════════════╪══════╡
│ 2022-12-01 ┆ 2021-12-01 ┆ 2010 │
│ 2022-01-01 ┆ 2000-01-01 ┆ 2021 │
└────────────┴────────────┴──────┘
I am curious about what is the meaning of this message.
The first expression gives me two such messages. And I suspect it should be somehow related to type differences.
So, in the second expression, I cast them to the same type, but this time I still get one such message (less then 2 in the first time though).
However, in the third expression, I get no message, and the only difference between the first one is minus instead of plus. So, it makes me more confusing.
It would be great if someone could help me understand this message and what is the implications of it.
Thanks for your help.
Looks like it was just a debug statement left in the code, should be fixed in 0.16.1
https://github.com/pola-rs/polars/issues/6540
I am trying to count the number of letters in a string in Polars.
I could probably just use an apply method and get the len(Name).
However, I was wondering if there is a polars specific method?
import polars as pl
mydf = pl.DataFrame(
{"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"],
"Name": ["John", "Joe", "James"]})
print(mydf)
│start_date ┆ Name │
│ --- ┆ --- │
│ str ┆ str │
╞════════════╪═══════╡
│ 2020-01-02 ┆ John │
│ 2020-01-03 ┆ Joe │
│ 2020-01-04 ┆ James │
In the end John would have 5, Joe would be 3 and James would be 5
I thought something like below might work based on the Pandas equivalent
# Assume that its a Pandas Dataframe
mydf['count'] = mydf ['Name'].str.len()
# Polars equivalent - ERRORs
mydf = mydf.with_columns(
pl.col('Name').str.len().alias('count')
)
You can use
.str.lengths() that counts number of bytes in the UTF8 string (doc) - faster
.str.n_chars() that counts number of characters (doc)
mydf.with_columns([
pl.col("Name").str.lengths().alias("len")
])
┌────────────┬───────┬─────┐
│ start_date ┆ Name ┆ len │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ u32 │
╞════════════╪═══════╪═════╡
│ 2020-01-02 ┆ John ┆ 4 │
│ 2020-01-03 ┆ Joe ┆ 3 │
│ 2020-01-04 ┆ James ┆ 5 │
└────────────┴───────┴─────┘
I have a table like this in polars:
arrival_time
Train
08:40:10
112
19:31:26
134
An I have another table that defines the period of the day based on the hours:
Time
Period
08:00:00
Early
16:00:00
Afternoon
What I am trying to achieve is a Polars way to combine both tables and obtain in the first table the Period of the day as a new column:
arrival_time
Train
Period
08:40:10
112
Early
19:31:26
134
Afternoon
For now what I am doing is working entirely with dictionaries, zipping the 2 columns of my comparison table and computing the key of the minimium distance beetween the 2 time columns:
min(dict(zip( df.Period,df.Time)).items(), key=lambda x: abs(pl.col('arrival_time') - x[1]))[0])
But i am certainly sure that there's a better way to process in Polars.
Polars has join_asof which joins to the closest key forward or backward in time.
from datetime import time
df_a = pl.DataFrame({
"arrival_time": [time(8, 40, 10), time(19, 31, 26)],
"train": [112, 134]
})
df_b = pl.DataFrame({
"arrival_time": [time(8), time(16)],
"period": ["early", "afternoon"]
})
print(df_a.join_asof(df_b, on="arrival_time"))
shape: (2, 3)
┌──────────────┬───────┬───────────┐
│ arrival_time ┆ train ┆ period │
│ --- ┆ --- ┆ --- │
│ time ┆ i64 ┆ str │
╞══════════════╪═══════╪═══════════╡
│ 08:40:10 ┆ 112 ┆ early │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 19:31:26 ┆ 134 ┆ afternoon │
└──────────────┴───────┴───────────┘
Buildling on #ritchie46 answer, if you need to find the closest value, you can apply join_asof in both the forward and backward direction, and choose the result that is closest.
For example, if we start with this data:
df_a = pl.DataFrame(
{"arrival_time": [time(8, 40, 10), time(8, 59, 0), time(
19, 31, 26)], "train": [112, 113, 134]}
)
df_b = pl.DataFrame(
{
"arrival_time": [
time(8),
time(8, 30, 0),
time(9),
time(16),
time(16, 30, 0),
time(17),
],
"period": [
"early morning",
"mid-morning",
"late morning",
"early afternoon",
"mid afternoon",
"late afternoon",
],
}
)
df_a
df_b
shape: (3, 2)
┌──────────────┬───────┐
│ arrival_time ┆ train │
│ --- ┆ --- │
│ time ┆ i64 │
╞══════════════╪═══════╡
│ 08:40:10 ┆ 112 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 08:59:00 ┆ 113 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 19:31:26 ┆ 134 │
└──────────────┴───────┘
>>> df_b
shape: (6, 2)
┌──────────────┬─────────────────┐
│ arrival_time ┆ period │
│ --- ┆ --- │
│ time ┆ str │
╞══════════════╪═════════════════╡
│ 08:00:00 ┆ early morning │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 08:30:00 ┆ mid-morning │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 09:00:00 ┆ late morning │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 16:00:00 ┆ early afternoon │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 16:30:00 ┆ mid afternoon │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 17:00:00 ┆ late afternoon │
└──────────────┴─────────────────┘
We see that train 113 arrives closer to late morning than to mid-morning. If you need to capture that, you can join with both backward and forward, and then choose which is closest:
(
df_a
.join_asof(df_b.with_column(pl.col('arrival_time').alias('early_time')), on="arrival_time", strategy="backward", suffix="_early")
.join_asof(df_b.with_column(pl.col('arrival_time').alias('late_time')), on="arrival_time", strategy="forward", suffix="_late")
.with_column(
pl.when(pl.col('early_time').is_null())
.then(pl.col('period_late'))
.when(pl.col('late_time').is_null())
.then(pl.col('period'))
.when((pl.col('arrival_time').cast(pl.Datetime) - pl.col('early_time').cast(pl.Datetime)) <
(pl.col('late_time').cast(pl.Datetime) - pl.col('arrival_time').cast(pl.Datetime)))
.then(pl.col('period'))
.otherwise(pl.col('period_late'))
.alias('closest')
)
)
shape: (3, 7)
┌──────────────┬───────┬────────────────┬────────────┬──────────────┬───────────┬────────────────┐
│ arrival_time ┆ train ┆ period ┆ early_time ┆ period_late ┆ late_time ┆ closest │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ time ┆ i64 ┆ str ┆ time ┆ str ┆ time ┆ str │
╞══════════════╪═══════╪════════════════╪════════════╪══════════════╪═══════════╪════════════════╡
│ 08:40:10 ┆ 112 ┆ mid-morning ┆ 08:30:00 ┆ late morning ┆ 09:00:00 ┆ mid-morning │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 08:59:00 ┆ 113 ┆ mid-morning ┆ 08:30:00 ┆ late morning ┆ 09:00:00 ┆ late morning │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 19:31:26 ┆ 134 ┆ late afternoon ┆ 17:00:00 ┆ null ┆ null ┆ late afternoon │
└──────────────┴───────┴────────────────┴────────────┴──────────────┴───────────┴────────────────┘
Note that in the when/then/otherwise I've chosen to explicitly handle the case where a train arrives before the first period or after the last period.