pandas: join dataframes based on time interval - python

I have a data frame with a datetime column every 10 minutes and a numerical value:
df1 = pd.DataFrame({'time' : pd.date_range('1/1/2018', periods=20, freq='10min'), 'value' : np.random.randint(2, 20, size=20)})
And another with a schedule of events, with a start time and end time. There can be multiple events happening at the same time:
df2 = pd.DataFrame({'start_time' : ['2018-01-01 00:00:00', '2018-01-01 00:00:00','2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00' ], 'end_time' : ['2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00','2018-01-01 02:00:00', '2018-01-01 02:00:00', '2018-01-01 03:00:00'], 'event' : ['A', 'B', 'C', 'D', 'E', 'F'] })
df2[['start_time', 'end_time']] = df2.iloc[:,0:2].apply(pd.to_datetime)
I want to do a left join on df1, with all events that fall inside the start and end times. My output table should be:
time value event
0 2018-01-01 00:00:00 5 A
1 2018-01-01 00:00:00 5 B
2 2018-01-01 00:10:00 15 A
3 2018-01-01 00:10:00 15 B
4 2018-01-01 00:20:00 16 A
5 2018-01-01 00:20:00 16 B
.....
17 2018-01-01 02:50:00 7 F
I attempted these SO solutions, but they fail because of duplicate time intervals.

Setup (Only using a few entries from df1 for brevity):
df1 = pd.DataFrame({'time' : pd.date_range('1/1/2018', periods=20, freq='10min'), 'value' : np.random.randint(2, 20, size=20)})
df2 = pd.DataFrame({'start_time' : ['2018-01-01 00:00:00', '2018-01-01 00:00:00','2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00' ], 'end_time' : ['2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00','2018-01-01 02:00:00', '2018-01-01 02:00:00', '2018-01-01 03:00:00'], 'event' : ['A', 'B', 'C', 'D', 'E', 'F'] })
df1 = df1.sample(5)
df2[['start_time', 'end_time']] = df2.iloc[:,0:2].apply(pd.to_datetime)
You can use a couple of straightfoward list comprehensions to achieve your result. This answer assumes that all date columns are in fact, of type datetime in your DataFrame:
Step 1
Find all events that occur within a particular time range using a list comprehension and simple interval checking:
packed = list(zip(df2.start_time, df2.end_time, df2.event))
df1['event'] = [[ev for strt, end, ev in packed if strt <= el <= end] for el in df1.time]
time value event
2 2018-01-01 00:20:00 8 [A, B]
14 2018-01-01 02:20:00 14 [F]
8 2018-01-01 01:20:00 6 [C, D, E]
19 2018-01-01 03:10:00 16 []
4 2018-01-01 00:40:00 7 [A, B]
Step 2:
Finally, explode each list from the last result to a new row using another list comprehension:
pd.DataFrame(
[[t, val, e] for t, val, event in zip(df1.time, df1.value, df1.event)
for e in event
], columns=df1.columns
)
Output:
time value event
0 2018-01-01 00:20:00 8 A
1 2018-01-01 00:20:00 8 B
2 2018-01-01 02:20:00 14 F
3 2018-01-01 01:20:00 6 C
4 2018-01-01 01:20:00 6 D
5 2018-01-01 01:20:00 6 E
6 2018-01-01 00:40:00 7 A
7 2018-01-01 00:40:00 7 B

I'm not entirely sure of your question, but if you are trying to join on "events that fall inside the start and end times," then sounds like you need something akin to a "between" operator from SQL. You're data doesn't make it particularly clear.
Pandas doesn't have this natively, but Pandasql does. It allows you to run sqlite against you're dataframe. I think something like this is what you need:
import pandasql as ps
sqlcode = '''
select *
from df1
inner join df2 on df1.event=df2.event
where df2.time >= d1.start_time and df2.fdate <= d1.stop_time
'''
newdf = ps.sqldf(sqlcode,locals())
Relevant Question:
Merge pandas dataframes where one value is between two others

One option is with the conditional_join from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
out = df1.conditional_join(
df2,
('time', 'start_time', '>='),
('time', 'end_time', '<=')
)
out.head()
time value start_time end_time event
0 2018-01-01 00:00:00 14 2018-01-01 2018-01-01 01:00:00 A
1 2018-01-01 00:00:00 14 2018-01-01 2018-01-01 01:00:00 B
2 2018-01-01 00:10:00 10 2018-01-01 2018-01-01 01:00:00 A
3 2018-01-01 00:10:00 10 2018-01-01 2018-01-01 01:00:00 B
4 2018-01-01 00:20:00 15 2018-01-01 2018-01-01 01:00:00 A

You can work on df2 to create a column with all the time with a resampling '10min' (like in df1) for each event, and then use merge. It's a lot of manipulation so probably not the most efficient.
df2_manip = (df2.set_index('event').stack().reset_index().set_index(0)
.groupby('event').resample('10T').ffill().reset_index(1))
and df2_manip looks like:
0 event level_1
event
A 2018-01-01 00:00:00 A start_time
A 2018-01-01 00:10:00 A start_time
A 2018-01-01 00:20:00 A start_time
A 2018-01-01 00:30:00 A start_time
A 2018-01-01 00:40:00 A start_time
A 2018-01-01 00:50:00 A start_time
A 2018-01-01 01:00:00 A end_time
B 2018-01-01 00:00:00 B start_time
B 2018-01-01 00:10:00 B start_time
B 2018-01-01 00:20:00 B start_time
B 2018-01-01 00:30:00 B start_time
...
Now you can merge:
df1 = df1.merge(df2_manip[[0, 'event']].rename(columns={0:'time'}))
and you get df1:
time value event
0 2018-01-01 00:00:00 9 A
1 2018-01-01 00:00:00 9 B
2 2018-01-01 00:10:00 16 A
3 2018-01-01 00:10:00 16 B
...
33 2018-01-01 02:00:00 6 D
34 2018-01-01 02:00:00 6 E
35 2018-01-01 02:00:00 6 F
36 2018-01-01 02:10:00 2 F
37 2018-01-01 02:20:00 18 F
38 2018-01-01 02:30:00 14 F
39 2018-01-01 02:40:00 5 F
40 2018-01-01 02:50:00 3 F
41 2018-01-01 03:00:00 9 F

Related

How do i identify an entity (merchant) and perform operations on the entries for that entity in pandas dataframe?

I am trying to perform operations on the time column for each unique merchant (calculate time between transactions). How do I access the individual merchants in an iteration? is there a way to do that in python?
Thank you.
Assuming, time is already a datetime64. Use groupby_diff:
df['delta'] = df.groupby('merchant')['time'].diff()
print(df)
# Output
merchant time delta
0 A 2022-01-01 16:00:00 NaT
1 A 2022-01-01 16:30:00 0 days 00:30:00
2 A 2022-01-01 17:00:00 0 days 00:30:00
3 B 2022-01-01 10:00:00 NaT
4 B 2022-01-01 11:00:00 0 days 01:00:00
5 B 2022-01-01 12:00:00 0 days 01:00:00
If you want to compute the mean between transactions per merchant, use:
out = df.groupby('merchant', as_index=False)['time'].apply(lambda x: x.diff().mean())
print(out)
# Output
merchant time
0 A 0 days 00:30:00
1 B 0 days 01:00:00
Setup:
data = {'merchant': ['A', 'A', 'A', 'B', 'B', 'B'],
'time': [pd.Timestamp('2022-01-01 16:00:00'),
pd.Timestamp('2022-01-01 16:30:00'),
pd.Timestamp('2022-01-01 17:00:00'),
pd.Timestamp('2022-01-01 10:00:00'),
pd.Timestamp('2022-01-01 11:00:00'),
pd.Timestamp('2022-01-01 12:00:00')]}
df = pd.DataFrame(data)

Pandas calculate result dataframe from a dataframe of multiple trades at same timestamp

I have a dataframe containing trades with duplicated timestamps and buy and sell orders divided over several rows. In my example the total order amount is the sum over the same timestamp for that particular stock. I have created a simplified dataframe to show how the data looks like.
I would like to end up with an dataframe with results from the trades and a trading ID for each trades.
All trades are long positions, ie buy and try to sell at a higher price.
The ID column for the desired output df2 is answered in this thread Create ID column in a pandas dataframe
import pandas as pd
from datetime import datetime
import numpy as np
string_date =['2018-01-01 01:00:00',
'2018-01-01 01:00:00',
'2018-01-01 01:00:00',
'2018-01-01 01:00:00',
'2018-01-01 02:00:00',
'2018-01-01 03:00:00',
'2018-01-01 03:00:00',
'2018-01-01 03:00:00',
'2018-01-01 04:00:00',
'2018-01-01 04:00:00',
'2018-01-01 04:00:00',
'2018-01-01 07:00:00',
'2018-01-01 07:00:00',
'2018-01-01 07:00:00',
'2018-01-01 08:00:00',
'2018-01-01 08:00:00',
'2018-01-01 08:00:00',
'2018-02-01 12:00:00',
]
data ={'stock': ['A','A','A','A','B','A','A','A','C','C','C','B','B','B','C','C','C','B'],
'deal': ['buy', 'buy', 'buy','buy','buy','sell','sell','sell','buy','buy','buy','sell','sell','sell','sell','sell','sell','buy'],
'amount':[1,2,3,4,10,8,1,1,3,2,5,2,2,6,3,3,4,5],
'price':[10,10,10,10,2,20,20,20,3,3,3,1,1,1,2,2,2,11]}
df = pd.DataFrame(data, index =string_date)
df
Out[245]:
stock deal amount price
2018-01-01 01:00:00 A buy 1 10
2018-01-01 01:00:00 A buy 2 10
2018-01-01 01:00:00 A buy 3 10
2018-01-01 01:00:00 A buy 4 10
2018-01-01 02:00:00 B buy 10 2
2018-01-01 03:00:00 A sell 8 20
2018-01-01 03:00:00 A sell 1 20
2018-01-01 03:00:00 A sell 1 20
2018-01-01 04:00:00 C buy 3 3
2018-01-01 04:00:00 C buy 2 3
2018-01-01 04:00:00 C buy 5 3
2018-01-01 07:00:00 B sell 2 1
2018-01-01 07:00:00 B sell 2 1
2018-01-01 07:00:00 B sell 6 1
2018-01-01 08:00:00 C sell 3 2
2018-01-01 08:00:00 C sell 3 2
2018-01-01 08:00:00 C sell 4 2
2018-02-01 12:00:00 B buy 5 11
One desired output:
string_date2 =['2018-01-01 01:00:00',
'2018-01-01 02:00:00',
'2018-01-01 03:00:00',
'2018-01-01 04:00:00',
'2018-01-01 07:00:00',
'2018-01-01 08:00:00',
'2018-01-02 12:00:00',
]
data2 ={'stock': ['A','B', 'A', 'C', 'B','C','B'],
'deal': ['buy', 'buy','sell','buy','sell','sell','buy'],
'amount':[10,10,10,10,10,10,5],
'price':[10,2,20,3,1,2,11],
'ID': ['1', '2','1','3','2','3','4']
}
df2 = pd.DataFrame(data2, index =string_date2)
df2
Out[226]:
stock deal amount price ID
2018-01-01 01:00:00 A buy 10 10 1
2018-01-01 02:00:00 B buy 10 2 2
2018-01-01 03:00:00 A sell 10 20 1
2018-01-01 04:00:00 C buy 10 3 3
2018-01-01 07:00:00 B sell 10 1 2
2018-01-01 08:00:00 C sell 10 2 3
2018-01-02 12:00:00 B buy 5 11 4
Any ideas?
This solution assumes a 'Long Only' portfolio where short sales are not allowed. Once a position is opened for a given stock, the transaction is assigned a new trade ID. Increasing the position in that stock results in the same trade ID, as well as any sell transactions reducing the size of the position (including the final sale where the position quantity is reduced to zero). A subsequent buy transaction in that same stock results in a new trade ID.
In order to maintain consistent trade identifiers with a growing log of transactions, I created a class TradeTracker to track and assign trade identifiers for each transaction.
import numpy as np
import pandas as pd
# Create sample dataframe.
dates = [
'2018-01-01 01:00:00',
'2018-01-01 01:01:00',
'2018-01-01 01:02:00',
'2018-01-01 01:03:00',
'2018-01-01 02:00:00',
'2018-01-01 03:00:00',
'2018-01-01 03:01:00',
'2018-01-01 03:03:00',
'2018-01-01 04:00:00',
'2018-01-01 04:01:00',
'2018-01-01 04:02:00',
'2018-01-01 07:00:00',
'2018-01-01 07:01:00',
'2018-01-01 07:02:00',
'2018-01-01 08:00:00',
'2018-01-01 08:01:00',
'2018-01-01 08:02:00',
'2018-02-01 12:00:00',
'2018-03-01 12:00:00',
]
data = {
'stock': ['A','A','A','A','B','A','A','A','C','C','C','B','B','B','C','C','C','B','A'],
'deal': ['buy', 'buy', 'buy', 'buy', 'buy', 'sell', 'sell', 'sell', 'buy', 'buy', 'buy',
'sell', 'sell', 'sell', 'sell', 'sell', 'sell', 'buy', 'buy'],
'amount': [1, 2, 3, 4, 10, 8, 1, 1, 3, 2, 5, 2, 2, 6, 3, 3, 4, 5, 10],
'price': [10, 10, 10, 10, 2, 20, 20, 20, 3, 3, 3, 1, 1, 1, 2, 2, 2, 11, 15]
}
df = pd.DataFrame(data, index=pd.to_datetime(dates))
>>> df
stock deal amount price
2018-01-01 01:00:00 A buy 1 10
2018-01-01 01:01:00 A buy 2 10
2018-01-01 01:02:00 A buy 3 10
2018-01-01 01:03:00 A buy 4 10
2018-01-01 02:00:00 B buy 10 2
2018-01-01 03:00:00 A sell 8 20
2018-01-01 03:01:00 A sell 1 20
2018-01-01 03:03:00 A sell 1 20
2018-01-01 04:00:00 C buy 3 3
2018-01-01 04:01:00 C buy 2 3
2018-01-01 04:02:00 C buy 5 3
2018-01-01 07:00:00 B sell 2 1
2018-01-01 07:01:00 B sell 2 1
2018-01-01 07:02:00 B sell 6 1
2018-01-01 08:00:00 C sell 3 2
2018-01-01 08:01:00 C sell 3 2
2018-01-01 08:02:00 C sell 4 2
2018-02-01 12:00:00 B buy 5 11
2018-03-01 12:00:00 A buy 10 15
# Add `position` column representing the cumulative buys and sells for a given stock.
df['position'] = (
df
.assign(temp_amount=np.where(df['deal'].eq('buy'), df['amount'], -df['amount']))
.groupby(['stock'])['temp_amount']
.cumsum()
)
# Create a class to track trade identifiers and instantiate it.
class TradeTracker():
def __init__(self):
self.trade_counter = 0
self.trade_ids = {}
def get_trade_id(self, stock, position):
if position == 0:
trade_id = self.trade_ids.pop(stock)
elif stock not in self.trade_ids:
self.trade_counter += 1
self.trade_ids[stock] = trade_id = self.trade_counter
else:
trade_id = self.trade_ids[stock]
return trade_id
trade_tracker = TradeTracker()
# Add a `trade_id` column using our custom class in a list comprehension.
df['trade_id'] = [trade_tracker.get_trade_id(stock, position)
for stock, position in df[['stock', 'position']].to_numpy()]
>>> df
stock deal amount price position trade_id
2018-01-01 01:00:00 A buy 1 10 1 1
2018-01-01 01:01:00 A buy 2 10 3 1
2018-01-01 01:02:00 A buy 3 10 6 1
2018-01-01 01:03:00 A buy 4 10 10 1
2018-01-01 02:00:00 B buy 10 2 10 2
2018-01-01 03:00:00 A sell 8 20 2 1
2018-01-01 03:01:00 A sell 1 20 1 1
2018-01-01 03:03:00 A sell 1 20 0 1
2018-01-01 04:00:00 C buy 3 3 3 3
2018-01-01 04:01:00 C buy 2 3 5 3
2018-01-01 04:02:00 C buy 5 3 10 3
2018-01-01 07:00:00 B sell 2 1 8 2
2018-01-01 07:01:00 B sell 2 1 6 2
2018-01-01 07:02:00 B sell 6 1 0 2
2018-01-01 08:00:00 C sell 3 2 7 3
2018-01-01 08:01:00 C sell 3 2 4 3
2018-01-01 08:02:00 C sell 4 2 0 3
2018-02-01 12:00:00 B buy 5 11 5 4
2018-03-01 12:00:00 A buy 10 15 10 5
Changed your string_date to this:
In [2295]: string_date =['2018-01-01 01:00:00',
...: '2018-01-01 01:00:00',
...: '2018-01-01 01:00:00',
...: '2018-01-01 01:00:00',
...: '2018-01-01 02:00:00',
...: '2018-01-01 03:00:00',
...: '2018-01-01 03:00:00',
...: '2018-01-01 03:00:00',
...: '2018-01-01 04:00:00',
...: '2018-01-01 04:00:00',
...: '2018-01-01 04:00:00',
...: '2018-01-01 07:00:00',
...: '2018-01-01 07:00:00',
...: '2018-01-01 07:00:00',
...: '2018-01-01 08:00:00',
...: '2018-01-01 08:00:00',
...: '2018-01-01 08:00:00',
...: '2018-02-01 12:00:00',
...: ]
...:
So df now is:
In [2297]: df
Out[2297]:
stock deal amount price
2018-01-01 01:00:00 A buy 1 10
2018-01-01 01:00:00 A buy 2 10
2018-01-01 01:00:00 A buy 3 10
2018-01-01 01:00:00 A buy 4 10
2018-01-01 02:00:00 B buy 10 2
2018-01-01 03:00:00 A sell 8 20
2018-01-01 03:00:00 A sell 1 20
2018-01-01 03:00:00 A sell 1 20
2018-01-01 04:00:00 C buy 3 3
2018-01-01 04:00:00 C buy 2 3
2018-01-01 04:00:00 C buy 5 3
2018-01-01 07:00:00 B sell 2 1
2018-01-01 07:00:00 B sell 2 1
2018-01-01 07:00:00 B sell 6 1
2018-01-01 08:00:00 C sell 3 2
2018-01-01 08:00:00 C sell 3 2
2018-01-01 08:00:00 C sell 4 2
2018-02-01 12:00:00 B buy 5 11
You can use Groupby.agg:
In [2302]: x = df.reset_index().groupby(['index', 'stock', 'deal'], as_index=False).agg({'amount': 'sum', 'price': 'max'}).set_index('index')
In [2303]: m = x['deal'] == 'buy'
In [2305]: x['ID'] = m.cumsum().where(m)
In [2307]: x['ID'] = x.groupby('stock')['ID'].ffill()
In [2308]: x
Out[2308]:
stock deal amount price ID
index
2018-01-01 01:00:00 A buy 10 10 1.0
2018-01-01 02:00:00 B buy 10 2 2.0
2018-01-01 03:00:00 A sell 10 20 1.0
2018-01-01 04:00:00 C buy 10 3 3.0
2018-01-01 07:00:00 B sell 10 1 2.0
2018-01-01 08:00:00 C sell 10 2 3.0
2018-02-01 12:00:00 B buy 5 11 4.0

Error rounding time to previous 15 min - Python

I've developed a crude method to round timestamps to the previous 15 mins. For instance, if the timestamp is 8:10:00, it gets rounded to 8:00:00.
However, when it goes over 15 mins it rounds to the previous hour. For instance, if the timestamp was 8:20:00, it gets rounded to 7:00:00 for some reason? I'll list the two examples below.
Correct Rounding:
import pandas as pd
from datetime import datetime, timedelta
d = ({
'Time' : ['8:00:00'],
})
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
FirstTime = df['Time'].iloc[0]
def hour_rounder(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
-timedelta(hours=t.minute//15))
StartTime = hour_rounder(FirstTime)
StartTime = datetime.time(StartTime)
print(StartTime)
Out:
08:00:00
Incorrect Rounding:
import pandas as pd
from datetime import datetime, timedelta
d = ({
'Time' : ['8:20:00'],
})
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
FirstTime = df['Time'].iloc[0]
def hour_rounder(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
-timedelta(hours=t.minute//15))
StartTime = hour_rounder(FirstTime)
StartTime = datetime.time(StartTime)
print(StartTime)
Out:
07:00:00
I don't understand what I'm doing wrong?
- timedelta(hours=t.minute//15)
If minute is 20, then minute // 15 equals 1, so you're subtracting one hour.
Try this instead:
return t.replace(second=0, microsecond=0, minute=(t.minute // 15 * 15), hour=t.hour)
Use .dt.floor('15min') to round down to 15 minute invervals.
import pandas as pd
df = pd.DataFrame({'Time': pd.date_range('2018-01-01', freq='13.141min', periods=13)})
df['prev_15'] = df.Time.dt.floor('15min')
Output:
Time prev_15
0 2018-01-01 00:00:00.000 2018-01-01 00:00:00
1 2018-01-01 00:13:08.460 2018-01-01 00:00:00
2 2018-01-01 00:26:16.920 2018-01-01 00:15:00
3 2018-01-01 00:39:25.380 2018-01-01 00:30:00
4 2018-01-01 00:52:33.840 2018-01-01 00:45:00
5 2018-01-01 01:05:42.300 2018-01-01 01:00:00
6 2018-01-01 01:18:50.760 2018-01-01 01:15:00
7 2018-01-01 01:31:59.220 2018-01-01 01:30:00
8 2018-01-01 01:45:07.680 2018-01-01 01:45:00
9 2018-01-01 01:58:16.140 2018-01-01 01:45:00
10 2018-01-01 02:11:24.600 2018-01-01 02:00:00
11 2018-01-01 02:24:33.060 2018-01-01 02:15:00
12 2018-01-01 02:37:41.520 2018-01-01 02:30:00
There is also .dt.round() and .dt.ceil() if you need to get the nearest 15 minute, or the following 15 minute invterval respectively.

Group pandas rows into pairs then find timedelta

I have a dataframe where I need to group the TX/RX column into pairs, and then put these into a new dataframe with a new index and the timedelta between them as values.
df = pd.DataFrame()
df['time1'] = pd.date_range('2018-01-01', periods=6, freq='H')
df['time2'] = pd.date_range('2018-01-01', periods=6, freq='1H1min')
df['id'] = ids
df['val'] = vals
time1 time2 id val
0 2018-01-01 00:00:00 2018-01-01 00:00:00 1 A
1 2018-01-01 01:00:00 2018-01-01 01:01:00 2 B
2 2018-01-01 02:00:00 2018-01-01 02:02:00 3 A
3 2018-01-01 03:00:00 2018-01-01 03:03:00 4 B
4 2018-01-01 04:00:00 2018-01-01 04:04:00 5 A
5 2018-01-01 05:00:00 2018-01-01 05:05:00 6 B
needs to be...
index timedelta A B
0 1 1 2
1 1 3 4
2 1 5 6
I think that pivot_tables or stack/unstack is probably the best way to go about this, but I'm not entirely sure how...
I believe you need:
df = pd.DataFrame()
df['time1'] = pd.date_range('2018-01-01', periods=6, freq='H')
df['time2'] = df['time1'] + pd.to_timedelta([60,60,120,120,180,180], 's')
df['id'] = range(1,7)
df['val'] = ['A','B'] * 3
df['t'] = df['time2'] - df['time1']
print (df)
time1 time2 id val t
0 2018-01-01 00:00:00 2018-01-01 00:01:00 1 A 00:01:00
1 2018-01-01 01:00:00 2018-01-01 01:01:00 2 B 00:01:00
2 2018-01-01 02:00:00 2018-01-01 02:02:00 3 A 00:02:00
3 2018-01-01 03:00:00 2018-01-01 03:02:00 4 B 00:02:00
4 2018-01-01 04:00:00 2018-01-01 04:03:00 5 A 00:03:00
5 2018-01-01 05:00:00 2018-01-01 05:03:00 6 B 00:03:00
#if necessary convert to seconds
#df['t'] = (df['time2'] - df['time1']).dt.total_seconds()
df = df.pivot('t','val','id').reset_index().rename_axis(None, axis=1)
#if necessary aggregate values
#df = (df.pivot_table(index='t',columns='val',values='id', aggfunc='mean')
# .reset_index().rename_axis(None, axis=1))
print (df)
t A B
0 00:01:00 1 2
1 00:02:00 3 4
2 00:03:00 5 6

Pandas .resample() or .asfreq() fill forward times

I'm trying to resample a dataframe with a time series from 1-hour increments to 15-minute. Both .resample() and .asfreq() do almost exactly what I want, but I'm having a hard time filling the last three intervals.
I could add an extra hour at the end, resample, and then drop that last hour, but it feels hacky.
Current code:
df = pd.DataFrame({'date':pd.date_range('2018-01-01 00:00', '2018-01-01 01:00', freq = '1H'), 'num':5})
df = df.set_index('date').asfreq('15T', method = 'ffill', how = 'end').reset_index()
Current output:
date num
0 2018-01-01 00:00:00 5
1 2018-01-01 00:15:00 5
2 2018-01-01 00:30:00 5
3 2018-01-01 00:45:00 5
4 2018-01-01 01:00:00 5
Desired output:
date num
0 2018-01-01 00:00:00 5
1 2018-01-01 00:15:00 5
2 2018-01-01 00:30:00 5
3 2018-01-01 00:45:00 5
4 2018-01-01 01:00:00 5
5 2018-01-01 01:15:00 5
6 2018-01-01 01:30:00 5
7 2018-01-01 01:45:00 5
Thoughts?
Not sure about asfreq but reindex works wonderfully:
df.set_index('date').reindex(
pd.date_range(
df.date.min(),
df.date.max() + pd.Timedelta('1H'), freq='15T', closed='left'
),
method='ffill'
)
num
2018-01-01 00:00:00 5
2018-01-01 00:15:00 5
2018-01-01 00:30:00 5
2018-01-01 00:45:00 5
2018-01-01 01:00:00 5
2018-01-01 01:15:00 5
2018-01-01 01:30:00 5
2018-01-01 01:45:00 5

Categories