I am new to Quantitative Finance in Python so please bear with me. I have the following data set:
> head(df, 20)
# A tibble: 20 × 15
deal_id book counterparty commodity_name commodity_code executed_date first_delivery_date last_delivery_date last_trading_date volume buy_sell trading_unit tenor delivery_window strategy
<int> <chr> <chr> <chr> <chr> <dttm> <dttm> <dttm> <dttm> <int> <chr> <chr> <chr> <chr> <chr>
1 0 Book_7 Counterparty_3 api2coal ATW 2021-03-07 11:50:24 2022-01-01 00:00:00 2022-12-31 00:00:00 2021-12-31 00:00:00 23000 sell MT year Cal 22 NA
2 1 Book_7 Counterparty_3 oil B 2019-11-10 18:33:39 2022-01-01 00:00:00 2022-12-31 00:00:00 2021-11-30 00:00:00 16000 sell bbl year Cal 22 NA
3 2 Book_4 Counterparty_3 oil B 2021-02-25 11:44:20 2021-04-01 00:00:00 2021-04-30 00:00:00 2021-02-26 00:00:00 7000 buy bbl month Apr 21 NA
4 3 Book_3 Counterparty_3 gold GC 2022-05-27 19:28:48 2022-11-01 00:00:00 2022-11-30 00:00:00 2022-10-31 00:00:00 200 buy oz month Nov 22 NA
5 4 Book_2 Counterparty_3 czpower CZ 2022-09-26 13:14:31 2023-03-01 00:00:00 2023-03-31 00:00:00 2023-02-27 00:00:00 2 buy MW quarter Mar 23 NA
6 5 Book_1 Counterparty_3 depower DE 2022-08-29 10:28:34 2022-10-01 00:00:00 2022-10-31 00:00:00 2022-09-30 00:00:00 23 buy MW month Oct 22 NA
7 6 Book_3 Counterparty_1 api2coal ATW 2022-12-08 08:17:11 2023-01-01 00:00:00 2023-01-31 00:00:00 2022-12-30 00:00:00 29000 sell MT quarter Jan 23 NA
8 7 Book_3 Counterparty_2 depower DE 2020-10-16 17:36:13 2022-03-01 00:00:00 2022-03-31 00:00:00 2022-02-25 00:00:00 3 sell MW quarter Mar 22 NA
9 8 Book_7 Counterparty_1 api2coal ATW 2020-10-13 09:35:24 2021-02-01 00:00:00 2021-02-28 00:00:00 2021-01-29 00:00:00 1000 sell MT quarter Feb 21 NA
10 9 Book_2 Counterparty_1 api2coal ATW 2020-05-19 11:04:39 2022-01-01 00:00:00 2022-12-31 00:00:00 2021-12-31 00:00:00 19000 sell MT year Cal 22 NA
11 10 Book_6 Counterparty_1 oil B 2022-03-03 08:04:04 2022-08-01 00:00:00 2022-08-31 00:00:00 2022-06-30 00:00:00 26000 buy bbl month Aug 22 NA
12 11 Book_3 Counterparty_1 gold GC 2021-05-09 18:08:31 2022-05-01 00:00:00 2022-05-31 00:00:00 2022-04-29 00:00:00 1600 sell oz month May 22 NA
13 12 Book_5 Counterparty_2 oil B 2020-08-20 11:54:34 2021-04-01 00:00:00 2021-04-30 00:00:00 2021-02-26 00:00:00 6000 buy bbl month Apr 21 Strategy_3
14 13 Book_6 Counterparty_2 gold GC 2020-12-23 16:28:55 2021-12-01 00:00:00 2021-12-31 00:00:00 2021-11-30 00:00:00 1700 sell oz month Dec 21 NA
15 14 Book_2 Counterparty_1 depower DE 2021-08-11 12:54:23 2024-01-01 00:00:00 2024-12-31 00:00:00 2023-12-28 00:00:00 15 buy MW year Cal 24 NA
16 15 Book_5 Counterparty_1 czpower CZ 2022-02-15 07:45:24 2022-12-01 00:00:00 2022-12-31 00:00:00 2022-11-30 00:00:00 28 buy MW month Dec 22 Strategy_3
17 16 Book_7 Counterparty_2 oil B 2021-05-19 07:37:05 2022-02-01 00:00:00 2022-02-28 00:00:00 2021-12-31 00:00:00 11000 buy bbl quarter Feb 22 Strategy_3
18 17 Book_4 Counterparty_3 depower DE 2022-02-01 12:34:49 2022-06-01 00:00:00 2022-06-30 00:00:00 2022-05-31 00:00:00 14 sell MW month Jun 22 NA
19 18 Book_2 Counterparty_3 czpower CZ 2022-06-02 09:39:16 2023-02-01 00:00:00 2023-02-28 00:00:00 2023-01-30 00:00:00 21 buy MW quarter Feb 23 NA
20 19 Book_3 Counterparty_1 czpower CZ 2021-10-28 12:41:11 2022-09-01 00:00:00 2022-09-30 00:00:00 2022-08-31 00:00:00 3 sell MW month Sep 22 NA
And I am asked to extract some information from it while applying what is called Yearly and Quarterly Futures Cascading, which I do not know. The question is as follows:
Compute the position size (contracted volume) for a combination of books and commodities, for a selected time in history. The output format should be a data frame with future delivery periods as index (here comes yearly and quarterly cascading), commodities as column names and total volume as values. Provide negative values when the total volume for given period was sold and positive value when it was bought.
I read some material online about Cascading Futures here and here, but it only gave me a vague idea of what they are about and doesn't help solve the problem in hand. and coding examples in Python are nonexistent.
Can someone please give me a hint as to how to approach this problem? I am a beginner in the field of quantitative finance and any help would be much appreciated.
Related
this is my first question on Stackoverflow and I hope I describe my problem detailed enough.
I'm starting to learn data analysis with Pandas and I've created a time series with daily data for gas prices of a certain station. I've already grouped the hourly data into daily data.
I've been successfull with a simple scatter plot over the year with plotly but in the next step I would like to analyze which weekday is the cheapest or most expensive in every week, count the daynames and then look if there is a pattern over the whole year.
count mean std min 25% 50% 75% max \
2022-01-01 35.0 1.685000 0.029124 1.649 1.659 1.689 1.6990 1.749
2022-01-02 27.0 1.673444 0.024547 1.649 1.649 1.669 1.6890 1.729
2022-01-03 28.0 1.664000 0.040597 1.599 1.639 1.654 1.6890 1.789
2022-01-04 31.0 1.635129 0.045069 1.599 1.599 1.619 1.6490 1.779
2022-01-05 33.0 1.658697 0.048637 1.599 1.619 1.649 1.6990 1.769
2022-01-06 35.0 1.658429 0.050756 1.599 1.619 1.639 1.6940 1.779
2022-01-07 30.0 1.637333 0.039136 1.599 1.609 1.629 1.6565 1.759
2022-01-08 41.0 1.655829 0.041740 1.619 1.619 1.639 1.6790 1.769
2022-01-09 35.0 1.647857 0.031602 1.619 1.619 1.639 1.6590 1.769
2022-01-10 31.0 1.634806 0.041374 1.599 1.609 1.619 1.6490 1.769
...
week weekday
2022-01-01 52 Saturday
2022-01-02 52 Sunday
2022-01-03 1 Monday
2022-01-04 1 Tuesday
2022-01-05 1 Wednesday
2022-01-06 1 Thursday
2022-01-07 1 Friday
2022-01-08 1 Saturday
2022-01-09 1 Sunday
2022-01-10 2 Monday
...
I tried with grouping and resampling but unfortunately I didn't get the result I was hoping for.
Can someone suggest a way how to deal with this problem? Thanks!
Here's a way to do what I believe your question asks:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'count':[35,27,28,31,33,35,30,41,35,31]*40,
'mean':
[1.685,1.673444,1.664,1.635129,1.658697,1.658429,1.637333,1.655829,1.647857,1.634806]*40
},
index=pd.Series(pd.to_datetime(pd.date_range("2022-01-01", periods=400, freq="D"))))
print( '','input df:',df,sep='\n' )
df_date = df.reset_index()['index']
df['weekday'] = list(df_date.dt.day_name())
df['year'] = df_date.dt.year.to_numpy()
df['week'] = df_date.dt.isocalendar().week.to_numpy()
df['year_week_started'] = df.year - np.where((df.week>=52)&(df.week.shift(-7)==1),1,0)
print( '','input df with intermediate columns:',df,sep='\n' )
cols = ['year_week_started', 'week']
dfCheap = df.loc[df.groupby(cols)['mean'].idxmin(),:].set_index(cols)
dfCheap = ( dfCheap.groupby(['year_week_started', 'weekday'])['mean'].count()
.rename('freq').to_frame().set_index('freq', append=True)
.reset_index(level='weekday').sort_index(ascending=[True,False]) )
print( '','dfCheap:',dfCheap,sep='\n' )
dfExpensive = df.loc[df.groupby(cols)['mean'].idxmax(),:].set_index(cols)
dfExpensive = ( dfExpensive.groupby(['year_week_started', 'weekday'])['mean'].count()
.rename('freq').to_frame().set_index('freq', append=True)
.reset_index(level='weekday').sort_index(ascending=[True,False]) )
print( '','dfExpensive:',dfExpensive,sep='\n' )
Sample input:
input df:
count mean
2022-01-01 35 1.685000
2022-01-02 27 1.673444
2022-01-03 28 1.664000
2022-01-04 31 1.635129
2022-01-05 33 1.658697
... ... ...
2023-01-31 35 1.658429
2023-02-01 30 1.637333
2023-02-02 41 1.655829
2023-02-03 35 1.647857
2023-02-04 31 1.634806
[400 rows x 2 columns]
input df with intermediate columns:
count mean weekday year week year_week_started
2022-01-01 35 1.685000 Saturday 2022 52 2021
2022-01-02 27 1.673444 Sunday 2022 52 2021
2022-01-03 28 1.664000 Monday 2022 1 2022
2022-01-04 31 1.635129 Tuesday 2022 1 2022
2022-01-05 33 1.658697 Wednesday 2022 1 2022
... ... ... ... ... ... ...
2023-01-31 35 1.658429 Tuesday 2023 5 2023
2023-02-01 30 1.637333 Wednesday 2023 5 2023
2023-02-02 41 1.655829 Thursday 2023 5 2023
2023-02-03 35 1.647857 Friday 2023 5 2023
2023-02-04 31 1.634806 Saturday 2023 5 2023
[400 rows x 6 columns]
Sample output:
dfCheap:
weekday
year_week_started freq
2021 1 Monday
2022 11 Tuesday
10 Thursday
10 Wednesday
6 Sunday
5 Friday
5 Monday
5 Saturday
2023 2 Thursday
1 Saturday
1 Sunday
1 Wednesday
dfExpensive:
weekday
year_week_started freq
2021 1 Saturday
2022 16 Monday
10 Tuesday
6 Sunday
5 Friday
5 Saturday
5 Thursday
5 Wednesday
2023 2 Monday
1 Friday
1 Thursday
1 Tuesday
My dataframe has multiple years and months in "yyyy-mm-dd" format.
I would like to dynamically drop all current Year-month rows from the df
you could use a simple strftime method where the %Y%m is not equal to the year month of the current date.
df1 = df.loc[
df['Date'].dt.strftime('%Y%m') != pd.Timestamp('today').strftime('%Y%m')]
Example
d = pd.date_range('01 oct 2021', '01 dec 2021',freq='d')
df = pd.DataFrame(d,columns=['Date'])
print(df)
Date
0 2021-10-01
1 2021-10-02
2 2021-10-03
3 2021-10-04
4 2021-10-05
.. ...
57 2021-11-27
58 2021-11-28
59 2021-11-29
60 2021-11-30
61 2021-12-01
print(df1)
Date
0 2021-10-01
1 2021-10-02
2 2021-10-03
3 2021-10-04
4 2021-10-05
5 2021-10-06
6 2021-10-07
7 2021-10-08
8 2021-10-09
9 2021-10-10
10 2021-10-11
11 2021-10-12
12 2021-10-13
13 2021-10-14
14 2021-10-15
15 2021-10-16
16 2021-10-17
17 2021-10-18
18 2021-10-19
19 2021-10-20
20 2021-10-21
21 2021-10-22
22 2021-10-23
23 2021-10-24
24 2021-10-25
25 2021-10-26
26 2021-10-27
27 2021-10-28
28 2021-10-29
29 2021-10-30
30 2021-10-31
61 2021-12-01
How can i create an interval of dates and sum the value of the dates that are in the interval? Those intervals must be grouped by Country.
I have a pandas dataframe like this:
Country
Population
Start
End
UK
1000
2021-05-15 00:00:00
2021-07-21 23:59:00
UK
800
2021-05-30 22:00:00
2021-06-02 19:00:00
Spain
1050
2021-05-15 00:00:00
2021-06-21 10:00:00
France
700
2021-01-15 10:00:00
2022-01-15 10:00:00
France
750
2021-06-15 10:00:00
2021-06-17 19:00:00
And I need to create differents intervals for each contry to sum the population in each interval like this:
Country
Population
Start
End
UK
1000
2021-05-15 00:00:00
2021-05-30 22:00:00
UK
1800
2021-05-30 22:00:00
2021-06-02 19:00:00
UK
1000
2021-06-02 19:00:00
2021-07-21 23:59:00
Spain
1050
2021-05-15 00:00:00
2021-06-21 10:00:00
France
700
2021-01-15 10:00:00
2021-06-15 10:00:00
France
1450
2021-06-15 10:00:00
2021-06-17 19:00:00
France
700
2021-06-17 19:00:00
2022-01-15 10:00:00
Any idea??
Thanks!
bond_df['Maturity']
0 2022-07-15 00:00:00
1 2024-07-18 00:00:00
2 2027-07-16 00:00:00
3 2020-07-28 00:00:00
4 2019-10-09 00:00:00
5 2022-04-08 00:00:00
6 2020-12-15 00:00:00
7 2022-12-15 00:00:00
8 2026-04-08 00:00:00
9 2023-04-11 00:00:00
10 2024-12-15 00:00:00
11 2019
12 2020-10-25 00:00:00
13 2024-04-22 00:00:00
14 2047-12-15 00:00:00
15 2020-07-08 00:00:00
17 2043-04-11 00:00:00
18 2021
19 2022
20 2023
21 2025
22 2026
23 2027
24 2029
25 2021-04-15 00:00:00
26 2044-04-22 00:00:00
27 2043-10-02 00:00:00
28 2039-01-19 00:00:00
29 2040-07-09 00:00:00
30 2029-09-21 00:00:00
31 2040-10-25 00:00:00
32 2019
33 2035-09-04 00:00:00
34 2035-09-28 00:00:00
35 2041-04-15 00:00:00
36 2040-04-02 00:00:00
37 2034-03-27 00:00:00
38 2030
39 2027-04-05 00:00:00
40 2038-04-15 00:00:00
41 2037-08-17 00:00:00
42 2023-10-16 00:00:00
43 -
45 2019-10-09 00:00:00
46 -
47 2021-06-23 00:00:00
48 2021-06-23 00:00:00
49 2023-06-26 00:00:00
50 2025-06-26 00:00:00
51 2028-06-26 00:00:00
52 2038-06-28 00:00:00
53 2020-06-23 00:00:00
54 2020-06-23 00:00:00
55 2048-06-29 00:00:00
56 -
57 -
58 2029-07-08 00:00:00
59 2026-07-08 00:00:00
60 2024-07-08 00:00:00
61 2020-07-31 00:00:00
Name: Maturity, dtype: object
This is a column of data that I imported from Excel of maturity dates for various Walmart bonds. All I am concerned with is the year portion of these dates. How can I format the entire column to just return the year values?
dt.strftime didn't work
Thanks in advance
Wrote this little script for you which should output the years in a years.txt file, assuming your data is in data.txt as only the years you posted above.
Script also lets you toggle if you want to include the dash and the years on the right.
Contents of the data.txt I tested with:
0 2022-07-15 00:00:00
1 2024-07-18 00:00:00
2 2027-07-16 00:00:00
3 2020-07-28 00:00:00
4 2019-10-09 00:00:00
5 2022-04-08 00:00:00
6 2020-12-15 00:00:00
7 2022-12-15 00:00:00
8 2026-04-08 00:00:00
9 2023-04-11 00:00:00
10 2024-12-15 00:00:00
11 2019
12 2020-10-25 00:00:00
13 2024-04-22 00:00:00
14 2047-12-15 00:00:00
15 2020-07-08 00:00:00
17 2043-04-11 00:00:00
18 2021
19 2022
20 2023
21 2025
22 2026
23 2027
24 2029
25 2021-04-15 00:00:00
26 2044-04-22 00:00:00
27 2043-10-02 00:00:00
28 2039-01-19 00:00:00
29 2040-07-09 00:00:00
30 2029-09-21 00:00:00
31 2040-10-25 00:00:00
32 2019
33 2035-09-04 00:00:00
34 2035-09-28 00:00:00
35 2041-04-15 00:00:00
36 2040-04-02 00:00:00
37 2034-03-27 00:00:00
38 2030
39 2027-04-05 00:00:00
40 2038-04-15 00:00:00
41 2037-08-17 00:00:00
42 2023-10-16 00:00:00
43 -
45 2019-10-09 00:00:00
46 -
47 2021-06-23 00:00:00
48 2021-06-23 00:00:00
49 2023-06-26 00:00:00
50 2025-06-26 00:00:00
51 2028-06-26 00:00:00
52 2038-06-28 00:00:00
53 2020-06-23 00:00:00
54 2020-06-23 00:00:00
55 2048-06-29 00:00:00
56 -
57 -
58 2029-07-08 00:00:00
59 2026-07-08 00:00:00
60 2024-07-08 00:00:00
61 2020-07-31 00:00:00
and the script I wrote:
#!/usr/bin/python3
all_years = []
include_dash = False
include_years_on_right = True
with open("data.txt", "r") as f:
text = f.read()
lines = text.split("\n")
for line in lines:
line = line.strip()
if line == "":
continue
if "00" in line:
all_years.append(line.split("-")[0].split()[-1])
else:
if include_years_on_right == False:
continue
year = line.split(" ")[-1]
if year == "-":
if include_dash == True:
all_years.append(year)
else:
continue
else:
all_years.append(year)
with open("years.txt", "w") as f:
for year in all_years:
f.write(year + "\n")
and the output to the years.txt:
2022
2024
2027
2020
2019
2022
2020
2022
2026
2023
2024
2019
2020
2024
2047
2020
2043
2021
2022
2023
2025
2026
2027
2029
2021
2044
2043
2039
2040
2029
2040
2019
2035
2035
2041
2040
2034
2030
2027
2038
2037
2023
2019
2021
2021
2023
2025
2028
2038
2020
2020
2048
2029
2026
2024
2020
Contact me if you have any issues, and I hope I can help you!
I have a dataframe that can be simplified as:
date id
0 02/04/2015 02:34 1
1 06/04/2015 12:34 2
2 09/04/2015 23:03 3
3 12/04/2015 01:00 4
4 15/04/2015 07:12 5
5 21/04/2015 12:59 6
6 29/04/2015 17:33 7
7 04/05/2015 10:44 8
8 06/05/2015 11:12 9
9 10/05/2015 08:52 10
10 12/05/2015 14:19 11
11 19/05/2015 19:22 12
12 27/05/2015 22:31 13
13 01/06/2015 11:09 14
14 04/06/2015 12:57 15
15 10/06/2015 04:00 16
16 15/06/2015 03:23 17
17 19/06/2015 05:37 18
18 23/06/2015 13:41 19
19 27/06/2015 15:43 20
It can be created using:
tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"]})
The data has the following types:
tempDF.dtypes
date object
id int64
dtype: object
I have set the 'date' variable to be Pandas datefime64 format (if that's the right way to describe it) using:
import numpy as np
import pandas as pd
tempDF['date'] = pd_to_datetime(tempDF['date'])
So now, the dtypes look like:
tempDF.dtypes
date datetime64[ns]
id int64
dtype: object
I want to change the hours of the original date data. I can use .normalize() to convert to midnight via the .dt accessor:
tempDF['date'] = tempDF['date'].dt.normalize()
And, I can get access to individual datetime components (e.g. year) using:
tempDF['date'].dt.year
This produces:
0 2015
1 2015
2 2015
3 2015
4 2015
5 2015
6 2015
7 2015
8 2015
9 2015
10 2015
11 2015
12 2015
13 2015
14 2015
15 2015
16 2015
17 2015
18 2015
19 2015
Name: date, dtype: int64
The question is, how can I change specific date and time components? For example, how could I change the midday (12:00) for all the dates? I've found that datetime.datetime has a .replace() function. However, having converted dates to Pandas format, it would make sense to keep in that format. Is there a way to do that without changing the format again?
EDIT :
A vectorized way to do this would be to normalize the series, and then add 12 hours to it using timedelta. Example -
tempDF['date'].dt.normalize() + datetime.timedelta(hours=12)
Demo -
In [59]: tempDF
Out[59]:
date id
0 2015-02-04 12:00:00 1
1 2015-06-04 12:00:00 2
2 2015-09-04 12:00:00 3
3 2015-12-04 12:00:00 4
4 2015-04-15 12:00:00 5
5 2015-04-21 12:00:00 6
6 2015-04-29 12:00:00 7
7 2015-04-05 12:00:00 8
8 2015-06-05 12:00:00 9
9 2015-10-05 12:00:00 10
10 2015-12-05 12:00:00 11
11 2015-05-19 12:00:00 12
12 2015-05-27 12:00:00 13
13 2015-01-06 12:00:00 14
14 2015-04-06 12:00:00 15
15 2015-10-06 12:00:00 16
16 2015-06-15 12:00:00 17
17 2015-06-19 12:00:00 18
18 2015-06-23 12:00:00 19
19 2015-06-27 12:00:00 20
In [60]: tempDF['date'].dt.normalize() + datetime.timedelta(hours=12)
Out[60]:
0 2015-02-04 12:00:00
1 2015-06-04 12:00:00
2 2015-09-04 12:00:00
3 2015-12-04 12:00:00
4 2015-04-15 12:00:00
5 2015-04-21 12:00:00
6 2015-04-29 12:00:00
7 2015-04-05 12:00:00
8 2015-06-05 12:00:00
9 2015-10-05 12:00:00
10 2015-12-05 12:00:00
11 2015-05-19 12:00:00
12 2015-05-27 12:00:00
13 2015-01-06 12:00:00
14 2015-04-06 12:00:00
15 2015-10-06 12:00:00
16 2015-06-15 12:00:00
17 2015-06-19 12:00:00
18 2015-06-23 12:00:00
19 2015-06-27 12:00:00
dtype: datetime64[ns]
Timing information for both methods at bottom
One method would be to use Series.apply along with the .replace() method OP mentions in his post. Example -
tempDF['date'] = tempDF['date'].apply(lambda x:x.replace(hour=12,minute=0))
Demo -
In [12]: tempDF
Out[12]:
date id
0 2015-02-04 02:34:00 1
1 2015-06-04 12:34:00 2
2 2015-09-04 23:03:00 3
3 2015-12-04 01:00:00 4
4 2015-04-15 07:12:00 5
5 2015-04-21 12:59:00 6
6 2015-04-29 17:33:00 7
7 2015-04-05 10:44:00 8
8 2015-06-05 11:12:00 9
9 2015-10-05 08:52:00 10
10 2015-12-05 14:19:00 11
11 2015-05-19 19:22:00 12
12 2015-05-27 22:31:00 13
13 2015-01-06 11:09:00 14
14 2015-04-06 12:57:00 15
15 2015-10-06 04:00:00 16
16 2015-06-15 03:23:00 17
17 2015-06-19 05:37:00 18
18 2015-06-23 13:41:00 19
19 2015-06-27 15:43:00 20
In [13]: tempDF['date'] = tempDF['date'].apply(lambda x:x.replace(hour=12,minute=0))
In [14]: tempDF
Out[14]:
date id
0 2015-02-04 12:00:00 1
1 2015-06-04 12:00:00 2
2 2015-09-04 12:00:00 3
3 2015-12-04 12:00:00 4
4 2015-04-15 12:00:00 5
5 2015-04-21 12:00:00 6
6 2015-04-29 12:00:00 7
7 2015-04-05 12:00:00 8
8 2015-06-05 12:00:00 9
9 2015-10-05 12:00:00 10
10 2015-12-05 12:00:00 11
11 2015-05-19 12:00:00 12
12 2015-05-27 12:00:00 13
13 2015-01-06 12:00:00 14
14 2015-04-06 12:00:00 15
15 2015-10-06 12:00:00 16
16 2015-06-15 12:00:00 17
17 2015-06-19 12:00:00 18
18 2015-06-23 12:00:00 19
19 2015-06-27 12:00:00 20
Timing information
In [52]: df = pd.DataFrame([[datetime.datetime.now()] for _ in range(100000)],columns=['date'])
In [54]: %%timeit
....: df['date'].dt.normalize() + datetime.timedelta(hours=12)
....:
The slowest run took 12.53 times longer than the fastest. This could mean that an intermediate result is being cached
1 loops, best of 3: 32.3 ms per loop
In [57]: %%timeit
....: df['date'].apply(lambda x:x.replace(hour=12,minute=0))
....:
1 loops, best of 3: 1.09 s per loop
Here's the solution I used to replace the time component of the datetime values in a Pandas DataFrame. Not sure how efficient this solution is, but it fit my needs.
import pandas as pd
# Create a list of EOCY dates for a specified period
sDate = pd.Timestamp('2022-01-31 23:59:00')
eDate = pd.Timestamp('2060-01-31 23:59:00')
dtList = pd.date_range(sDate, eDate, freq='Y').to_pydatetime()
# Create a DataFrame with a single column called 'Date' and fill the rows with the list of EOCY dates.
df = pd.DataFrame({'Date': dtList})
# Loop through the DataFrame rows using the replace function to replace the hours and minutes of each date value.
for i in range(df.shape[0]):
df.iloc[i, 0]=df.iloc[i, 0].replace(hour=00, minute=00)
Not sure how efficient this solution is, but it fit my needs.