Get the monthly observation data from daily dataframe in pandas - python

I want to get a monthly observation data from the daily data in pandas. That means, I want to get the data at every 5th day of the month (2011-01-05; 2011-02-05; 2011-03-05...2011-12-05) or the closest trading day to that date (e.g if 03-05 is not existed, it will search 2011-03-06). How can i do that?
The dataframe looks something like:
Date Close
2011-01-01 100.99
2011-01-02 100.65
......
2011-12-31 76.08

Below answer will solve your problem but there is a caveat that there should be atleast a single day data for each month!
df['Date'] = pd.to_datetime(df['Date'])
df['day'] = df.Date.dt.day
df['month'] = df.Date.dt.month
df['year'] = df.Date.dt.year
def get_nearest_time_data(df, day):
newdf = pd.DataFrame()
for month in range(1,13):
daydf = df[(df.day==day) & (df.month==month)]
while (daydf.shape[0]==0):
day+=1
daydf = df[(df.day==day) & (df.month==month)]
newdf = pd.concat([newdf,daydf], ignore_index=True)
return newdf
get_nearest_time_data(df, 5)

Related

Get the first and the last day of a month from the df

This is how my dataframe looks like:
datetime open high low close
2006-01-02 4566.95 4601.35 4542.00 4556.25
2006-01-03 4531.45 4605.45 4531.45 4600.25
2006-01-04 4619.55 4707.60 4616.05 4694.14
.
.
.
Need to calculate the Monthly Returns in %
Formula: (Month Closing Price - Month Open Price) / Month Open Price
I can't seem to get the open price and closing price of a month, because in my df most months dont have a log for the 1st of the month. So having trouble calculating it.
Any help would be very much appreciated!
You need to use groupby and agg function in order to get the first and last value of each column in each month:
import pandas as pd
df = pd.read_csv("dt.txt")
df["datetime"] = pd.to_datetime(df["datetime"])
df.set_index("datetime", inplace=True)
resultDf = df.groupby([df.index.year, df.index.month]).agg(["first", "last"])
resultDf["new_column"] = (resultDf[("close", "last")] - resultDf[("open", "first")])/resultDf[("open", "first")]
resultDf.index.rename(["year", "month"], inplace=True)
resultDf.reset_index(inplace=True)
resultDf
The code above will result in a dataframe that has multiindex column. So, if you want to get, for example, rows with year of 2010, you can do something like:
resultDf[resultDf["year"] == 2010]
You can create a custom grouper such as follow :
import pandas as pd
import numpy as np
from io import StringIO
csvfile = StringIO(
"""datetime\topen\thigh\tlow\tclose
2006-01-02\t4566.95\t4601.35\t4542.00\t4556.25
2006-01-03\t4531.45\t4605.45\t4531.45\t4600.25
2006-01-04\t4619.55\t4707.60\t4616.05\t4694.14""")
df = pd.read_csv(csvfile, sep = '\t', engine='python')
df.datetime = pd.to_datetime(df.datetime, format = "%Y-%m-%d")
dg = df.groupby(pd.Grouper(key='datetime', axis=0, freq='M'))
Then each group of dg is separate by month, and since we convert datetime as pandas.datetime we can use classic arithmetic on it :
def monthly_return(datetime, close_value, open_value):
index_start = np.argmin(datetime)
index_end = np.argmax(datetime)
return (close_value[index_end] - open_value[index_start]) / open_value[index_start]
dg.apply(lambda x : monthly_return(x.datetime, x.close, x.open))
Out[97]:
datetime
2006-01-31 0.02785
Freq: M, dtype: float64
Of course a pure functional approach is possible instead of using monthly_return function

Duplicate value at last date of month to all previous dates in that month

This is how the data looks
PERMNO is the ticker for different stocks. I want to extract the RET value for each stock at the last date in each month and duplicate it to the other dates in that month. Say RET = 0.01 for PERMNO 10006 30.06.1928, then all RET values from 01.06.1928-30.06.1928 should show 0.01 as well. This is to be done for all dates and all stocks. I have tried with groupby, loops and dateranges, but run into problems..
Any help is much appreciated!
First convert to datetimes ans sorting and then use GroupBy.last in GroupBy.transform:
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(['PERMNO','Date'])
df['new'] = df.groupby(['PERMNO', df['date'].dt.year, df['date'].dt.month])['RET'].transform('last')

Pandas - Select month and year

Trying to subset a dataframe, ultimately want to export a certain month and year (Say November 2020) to a CSV. But I'm stuck at the selection part, the date column is in DD/MM/YYYY format. My attempt -
csv = r"C:\Documents\Transactions.csv"
current_month = 11
current_year = 2020
data =pd.read_csv(csv, sep=',', index_col = None)
df = data[pd.to_datetime(data['Date'],dayfirst=True).dt.month == current_month &(pd.to_datetime(data['Date']).dt.year==current_year)]
print(df)
Result is the rows with the correct year, but includes all months whereas I want it restricted the current_month variable. Any help appreciated.
Given that you have a Date column, I would suggest to first convert the column as you do it twice. You cannot apply .dt.month to the Series (whole column).
Then just apply it to the Series.
import datetime as dt
data['Date']= pd.to_datetime(data['Date'], dayfirst=True)
df = data[(data['Date'].apply(lambda x: x.month) == current_month) &
(data['Date'].apply(lambda y: y.year) == current_year)]
Convert column Date to date format first, then do the selection part as usual.
import pandas as pd
df = pd.read_csv('data-date.txt')
current_month = 11
current_year = 2020
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df[(df['Date'].dt.month == current_month) & (df['Date'].dt.year == current_year)]

Pandas equivalent to sql for month date time

I have a pandas dataframe that I need to filter just like a sql query for a specific month. Everytime I run the code I want it to grab data from the previous month, no matter what the specific day is of the current month.
My SQL code is here but I need pandas equivalent.
WHERE DATEPART(m, logged) = DATEPART(m, DATEADD(m, -1, getdate()))
df = pd.DataFrame({'month': ['1-05-01 00:00:00','1-06-01 00:00:00','1-06-01 00:00:00','1-05-01 00:00:00']})
df['month'] = pd.to_datetime(df['month'])```
In this example, I only want the data from June.
Would definitely appreciate the help! Thanks.
Modifying based on the question edit:
df = pd.DataFrame({'month': ['1-05-01 00:00:00','1-06-01 00:00:00','1-06-01 00:00:00','1-05-01 00:00:00']})
df['month'] = pd.to_datetime(df['month'])
## To get it to the right format
import datetime as dt
df['month'] = df['month'].apply(lambda x: dt.datetime.strftime(x, '%Y-%d-%m'))
df['month'] = pd.to_datetime(df['month'])
## Extract the month from this date
df['month_ex'] = df.month.dt.month
## Get current month to get the latest month from the dataframe, which is the previous month of the current month
from datetime import datetime
currentMonth = datetime.now().month
newDf = df[df.month_ex == currentMonth - 1]
Output:
month month_ex
1 2001-06-01 6
2 2001-06-01 6

Calculating cummulative returns mid-year to mid-year Pandas

I have a Pandas dataframe of the size (80219 * 5) with the same structure as the image I have uploaded. The data can range from 2002-2016 for each company but if missing values appear the data either starts at a later date or ends at an earlier date as you can see in the image.
What I would like to do is to calculate yearly compounded returns measured from June to June for each company. If there is no data for the specific company for the full 12 months period from June to June the result should be nan. Below is my current code, but I don't know how to calculate the returns from June to June.
After having loaded the file and cleaned it I:
df[['Returns']] = df[['Returns']].apply(pd.to_numeric)
df['Names Date'] = pd.to_datetime(df['Names Date'])
df['Returns'] = df['Returns']+ 1
df = df[['Company Name','Returns','Names Date']]
df['year']=df['Names Date'].dt.year
df['cum_return'] = df.groupby(['Company Name','year']).cumprod()
df = df.groupby(['Company Name','year']).nth(11)
print(tabulate(df, headers='firstrow', tablefmt='psql'))
Which calculates the annual return from 1st of january to 31st of december..
I finally found a way to do it. The easiest way I could find is to calculate a rolling 12 month compounded return for each month and then slice the dataframe for to give me the 12 month returns of the months I want:
def myfunc(arr):
return np.cumprod(arr)[-1]
cum_ret = pd.Series()
grouped = df.groupby('Company Name')
for name, group in grouped:
cum_ret = cum_ret.append(pd.rolling_apply(group['Returns'],12,myfunc))
df['Cum returns'] = cum_ret
df = df.loc[df['Names Date'].dt.month==6]
df['Names Date'] = df['Names Date'].dt.year

Categories