I am working on a panel dataset that includes daily stock returns of 450 firms for 5 years and daily ESG score(momentum based) for 5 years. I want to regress stock return on daily ESG scores, keeping Firm and year fixed effect. I have used linearmodels.panel function in python and set the index('Stock ticker", "Date") before running the regressions with entity and time effects. In the regression result, the number of entities shows 450, which is perfect but the time period shows 1800. I am wondering how python is capturing the time effects? Is it based on year or some other way? What I want is a year fixed effects, where for a particular year all firm will have same indicator variable. Can someone please help me to do it in the right way?
the image shows the format of the data, where panel is based on daily returns
Sounds like your model is capturing daily fixed effects instead of yearly fixed effects. This is happening because you set Date as an index, so you're telling Python that you want one fixed effect per date.
You have to create a new column that only contains the year. That is, convert the date column to datetime format (see pandas.to_datetime) and then:
# Extract year from Date
df['Year'] = pd.DatetimeIndex(df['Date']).year
# Set indices
df = df.set_index(['Ticker','Year'])
Then run your model.
I recommend using linearmodels.PanelOLS because that module is specifically made for fitting fixed effects models.
For future reference, post your code and a replicable example so we can help you out more easily.
Related
Apologies in advance for the long post (please advice if these long posts are poor form). :(
Attempting to code a simple trading strategy to learn how to calculate expected returns and financial trading methods.
I have here loaded S&P 500 data from Yahoo Finance using yfinance. I then loaded the data, and I wanted the user to be able to input how far back the data goes.
Here already begins my problem. My dataframe is loaded such that the "close_price" list has the dates as an index column (can be seen also in the attached image). Not my biggest concern as I'm able to call all the dates and close_prices for the stock I've selected.
From here, I'm trying to calculate the expected returns based on two strategies:
Buy $x on the first date. Buy $x every month thereafter. Calculate the portfolio value (or returns on each investment/total returns) on a specified date.
Buy $x on the first date. Buy $x again if the price drops by 10%. Sell 0.5*$x if the price increases by 10%. Buy $x if 30 days have surpassed and no buy/sell order has been made.
Picture of my data table
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from pandas_datareader import data as web
# Load stock data
stock_ticker = '^GSPC'
df = yf.download(stock_ticker)
# Allows user to input number of days to trackback prior to today (excluding weekends) for analysis
timescale = int(input("Enter No. of days prior to today (excluding weekends):"))
# List arrays for close price and dates
close_price = df["Adj Close"][-timescale:]
dates = df.index.tolist()[-timescale:]
# Returns
daily_returns = close_price.pct_change(fill_method='pad')
monthly_returns = close_price.resample('M').ffill().pct_change(fill_method='pad')
Things I've tried:
-- Writing a for loop that calculates the multiplies my monthly stock return (monthly return values are of the order 0.01 and stock prices of the order $4000) against my investments per month. So $1000 investment, one month later return is 0.04, so returns are 40, value of portfolio = $1040
-- Write a while loop that is True while the stock prices from the initial value are greater than 0.90% of it. If not True: put $1000 into the stock. If the stock goes up 10% (or if the price from the last buy/sell order is < 1.1x), then sell 50%.
I've tried many ways to logic this in code, but to no avail. Would love your help guys!
Thanks!
There are some modules such as Zipline that can help simulate these trading strategies I think. It might be less time consuming to use that and it incorporates things like slippage and trading fees as well.
However if you want to build your own code from scratch I’d suggest breaking it down into a few smaller steps.
Firstly a section of code that finds trade signals buy/sell, based on your previously stated criteria.
Then another section of code that takes the trading signals and finds trades with entry/exit dates.
When you have a list of trades with entry/exit dates you can create a section of code that turns the list of trades into a portfolio database. The database shows the value of your portfolio over time and how your trades effect that value.
I am currently using Prophet to forecast usage in a year period. This is my first time using this algo and I have some questions in mind.
I am utilising the code attached below. I am wondering if anyone has included holidays as parameter before and how to do so while including holidays from other calendar (lunar/islamic etc). Also since February may have 1 more day in a leap year, would be great as well to know if the algorithm take this into consideration?
m = Prophet(
growth='logistic',
seasonality_mode='multiplicative',
seasonality_prior_scale=1.5,
mcmc_samples=5,
n_changepoints=25,
changepoint_range=0.8,
yearly_seasonality='auto',
weekly_seasonality='auto',
daily_seasonality='auto',
holidays=None,
holidays_prior_scale=10.0,
changepoint_prior_scale=0.05,
interval_width=0.8,
stan_backend=None,
)
The holidays parameter takes in a dataframe. The minimal set of columns required in that dataframe are date and holiday name.
The important thing to note here is that you provide both historical and future holidays in this dataframe.
Apart from the 2 columns mentioned above, the following columns are optional:
lower_window, upper_window (int) - to extend holiday effect around the date of holiday.
prior_scale(float) - to set a different prior scale for each holiday.
Also to answer your second question i.e.
Also since February may have 1 more day in a leap year, would be great
as well to know if the algorithm take this into consideration?
It depends on the modelling data. Since the data you'd be providing would already include leap year, Prophet will take that into consideration.
Which modelling strategy (time frame, features, technique) would you recommend to forecast 3-month sales for total customer base?
At my company, we often analyse the effect of e.g. marketing campaigns that run at the same time for the total customer base. In order to get an idea of the true incremental impact of the campaign, we - among other things - want to use predicted sales as a counterfactual for the campaign, i.e. what sales were expected to be assuming no marketing campaign.
Time frame used to train the model I'm currently considering 2 options (static time frame and rolling window) - let me know what you think. 1. Static: Use the same period last year as the dependent variable to build a specific model for this particular 3 month time frame. Data of 12 months before are used to generate features. 2. Use a rolling window logic of 3 months, dynamically creating dependent time frames and features. Not yet sure what the benefit of that would be. It uses more recent data for the model creation but feels less specific because it uses any 3 month period in a year as dependent. Not sure what the theory says for this particular example. Experiences, thoughts?
Features - currently building features per customer from one year pre-period data, e.g. Sales in individual months, 90,180,365 days prior, max/min/avg per customer, # of months with sales, tenure, etc. Takes quite a lot of time - any libraries/packages you would recommend for this?
Modelling technique - currently considering GLM, XGboost, S/ARIMA, and LSTM networks. Any experience here?
To clarify, even though I'm considering e.g. ARIMA, I do not need to predict any seasonal patterns of the 3 month window. As a first stab, a single number, total predicted sales of customer base for these 3 months, would be sufficient.
Any experience or comment would be highly appreciated.
Thanks,
F
I am a beginner in Python programming and machine learning.
I have a dataset with sales per product on monthly level.
The dataset has data from 2015 up till 2019.
With the help of Python I would like to make a prediction model that predicts the sales of the next month.
I followed this tutorial:
Sales prediction
This gave me a prediction of the last 6 months and lined them up with the actual sales, I managed to gat a pretty accurate prediction but my problem is that I need the predictions per product and if possible I would like to get the influence of weather in there aswell. For example if the weather data would be rainy it has to take this into account.
Does anyone know a way of doing this?
Every tip on which model to use or article to read is much appreciated!!
The most basic way of doing that would be to run an ARIMA model with external regressors (the weather measured in terms of temperature, humidity or any other feature that is expected to influence the monthly sales).
What is important is that before fitting the model, the sales data had better be transformed into log monthly changes by something like np.log(df.column).diff().
I've recently written a Python micro-package salesplansuccess, which deals with prediction of the current (or next) year's annual sales for individual products from historic monthly sales data. But a major assumption for that model is a quarterly seasonality (more specifically a repeating drift from the 2nd to the 3rd month in each quarter), which is more characteristic for wholesalers (who strives to achieve their formal quarterly sales goals by pushing sales at the end of each quarter).
The package is installed as usual with pip install salesplansuccess.
You can modify its source code for it to better fit your needs. It uses both ARIMA (maximum likelihood estimates) and linear regression (least square estimates) technics under the hood.
The minimalistic use case is below:
import pandas as pd
from salesplansuccess.api import SalesPlanSuccess
myHistoricalData = pd.read_excel('myfile.xlsx')
myAnnualPlan = 1000
sps = SalesPlanSuccess(data=myHistoricalData, plan=myAnnualPlan)
sps.fit()
sps.simulate()
sps.plot()
For more detailed illustration of its use, you may want to refer to a Jupyter Notebook illustration file at its GitHub repository.
I have this daily stats churned out from a system which outputs total sales and units sold per region group. For my analysis, I want to breakdown the entries into regions instead of region group. I'm trying to look for a way to split each row into per region with the respective measures.
I have historical percentages on the market share per region which I'll use to come up with the estimated sales and units sold.
I can do this manually in excel but given how i'll be doing this on a weekly basis, I'm looking for a way to automate it via python.
My data: https://imgur.com/a/pBr3y4D
Goal: https://imgur.com/a/Uc56PVR
Well, first of all, when you're doing DS researches try to find the most appropriate way in your personal case. There's nothing bad in using all Excel functionality to solve your issue, scripting, etc.
However, if you really-really want to use pandas, then what I would do in your case - just .append() and then split on regions and grouping by sales or made up a function with for..loop.