I am calculating ema with python on binance (BTC Futures) monthly open price data(20/12~21/01).
ema2 gives 25872.82333 on the second month like below.
df = pd.Series([19722.09, 28948.19])
ema2 = df.ewm(span=2,adjust=False).mean()
ema2
0 19722.090000
1 25872.823333
But in binance, ema(2) gives difference value(25108.05) like in the picture.
https://www.binance.com/en/futures/BTCUSDT_perpetual
Any help would be appreciated.
I had a the same problem, that the calculated EMA (df.ewm...) from pandas wasn't the same as the one from binance. You have to use a longer series. First i used 25 candlestick data, then changed to 500. When you query binance, query a lot of date, because the mathematical calculation of the EMA is from the beginning of the series.
best regards
Related
My first post, so I hope I do this correctly!
This is admittedly an OOP modification of something on DataCamp. I have two objects which contain Pandas dataframes. The first (StockData) has stock data for every trading day of 2016 for both Amazon and Facebook. The second (BenchmarkData) has the S&P 500 closing values for every trading day of 2016. For both, I want to calculate the percent change (StockReturns and BenchmarkReturns, respectively) and then plot them. I want both of the StockReturns on the same plot, but the BenchmarkReturns (which is a Series and not a dataframe for reasons irrelevant to this part of the code) on a separate plot. For my function, I've added a flag as input to tell the program whether the object contains a stock dataframe or a benchmark dataframe and I call the function twice during runtime, once for stock and the other for benchmark. However, no matter what I do, Pandas plots all 3 on the same plot. How do I separate the benchmark data and get it on its own plot?
def _CalculatePercentChange(self, IsStockData):
if(IsStockData):
self.__StockReturns = self.__StockData.GetData().pct_change()
self.__StockReturns.plot(title = 'Daily Percent Change')
else:
self.__BenchmarkReturnsDataFrame = self.__BenchmarkData.GetData().pct_change()
self.__BenchmarkReturns = self.__BenchmarkReturnsDataFrame['S&P 500'].squeeze()
self.__BenchmarkReturns.plot(title = 'Daily Percent Change')
Thanks guys.
I downloaded some stock data from CRSP and need the variance of the stock returns of the last 36 months of that company.
So, basically the variance based on two conditions:
Same PERMCO (company number)
Monthly stock returns of the last 3 years.
However, I excluded penny stocks from my sample (stocks with prices < $2). Hence, sometimes months are missing and e.g. april and junes monthly returns are directly on top of each other.
If I am not mistaken, a rolling function (grouped by Permco) would just take the 36 monthly returns above. But when months are missing, the rolling function would actually take more than 3 years data (since the last 36 monthly returns would exceed that timeframe).
Usually I work with Ms Excel. However, in this case the amount of data is too big and it takes years to let Excel calculate stuff. Thats why I want to tackle that problem with Python.
The sample is organized as follows:
PERMNO date SHRCD PERMCO PRC RET
When I have figured out how to make a proper table in here I will show you a sample of my data.
What I have tried so far:
data["RET"]=data["RET"].replace(["C","B"], np.nan)
data["date"] = pd.to_datetime(date["date"])
data=data.sort_values[("PERMCO" , "date"]).reset_index()
L3Yvariance=data.groupby("PERMCO")["RET"].rolling(36).var().reset_index()
Sometimes there are C and B instead of actual returns, thats why the first line
You can replace the missing values by the mean value. It won't affect the variance as the variance is calculated after subtracting the mean, so in this case, for times you won't have the value, the contribution to variance will be 0.
I currently try to put the following time series into one plot, with adjusted scalings to examine whether they correlate or not.
raw = pd.read_csv('C:/Users/Jens/Documents/Studium/Master/Big Data Analytics/Lecture 5/tr_eikon_eod_data.csv', index_col=0, parse_dates=True)
data = raw[['.SPX', '.VIX']].dropna()
data.tail()
data.plot(subplots=True, figsize=(10,6));
Has anyone an idea how to do that?
Also, would it possible to do the same with data from two different data sets? I like to compare a house price index with a stock index. I have the daily closing prices for the stock index and just a value for each quarter for the house prices (10y). I.e. closing price against house price index.
I don't really know where to start.
Thank you!:)
Here's how Yahoo Finance apparently calculates Adjusted Close stock prices:
https://help.yahoo.com/kb/adjusted-close-sln28256.html
From this, I understand that a constant factor is applied to the unadjusted price and that said factor changes with each dividend or split event, which should happen not too often. And that I should be able to infer that factor by dividing the unadjusted by the adjusted price.
However, if I verify this with AAPL data (using Python), I get confusing results:
import yfinance
df = yfinance.download("AAPL", start="2010-01-01", end="2019-12-31")
df["Factor"] = df["Close"] / df["Adj Close"]
print(df["Factor"].nunique(), df["Factor"].count())
Which produces: 2442 2516
So the factor is different in by far most of the cases. But AAPL usually has 4 dividend events per year and had a stock split during that period, so I would expect roughly 40 different factors rather than 2442.
Is the formula Yahoo Finance provides under the link above overly simplified or am I missing something here?
The problem is that Yahoo Finance doesn't provide BOTH raw and adjusted prices for you to work with. If you check the footnote of a sample historical price page (e.g., MSFT), you will see a text that says "Close price adjusted for splits; Adjusted close price adjusted for both dividends and splits."
In order to derive clean adjusted ratios, both raw (unadjusted) and adjusted prices are needed. Then you can apply an adjustment method such as CRSP to derive the correct values. In summary, you didn't do anything wrong! It's the intrinsic limitation of Yahoo's output.
References:
[1] https://medium.com/#patrick.collins_58673/stock-api-landscape-5c6e054ee631
[2] http://www.crsp.org/products/documentation/crsp-calculations
I'm not sure this is a complete answer, but it's too long for a comment:
First, there is definitely an issue with rounding. If you modify your third line to
df["Factor"] = df["Close"] / df["Adj Close"].round(12)
you get 2441 unique factors. If, however, you use, for example round(6), you only get 46 unique factors.
Second, according to the adjustment rules in the Yahoo help page in your question, each dividend results in an adjustment for the 5 trading dates immediately prior to the ex-divined date. During the 10 year period in your question, there were no stock splits and approximately 40 quarterly dividends. These should have resulted in 200 dates with adjusted closing prices. All the other 2300 or so dates should have no closing adjustments, ie., a Factor of 1. Yet when you run:
df[df.Factor == 1].shape
you get only 37 dates (regardless of the rounding used) with no adjustments.
Obviously, that doesn't make sense and - unless I'm missing something basic - there is some error either in the implementation of the adjustment methodology or in the Yahoo code.
I am quite new in python and I need your help guys!
my data structure
this data is intraday data 5 min from 2001/01/02 till 31/12/2019. As you can see from the data 0 indicated the date, and 2 indicate the prices of the stock.
Each day, such as 2001/01/02 has 79 observation.
First of all, I need to create a daily return as a new column. Normaly I was dealing with daily data and for the daily log return was as follow
def lr(x):
return np.log(x[1:]) - np.log(x[:-1])
How can I create new column for the daily return from the 5 min data.
If you load your data into a pandas.DataFrame, you can use df.groupby() and then apply your lr-function with minimal changes:
df = pd.read_excel('path/to/your/file.xlsx', header=None,
names=['Index', 'Date', 'Some_var', 'Stock_price'])
The key thing to do, though, will be to decide how you want to generate your daily values from your 5 minute data. I'm no stock expert, but I'd guess you want to use the last value for each day to represent the stock value. If that's the case, you can use
daily_values = df.groupby('Date')['Stock_price'].agg('last')
and then apply your lr function to get the returns
lr(daily_values)