Gathering rate of return from yfinance API - python

Picture of data
Looking at my data I am trying to create a new column in a separate dataset that gives the ticker and the rate of return calculated by taking the open price of the first observation for a ticker and the last observation for that same ticker to take the close price and use those two numbers to calculate for my rate of return.

Can you try using this function and tell me if it achieves your purpose?
def calculate_return_rate(df):
df_ordered = df.sort_values(by=["date"])
df_last_close = df_ordered.groupby(["Ticker"]).agg("last")["close"]
df_first_open = df_ordered.groupby(["Ticker"]).agg("first")["open"]
return (df_last_close - df_first_open)/df_first_open

Related

Obtaining Data from a dataframe at desired timestep

I have a long timeseries with a 15 minute time step. I want to obtain timeseries at 3H time step from the existing series. I have tried different methods including the resample method. But the resample method does not work for me. I decided to run a loop to obtain these value. I used the following piece of code. But I am not sure why it is not working as I expect it to work. I cannot use the resample.mean() since I don't want to miss any actual peak values e.g. that of a flood wave. I want to keep the original data as it is.
station_number = []
timestamp = []
water_level = []
discharge = []
for i in df3.index:
station_number.append(df3['Station ID'][i])
timestamp.append(df3['Timestamp'][i])
water_level.append(df3['Water Level (m)'][i])
discharge.append(df3['Discharge (m^3/s)'][i])
i = i + 12
pass
df5 = pd.DataFrame(station_number, columns=['Station ID'],)
df5['Timestamp']= timestamp
df5['Water Level (m)']= water_level
df5['Discharge (m^3/s)']= discharge
df5
Running this code returns me the same dateframe. My logic is that the value of i updates by 12 steps and pick up the corresponding values from the dataset. Please guide if I am doing something wrong.

exponential moving average(ema) differs from binance

I am calculating ema with python on binance (BTC Futures) monthly open price data(20/12~21/01).
ema2 gives 25872.82333 on the second month like below.
df = pd.Series([19722.09, 28948.19])
ema2 = df.ewm(span=2,adjust=False).mean()
ema2
0 19722.090000
1 25872.823333
But in binance, ema(2) gives difference value(25108.05) like in the picture.
https://www.binance.com/en/futures/BTCUSDT_perpetual
Any help would be appreciated.
I had a the same problem, that the calculated EMA (df.ewm...) from pandas wasn't the same as the one from binance. You have to use a longer series. First i used 25 candlestick data, then changed to 500. When you query binance, query a lot of date, because the mathematical calculation of the EMA is from the beginning of the series.
best regards

Loop for interaction between two series

I have two series, one having the company stock volume for all the many stocks across many exchanges (a lot of the stocks trade in all the exchanges). The other series is of the standard deviation of each stock (each company, irrespective of the exchanges they are traded in). Now, I have been trying to create a loop, to divide the volume of the respective stock (in first series) with the combined standard deviation that is in the second series. I made the following loop:
#for standard deviation of volume of each stock across all exchanges.(it is working properly)
stdev_volume = Main_df_retvol.groupby(['pair_name'], sort=False)['volume'].std()
#loop to divide the volume by the standard deviation of volume of respective stock.(loop not working)
df_vol_std = []
for i in range(len(stdev_volume)):
if stdev_volume[i]['pair_name'] == Main_df_retvol['pair_name']:
df_vol_std = Main_df_retvol['vol'].divide(other = stdev_volume['Volume'])
print(df_vol_std)
Any help would be really appreciated.
Let's break it down...
Getting an index to select each row in stdev
for i in range(len(stdev_volume)):
Comparing a scalar value from a row/col in stdev to a full column from main (which will raise an exception)
if stdev_volume[i]['pair_name'] == Main_df_retvol['pair_name']:
And taking a variable you had initialized as a list and overwriting with a full column/ column division ( regardless of the intended row, and will only keep the value from the last time around the loop anyways)
df_vol_std = Main_df_retvol['vol'].divide(other = stdev_volume['Volume'])
So, instead of that loop i would suggest something like:
main.join(stdev, on='pair_name')
Or go even further to when you build stdev and add it as a column on main:
main = main.assign(stdev=
main.groupby('pair_name').volume.transform('std'))
main = main.assign(volbystd=
main.volume.div(main.stdev))
If you provide a sample of your data we can test if this works

Python Correlation Analysis of 2 data sets

I currently try to put the following time series into one plot, with adjusted scalings to examine whether they correlate or not.
raw = pd.read_csv('C:/Users/Jens/Documents/Studium/Master/Big Data Analytics/Lecture 5/tr_eikon_eod_data.csv', index_col=0, parse_dates=True)
data = raw[['.SPX', '.VIX']].dropna()
data.tail()
data.plot(subplots=True, figsize=(10,6));
Has anyone an idea how to do that?
Also, would it possible to do the same with data from two different data sets? I like to compare a house price index with a stock index. I have the daily closing prices for the stock index and just a value for each quarter for the house prices (10y). I.e. closing price against house price index.
I don't really know where to start.
Thank you!:)

Intraday daily return

I am quite new in python and I need your help guys!
my data structure
this data is intraday data 5 min from 2001/01/02 till 31/12/2019. As you can see from the data 0 indicated the date, and 2 indicate the prices of the stock.
Each day, such as 2001/01/02 has 79 observation.
First of all, I need to create a daily return as a new column. Normaly I was dealing with daily data and for the daily log return was as follow
def lr(x):
return np.log(x[1:]) - np.log(x[:-1])
How can I create new column for the daily return from the 5 min data.
If you load your data into a pandas.DataFrame, you can use df.groupby() and then apply your lr-function with minimal changes:
df = pd.read_excel('path/to/your/file.xlsx', header=None,
names=['Index', 'Date', 'Some_var', 'Stock_price'])
The key thing to do, though, will be to decide how you want to generate your daily values from your 5 minute data. I'm no stock expert, but I'd guess you want to use the last value for each day to represent the stock value. If that's the case, you can use
daily_values = df.groupby('Date')['Stock_price'].agg('last')
and then apply your lr function to get the returns
lr(daily_values)

Categories