I'm following a tutorial on using Yfinance in Jupyter Notebook to get prices for SPY (S&P 500) in a dataframe. The code looks simple, but I can't seem to get the desired results.
df_tickers = pd.DataFrame()
spyticker = yf.Ticker("SPY")
print(spyticker)
df_ticker = spyticker.history(period="max", interval="1d", start="1998-12-01", end="2022-01-01" , auto_adjust=True, rounding=True)
df_ticker.head()
The error states: "SPY: No data found for this date range, symbol may be delisted." But when I print spyticker, I get the correct yfinance object:
yfinance.Ticker object <SPY>
I am not sure what your problem is but if I use the following:
spyticker = yf.Ticker("SPY")
df_ticker = spyticker.history(period="max", interval="1d", start="1998-12-01", end="2022-01-01" , auto_adjust=True, rounding=True)
df_ticker.head()
I get the following:
Open High Low Close Volume Dividends Stock Splits
Date
1998-12-01 76.02 77.27 75.43 77.00 8950600 0.0 0
1998-12-02 76.74 77.19 75.94 76.78 7495500 0.0 0
1998-12-03 76.76 77.45 75.35 75.51 12145300 0.0 0
1998-12-04 76.35 77.58 76.27 77.49 10339500 0.0 0
1998-12-07 77.29 78.21 77.25 77.86 4290000 0.0 0
My only explanation is that the call to spyticker.history already returns a dataframe, so it isn't necessary to define the df_ticker beforehand.
Related
I am using yahoo finance in python and when I run the following code:
print(apple.history('max'))
It gives me this output:
Open High ... Dividends Stock Splits
Date ...
1980-12-12 0.100323 0.100759 ... 0.0 0.0
1980-12-15 0.095525 0.095525 ... 0.0 0.0
How do I get the output to show the Low price, Close price, and Volume for each date as many sites show it does? It only shows me the 3 dots in between High and Dividends.
ticker.history() returns a Pandas DataFrame. You can access any column using the column name e. g. 'Low'. A primer on indexing and selecting data can be found in the docs.
By default the number of rows, that are shown of a DataFrame are limited. However, you can disable this limit.
import yfinance as yf
import pandas as pd
apple = yf.Ticker('AAPL')
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(apple.history('max')[['Low', 'Close', 'Volume']])
# or: display(...) if working with jupyter
The context of the problem that I am dealing with is trying to convert the results from a time series forecast, plotted with matplotlib.plotly back into a dataframe so that I can use the cufflinks library to be able to get a more interactive chart going so that I can hover over data points to get a more detailed look at the forecast.
so after training and creating a simulation the code goes:
date_ori = pd.to_datetime(df.iloc[:, 0]).tolist()
for i in range(test_size):
date_ori.append(date_ori[-1] + timedelta(days = 1))
date_ori = pd.Series(date_ori).dt.strftime(date_format = '%Y-%m-%d').tolist()
date_ori[-5:]
accepted_results = []
for r in results:
if (np.array(r[-test_size:]) < np.min(df['Close'])).sum() == 0 and \
(np.array(r[-test_size:]) > np.max(df['Close']) * 2).sum() == 0:
accepted_results.append(r)
len(accepted_results)
accuracies = [calculate_accuracy(df['Close'].values, r[:-test_size]) for r in accepted_results]
plt.figure(figsize = (15, 5))
for no, r in enumerate(accepted_results):
plt.plot(r, label = 'forecast %d'%(no + 1))
plt.plot(df['Close'], label = 'true trend', c = 'black')
plt.legend()
plt.title('average accuracy: %.4f'%(np.mean(accuracies)))
x_range_future = np.arange(len(results[0]))
plt.xticks(x_range_future[::30], date_ori[::30])
plt.show()
I have started to dissect the last plotting section to attempt to convert the data into a dataframe in order to plot with cufflinks as the format for cufflinks is like :
import cufflinks as cf
# data from FXCM Forex Capital Markets Ltd.
raw = pd.read_csv('http://hilpisch.com/fxcm_eur_usd_eod_data.csv',
index_col=0, parse_dates=True)
quotes = raw[['AskOpen', 'AskHigh', 'AskLow', 'AskClose']]
quotes = quotes.iloc[-60:]
quotes.tail()
AskOpen AskHigh AskLow AskClose
2017-12-25 22:00:00 1.18667 1.18791 1.18467 1.18587
2017-12-26 22:00:00 1.18587 1.19104 1.18552 1.18885
2017-12-27 22:00:00 1.18885 1.19592 1.18885 1.19426
2017-12-28 22:00:00 1.19426 1.20256 1.19369 1.20092
2017-12-31 22:00:00 1.20092 1.20144 1.19994 1.20147
qf = cf.QuantFig(
quotes,
title='EUR/USD Exchange Rate',
legend='top',
name='EUR/USD'
)
qf.iplot()
Where I have gotten so far is trying to dissect the plotly graph into a dataframe as so, these are the forecasted results:
df = accepted_results
rd = pd.DataFrame(df)
rd.T
0 1 2 3 4 5 6 7
0 768.699985 768.699985 768.699985 768.699985 768.699985 768.699985 768.699985 768.699985
1 775.319656 775.891012 772.283885 737.763376 773.811344 785.021571 770.438252 770.464180
2 772.387081 787.562968 764.858772 737.837558 775.712162 770.660990 768.103724 770.786379
3 786.316425 779.248516 765.839603 760.195678 783.410054 789.610540 765.924561 773.466415
4 796.039144 803.113903 790.219174 770.508252 795.110376 793.371152 774.331197 786.772606
... ... ... ... ... ... ... ... ...
277 1042.788063 977.462670 1057.189696 1262.203613 1057.900621 1042.329811 1053.378352 1171.416597
278 1026.857102 975.473725 1061.585063 1307.540754 1061.490772 1049.696547 1054.122795 1117.779434
279 1029.388746 977.097765 1069.265953 1192.250498 1064.540056 1049.169295 1045.126807 1242.474584
280 1030.373147 983.650686 1070.628785 1103.139889 1053.571269 1030.669091 1047.641127 1168.965372
281 1023.118504 984.660763 1071.661590 1068.445156 1080.461617 1035.736879 1035.599867 1231.714340
then converting the x axis from
plt.xticks(x_range_future[::30], date_ori[::30])
to
df1 = pd.DataFrame((x_range_future[::30], date_ori[::30]))
df1.T
0 1
0 0 2016-11-02
1 30 2016-12-15
2 60 2017-01-31
3 90 2017-03-15
4 120 2017-04-27
5 150 2017-06-09
6 180 2017-07-24
7 210 2017-09-05
8 240 2017-10-17
9 270 2017-11-20
lastly I have the close column and this is what I've been able to come up with for it so far
len(df['Close'].values)
252
when i use
df['Close'].values
I get an array, I'm having problems getting this all together, the cufflinks iplot graphs are just way better, and it would be amazing if I could somehow gain the intuition to do this, I apologize in advance if I didn't try hard enough, but I'm doing my best I can't seem to find the answer no matter how many times I've searched google so I thought I would ask here.
This is what I did, I went through and printed indipendent strings like print(date_ori) as well as simplified it with print(len(date_ori) which in turn had all of the dates for the forecast, then i made it into a dataframe with df['date'] = pd.DataFrame(date_ori), where as with the results, I had to transpose them with df.T so they would be in a long column format rather than in a long row, so first
df = pd.DataFrame(results)
df = df.T
then
df['date'] = pd.DataFrame(date_ori)
I had trouble naming the column 0 which contained all of the predicted results so i just saved the file with
df.to_csv('yo')
then i edited the column named 0 to results and added .csv to the end, then pulled the data back into memory
then i formatted the date
format = '%Y-%m-%d'
df['Datetime'] = pd.to_datetime(df['date'], format=format)
df = df.set_index(pd.DatetimeIndex(df['Datetime']))
and dropped the un needed columns, and i guess i could add the close data that i started with to plot together now, but i got the results into the dataframe so now i can use these awesome charts! Can't believe i figured it out within 18 hours I was so lost lol.
also i dropped the experiment to just one simulation so there was only 1 row of results to deal with so i could figure it out.
I have this code:
data = pd.read_csv("out.csv")
df=data[['created_at','ticker','close']]
print(df)
print(df.corr())
out.csv looks like this:
created_at,ticker,adj_close,close,high,low,open,volume
2020-06-02 09:30:00-04:00,A,90.33000183105469,90.33000183105469,90.41000366210938,89.94999694824219,90.0,45326.0
2020-06-02 09:31:00-04:00,A,90.2300033569336,90.2300033569336,90.2300033569336,90.22000122070312,90.22000122070312,709.0
2020-06-08 15:56:00-04:00,ZYXI,22.899900436401367,22.899900436401367,22.959999084472656,22.829999923706055,22.959999084472656,5304.0
2020-06-08 15:57:00-04:00,ZYXI,22.920000076293945,22.920000076293945,22.950000762939453,22.889999389648438,22.899999618530273,5317.0
2020-06-08 15:58:00-04:00,ZYXI,22.860000610351562,22.860000610351562,22.93000030517578,22.860000610351562,22.90999984741211,10357.0
I want to see a correlation matrix between tickers using the close price over time which is why I have included the created_at column. However, when I do print(df.corr) I only see the result below not sure why
close
close 1.0
Found the answer https://www.interviewqs.com/blog/py_stock_correlation
data = pd.read_csv("out.csv")
dfdata=data[['created_at','ticker','close']]
# print(df)
df_pivot = dfdata.pivot('created_at','ticker','close').reset_index()
print("loaded df")
# print(df_pivot.head())
corr_df = df_pivot.corr(method='pearson')
#reset symbol as index (rather than 0-X)
corr_df.head().reset_index()
del corr_df.index.name
print(corr_df.head(10))
I am super noob in pandas and I am following a tutorial that is obviously outdated.
I have this simple script that when I run I get tis error :
ValueError: Array conditional must be same shape as self
# loading the class data from the package pandas_datareader
import pandas as pd
from pandas_datareader import data
import matplotlib.pyplot as plt
# Adj Close:
# The closing price of the stock that adjusts the price of the stock for corporate actions.
# This price takes into account the stock splits and dividends.
# The adjusted close is the price we will use for this example.
# Indeed, since it takes into account splits and dividends, we will not need to adjust the price manually.
# First day
start_date = '2014-01-01'
# Last day
end_date = '2018-01-01'
# Call the function DataReader from the class data
goog_data = data.DataReader('GOOG', 'yahoo', start_date, end_date)
goog_data_signal = pd.DataFrame(index=goog_data.index)
goog_data_signal['price'] = goog_data['Adj Close']
goog_data_signal['daily_difference'] = goog_data_signal['price'].diff()
goog_data_signal['signal'] = 0.0
# this line produces the error
goog_data_signal['signal'] = pd.DataFrame.where(goog_data_signal['daily_difference'] > 0, 1.0, 0.0)
goog_data_signal['positions'] = goog_data_signal['signal'].diff()
print(goog_data_signal.head())
I am trying to understand the theory, the libraries and the methodology through practicing so bear with me if it is too obvious... :]
The where method is always called from a dataframe however here, you only need to check the condition for a series, so I found 2 ways to solve this problem:
The new where method doesn't support setting a value for the rows where condition is true (1.0 in your case), but still supports setting a value for the false rows (called the other parameter in the doc). So you can set the 1.0's manually later as follows:
goog_data_signal['signal'] = goog_data_signal.where(goog_data_signal['daily_difference'] > 0, other=0.0)
# the true rows will retain their values and you can set them to 1.0 as needed.
Or you can check the condition directly as follows:
goog_data_signal['signal'] = (goog_data_signal['daily_difference'] > 0).astype(int)
The second method produces the output for me:
price daily_difference signal positions
Date
2014-01-02 554.481689 NaN 0 NaN
2014-01-03 550.436829 -4.044861 0 0.0
2014-01-06 556.573853 6.137024 1 1.0
2014-01-07 567.303589 10.729736 1 0.0
2014-01-08 568.484192 1.180603 1 0.0
I run a query to a webserver with certain added criteria.
I specify a date range which alters the date in the url.
I then pull the data line for specified symbols and I get a list of short volume etc. for the specified stock and time frame.
However, I want to be able to get the output as a dataframe.
The dataframe is now still the stored dataframe from the last ran url, and not of the output.
I tried to use list_.append which I did not get to work.
import pandas as pd
from datetime import datetime
import urllib
symbols = ['AABA']
start_date = datetime(2019, 5, 10 )
end_date = datetime(2019, 5, 15 )
datelist = pd.date_range(start_date, periods=(end_date-start_date).days+1).tolist()
for date in datelist:
url = f"http://regsho.finra.org/FNYXshvol{date.strftime('%Y%m%d')}.txt"
try:
df = pd.read_csv(url,delimiter='|')
if any(df['Symbol'].isin(symbles)):
stocks = df[df['Symbol'].isin(symbols)].to_string(index=False, header=False)
print(stocks)
else:
print(f'No stock found for {date.date()}' )
except urllib.error.HTTPError:
continue
The result is now:
20190510 AABA 2300.0 0.0 14617.0 N
20190513 AABA 2816.0 0.0 39128.0 N
20190514 AABA 1761.0 0.0 26191.0 N
20190515 AABA 24092.0 0.0 62745.0 N
I want the result to be in a dataframe so that I can directly export the result to csv
Why do you convert dataframe to string when you want the output to be a dataframe? (For example, df[df['Symbol'].isin(symbols)].to_csv('ABBA.csv', index=False, header=False)) Anyways, to convert string back to dataframe you can use pandas.read_fwf:
from io import StringIO
df=pd.read_fwf(StringIO(stocks), header=None)
OUTPUT:
0 1 2 3 4 5
0 20190510 AABA 2300.0 0.0 14617.0 N
1 20190513 AABA 2816.0 0.0 39128.0 N
2 20190514 AABA 1761.0 0.0 26191.0 N
3 20190515 AABA 24092.0 0.0 62745.0 N
stocks is a dataframe before you convert it to a string. Just keep it as a dataframe, store it in a list and just concat that list to obtain a full dataframe:
dflist = []
for date in datelist:
url = f"http://regsho.finra.org/FNYXshvol{date.strftime('%Y%m%d')}.txt"
try:
df = pd.read_csv(url,delimiter='|')
if any(df['Symbol'].isin(symbles)):
stocks = df[df['Symbol'].isin(symbols)]
print(stocks.to_string(index=False, header=False))
dflist.append(stocks)
else:
print(f'No stock found for {date.date()}' )
except urllib.error.HTTPError:
continue
df = pd.concat(dflist)