I am trying to get the current price and market cap of all of the tickers in the S&P500 and the way I am currently doing it is very slow, so I was wondering if there was anything I could do to improve it, or any other methods.
Here is my current method, simply to print the name, market cap and current price:
import yfinance as yf
#I am using a csv file with a list of all the tickers which I use to create a pandas dataframe and form a space seperated string of all of the tickers called all_symbols
#I have simplified the pandas dataframe to a list for the purpose of this question
ticker_list = ["A", "AL", "AAP", "AAPL", ... "ZBRA", "ZION", "ZTS"]
all_symbols = " ".join(ticker_list)
tickers = yf.Tickers(all_symbols)
for ticker in ticker_list:
price = tickers.tickers[ticker].info["currentPrice"]
market_cap = tickers.tickers[ticker].info["marketCap"]
print(ticker, market_cap, price)
This method is currently very slow and the information is received one at a time, so is there anyway to make it faster and/or get the ticker info as a batch.
I have also tried using the yf.download method to download information on multiple tickers at once, and this was faster but I could not get the information I wanted from that, so is it possible to get the market cap and current price using the yf.download method?
Although there have been similar questions to this, they all seem to use the same general idea which I use, which takes a long time when the number of tickers is high, I am yet to find any solution which is faster than my current one, so any suggestions are appreciated, even solutions not using yfinance, as long as they get real-time data without a massive delay.
There is another library you can try called yahooquery. In my trial the time reduced from 34 seconds to 0.4 seconds.
from yahooquery import Ticker
ticker_list = ["A", "AL", "AAP", "AAPL", "ZBRA", "ZION", "ZTS"]
all_symbols = " ".join(ticker_list)
myInfo = Ticker(all_symbols)
myDict = myInfo.price
for ticker in ticker_list:
ticker = str(ticker)
longName = myDict[ticker]['longName']
market_cap = myDict[ticker]['marketCap']
price = myDict[ticker]['regularMarketPrice']
print(ticker, longName, market_cap, price)
There are lots of other information in the myDict {} dictionary, check it out.
You may find that getting the values for a single ticker in a discrete thread will give you better overall performance. Here's an example:
import yfinance as yf
from concurrent.futures import ThreadPoolExecutor
def get_stats(ticker):
info = yf.Tickers(ticker).tickers[ticker].info
print(f"{ticker} {info['currentPrice']} {info['marketCap']}")
ticker_list = ['AAPL', 'ORCL', 'PREM.L', 'UKOG.L', 'KOD.L', 'TOM.L', 'VELA.L', 'MSFT', 'AMZN', 'GOOG']
with ThreadPoolExecutor() as executor:
executor.map(get_stats, ticker_list)
Output:
VELA.L 0.035 6004320
UKOG.L 0.1139 18496450
PREM.L 0.461 89516976
ORCL 76.755 204970377216
MSFT 294.8669 2210578825216
TOM.L 0.604 10558403
KOD.L 0.3 47496900
AMZN 3152.02 1603886514176
AAPL 171.425 2797553057792
GOOG 2698.05 1784584732672
Related
the code I'm running gives results that are space de-liminated. This creates a problem with my sector column which gives a result of Communication Services. It creates 1 column for Communication and another column for Services where I need 1 column saying Communication Services. I have tried to concatentate the 2 columns into 1 but I'm getting attribute and str errors and don't know how to achieve this. Can anyone show how this can be done? Thanks
Code
import yfinance as yf
import pandas as pd
from concurrent.futures import ThreadPoolExecutor
list_of_futures= []
def get_stats(ticker):
info = yf.Tickers(ticker).tickers[ticker].info
s= f"{ticker} {info['currentPrice']} {info['marketCap']} {info['sector']}"
list_of_futures.append(s)
ticker_list = ['AAPL', 'ORCL', 'GTBIF', 'META']
with ThreadPoolExecutor() as executor:
executor.map(get_stats, ticker_list)
(
pd.DataFrame(list_of_futures)
[0].str.split(expand=True)
.rename(columns={0: "Ticker", 1: "Price", 2: "Market Cap", 3: "Sector", 4: "Sector1"})
.to_excel("yahoo_futures.xlsx", index=False)
)
Current Results
Desired Results
Let us reformulate the get_stats function to return dictionary instead string. This way you can avoid the unnecessary step to split the strings to create a dataframe
def get_stats(ticker):
info = yf.Tickers(ticker).tickers[ticker].info
cols = ['currentPrice', 'marketCap', 'sector']
return {'ticker': ticker, **{c: info[c] for c in cols}}
tickers = ['AAPL', 'ORCL', 'GTBIF', 'META']
with ThreadPoolExecutor() as executor:
result_iter = executor.map(get_stats, tickers)
df = pd.DataFrame(result_iter)
Result
ticker currentPrice marketCap sector
0 AAPL 148.11 2356148699136 Technology
1 ORCL 82.72 223027183616 Technology
2 GTBIF 13.25 3190864896 Healthcare
3 META 111.41 295409188864 Communication Services
I'm trying to build a cryptocurrency price tracker in Python (see code below). I'm working with Python 3.10.1 in Visual Studio Code.
import pandas_datareader.data as web
import datetime as dt
currency = 'EUR'
metric = 'Close'
crypto = ['BTC','ETH']
colnames = []
first = True
start = dt.datetime(2020,1,1)
end = dt.datetime.now()
for ticker in crypto:
data = web.DataReader(f'{crypto}-{currency}', 'yahoo', start, end)
if first:
combined = data[[metric]].copy()
colnames.append(ticker)
combined.columns = colnames
first = False
else:
combined = combined.join(data[metric])
colnames.append(ticker)
combined.columns = colnames
When I execute this code, I get the following error notification:
RemoteDataError: No data fetched for symbol ['BTC', 'ETH']-EUR using YahooDailyReader
When I change the variable crypto to only pull the prices for BTC the code works, but the output looks like this:
Date
B
T
C
2020-01-01
6417.781738
6417.781738
6417.781738
2020-01-02
6252.938477
6252.938477
6252.938477
2020-01-03
6581.735840
6581.735840
6581.735840
In the scenario of only pulling BTC, the variable colnames looks like this: colnames = ['B','T', 'C']. I suspect, there's something wrong with that variable and it's potentially the reason why my code fails when I try to pull the data for multiple cryptocurrencies but I can't quite figure it out and solve my problem.
So im trying to get data from multiple stocks from yahoo finance and write it to excell.
The problem is at the moment i have to hardcode the stocks in question. and currently i would like to download the information from all 25 stocks in the C25 index (^OMXC25) or potentially other indexes. so therefore i would like to know how i can acces the components list and retrieve these and then download each of them. The current code i use to get each is as follows:
import pandas as pd
import pandas_datareader as pdr
import datetime as dt
download_source = (r'C:\Users\SKlin\Downloads\OMXC25.xlsx')
start = dt.datetime(2010,1,1)
end = dt.datetime.today()
writer = pd.ExcelWriter(download_source, engine ='xlsxwriter')
#GN Store Nord
dfGN = pdr.get_data_yahoo('GN.CO',start,end)
dfGN.to_excel(writer, sheet_name='GN.CO')
#Vestas Wind systems
dfVestas = pdr.get_data_yahoo('VWS.CO',start,end)
dfVestas.to_excel(writer, sheet_name='VWS.CO')
writer.save()
This saves the data just fint, but with 25 stocks it's doable, but seems tidious to do with index with 500 stocks.. Plz help.
Use beautiful soup to scrape a list of the ticker names from wiki:
https://en.wikipedia.org/wiki/OMX_Copenhagen_25
Then just iterate through them.
If you can get a list of all symbols:
stocks = ['GN.CO' , 'VWS.CO']
for stock in stocks:
dfGN = pdr.get_data_yahoo(stock ,start,end)
dfGN.to_excel(writer, sheet_name=stock )
I can get 4 tickers of stockinfo from Alpha Vantage before the rest of the DataFrames are not getting the stockinfo I ask for. So my resulting concatenated df gets interpreted as Nonetype (because the 4 first dfs are formatted differently than the last 2). This is not my problem. The fact that I only get 4 of my requests is... If I can fix that - the resulting concatenated df will be intact.
My code
import pandas as pd
import datetime
import requests
from alpha_vantage.timeseries import TimeSeries
import time
tickers = []
def alvan_csv(stocklist):
api_key = 'demo' # For use with Alpha Vantage stock-info retrieval.
for ticker in stocklist:
#data=requests.get('https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=%s&apikey={}'.format(api_key) %(ticker))
df = pd.read_csv('https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&datatype=csv&symbol=%s&apikey={}'.format(api_key) %(ticker))#, index_col = 0) &outputsize=full
df['ticker'] = ticker
tickers.append(df)
# concatenate all the dfs
df = pd.concat(tickers)
print('\ndata before json parsing for === %s ===\n%s' %(ticker,df))
df['adj_close'] = df['adjusted_close']
del df['adjusted_close']
df['date'] = df['timestamp']
del df['timestamp']
df = df[['date','ticker','adj_close','volume','dividend_amount','split_coefficient','open','high','low']] #
df=df.sort_values(['ticker','date'], inplace=True)
time.sleep(20.3)
print('\ndata after col reshaping for === %s ===\n%s' %(ticker,df))
return df
if __name__ == '__main__':
stocklist = ['vws.co','nflx','mmm','abt','msft','aapl']
df = alvan_csv(stocklist)
NB. Please note that to use the Alpha Vantage API, you need a free API-Key which you may optain here: https://www.alphavantage.co/support/#api-key
Replace the demo API Key with your API Key to make this code work.
Any ideas as to get this to work?
Apparently Alpha Vantage has a pretty low fair usage allowance, where they measure no of queries pr. minute. So in effekt only the first 4 stocks are allowed at full speed. The rest of the stocks need to pause before downloading for not violating their fair-usage policy.
I have now introduced a pause between my stock-queries. At the moment I get approx 55% of my stocks, if I pause for 10 sec. between calls, and 100% if I pause for 15 seconds.
I will be testing exactly how low the pause can be set to allow for 100% of stocks to come through.
I must say compared to the super high-speed train we had at finance.yahoo.com, this strikes me as steam-train. Really really slow downloads. To get my 500 worth of tickers it takes me 2½ hours. But I guess beggars can't be choosers. This is a free service and I will manage with this.
I have this code to download data from yahoo:
#gets data from yahoo finance
stocks = list(newmerge.index)
start = dt.datetime(2012,1,1)
end = dt.datetime.today()
yahoodata = pdr.get_data_yahoo(stocks,start,end)
cleanData = yahoodata.loc['Adj Close']
dataFrame = pd.DataFrame(cleanData, columns=stocks)
It works fine but I noticed a problem recently, it doesn't download data for stocks "BRK.B" , and "BR.B" .
I have a list of all the stocks called "stocks" , and here's what I've done, but it still doesn't show data for stocks w/ dot in them:
def stocksdot(stocks):
stocks_dash = str(stocks).replace('.','-')
stockslist = stocks_dash.split(',')
return stockslist
stocksdot(stocks)
My expected output would be to download all stocks, even those with a dot in them. Any ideas how to circumvent?
Your problem is Yahoo Finance doesn't use the "." notation to track shares of different classes. So, "BRK.B" and "BR.B" are actually "BRKB" and "BRB".
Using the Yahoo Finance python SDK I made a little script to test whether or not Yahoo Finance could find information about a stock with the ticker "BRK.B" or "BR.B".
from yahoo_finance import Share
stock = Share('BRK.B')
print(stock.get_price())
This results is:
>>>> None
Stock tickers with a dot in them are used as a shorthand for a type or class of a specific stock. You can learn more here.
To circumvent it looks like you can remove the ".". For example when I use "BRKB" instead of "BRK.B" I get the result:
>>>> 173.05
Which is the current price of Berkshire Hathaway class B stock.
To replace the "." programatically use Python's .replace() method.
for stock in stocks:
stock = stock.replace(".", "") # Replaces all "." with "" in the string
# stock
Your problem is Yahoo Finance doesn't use the "." notation to track shares of different classes. So, "BRK.B" and "BR.B" are actually "BRKB" and "BRB". --- My comment: Now "BRK.B" and "BR.B" are actually "BRK-B" and "BR-B".