the code I'm running gives results that are space de-liminated. This creates a problem with my sector column which gives a result of Communication Services. It creates 1 column for Communication and another column for Services where I need 1 column saying Communication Services. I have tried to concatentate the 2 columns into 1 but I'm getting attribute and str errors and don't know how to achieve this. Can anyone show how this can be done? Thanks
Code
import yfinance as yf
import pandas as pd
from concurrent.futures import ThreadPoolExecutor
list_of_futures= []
def get_stats(ticker):
info = yf.Tickers(ticker).tickers[ticker].info
s= f"{ticker} {info['currentPrice']} {info['marketCap']} {info['sector']}"
list_of_futures.append(s)
ticker_list = ['AAPL', 'ORCL', 'GTBIF', 'META']
with ThreadPoolExecutor() as executor:
executor.map(get_stats, ticker_list)
(
pd.DataFrame(list_of_futures)
[0].str.split(expand=True)
.rename(columns={0: "Ticker", 1: "Price", 2: "Market Cap", 3: "Sector", 4: "Sector1"})
.to_excel("yahoo_futures.xlsx", index=False)
)
Current Results
Desired Results
Let us reformulate the get_stats function to return dictionary instead string. This way you can avoid the unnecessary step to split the strings to create a dataframe
def get_stats(ticker):
info = yf.Tickers(ticker).tickers[ticker].info
cols = ['currentPrice', 'marketCap', 'sector']
return {'ticker': ticker, **{c: info[c] for c in cols}}
tickers = ['AAPL', 'ORCL', 'GTBIF', 'META']
with ThreadPoolExecutor() as executor:
result_iter = executor.map(get_stats, tickers)
df = pd.DataFrame(result_iter)
Result
ticker currentPrice marketCap sector
0 AAPL 148.11 2356148699136 Technology
1 ORCL 82.72 223027183616 Technology
2 GTBIF 13.25 3190864896 Healthcare
3 META 111.41 295409188864 Communication Services
Related
I try to scrape the yearly total revenues from yahoo finance using pandas and yahoo_fin by using the following code:
from yahoo_fin import stock_info as si
import yfinance as yf
import pandas as pd
tickers = ('AAPL', 'MSFT', 'IBM')
income_statements_yearly= [] #All numbers in thousands
for ticker in tickers:
income_statement = si.get_income_statement(ticker, yearly=True)
years = income_statement.columns
income_statement.insert(loc=0, column='Ticker', value=ticker)
for i in range(4):
#print(years[i].year)
income_statement.rename(columns = {years[i]:years[i].year}, inplace = True)
income_statements_yearly.append(income_statement)
income_statements_yearly = pd.concat(income_statements_yearly)
income_statements_yearly
The result I get looks like:
I would like to create on that basis another dataframe revenues and reduce the dataframe to only the row totalRevenue instead of getting all rows and at the same time I would love to rename the columns 2021, 2020, 2019, 2018 to revenues_2021, revenues_2020, revenues_2019, revenues_2018.
The result shall look like:
df = pd.DataFrame({'Ticker': ['AAPL', 'MSFT', 'IBM'],
'revenues_2021': [365817000000, 168088000000, 57351000000],
'revenues_2020': [274515000000, 143015000000, 55179000000],
'revenues_2019': [260174000000, 125843000000, 57714000000],
'revenues_2018': [265595000000, 110360000000, 79591000000]})
How can I solve this in an easy and fast way?
Ty for your help in advance.
CODE
revenues = income_statements_yearly.loc["totalRevenue"].reset_index(drop=True)
revenues.columns = ["Ticker"] + ["revenues_" + str(col) for col in revenues.columns if col != "Ticker"]
OUTPUT
Ticker revenues_2021 revenues_2020 revenues_2019 revenues_2018
0 AAPL 365817000000 274515000000 260174000000 265595000000
1 MSFT 168088000000 143015000000 125843000000 110360000000
2 IBM 57351000000 55179000000 57714000000 79591000000
I am trying to get the current price and market cap of all of the tickers in the S&P500 and the way I am currently doing it is very slow, so I was wondering if there was anything I could do to improve it, or any other methods.
Here is my current method, simply to print the name, market cap and current price:
import yfinance as yf
#I am using a csv file with a list of all the tickers which I use to create a pandas dataframe and form a space seperated string of all of the tickers called all_symbols
#I have simplified the pandas dataframe to a list for the purpose of this question
ticker_list = ["A", "AL", "AAP", "AAPL", ... "ZBRA", "ZION", "ZTS"]
all_symbols = " ".join(ticker_list)
tickers = yf.Tickers(all_symbols)
for ticker in ticker_list:
price = tickers.tickers[ticker].info["currentPrice"]
market_cap = tickers.tickers[ticker].info["marketCap"]
print(ticker, market_cap, price)
This method is currently very slow and the information is received one at a time, so is there anyway to make it faster and/or get the ticker info as a batch.
I have also tried using the yf.download method to download information on multiple tickers at once, and this was faster but I could not get the information I wanted from that, so is it possible to get the market cap and current price using the yf.download method?
Although there have been similar questions to this, they all seem to use the same general idea which I use, which takes a long time when the number of tickers is high, I am yet to find any solution which is faster than my current one, so any suggestions are appreciated, even solutions not using yfinance, as long as they get real-time data without a massive delay.
There is another library you can try called yahooquery. In my trial the time reduced from 34 seconds to 0.4 seconds.
from yahooquery import Ticker
ticker_list = ["A", "AL", "AAP", "AAPL", "ZBRA", "ZION", "ZTS"]
all_symbols = " ".join(ticker_list)
myInfo = Ticker(all_symbols)
myDict = myInfo.price
for ticker in ticker_list:
ticker = str(ticker)
longName = myDict[ticker]['longName']
market_cap = myDict[ticker]['marketCap']
price = myDict[ticker]['regularMarketPrice']
print(ticker, longName, market_cap, price)
There are lots of other information in the myDict {} dictionary, check it out.
You may find that getting the values for a single ticker in a discrete thread will give you better overall performance. Here's an example:
import yfinance as yf
from concurrent.futures import ThreadPoolExecutor
def get_stats(ticker):
info = yf.Tickers(ticker).tickers[ticker].info
print(f"{ticker} {info['currentPrice']} {info['marketCap']}")
ticker_list = ['AAPL', 'ORCL', 'PREM.L', 'UKOG.L', 'KOD.L', 'TOM.L', 'VELA.L', 'MSFT', 'AMZN', 'GOOG']
with ThreadPoolExecutor() as executor:
executor.map(get_stats, ticker_list)
Output:
VELA.L 0.035 6004320
UKOG.L 0.1139 18496450
PREM.L 0.461 89516976
ORCL 76.755 204970377216
MSFT 294.8669 2210578825216
TOM.L 0.604 10558403
KOD.L 0.3 47496900
AMZN 3152.02 1603886514176
AAPL 171.425 2797553057792
GOOG 2698.05 1784584732672
I am trying to use Yfinance to download into a single pandas dataframe some info such as industry, beta and market cap (columns) for a number of S&P stocks (rows). In the simplified example below it's the industry and beta of 3 stocks.
How can I automate the code so that I don't have to use info.get() each time ? I plan on downloading about 10 different parameters besides the industry and beta...
What is the best way to turn the current output (a list) into the pandas dataframe I outlined above? Thanks!
import yfinance as yf
stocks = ['JNJ', 'MSFT','GS']
df=[]
for stock in stocks:
info = yf.Ticker(stock).info
industry = info.get('industry')
beta = info.get('beta')
df.extend((stock,industry,beta))
print(df)
===== OUTPUT ====
['JNJ', 'Drug Manufacturers—General', 0.711267, 'MSFT', 'Software—Infrastructure', 0.812567, 'GS', 'Capital Markets', 1.484832]
The return value of '.info' is in dict format, so you can extract it once you get it. Put it in a data frame and list it.
import yfinance as yf
import pandas as pd
stocks = ['JNJ', 'MSFT','GS']
df = pd.DataFrame()
for stock in stocks:
info = yf.Ticker(stock).info
industry = info['industry']
beta = info['beta']
marketcap = info['marketCap']
df = df.append({'Stock':stock,'Industry':industry,'Beta':beta,'marketcap':marketcap}, ignore_index=True)
print(df)
Beta Industry Stock marketcap
0 0.711267 Drug Manufacturers—General JNJ 4.103370e+11
1 0.812567 Software—Infrastructure MSFT 1.746778e+12
2 1.484832 Capital Markets GS 1.132026e+11
Specify the first column
df = pd.DataFrame(columns=['Stock','Industry','Beta','Marketcap'])
Finally, change the column order.
df.columns = ['Stock','Industry','Beta','Marketcap']
I have create a Dataframe from a dictionary like this:
import pandas as pd
data = {'Name': 'Ford Motor', 'AssetType': 'Common Stock', 'Exchange': 'NYSE'}
records = []
statement = {}
for key, value in data.items():
statement = {}
statement[key] = value
records.append(statement)
df = pd.DataFrame(records)
If I do this way, the output look like this:
Name AssetType Exchange
0 Ford Motor NaN NaN
1 NaN Common Stock NaN
2 NaN NaN NYSE
I want the values on the first row and the result look like this:
Name AssetType Exchange
0 Ford Common Stock NYSE
Just put data inside a list [] when creating dataframe:
import pandas as pd
data = {'Name': 'Ford Motor', 'AssetType': 'Common Stock', 'Exchange': 'NYSE'}
df = pd.DataFrame([data])
print(df)
Prints:
Name AssetType Exchange
0 Ford Motor Common Stock NYSE
There are a lot of ways you might want to turn data (dict, list, nested list, etc) into a dataframe. Pandas also includes many creation methods, some of which will overlap, making it hard to remember how to create dfs from data. Here are a few ways you could do this for your data:
df = pd.DataFrame([data])
df = pd.Series(data).to_frame().T
pd.DataFrame.from_dict(data, orient="index").T
pd.DataFrame.from_records(data, index=[0])
imo, from_dict is the least intuitive (I never get the arguments right on the first try). I find focusing on one construction method to be more memorable than using a different one each time; I use pd.DataFrame(...) and from_records(...) the most.
I try to download key financial ratios from yahoo finance via the FundamentalAnalysis library. It's pretty easy for single I have a df with tickers and names:
Ticker Company
0 A Agilent Technologies Inc.
1 AA ALCOA CORPORATION
2 AAC AAC Holdings Inc
3 AAL AMERICAN AIRLINES GROUP INC
4 AAME Atlantic American Corp.
I then tried to use a for-loop to download the ratios for every ticker with fa.ratios().
for i in range (3):
i = 0
i = i + 1
Ratios = fa.ratios(tickers["Ticker"][i])
So basically it shall download all ratios for one ticker and the second and so on. I also tried to change the df into a list, but it didn't work as well. If I put them in a list manually like:
Symbol = ["TSLA" , "AAPL" , "MSFT"]
it works somehow. But as I want to work with Data from 1000+ Tickers I don't want to type all of them manually into a list.
Maybe this question has already been answered elsewhere, in that case sorry, but I've not been able to find a thread that helps me. Any ideas?
You can get symbols using
symbols = df['Ticker'].to_list()
and then you could use for-loop without range()
ratios = dict()
for s in symbols:
ratios[s] = fa.ratios(s)
print(ratios)
Because some symbols may not give ratios so you should use try/except
Minimal working example. I use io.StringIO only to simulate file.
import FundamentalAnalysis as fa
import pandas as pd
import io
text='''Ticker Company
A Agilent Technologies Inc.
AA ALCOA CORPORATION
AAC AAC Holdings Inc
AAL AMERICAN AIRLINES GROUP INC
AAME Atlantic American Corp.'''
df = pd.read_csv(io.StringIO(text), sep='\s{2,}')
symbols = df['Ticker'].to_list()
#symbols = ["TSLA" , "AAPL" , "MSFT"]
print(symbols)
ratios = dict()
for s in symbols:
try:
ratios[s] = fa.ratios(s)
except Exception as ex:
print(s, ex)
for s, ratio in ratios.items():
print(s, ratio)
EDIT: it seems fa.ratios() returns DataFrames and if you will keep them on list then you can concatenate all DataFrames to one DataFrame
ratios = list() # list instead of dictionary
for s in symbols:
try:
ratios.append(fa.ratios(s)) # append to list
except Exception as ex:
print(s, ex)
df = pd.concat(ratios, axis=1) # convert list of DataFrames to one DataFrame
print(df.columns)
print(df)
Doc: pandas.concat()