I am trying to scrape data from yahoo finance, but I am only able to get data from certain tables on the statistics page at this link https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL. I am able to get data from the top table and the left tables, but I can't figure out why the following program won't scrape from the right tables with values like Beta (5Y Monthly), 52 Week Change,Last Split Factor and Last Split Date
stockStatDict = {}
stockSymbol = 'AAPL'
URL = 'https://finance.yahoo.com/quote/'+ stockSymbol + '/key-statistics?p=' + stockSymbol
page = requests.get(URL, headers=headers, timeout=5)
soup = BeautifulSoup(page.content, 'html.parser')
# Find all tables on the page
stock_data = soup.find_all('table')
# stock_data will contain multiple tables, next we examine each table one by one
for table in stock_data:
# Scrape all table rows into variable trs
trs = table.find_all('tr')
for tr in trs:
print('tr: ', tr)
print()
# Scrape all table data tags into variable tds
tds = tr.find_all('td')
print('tds: ', tds)
print()
print()
if len(tds) > 0:
# Index 0 of tds will contain the measurement
# Index 1 of tds will contain the value
# Insert measurement and value into stockDict
stockStatDict[tds[0].get_text()] = [tds[1].get_text()]
stock_stat_df = pd.DataFrame(data=stockStatDict)
print(stock_stat_df.head())
print(stock_stat_df.info())
Any idea why this code isn't retrieving those fields and values?
To get correct response from the Yahoo server, set User-Agent HTTP header:
import requests
from bs4 import BeautifulSoup
url = "https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for t in soup.select("table"):
for tr in t.select("tr:has(td)"):
for sup in tr.select("sup"):
sup.extract()
tds = [td.get_text(strip=True) for td in tr.select("td")]
if len(tds) == 2:
print("{:<50} {}".format(*tds))
Prints:
Market Cap (intraday) 2.34T
Enterprise Value 2.36T
Trailing P/E 31.46
Forward P/E 26.16
PEG Ratio (5 yr expected) 1.51
Price/Sales(ttm) 7.18
Price/Book(mrq) 33.76
Enterprise Value/Revenue 7.24
Enterprise Value/EBITDA 23.60
Beta (5Y Monthly) 1.21
52-Week Change 50.22%
S&P500 52-Week Change 38.38%
52 Week High 145.09
52 Week Low 89.14
50-Day Moving Average 129.28
200-Day Moving Average 129.32
Avg Vol (3 month) 82.16M
Avg Vol (10 day) 64.25M
Shares Outstanding 16.69B
Implied Shares Outstanding N/A
Float 16.67B
% Held by Insiders 0.07%
% Held by Institutions 58.54%
Shares Short (Jun 14, 2021) 108.94M
Short Ratio (Jun 14, 2021) 1.52
Short % of Float (Jun 14, 2021) 0.65%
Short % of Shares Outstanding (Jun 14, 2021) 0.65%
Shares Short (prior month May 13, 2021) 94.75M
Forward Annual Dividend Rate 0.88
Forward Annual Dividend Yield 0.64%
Trailing Annual Dividend Rate 0.82
Trailing Annual Dividend Yield 0.60%
5 Year Average Dividend Yield 1.32
Payout Ratio 18.34%
Dividend Date May 12, 2021
Ex-Dividend Date May 06, 2021
Last Split Factor 4:1
Last Split Date Aug 30, 2020
Fiscal Year Ends Sep 25, 2020
Most Recent Quarter(mrq) Mar 26, 2021
Profit Margin 23.45%
Operating Margin(ttm) 27.32%
Return on Assets(ttm) 16.90%
Return on Equity(ttm) 103.40%
Revenue(ttm) 325.41B
Revenue Per Share(ttm) 19.14
Quarterly Revenue Growth(yoy) 53.60%
Gross Profit(ttm) 104.96B
EBITDA 99.82B
Net Income Avi to Common(ttm) 76.31B
Diluted EPS(ttm) 4.45
Quarterly Earnings Growth(yoy) 110.10%
Total Cash(mrq) 69.83B
Total Cash Per Share(mrq) 4.18
Total Debt(mrq) 134.74B
Total Debt/Equity(mrq) 194.78
Current Ratio(mrq) 1.14
Book Value Per Share(mrq) 4.15
Operating Cash Flow(ttm) 99.59B
Levered Free Cash Flow(ttm) 80.12B
Related
Here is part of the data of scaffold_table
import pandas as pd
scaffold_table = pd.DataFrame({
'Position':[2000]*5,
'Company':['Amazon', 'Amazon', 'Alphabet', 'Amazon', 'Alphabet'],
'Date':['2020-05-26','2020-05-27','2020-05-27','2020-05-28','2020-05-28'],
'Ticker':['AMZN','AMZN','GOOG','AMZN','GOOG'],
'Open':[2458.,2404.9899,1417.25,2384.330078,1396.859985],
'Volume':[3568200,5056900,1685800,3190200,1692200],
'Daily Return':[-0.006164,-0.004736,0.000579,-0.003854,-0.000783],
'Daily PnL':[-12.327054,-9.472236,1.157283,-7.708126,-1.565741],
'Cumulative PnL/Ticker':[-12.327054,-21.799290,1.157283,-29.507417,-0.408459]})
I would like to create a summary table that returns the overall yield per ticker. The overall yield should be calculated as the total PnL per ticker divided by the last date's position per ticker
# Create a summary table of your average daily PnL, total PnL, and overall yield per ticker
summary_table = pd.DataFrame(scaffold_table.groupby(['Date','Ticker'])['Daily PnL'].mean())
position_ticker = pd.DataFrame(scaffold_table.groupby(['Date','Ticker'])['Position'].sum())
# the total PnL is the sum of PnL per Ticker after two years period
totals = summary_table.droplevel('Date').groupby('Ticker').sum().rename(columns={'Daily PnL':'total PnL'})
summary_table = summary_table.join(totals, on='Ticker')
summary_table = summary_table.join(position_ticker, on = ['Date','Ticker'], how='inner')
summary_table['Yield'] = summary_table.loc['2022-04-29']['total PnL']/summary_table.loc['2022-04-29']['Position']
summary_table
But the yield is showing NaN, could anyone take a look at my codes?
I used ['2022-04-29'] because it is the last date, but I think there are some codes to return the last date without explicitly inputting that.
I solved the problem with the following code
# we want the overall yield per ticker, so total PnL/Position on the last date
summary_table['Yield'] = summary_table['total PnL']/summary_table.loc['2022-04-29']['Position']
This does not specify the date for total PnL since it's the sum by ticker without regard of the date.
I note the comment in your code saying: "Create a summary table of your average daily PnL, total PnL, and overall yield per ticker".
If we start from this, here are a few observations:
the average daily PnL per ticker should just be the mean of Daily PnL for each ticker
the total PnL per ticker is already listed in the Cumulative PnL/Ticker column, so if we use groupby on Ticker and get the value in the Cumulative PnL/Ticker column for the most recent date (namely, for the last() row in the groupby assuming the df is sorted by date), we don't have to calculate it
for the overall yield per ticker (which you have specified "should be calculated as the total PnL per ticker divided by the last date's position per ticker") we can get the relevant Position (namely, for the most recent date per ticker) analogously to how we got the relevant Cumulative PnL/Ticker and use these two values to calculate Yield.
Here is sample code to do this:
import pandas as pd
scaffold_table = pd.DataFrame({
'Position':[2000]*5,
'Company':['Amazon', 'Amazon', 'Alphabet', 'Amazon', 'Alphabet'],
'Date':['2020-05-26','2020-05-27','2020-05-27','2020-05-28','2020-05-28'],
'Ticker':['AMZN','AMZN','GOOG','AMZN','GOOG'],
'Open':[2458.,2404.9899,1417.25,2384.330078,1396.859985],
'Volume':[3568200,5056900,1685800,3190200,1692200],
'Daily Return':[-0.006164,-0.004736,0.000579,-0.003854,-0.000783],
'Daily PnL':[-12.327054,-9.472236,1.157283,-7.708126,-1.565741],
'Cumulative PnL/Ticker':[-12.327054,-21.799290,1.157283,-29.507417,-0.408459]})
print(scaffold_table)
# Create a summary table of your average daily PnL, total PnL, and overall yield per ticker
gb = scaffold_table.groupby(['Ticker'])
summary_table = gb.last()[['Position', 'Cumulative PnL/Ticker']].rename(columns={'Cumulative PnL/Ticker':'Total PnL'})
summary_table['Yield'] = summary_table['Total PnL'] / summary_table['Position']
summary_table['Average Daily PnL'] = gb['Daily PnL'].mean()
summary_table = summary_table[['Average Daily PnL', 'Total PnL', 'Yield']]
print('\nsummary_table:'); print(summary_table)
exit()
Input:
Position Company Date Ticker Open Volume Daily Return Daily PnL Cumulative PnL/Ticker
0 2000 Amazon 2020-05-26 AMZN 2458.000000 3568200 -0.006164 -12.327054 -12.327054
1 2000 Amazon 2020-05-27 AMZN 2404.989900 5056900 -0.004736 -9.472236 -21.799290
2 2000 Alphabet 2020-05-27 GOOG 1417.250000 1685800 0.000579 1.157283 1.157283
3 2000 Amazon 2020-05-28 AMZN 2384.330078 3190200 -0.003854 -7.708126 -29.507417
4 2000 Alphabet 2020-05-28 GOOG 1396.859985 1692200 -0.000783 -1.565741 -0.408459
Output:
Average Daily PnL Total PnL Yield
Ticker
AMZN -9.835805 -29.507417 -0.014754
GOOG -0.204229 -0.408459 -0.000204
Good day, I am a student taking python classes. We are now learning about Beautiful Soup and I am having trouble extracting data from 2 tables as you will see in the code below:
import pandas as pd
import requests
list_of_urls = ['https://tradingeconomics.com/albania/gdp-growth-annual',
'https://trdingeconomics.com/south-africa/gdp-growth-annual']
final_df = pd.DataFrame()
for i in lists_of_urls:
table = pd.read_html(i, match='Related')
for row in table:
if row.loc['Related'] == 'GDP Annual Growth Rate':
final_df.append(row)
else:
pass
You don't need neither requests nor bs4. pd.read_html does the job.
list_of_urls = ['https://tradingeconomics.com/albania/gdp-growth-annual',
'https://tradingeconomics.com/south-africa/gdp-growth-annual']
data = {}
for i in list_of_urls:
country = i.split('/')[3]
df = pd.read_html(i, match='Related')[0]
data[country] = df.loc[df['Related'] == 'GDP Annual Growth Rate']
df = pd.concat(data)
Output:
>>> df
Related Last Previous Unit Reference
albania 1 GDP Annual Growth Rate 6.99 18.38 percent Sep 2021
south-africa 1 GDP Annual Growth Rate 1.70 2.90 percent Dec 2021
import urllib.request
import re
import csv
import pandas as pd
from bs4 import BeautifulSoup
columns = []
data = []
f = open('companylist.csv')
csv_f = csv.reader(f)
for row in csv_f:
stocklist = row
print(stocklist)
for s in stocklist:
print('http://finance.yahoo.com/q?s='+s)
optionsUrl = urllib.request.urlopen('http://finance.yahoo.com/q?s='+s).read()
soup = BeautifulSoup(optionsUrl, "html.parser")
stocksymbol = ['Symbol:', s]
optionsTable = [stocksymbol]+[
[x.text for x in y.parent.contents]
for y in soup.findAll('td', attrs={'class': 'yfnc_tabledata1','rtq_table': ''})
]
if not columns:
columns = [o[0] for o in optionsTable] #list(my_df.loc[0])
data.append(o[1] for o in optionsTable)
# create DataFrame from data
df = pd.DataFrame(data, columns=columns)
df.to_csv('test.csv', index=False)
The scripts works fine when I have about 200 to 300 stocks, but my company list has around 6000 symbols.
Is there a way I can download chunks of data, say like 200 stocks at a time, pause for while, and then resume the download again?
The export is one stock at a time; how do I write 200 at a time, and append the next batch to the initial batch (for the CSV)?
As #Merlin has recommended you - take a closer look at pandas_datareader module - you can do a LOT using this tool. Here is a small example:
import csv
import pandas_datareader.data as data
from pandas_datareader.yahoo.quotes import _yahoo_codes
stocklist = ['aapl','goog','fb','amzn','COP']
#http://www.jarloo.com/yahoo_finance/
#https://greenido.wordpress.com/2009/12/22/yahoo-finance-hidden-api/
_yahoo_codes.update({'Market Cap': 'j1'})
_yahoo_codes.update({'Div Yield': 'y'})
_yahoo_codes.update({'Bid': 'b'})
_yahoo_codes.update({'Ask': 'a'})
_yahoo_codes.update({'Prev Close': 'p'})
_yahoo_codes.update({'Open': 'o'})
_yahoo_codes.update({'1 yr Target Price': 't8'})
_yahoo_codes.update({'Earnings/Share': 'e'})
_yahoo_codes.update({"Day’s Range": 'm'})
_yahoo_codes.update({'52-week Range': 'w'})
_yahoo_codes.update({'Volume': 'v'})
_yahoo_codes.update({'Avg Daily Volume': 'a2'})
_yahoo_codes.update({'EPS Est Current Year': 'e7'})
_yahoo_codes.update({'EPS Est Next Quarter': 'e9'})
data.get_quote_yahoo(stocklist).to_csv('test.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)
Output: i've intentionally transposed the result set, because there are too many columns to show them here
In [2]: data.get_quote_yahoo(stocklist).transpose()
Out[2]:
aapl goog fb amzn COP
1 yr Target Price 124.93 924.83 142.87 800.92 51.23
52-week Range 89.47 - 132.97 515.18 - 789.87 72.000 - 121.080 422.6400 - 731.5000 31.0500 - 64.1300
Ask 97.61 718.75 114.58 716.73 44.04
Avg Daily Volume 3.81601e+07 1.75567e+06 2.56467e+07 3.94018e+06 8.94779e+06
Bid 97.6 718.57 114.57 716.65 44.03
Day’s Range 97.10 - 99.12 716.51 - 725.44 113.310 - 115.480 711.1600 - 721.9900 43.8000 - 44.9600
Div Yield 2.31 N/A N/A N/A 4.45
EPS Est Current Year 8.28 33.6 3.55 5.39 -2.26
EPS Est Next Quarter 1.66 8.38 0.87 0.96 -0.48
Earnings/Share 8.98 24.58 1.635 2.426 -4.979
Market Cap 534.65B 493.46B 327.71B 338.17B 54.53B
Open 98.6 716.51 115 713.37 43.96
PE 10.87 29.25 70.074 295.437 N/A
Prev Close 98.83 719.41 116.62 717.91 44.51
Volume 3.07086e+07 868366 2.70182e+07 2.42218e+06 5.20412e+06
change_pct -1.23% -0.09% -1.757% -0.1644% -1.0782%
last 97.61 718.75 114.571 716.73 44.0301
short_ratio 1.18 1.41 0.81 1.29 1.88
time 3:15pm 3:15pm 3:15pm 3:15pm 3:15pm
If you need more fields (codes for Yahoo Finance API) you may want to check the following links:
http://www.jarloo.com/yahoo_finance/
https://greenido.wordpress.com/2009/12/22/yahoo-finance-hidden-api/
Use python_datareader for this.
In [1]: import pandas_datareader.data as web
In [2]: import datetime
In [3]: start = datetime.datetime(2010, 1, 1)
In [4]: end = datetime.datetime(2013, 1, 27)
In [5]: f = web.DataReader("F", 'yahoo', start, end)
In [6]: f.ix['2010-01-04']
Out[6]:
Open 10.170000
High 10.280000
Low 10.050000
Close 10.280000
Volume 60855800.000000
Adj Close 9.151094
Name: 2010-01-04 00:00:00, dtype: float64
To pause after every 200 downloads, you could - also when you use pandas_datareader:
import time
for i, s in enumerate(stocklist):
if i % 200 == 0:
time.sleep(5) # in seconds
To save all data into a single file (IIUC):
stocks = pd.DataFrame() # to collect all results
In every iteration:
stocks = pd.concat([stocks, pd.DataFrame(data, columns=columns))
Finally:
stocks.to_csv(path, index=False)
I am trying to scrape the weather forecast from "https://weather.gc.ca/city/pages/ab-52_metric_e.html". With the code below I am able to get the table containing the data but I'm stuck. During the day the second row contains Today's forecast and the third row contains tonight's forecast. At the end of the day the second row becomes Tonight's forecast and Today's forecast is dropped. What I want to do is parse through the table to get the forecast for Today, Tonight, and each continuing day even if Today's forecast is missing; something like this:
Today: A mix of sun and cloud. 60 percent chance of showers this afternoon with risk of a thunderstorm. Widespread smoke. High 26. UV index 6 or high.
Tonight: Partly cloudy. Becoming clear this evening. Increasing cloudiness before morning. Widespread smoke. Low 13.
Friday: Mainly cloudy. Widespread smoke. Wind becoming southwest 30 km/h gusting to 50 in the afternoon. High 24.
#using Beautiful Soup 3, Python 2.6
from BeautifulSoup import BeautifulSoup
import urllib
pageFile = urllib.urlopen("https://weather.gc.ca/city/pages/ab- 52_metric_e.html")
pageHtml = pageFile.read()
pageFile.close()
soup = BeautifulSoup("".join(pageHtml))
data = soup.find("div", {"id": "mainContent"})
forecast = data.find('table',{'class':"table mrgn-bttm-md mrgn-tp-md textforecast hidden-xs"})
You could do something like iterate over each line in the table and get the value of the rows. An example would be:
forecast = data.find('table',{'class':"table mrgn-bttm-md mrgn-tp-md textforecast hidden-xs"}).find_all("tr")
for tr in forecast[1:]:
print " ".join(tr.text.split())
With this approach you get the contents of each lines (exclusive the first one which is some header.
I am trying to scrape http://www.dailyfinance.com/quote/NYSE/international-business-machines/IBM/financial-ratios, but the traditional url string building technique doesn't work because the "full-company-name-is-inserted-in-the-path" string. And the exact "full-company-name" isn't known in advance. Only the company symbol, "IBM" is known.
Essentially, the way I scrape is by looping through an array of company symbol and build the url string before sending it to urllib2.urlopen(url). But in this case, that can't be done.
For example, CSCO string is
http://www.dailyfinance.com/quote/NASDAQ/cisco-systems-inc/CSCO/financial-ratios
and another example url string is AAPL:
http://www.dailyfinance.com/quote/NASDAQ/apple/AAPL/financial-ratios
So in order to get the url, I had to search the symbol in the input box on the main page:
http://www.dailyfinance.com/
I've noticed that when I type "CSCO" and inspect the search input at (http://www.dailyfinance.com/quote/NASDAQ/apple/AAPL/financial-ratios in Firefox web developer network tab, I noticed that the get request is sending to
http://j.foolcdn.com/tmf/predictivesearch?callback=_predictiveSearch_csco&term=csco&domain=dailyfinance.com
and that the referer actually gives the path that I want to capture
Host: j.foolcdn.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.dailyfinance.com/quote/NASDAQ/cisco-systems-inc/CSCO/financial-ratios?source=itxwebtxt0000007
Connection: keep-alive
Sorry for the long explanation. So the question is how do I extract the url in the Referer? If that is not possible, how should I approach this problem? Is there another way?
I really appreciate your help.
I like this question. And because of that, I'll give a very thorough answer. For this, I'll use my favorite Requests library along with BeautifulSoup4. Porting over to Mechanize if you really want to use that is up to you. Requests will save you tons of headaches though.
First off, you're probably looking for a POST request. However, POST requests are often not needed if a search function brings you right away to the page you're looking for. So let's inspect it, shall we?
When I land on the base URL, http://www.dailyfinance.com/, I can do a simple check via Firebug or Chrome's inspect tool that when I put in CSCO or AAPL on the search bar and enable the "jump", there's a 301 Moved Permanently status code. What does this mean?
In simple terms, I was transferred somewhere. The URL for this GET request is the following:
http://www.dailyfinance.com/quote/jump?exchange-input=&ticker-input=CSCO
Now, we test if it works with AAPL by using a simple URL manipulation.
import requests as rq
apl_tick = "AAPL"
url = "http://www.dailyfinance.com/quote/jump?exchange-input=&ticker-input="
r = rq.get(url + apl_tick)
print r.url
The above gives the following result:
http://www.dailyfinance.com/quote/nasdaq/apple/aapl
[Finished in 2.3s]
See how the URL of the response changed? Let's take the URL manipulation one step further by looking for the /financial-ratios page by appending the below to the above code:
new_url = r.url + "/financial-ratios"
p = rq.get(new_url)
print p.url
When ran, this gives is the following result:
http://www.dailyfinance.com/quote/nasdaq/apple/aapl
http://www.dailyfinance.com/quote/nasdaq/apple/aapl/financial-ratios
[Finished in 6.0s]
Now we're on the right track. I will now try to parse the data using BeautifulSoup. My complete code is as follows:
from bs4 import BeautifulSoup as bsoup
import requests as rq
apl_tick = "AAPL"
url = "http://www.dailyfinance.com/quote/jump?exchange-input=&ticker-input="
r = rq.get(url + apl_tick)
new_url = r.url + "/financial-ratios"
p = rq.get(new_url)
soup = bsoup(p.content)
div = soup.find("div", id="clear").table
rows = table.find_all("tr")
for row in rows:
print row
I then try running this code, only to encounter an error with the following traceback:
File "C:\Users\nanashi\Desktop\test.py", line 13, in <module>
div = soup.find("div", id="clear").table
AttributeError: 'NoneType' object has no attribute 'table'
Of note is the line 'NoneType' object.... This means our target div does not exist! Egads, but why am I seeing the following?!
There can only be one explanation: the table is loaded dynamically! Rats. Let's see if we can find another source for the table. I study the page and see that there are scrollbars at the bottom. This might mean that the table was loaded inside a frame or was loaded straight from another source entirely and placed into a div in the page.
I refresh the page and watch the GET requests again. Bingo, I found something that seems a bit promising:
A third-party source URL, and look, it's easily manipulable using the ticker symbol! Let's try loading it into a new tab. Here's what we get:
WOW! We now have the very exact source of our data. The last hurdle though is will it work when we try to pull the CSCO data using this string (remember we went CSCO -> AAPL and now back to CSCO again, so you're not confused). Let's clean up the string and ditch the role of www.dailyfinance.com here completely. Our new url is as follows:
http://www.motleyfool.idmanagedsolutions.com/stocks/financial_ratios.idms?SYMBOL_US=AAPL
Let's try using that in our final scraper!
from bs4 import BeautifulSoup as bsoup
import requests as rq
csco_tick = "CSCO"
url = "http://www.motleyfool.idmanagedsolutions.com/stocks/financial_ratios.idms?SYMBOL_US="
new_url = url + csco_tick
r = rq.get(new_url)
soup = bsoup(r.content)
table = soup.find("div", id="clear").table
rows = table.find_all("tr")
for row in rows:
print row.get_text()
And our raw results for CSCO's financial ratios data is as follows:
Company
Industry
Valuation Ratios
P/E Ratio (TTM)
15.40
14.80
P/E High - Last 5 Yrs
24.00
28.90
P/E Low - Last 5 Yrs
8.40
12.10
Beta
1.37
1.50
Price to Sales (TTM)
2.51
2.59
Price to Book (MRQ)
2.14
2.17
Price to Tangible Book (MRQ)
4.25
3.83
Price to Cash Flow (TTM)
11.40
11.60
Price to Free Cash Flow (TTM)
28.20
60.20
Dividends
Dividend Yield (%)
3.30
2.50
Dividend Yield - 5 Yr Avg (%)
N.A.
1.20
Dividend 5 Yr Growth Rate (%)
N.A.
144.07
Payout Ratio (TTM)
45.00
32.00
Sales (MRQ) vs Qtr 1 Yr Ago (%)
-7.80
-3.70
Sales (TTM) vs TTM 1 Yr Ago (%)
5.50
5.60
Growth Rates (%)
Sales - 5 Yr Growth Rate (%)
5.51
5.12
EPS (MRQ) vs Qtr 1 Yr Ago (%)
-54.50
-51.90
EPS (TTM) vs TTM 1 Yr Ago (%)
-54.50
-51.90
EPS - 5 Yr Growth Rate (%)
8.91
9.04
Capital Spending - 5 Yr Growth Rate (%)
20.30
20.94
Financial Strength
Quick Ratio (MRQ)
2.40
2.70
Current Ratio (MRQ)
2.60
2.90
LT Debt to Equity (MRQ)
0.22
0.20
Total Debt to Equity (MRQ)
0.31
0.25
Interest Coverage (TTM)
18.90
19.10
Profitability Ratios (%)
Gross Margin (TTM)
63.20
62.50
Gross Margin - 5 Yr Avg
66.30
64.00
EBITD Margin (TTM)
26.20
25.00
EBITD - 5 Yr Avg
28.82
0.00
Pre-Tax Margin (TTM)
21.10
20.00
Pre-Tax Margin - 5 Yr Avg
21.60
18.80
Management Effectiveness (%)
Net Profit Margin (TTM)
17.10
17.65
Net Profit Margin - 5 Yr Avg
17.90
15.40
Return on Assets (TTM)
8.30
8.90
Return on Assets - 5 Yr Avg
8.90
8.00
Return on Investment (TTM)
11.90
12.30
Return on Investment - 5 Yr Avg
12.50
10.90
Efficiency
Revenue/Employee (TTM)
637,890.00
556,027.00
Net Income/Employee (TTM)
108,902.00
98,118.00
Receivable Turnover (TTM)
5.70
5.80
Inventory Turnover (TTM)
11.30
9.70
Asset Turnover (TTM)
0.50
0.50
[Finished in 2.0s]
Cleaning up the data is up to you.
One good lesson to learn from this scrape is not all data are contained in one page alone. It's pretty nice to see it coming from another static site. If it was produced via JavaScript or AJAX calls or the like, we would likely have some difficulties with our approach.
Hopefully you learned something from this. Let us know if this helps and good luck.
Doesn't answer your specific question, but solves your problem.
http://www.dailyfinance.com/quotes/{Company Symbol}/{Stock Exchange}
Examples:
http://www.dailyfinance.com/quotes/AAPL/NAS
http://www.dailyfinance.com/quotes/IBM/NYSE
http://www.dailyfinance.com/quotes/CSCO/NAS
To get to the financial ratios page you could then employ something like this:
import urllib2
def financial_ratio_url(symbol, stock_exchange):
starturl = 'http://www.dailyfinance.com/quotes/'
starturl += '/'.join([symbol, stock_exchange])
req = urllib2.Request(starturl)
res = urllib2.urlopen(starturl)
return '/'.join([res.geturl(),'financial-ratios'])
Example:
financial_ratio_url('AAPL', 'NAS')
'http://www.dailyfinance.com/quote/nasdaq/apple/aapl/financial-ratios'