I am building an MS Excel document to get stock and option prices from Yahoo finance, ariva etc. I am using xlwings and BeautifulSoup to get the data.
Everything works fine, I get stock prices from Yahoo and I also get stock/German option prices from ariva directly to Excel. Unfortunately, the option prices (not stock prices) are more difficult to get.
I am using this code (e.g. ticker is 'NVDA', date is '44211' (for 15/1/2021) and option_name is 'NVDA210115C00210000'):
import xlwings as xw
import bs4 as bs
import requests
#xw.func
def get_stock(ticker,date,option_name):
url_base = 'http://finance.yahoo.com/quote/'
new_date = str(86400*(date-25569))
src_base = requests.get(url_base+ticker+'/options?date='+new_date).text
soup = bs.BeautifulSoup(src_base,'lxml')
This results in loading https://finance.yahoo.com/quote/NVDA/options?date=1610668800 (that works fine).
How do I get the option price for this option: NVDA210115C00210000? I tried:
price = soup.find_all('div',attrs={'id':'Col1-1-OptionContracts-Proxy'})[0].find(attrs={'class':'data-col2'}).get_text()
return[price]
But it only returns the price of the first option on this page.
See picture: Yahoo Finance Code and option price I want the 324,37.
Somehow, I have to find the place of 'NVDA210115C00210000' and THEN get the text of data-col2. I just startet using Python two days ago and I am not a progammer, but I think it shouldn't be that difficult.
How can I use the 'find' to find that place and THEN get the price?
You have many errors,
soup = bs.BeautifulSoup(src_base,'lxml')
should be:
soup = bs.BeautifulSoup(src_base.content,'lxml')
It's .content you are missing.
What I done was instead find the the table row: table_data = soup.find('tr',{'data-reactid':'200'}) then found the data in that row option_price = table_data.find('td', {'data-reactid':'206'}).text
def get_stock(ticker,date,option_name):
url_base = 'http://finance.yahoo.com/quote/'
new_date = str(86400*(date-25569))
src_base = requests.get(url_base+ticker+'/options?date='+new_date)
soup = bs.BeautifulSoup(src_base.content,'lxml')
table_data = soup.find('tr',{'data-reactid':'200'})
option_price = table_data.find('td', {'data-reactid':'206'}).text
print(option_price)
get_stock('NVDA',44211,'NVDA210115C00210000')
>>> 407.35
HTML Code
<tr class="data-row7 Bgc($hoverBgColor):h BdT Bdc($seperatorColor) H(33px) in-the-money Bgc($hoverBgColor)" data-reactid="200">
<td class="data-col3 Ta(end) Pstart(7px)" data-reactid="206">407.35</td>
Related
A complete beginner here...I am trying to scrape the constituents table from this Wikipedia page, however the table scraped was the annual returns (1st table) instead of the constituents table (2nd table) that I need. Could someone help to see if there is any way that i can target the specific table that i want using BeautifulSoup4?
import bs4 as bs
import pickle
import requests
def save_klci_tickers():
resp = requests.get ('https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI')
soup = bs.BeautifulSoup(resp.text)
table = soup.find ('table', {'class': 'wikitable sortable'})
tickers = []
for row in table.findAll ('tr') [1:]:
ticker = row.findAll ('td') [0].text
tickers.append(ticker)
with open ("klcitickers.pickle", "wb") as f:
pickle.dump (tickers, f)
print (tickers)
return tickers
save_klci_tickers()
Try pandas library to get the tabular data from that page in a csv file with the blink of an eye:
import pandas as pd
url = 'https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI'
df = pd.read_html(url, attrs={"class": "wikitable"})[1] #change the index to get the table you need from that page
new = pd.DataFrame(df, columns=["Constituent Name", "Stock Code", "Sector"])
new.to_csv("wiki_data.csv", index=False)
print(df)
If it is still BeautifulSoup you wanna stick with, the following should serve the purpose:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI")
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select("table.wikitable")[1].select("tr"):
data = [item.get_text(strip=True) for item in items.select("th,td")]
print(data)
If you wanna use .find_all() instead of .select(), try the following:
for items in soup.find_all("table",class_="wikitable")[1].find_all("tr"):
data = [item.get_text(strip=True) for item in items.find_all(["th","td"])]
print(data)
I am new to web scraping and I'm trying to scrape the "statistics" page of yahoo finance for AAPL. Here's the link: https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL
Here is the code I have so far...
from bs4 import BeautifulSoup
from requests import get
url = 'https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL'
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')
stock_data = soup.find_all("table")
for stock in stock_data:
print(stock.text)
When I run that, I return all of the table data on the page. However, I only want specific data from each table (e.g. "Market Cap", "Revenue", "Beta").
I tried messing around with the code by doing print(stock[1].text) to see if I could limit the amount of data returned to just the second value in each table but that returned an error message. Am I on the right track by using BeautifulSoup or do I need to use a completely different library? What would I have to do in order to only return particular data and not all of the table data on the page?
Examining the HTML-code gives you the best idea of how BeautifulSoup will handle what it sees.
The web page seems to contain several tables, which in turn contain the information you are after. The tables follow a certain logic.
First scrape all the tables on the web page, then find all the table rows (<tr>) and the table data (<td>) that those rows contain.
Below is one way of achieving this. I even threw in a function to print only a specific measurement.
from bs4 import BeautifulSoup
from requests import get
url = 'https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL'
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')
stock_data = soup.find_all("table")
# stock_data will contain multiple tables, next we examine each table one by one
for table in stock_data:
# Scrape all table rows into variable trs
trs = table.find_all('tr')
for tr in trs:
# Scrape all table data tags into variable tds
tds = tr.find_all('td')
# Index 0 of tds will contain the measurement
print("Measure: {}".format(tds[0].get_text()))
# Index 1 of tds will contain the value
print("Value: {}".format(tds[1].get_text()))
print("")
def get_measurement(table_array, measurement):
for table in table_array:
trs = table.find_all('tr')
for tr in trs:
tds = tr.find_all('td')
if measurement.lower() in tds[0].get_text().lower():
return(tds[1].get_text())
# print only one measurement, e.g. operating cash flow
print(get_measurement(stock_data, "operating cash flow"))
Although this isn't Yahoo Finance, you can do something very similar like this...
import requests
from bs4 import BeautifulSoup
base_url = 'https://finviz.com/screener.ashx?v=152&o=price&t=MSFT,AAPL,SBUX,S,GOOG&o=price&c=0,1,2,3,4,5,6,7,8,9,25,63,64,65,66,67'
html = requests.get(base_url)
soup = BeautifulSoup(html.content, "html.parser")
main_div = soup.find('div', attrs = {'id':'screener-content'})
light_rows = main_div.find_all('tr', class_="table-light-row-cp")
dark_rows = main_div.find_all('tr', class_="table-dark-row-cp")
data = []
for rows_set in (light_rows, dark_rows):
for row in rows_set:
row_data = []
for cell in row.find_all('td'):
val = cell.a.get_text()
row_data.append(val)
data.append(row_data)
# sort rows to maintain original order
data.sort(key=lambda x: int(x[0]))
import pandas
pandas.DataFrame(data).to_csv("C:\\your_path\\AAA.csv", header=False)
This is a nice substitute in case Yahoo decided to depreciate more of the functionality of their API. I know they cut out a lot of things (mostly historical quotes) a couple years ago. It was sad to see that go away.
Tried using data-reactid markers to search Yahoo Finance for a number, but I get a SyntaxError: keyword can't be an expression. My code:
Walmart stock
source = requests.get('https://finance.yahoo.com/quote/WMT?p=WMT&.tsrc=fin-srch').text
soup = BeautifulSoup(source, 'lxml')
price = soup.find('span', data-reactid_='35')
print("Walmart stock: " + price.text)
You just do in wrong way a little. In my view, it is more flexible to use dict than something like class_=
from bs4 import BeautifulSoup
import requests
source = requests.get('https://finance.yahoo.com/quote/WMT?p=WMT&.tsrc=fin-srch').text
soup = BeautifulSoup(source, 'lxml')
price = soup.find_all('span', {"data-reactid":True})
print(price)
Try it this way.
import quandl
quandl.ApiConfig.api_key = 'e6Rbk-YUCGHVbt5kDAh_'
# get the table for daily stock prices and,
# filter the table for selected tickers, columns within a time range
# set paginate to True because Quandl limits tables API to 10,000 rows per call
data = quandl.get_table('WIKI/PRICES', ticker = ['WMT'],
qopts = { 'columns': ['ticker', 'date', 'adj_close'] },
date = { 'gte': '2015-12-31', 'lte': '2016-12-31' },
paginate=True)
print(data)
This is probably worth a look too.
https://www.quandl.com/api/v3/datasets/EOD/WMT.csv?api_key=your_api_key-oges_here
I am trying to download the data on this website
https://coinmunity.co/
...in order to manipulate later it in Python or Pandas
I have tried to do it directly to Pandas via Requests, but did not work, using this code:
res = requests.get("https://coinmunity.co/")
soup = BeautifulSoup(res.content, 'lxml')
table = soup.find_all('table')[0]
dfm = pd.read_html(str(table), header = 0)
dfm = dfm[0].dropna(axis=0, thresh=4)
dfm.head()
In most of the things I tried, I could only get to the info in the headers, which seems to be the only table seen in this page by the code.
Seeing that this did not work, I tried to do the same scraping with Requests and BeautifulSoup, but it did not work either. This is my code:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://coinmunity.co/")
soup = BeautifulSoup(res.content, 'lxml')
#table = soup.find_all('table')[0]
#table = soup.find_all('div', {'class':'inner-container'})
#table = soup.find_all('tbody', {'class':'_ngcontent-c0'})
#table = soup.find_all('table')[0].findAll('tr')
#table = soup.find_all('table')[0].find('tbody')#.find_all('tbody _ngcontent-c3=""')
table = soup.find_all('p', {'class':'stats change positiveSubscribers'})
You can see in the lines commented, all the things I have tried, but nothing worked.
Is there any way to easily download that table to use it on Pandas/Python, in the tidiest, easier and quickest possible way?
Thank you
Since the content is loaded dynamically after the initial request is made, you won't be able to scrape this data with request. Here's what I would do instead:
from selenium import webdriver
import pandas as pd
import time
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.get("https://coinmunity.co/")
html = driver.page_source.encode('utf-8')
soup = BeautifulSoup(html, 'lxml')
results = []
for row in soup.find_all('tr')[2:]:
data = row.find_all('td')
name = data[1].find('a').text
value = data[2].find('p').text
# get the rest of the data you need about each coin here, then add it to the dictionary that you append to results
results.append({'name':name, 'value':value})
df = pd.DataFrame(results)
df.head()
name value
0 NULS 14,005
1 VEN 84,486
2 EDO 20,052
3 CLUB 1,996
4 HSR 8,433
You will need to make sure that geckodriver is installed and that it is in your PATH. I just scraped the name of each coin and the value but getting the rest of the information should be easy.
I am trying to scrape certain financial data from Yahoo Finance. Specifically in this case, a single revenue number (type: double)
Here is my code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
searchurl = "http://finance.yahoo.com/q/ks?s=AAPL"
f = urlopen(searchurl)
html = f.read()
soup = BeautifulSoup(html, "html.parser")
revenue = soup.find("div", {"class": "yfnc_tabledata1", "id":"yui_3_9_1_8_1456172462911_38"})
print (revenue)
The view source inspection from Chrome looks like this:
I am trying to scrape the "234.99B" number, strip the "B", and convert it to a decimal. There is something wrong with my 'soup.find' line, where am I going wrong?
Locate the td element with Revenue (ttm): text and get the next td sibling:
revenue = soup.find("td", text="Revenue (ttm):").find_next_sibling("td").text
print(revenue)
Prints 234.99B.