how to use specific entry from a URL using requests - python

I have seen a lot of posts on how to use requests.get(link).json(). I followed along and I am able to import the link but when I try to focus on a specific item with entry such as: optionchain['ask'], it gives me an error message.
I use data from this yahoo finance link: https://query2.finance.yahoo.com/v7/finance/options/amd
and would like to import as specific variables the different strike prices, ask and bid. Could anyone please help me with that? Thank you in advance

The JSON at the link you posted has multiple levels. To get the ask price you have to call
data['optionChain']['result'][0]['quote']['ask'] where data is the result from requests.get(link).json()
import requests
data = requests.get(r"https://query2.finance.yahoo.com/v7/finance/options/amd").json()
ask = data['optionChain']['result'][0]['quote']['ask']

Related

Python - How to use scrape table from website with dropdown of available rows

I am trying to scrape the earnings calendar data from the table from zacks.com and the url is attached below.
https://www.zacks.com/stock/research/aapl/earnings-calendar
The thing is I am trying to scrape all data from the table, but it has a dropdown list to select 10, 25, 50 and 100 rows on a page. Ideally I want to scrape for all 100 rows but when I select 100 from the dropdown list, the url doesn't change. My code is below.
To note that the website blocks user-agent so I had to use chrome driver to impersonate human visiting the web. The obtained result from the pd.read_html is a list of all the tables and the d[4] returns the earnings calendar with only 10 rows (which I want to change to 100)
driver = webdriver.Chrome('../files/chromedriver96')
symbol = 'AAPL'
url = 'https://www.zacks.com/stock/research/{}/earnings-calendar'.format(symbol)
driver.get(url)
content = driver.page_source
d = pd.read_html(content)
d[4]
So calling help for anyone to guide me on this
Thanks!
UPDATE: it looks like my last post was downgraded due to lack of clear articulation and evidence of showing the past research. Maybe I am still a newbie to posting questions on this site. Actually, I have found several pages including this page with the same issue but the solutions didn't seem to work for me, which is why I came to post this as a new question
UPDATE 12/05:
Thanks a lot for the advise. As commented below, I finally got it working. Below is the code I used
dropdown = driver.find_element_by_css_selector('#earnings_announcements_earnings_table_length')
time.sleep(1)
hundreds = dropdown.find_element_by_xpath(".//option[. = '100']")
hundreds.click()
Having taken a look this is not going to be something that is easy to scrape. Given that the table is produced from the javascript I would say you have two options.
Option one:
Use selenium to render the page allowing the javascript to run. This way you can simply use the id/class of the drop down to interact with it.
You can then scrape the data by looking at the values in the table.
Option two:
This is the more challenging one. Look through the data that the page gets in response and try to find requests which result in the data you then see on the page. By cross-referencing these there will be a way to directly request the data you want.
You may find that to get at the data you want you need to accept a key from the original request to the page and then send that key as part of a second request. This way should allow you to scrape the data without having to run a selenium instance which will run more efficiently.
My personal suggestion is to go with option one as computer resources are cheap and developer time expensive.

How to read Values from a web page using python?

I am very very new to python but I have started an internship that requires me to do some python work, I am asked to create an application that can retrieve data from a webpage(IP address) and then compare those values to the correct values and then print out if it has passed or not. Please check this diagram I have made so that you guys can understand it better. Please take a look.
So far I have only written some python code to check if the website/ip address is up or not but I have no idea how to go further. could you guys please help me to execute the further steps with some examples maybe?
Here is a picture of the website. the values circled in red color need to be compared with the Auxiliary Values I hope this picture helps.
However, I could use http://192.168.100.2/globals.xml on this page to compare the values. Any help is much appreciated.
import requests
import urllib.request
import eventlet
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
eventlet.monkey_patch()
with eventlet.Timeout(10):
print(urllib.request.urlopen("http://192.168.100.5").getcode())
print("Website is UP")
eventlet.monkey_patch()
with eventlet.Timeout(10):
print(urllib.request.urlopen("http://10.10.10.2").getcode())
print("Website is UP")
You are off to a great start! Your next steps should be identifying unique traits about the elements that you want to scrape. Specifically, look for things like class or id names that are unique to only the data that you want to scrape.
You can also use tools like Selector Gadget (https://selectorgadget.com/) that can help automate the process. Unfortunately, since you are accessing local IP addresses, nobody here will be able to help you find these.
After you find the proper selectors, you can use BeautifulSoup to view the data. I'd recommend looking at the find and findall commands that BeautifulSoup has.

How do you get the username of a roblox account using Python Requests?

hope you are all doing well. This question is a bit more random than others I have asked. I am making a bot that extracts every username of the first 600,000,000 accounts on the platform Roblox, and loads it into a list.
This is my problem. I am using requests to get to the account page, but I can't find out how to extract the username from that page. I have tried using headers and inspect element but they don't work. If anyone has some suggestions on how to complete this, please help. Also, I am extraordinarily bad at network programming, so I may have made a noob mistake somewhere. Code is attached below.
import requests
users = []
for i in range(1, 600000001):
r = requests.get("https://web.roblox.com/users/{i}/profile".format(i=i))
print(r.status_code)
if r.status_code == 404:
users.append('Deleted')
continue
print(r.headers.get('username'))
You have to know that before working on the scraping, you have some errors in the code:
First of all in the 4th line if you want to use the .format command to insert values in a string you only have to insert the {}; so you should write:
r = requests.get("https://web.roblox.com/users/{}/profile".format(i))
And later you should remove continue from your code
But before doing anything you have to try the link be sure it's working, so copy the link, past it on your browser and remove i and add a number.
If it works you can go on with the code, if not you have to find another link to access to the page you want.
Eventually, to take the elements from the html page you have to use r.content.
But before continuing with coding you have to print(r.content).
You will see a long dict full of elements but you don't have to be afraid of it:
you have to search the value that interest you, and see how it's called, and you will be able to call that value writing
`<name_of_variable> = r.content['<name_of_the_value>']`

BeautifulSoup and Amazon.co.uk

I am trying to parse amazon to compile a list of prices, as part of a bigger project relating to statistics. However, I am stumped. I was wondering If anyone can review my code and tell me where I went wrong?
#!/usr/bin/python
# -*- coding: utf-8 -*-
import mechanize
from bs4 import BeautifulSoup
URL_00 = "http://www.amazon.co.uk/Call-Duty-Black-Ops-PS3/dp/B007WPF7FE/ref=sr_1_2?ie=UTF8&qid=1352117194&sr=8-2"
bro = mechanize.Browser()
resp = bro.open(URL_00)
html = resp.get_data()
soup_00 = BeautifulSoup(html)
price = soup_00.find('b', {'class':'priceLarge'})
print price #this should return at the very least the text enclosed in a tag
According to the screenshot, what I wrote above should work, shouldn't it?
Well all I get in the print out is "[]", if I change the line before last to this:
price = soup_00.find('b', {'class':'priceLarge'}).contents[0].string
or
price = soup_00.find('b', {'class':'priceLarge'}).text
I get a "noneType" error.
I am quite confused as to why this is happening. The page encoding in the URL on chrome says UTF8, to which my script is adjusted in line #2.
I have changed it to ISO (as per inner HTML of the page) but this makes zero difference, so I am positive encoding is not the issue here.
Also, don't know if this is relevant at all, but my system locale on linux being UTF-8 should not cause a problem should it?
There's no need to do this as Amazon provide an API
https://affiliate-program.amazon.co.uk/gp/advertising/api/detail/main.html
The Product Advertising API helps you advertise Amazon products using product search and look up capability, product information and features such as Customer Reviews, Similar Products, Wish Lists and New and Used listings.
More detail here: Amazon API library for Python?
I'm using the API and it so much easier and reliable then scraping the data from the webpage, even with BS. You will also get access to a list of prices for new, second hand etc and not just the "headline" price.

A simple spider question

I am a newbie trying to achive this simple task by using Scrapy with no luck so far. I am asking your advice about how to do this with Scrapy or with any other tool (with Python). Thank you.
I want to
start from a page that lists bios of attorneys whose last name start with A: initial_url = www.example.com/Attorneys/List.aspx?LastName=A
From LastName=A to extract links to actual bios: /BioLinks/
visit each of the /BioLinks/ to extract the school info for each attorney.
I am able to extract the /BioLinks/ and School information but I am unable to go from the initial url to the bio pages.
If you think this is the wrong way to go about this, then, how would you achieve this goal?
Many thanks.
Not sure I fully understand what you're asking, but maybe you need to get the absolute URL to each bio and retrieve the source code for that page:
import urllib2
bio_page = urllib.urlopen(bio_url).read()
Then use a regular expressions or other parsing to get the attorney's law school.

Categories