How do I print specific values from a json request? - python

I am trying to request data from Yahoo Finance and then print specific pieces of the data.
My code so far is:
import requests
ticker = input("Enter Stock Ticker: ")
url = "https://query1.finance.yahoo.com/v8/finance/chart/{}?region=GB&lang=en-GB&includePrePost=false&interval=2m&range=1d&corsDomain=uk.finance.yahoo.com&.tsrc=finance".format(ticker)
r = requests.get(url)
data = r.json()
What I am unsure of is how to extract certain pieces of the 'data' variable. For example, I want to display the value that is paired with 'regularMarketPrice'. This can be found in the request.
How can I do this?
Apologies if this isn't worded correctly.
Thanks

If you print data, you will see that it is a dictionary.
If you dig deep enough into the dictionary, you will see that regularMarketPrice can be retrieved as follows (for the first result):
print(data['chart']['result'][0]['meta']['regularMarketPrice'])
If there are multiple results, then you can use the following:
for result in data['chart']['result']:
print(result['meta']['regularMarketPrice'])

Related

How can I iterate through a python list with selenium performing a search for each item, scrape the data, and then search the next item

I have two lists of baseball players that I would like to scrape data for from the website fangraphs. I am trying to figure out how to have selenium search the first player in the list which would redirect to that players profile, scrape the data I am interested in, and then search the next player until each for loop is completed for the two lists. I have written other scrapers with selenium, but I haven't come across this situation where I need to perform a search, collect the data, then perform the next search, etc ...
Here is a smaller version of one of the lists:
batters = ['Freddie Freeman','Bryce Harper','Jesse Winker']
driver.get('https://www.fangraphs.com/')
search_box = driver.find_element_by_xpath('/html/body/form/div[3]/div[1]/div[2]/header/div[3]/nav/div[1]/div[2]/div/div/input')
search_box.click()
for batter in batters:
search_box.send_keys(batter)
search_box.send_keys(Keys.RETURN)
This will search all the names at once obviously, so I guess I'm trying to figure out how to code the logic of searching one by one but not performing the next search until I have collected the data for the previous search - any help is appreciated cheers
With selenium, you would just have to iterate through the names, "type" it into the search bar, click/go to the link, scrape the stats, then repeat. You have set up to do that, you just need to add the scrape part. So something like:
batters = ['Freddie Freeman','Bryce Harper','Jesse Winker']
driver.get('https://www.fangraphs.com/')
search_box = driver.find_element_by_xpath('/html/body/form/div[3]/div[1]/div[2]/header/div[3]/nav/div[1]/div[2]/div/div/input')
search_box.click()
for batter in batters:
search_box.send_keys(batter)
search_box.send_keys(Keys.RETURN)
## CODE THAT SCRAPES THE DATA ##
## CODE THAT STORES IT SOMEWAY TO APPEND AFTER EACH ITERATION ##
However, they have an api which is a far better solution than Selenium. Why?
APIs are consistent. Parsing HTML with selnium and/or beautifulsoup is reliant on the html structure. If they ever change the layout of the website, it may crash as certain tags that used to be there may not be there anymore, or they may add certain tags and attributes to the html. But the underlying data that is rendered in the html will come from the api in a nice json format and that will rarely change unless they do a complete overhaul of the data structure
It's far more efficient and quicker. No need to have Selenium open a browser, search and load/render the content, that scrape, then repeat. You get the response in 1 request
You'll get far more data than you intended and (imo) is a good thing. I'd rather have more data and "trim" off what I don't need. Lots of the time you'll see very interesting and useful data that you otherwise wouldn't had known was there.
So I'm not sure what you are after specifically, but this will get you going. You'll have to sift through the statsData to figure out what you want, but if you tell me what you are after, I can help get that into a nice table for you. Or if you want to figure it out yourself, look up pandas and the .json_normalize() function with that. Parsing nested json can be tricky (but it's also fun ;-) )
Code:
import requests
# Get teamIds
def get_teamIds():
team_id_dict = {}
url = 'https://cdn.fangraphs.com/api/menu/menu-standings'
jsonData = requests.get(url).json()
for team in jsonData:
team_id_dict[team['shortName']] = str(team['teamid'])
return team_id_dict
# Get Player IDs
def get_playerIds(team_id_dict):
player_id_dict = {}
for team, teamId in team_id_dict.items():
url = 'https://cdn.fangraphs.com/api/depth-charts/roster?teamid={teamId}'.format(teamId=teamId)
jsonData = requests.get(url).json()
print(team)
for player in jsonData:
if 'oPlayerId' in player.keys():
player_id_dict[player['player']] = [str(player['oPlayerId']), player['position']]
else:
player_id_dict[player['player']] = ['N/A', player['position']]
return player_id_dict
team_id_dict = get_teamIds()
player_id_dict = get_playerIds(team_id_dict)
batters = ['Freddie Freeman','Bryce Harper','Jesse Winker']
for player in batters:
playerId = player_id_dict[player][0]
pos = player_id_dict[player][1]
url = 'https://cdn.fangraphs.com/api/players/stats?playerid={playerId}&position={pos}'.format(playerId=playerId, pos=pos)
statsData = requests.get(url).json()
Ouput: Here's just a look at what you get

No JSON object could be decoded (Requests + Pandas)

learning to work with the request library and pandas but have been struggling to get past the starting point even with a good amount of examples online.
I am trying to extract NBA shot data from the URL below using a GET request, and then turn it into a DataFrame:
def extractData():
Harden_data_url = "https://stats.nba.com/events/?flag=3&CFID=33&CFPARAMS=2017-18&PlayerID=201935&ContextMeasure=FGA&Season=2017-18&section=player&sct=hex"
response = requests.get(Harden_data_url)
data = response.json()
shots = data['resultSets'][0]['rowSet']
headers = data['resultSets'][0]['headers']
df = pandas.DataFrame.from_records(shots, columns = headers)
However I get this error starting on line 2 "response = requests.get(url)"
ValueError: No JSON object could be decoded
I imagine I am missing something basic, any debugging help is appreciated!
The problem is that you are using the wrong URL for fetching the data.
The URL you used was for the HTML, which is in charge of the layout of the site. The data comes from a different URL, which fetches it in JSON format.
The correct URL for the data you are looking for is this:
https://stats.nba.com/stats/shotchartdetail?CFID=33&CFPARAMS=2017-18&ContextMeasure=FGA&DateFrom=&DateTo=&EndPeriod=10&EndRange=28800&GameID=&GameSegment=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID=201935&PlayerPosition=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&StartPeriod=1&StartRange=0&TeamID=0&VsConference=&VsDivision=
If you run it on the browser, you can see only the raw JSON data, which is exactly what you will get in your code, and make it work properly.
This blog post explains the method to find the data URL, and although the API has changed a little since the post was written, the method still works:
http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/

Access the next page of list results in Reddit API

I'm trying to play around with the API of Reddit, and I understand most of it, but I can't seem to figure out how to access the next page of results (since each page is 25 entries).
Here is the code I'm using:
import requests
import json
r = requests.get(r'https://www.reddit.com/r/Petscop/top.json?sort=top&show=all&t=all')
listing = r.json()
after = listing['data']['after']
data = listing['data']['children']
for entry in data:
post = entry['data']
print post['score']
query = 'https://www.reddit.com/r/Petscop/top.json?after='+after
r = requests.get(query)
listing = r.json()
data = listing['data']['children']
for entry in data:
post = entry['data']
print post['score']
So I extract the after ID as after, and pass it into the next request. However, after the first 25 entries (the first page) the code returns just an empty list ([]). I tried changing the second query as:
r = requests.get(r'https://www.reddit.com/r/Petscop/top.json?after='+after)
And the result is the same. I also tried replacing "after" with "before", but the result was again the same.
Is there a better way to get the next page of results?
Also, what the heck is the r in the get argument? I copied it from an example, but I have no idea what it actually means. I ask because I don't know if it is necessary to access the next page, and if it is necessary, I don't know how to modify the query dynamically by adding after to it.
Try:
query = 'https://www.reddit.com/r/Petscop/top.json?sort=top&show=all&t=all&after='+after
or better:
query = 'https://www.reddit.com/r/Petscop/top.json?sort=top&show=all&t=all&after={}'.format(after)
As for r in strings you can omit it.

HTML data grabbing in python?

I'm fairly new to programming and I am trying to take data from a webpage and use it in my python code. Basically, I'm trying to take the price of an item for a game by having python grab the data whenever I run my code, if that makes sense. Here's what I'm struggling with in particular:
The HTML page I'm using is for runescape, namely
http://services.runescape.com/m=itemdb_oldschool/api/catalogue/detail.json?item=4151
This page provides me with a bunch of dictionaries from which I am trying to extract the price of the item in question. All I really want to do it get all of this data into python so I can then manipulate it. My current code is:
import urllib2
response =urllib2.urlopen('http://services.runescape.com/m=itemdb_oldschool/api/catalogue/detail.json?item=4151')
print response
And it outputs:
addinfourl at 49631760 whose fp = socket._fileobject object at 0x02F4B2F0
whereas I just want it to display exactly what is on the URL in question.
Any ideas? I'm sorry if my formatting is terrible. And if it sounds like I have no idea what I'm talking about, it's because I don't.
If the webpage returns a json-encoded data, then do something like this:
import urllib2
import json
response = urllib2.urlopen("http://services.runescape.com/m=itemdb_oldschool/api/catalogue/detail.json?item=4151")
data = json.load(response)
print(data)
Extract the relevant keys in the data variable to get the values you want.

BeautifulSoup to access available bikes in DC bikeshare

I'm new to programming and python and am trying to access the number of available bikes at a given station in the DC bikeshare program. I believe that the best way to do that is with BeautifulSoup. The good news is that the data is available in what appears to be a clean format here: https://www.capitalbikeshare.com/data/stations/bikeStations.xml
Here's an example of a station:
<station>
<id>1</id>
<name>15th & S Eads St</name>
<terminalName>31000</terminalName>
<lastCommWithServer>1460217337648</lastCommWithServer>
<lat>38.858662</lat>
<long>-77.053199</long>
<installed>true</installed>
<locked>false</locked>
<installDate>0</installDate>
<removalDate/>
<temporary>false</temporary>
<public>true</public>
<nbBikes>7</nbBikes>
<nbEmptyDocks>8</nbEmptyDocks>
<latestUpdateTime>1460192501598</latestUpdateTime>
</station>
I'm looking for the <nbBikes> value. I had what I thought would be the start of a python script that would show me the value for the first 5 stations (I'll tackle picking the station I want once I get this under control) but it doesn't return any values. Here's the script:
# bikeShareParse.py - parses the capital bikeshare info page
import bs4, requests
url = "https://www.capitalbikeshare.com/data/stations/bikeStations.xml"
res = requests.get(url)
res.raise_for_status()
#create the soup element from the file
soup = bs4.BeautifulSoup("res.text", "lxml")
# defines the part of the page we are looking for
nbikes = soup.select('#text')
#limits number of results for testing
numOpen = 5
for i in range(numOpen):
print nbikes
I believe that my problem (besides not understanding how to format code correctly in a stack overflow question) is that the value for nbikes = soup.select('#text') is incorrect. However, I can't seem to substitute anything for '#text' to get any values, let alone the ones I want.
Am I approaching this the right way? If so, what am I missing?
thanks
This script creates a dictionary with the structure [station_ID, bikes_remaining]. It is modified from the beginning of this: http://www.plotsofdots.com/archives/68
# from http://www.plotsofdots.com/archives/68
import xml.etree.ElementTree as ET
import urllib2
#we parse the data using urlib2 and xml
site='https://www.capitalbikeshare.com/data/stations/bikeStations.xml'
htm=urllib2.urlopen(site)
doc = ET.parse(htm)
#we get the root tag
root=doc.getroot()
root.tag
#we define empty lists for the empty bikes
sID=[]
embikes=[]
#we now use a for loop to extract the information we are interested in
for country in root.findall('station'):
sID.append(country.find('id').text)
embikes.append(int(country.find('nbBikes').text))
#this just tests that the process above works, can be commented out
#print embikes
#print sID
#use zip to create touples and then parse them into a dataframe
prov=zip(sID,embikes)
print prov[0]

Categories