So I'm trying to scrape a table from this API:
https://api.pbpstats.com/get-wowy-combination-stats/nbaTeamId=1610612743&Season=201819&SeasonType=Playoffs&PlayerIds=203999,1627750,200794
But I'm having trouble getting the headers as a nice list like ['Players On', 'Players Off', 'Minutes', 'NetRtg', 'OffRtg', 'DefRtg'] for my eventual dataframe because the headers are their own class and not part of the other class results.
My current code looks like:
import requests
url = 'https://api.pbpstats.com/get-wowy-combination-stats/nba?TeamId=1610612743&Season=2018-19&SeasonType=Playoffs&PlayerIds=203999,1627750,200794'
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
# grab table
table = response.json()['results'][0]
#grab headers
headers = response.json()['headers']
And when I print(headers) I get [{'field': 'On', 'label': 'Players On'}, {'field': 'Off', 'label': 'Players Off'}, {'field': 'Minutes', 'label': 'Minutes', 'type': 'number'}, {'field': 'NetRtg', 'label': 'NetRtg', 'type': 'decimal'}, {'field': 'OffRtg', 'label': 'OffRtg', 'type': 'decimal'}, {'field': 'DefRtg', 'label': 'DefRtg', 'type': 'decimal'}].
Is a good way to get these into a list like ['Players On', 'Players Off', 'Minutes', 'NetRtg', 'OffRtg', 'DefRtg'] so I can then create a dataframe?
Thank you!
Just extract out all the values with a specific key out of the headers list
and make your dictionary
import requests
url = 'https://api.pbpstats.com/get-wowy-combination-stats/nba?TeamId=1610612743&Season=2018-19&SeasonType=Playoffs&PlayerIds=203999,1627750,200794'
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
#grab table
table = response.json()['results'][0]
#grab headers
headers = response.json()['headers']
#Extracting all values with every key into a dictionary
results = {}
for header in headers:
for k,v in header.items():
results.setdefault(k,[])
results[k].append(v)
#Remove duplicate elements from the list of values
results = {k:list(set(v)) for k,v in results.items()}
print(results)
The output will look like
{
'field': ['Minutes', 'Off', 'On', 'DefRtg', 'NetRtg', 'OffRtg'],
'label': ['Minutes', 'DefRtg', 'Players On', 'NetRtg', 'OffRtg', 'Players Off'],
'type': ['decimal', 'number']
}
list comprehension to iterate through should do the trick:
import requests
url = 'https://api.pbpstats.com/get-wowy-combination-stats/nba?TeamId=1610612743&Season=2018-19&SeasonType=Playoffs&PlayerIds=203999,1627750,200794'
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
# grab table
table = response.json()['results'][0]
#grab headers
headers = response.json()['headers']
headers = [each['label'] for each in headers ]
Related
I want to fetch the all the list tags under the ul tag with id= "demofour" from https://www.parliament.lk/en/members-of-parliament/directory-of-members/?cletter=A.
Below is the code:
print(soup.find('ul',id='demoFour'))
But the output which is being displayed is
<ul id="demoFour"></ul>
Content is served dynamically based on data of an additional XHR request, so you have to call this instead. You can inspect this by taking a look into devtools of browser on XHR tab.
Example
Instead of appending only the obvious to a list of dicts you could also iterate all detailpages while requesting them.
from bs4 import BeautifulSoup
import requests, string
data = []
for letter in list(string.ascii_uppercase):
result = requests.post(f'https://www.parliament.lk/members-of-parliament/directory-of-members/index2.php?option=com_members&task=all&tmpl=component&letter={letter}&wordfilter=&search_district=')
for e in result.json():
#result = requests.get(f"https://www.parliament.lk/en/members-of-parliament/directory-of-members/viewMember/{e['mem_intranet_id']}")
data.append({
'url':f"https://www.parliament.lk/en/members-of-parliament/directory-of-members/viewMember/{e['mem_intranet_id']}",
'id':e['mem_intranet_id'],
'name':e['member_sname_eng']
})
data
Output
[{'url': 'https://www.parliament.lk/en/members-of-parliament/directory-of-members/viewMember/3266',
'id': '3266',
'name': 'A. Aravindh Kumar'},
{'url': 'https://www.parliament.lk/en/members-of-parliament/directory-of-members/viewMember/50',
'id': '50',
'name': 'Abdul Haleem'},
{'url': 'https://www.parliament.lk/en/members-of-parliament/directory-of-members/viewMember/3325',
'id': '3325',
'name': 'Ajith Rajapakse'},
{'url': 'https://www.parliament.lk/en/members-of-parliament/directory-of-members/viewMember/3296',
'id': '3296',
'name': 'Akila Ellawala'},
{'url': 'https://www.parliament.lk/en/members-of-parliament/directory-of-members/viewMember/3355',
'id': '3355',
'name': 'Ali Sabri Raheem'},...]
Can't get prices for multiple symbols, gives error {'code': -1101, 'msg': "Duplicate values for parameter 'symbols'."}. I do as indicated in the documentation GitHub
This is a my code
import requests
symbols = ["KEYUSDT","BNBUSDT","ADAUSDT"]
url = 'https://api.binance.com/api/v3/ticker/price'
params = {'symbols': symbols}
ticker = requests.get(url, params=params).json()
print(ticker)
What am I doing wrong?
You have to specify the list as a string:
import requests
symbols = '["KEYUSDT","BNBUSDT","ADAUSDT"]'
url = 'https://api.binance.com/api/v3/ticker/price'
params = {'symbols': symbols}
ticker = requests.get(url, params=params).json()
print(ticker)
Result:
[{'symbol': 'BNBUSDT', 'price': '317.50000000'}, {'symbol': 'ADAUSDT', 'price': '0.56690000'}, {'symbol': 'KEYUSDT', 'price': '0.00504000'}]
I am trying to use the PokeAPI to extract all pokemon names for a personal project to help build API comfort. I have been having issues with the Params specifically. Can someone please provide support or resources to simplify data grabbing with JSON. Here is the code I have written so far, which returns the entire data set.
import json
from unicodedata import name
import requests
from pprint import PrettyPrinter
pp = PrettyPrinter()
url = ("https://pokeapi.co/api/v2/ability/1/")
params = {
name : "garbodor"
}
def main():
r= requests.get(url)
status = r.status_code
if status != 200:
quit()
else:
get_pokedex(status)
def get_pokedex(x):
print("status code: ", + x) # redundant check for status code before the program begins.
response = requests.get(url, params = params).json()
pp.pprint(response)
main()
Website link: https://pokeapi.co/docs/v2#pokemon-section working specifically with the pokemon group.
I have no idea what values you want but response is a dictionary with lists and you can use keys and indexes (with for-loops) to select elements from response - ie. response["names"][0]["name"]
Minimal working example
Name or ID has to be added at the end of URL.
import requests
import pprint as pp
name_or_id = "stench" # name
#name_or_id = 1 # id
url = "https://pokeapi.co/api/v2/ability/{}/".format(name_or_id)
response = requests.get(url)
if response.status_code != 200:
print(response.text)
else:
data = response.json()
#pp.pprint(data)
print('\n--- data.keys() ---\n')
print(data.keys())
print('\n--- data["name"] ---\n')
print(data['name'])
print('\n--- data["names"] ---\n')
pp.pprint(data["names"])
print('\n--- data["names"][0]["name"] ---\n')
print(data['names'][0]['name'])
print('\n--- language : name ---\n')
names = []
for item in data["names"]:
print(item['language']['name'],":", item["name"])
names.append( item["name"] )
print('\n--- after for-loop ---\n')
print(names)
Result:
--- data.keys() ---
dict_keys(['effect_changes', 'effect_entries', 'flavor_text_entries', 'generation', 'id', 'is_main_series', 'name', 'names', 'pokemon'])
--- data["name"] ---
stench
--- data["names"] ---
[{'language': {'name': 'ja-Hrkt',
'url': 'https://pokeapi.co/api/v2/language/1/'},
'name': 'あくしゅう'},
{'language': {'name': 'ko', 'url': 'https://pokeapi.co/api/v2/language/3/'},
'name': '악취'},
{'language': {'name': 'zh-Hant',
'url': 'https://pokeapi.co/api/v2/language/4/'},
'name': '惡臭'},
{'language': {'name': 'fr', 'url': 'https://pokeapi.co/api/v2/language/5/'},
'name': 'Puanteur'},
{'language': {'name': 'de', 'url': 'https://pokeapi.co/api/v2/language/6/'},
'name': 'Duftnote'},
{'language': {'name': 'es', 'url': 'https://pokeapi.co/api/v2/language/7/'},
'name': 'Hedor'},
{'language': {'name': 'it', 'url': 'https://pokeapi.co/api/v2/language/8/'},
'name': 'Tanfo'},
{'language': {'name': 'en', 'url': 'https://pokeapi.co/api/v2/language/9/'},
'name': 'Stench'},
{'language': {'name': 'ja', 'url': 'https://pokeapi.co/api/v2/language/11/'},
'name': 'あくしゅう'},
{'language': {'name': 'zh-Hans',
'url': 'https://pokeapi.co/api/v2/language/12/'},
'name': '恶臭'}]
--- data["names"][0]["name"] ---
あくしゅう
--- language : name ---
ja-Hrkt : あくしゅう
ko : 악취
zh-Hant : 惡臭
fr : Puanteur
de : Duftnote
es : Hedor
it : Tanfo
en : Stench
ja : あくしゅう
zh-Hans : 恶臭
--- after for-loop ---
['あくしゅう', '악취', '惡臭', 'Puanteur', 'Duftnote', 'Hedor', 'Tanfo', 'Stench', 'あくしゅう', '恶臭']
EDIT:
Another example with other URL and with parameters limit and offset.
I use for-loop to run with different offset (0, 100, 200, etc.)
import requests
import pprint as pp
url = "https://pokeapi.co/api/v2/pokemon/"
params = {'limit': 100}
for offset in range(0, 1000, 100):
params['offset'] = offset # add new value to dict with `limit`
response = requests.get(url, params=params)
if response.status_code != 200:
print(response.text)
else:
data = response.json()
#pp.pprint(data)
for item in data['results']:
print(item['name'])
Result (first 100 items):
bulbasaur
ivysaur
venusaur
charmander
charmeleon
charizard
squirtle
wartortle
blastoise
caterpie
metapod
butterfree
weedle
kakuna
beedrill
pidgey
pidgeotto
pidgeot
rattata
raticate
spearow
fearow
ekans
arbok
pikachu
raichu
sandshrew
sandslash
nidoran-f
nidorina
nidoqueen
nidoran-m
nidorino
nidoking
clefairy
clefable
vulpix
ninetales
jigglypuff
wigglytuff
zubat
golbat
oddish
gloom
vileplume
paras
parasect
venonat
venomoth
diglett
dugtrio
meowth
persian
psyduck
golduck
mankey
primeape
growlithe
arcanine
poliwag
poliwhirl
poliwrath
abra
kadabra
alakazam
machop
machoke
machamp
bellsprout
weepinbell
victreebel
tentacool
tentacruel
geodude
graveler
golem
ponyta
rapidash
slowpoke
slowbro
magnemite
magneton
farfetchd
doduo
dodrio
seel
dewgong
grimer
muk
shellder
cloyster
gastly
haunter
gengar
onix
drowzee
hypno
krabby
kingler
voltorb
How can I output the value of 'Number#en' from my response? I am struggling to understand the nested structure. Thanks
Response from my api
{
'count': 1, 'total': 1,
'data': [
{'id': '6a3d7026-43f3-67zt-9211-99dfc6fee82e',
'name': 'test',
'properties': {'Description#en': 'test', 'Number#en': '20934120'}}],
what I have trying to print the value
response = requests.get(url, headers=headers, data=payload)
data_text = json.loads(response.text)
print(data_text[data]['properties.Number#en'])
data_text['data'] is a list of dictionary so to access the Number#en, you should use
data_text['data'][0]['properties']['Number#en']
Here is the page I wanna scrape: https://www.racing.com/form/2018-11-06/flemington/race/7/results
The race results info are not in the source code.
I tried in the Chrome DevTools, but didn't find the response data that contains the results.
Here is some code in the source code:
ng-controller="formTabResultsController"
ng-init="meet=5149117;race=7;init();" ajax-loader="result"
I think the results is returned and saved in a "result" structure because there are many like this: "result.PrizeMoney" "result.Record".
So how can I get the data of the result with Python? Thanks.
This sites uses a GraphQL API on https://graphql.rmdprod.racing.com. An API key needs to be sent through headers & is retrieved here.
An example with curl, sed & jq :
api_key=$(curl -s "https://www.racing.com/layouts/app.aspx" | \
sed -nE 's/.*headerAPIKey:\s*"(.*)"/\1/p')
curl -s "https://www.racing.com/layouts/app.aspx"
query='query GetMeeting($meetCode: ID!) {
getMeeting(id: $meetCode) {
id
trackName
date
railPosition
races {
id
raceNumber
status
tempo
formRaceEntries {
id
raceEntryNumber
horseName
silkUrl
jockeyName
trainerName
scratched
speedValue
barrierNumber
horse {
name
fullName
colour
}
}
}
}
}'
variables='{ "meetCode": 5149117 }'
curl -G 'https://graphql.rmdprod.racing.com' \
--data-urlencode "query=$query" \
--data-urlencode "variables=$variables" \
-H "X-Api-Key: $api_key" | jq '.'
Using python with python-requests :
import requests
import re
import json
r = requests.get("https://www.racing.com/layouts/app.aspx")
api_key = re.search(".*headerAPIKey:\s*\"(.*)\"", r.text).group(1)
query= """query GetMeeting($meetCode: ID!) {
getMeeting(id: $meetCode) {
id
trackName
date
railPosition
races {
id
raceNumber
status
tempo
formRaceEntries {
id
raceEntryNumber
horseName
silkUrl
jockeyName
trainerName
scratched
speedValue
barrierNumber
horse {
name
fullName
colour
}
}
}
}
}"""
payload = {
"variables": json.dumps({
"meetCode": 5149117
}),
"query": query
}
r = requests.get(
'https://graphql.rmdprod.racing.com',
params = payload,
headers = {
"X-Api-Key": api_key
})
print(r.json())
Chrome Dev tool shows a call to their API
import re
import requests
import json
resp = requests.get('https://api.racing.com/v1/en-au/race/results/5149117/7/?callback=angular.callbacks._b')
# Returned JSONP so we remove the function call: keep only what is between ()
m = re.search(r'\((.*)\)', resp.text, flags=re.S)
data = json.loads(m.group(1))
print(data.keys())
# dict_keys(['race', 'resultCollection', 'exoticCollection'])
print(data['resultCollection'][0])
# {'position': {'at400m': 12, 'at800m': 20, 'finish': 1, 'positionAbbreviation': '1st', 'positionDescription': '', 'positionType': 'Finished'}, 'scratched': False, 'winningTime': 20117, 'margin': None, 'raceEntryNumber': 23, 'number': 23, 'barrierNumber': 19, 'isDeadHeat': False, 'weight': '51kg', 'rating': {'handicapRating': 109, 'ratingProgression': 0}, 'prizeMoney': 4000000.0, 'horse': {'fullName': 'Cross Counter (GB)', 'code': 5256710, 'urlSegment': 'cross-counter-gb', 'silkUrl': '//s3-ap-southeast-2.amazonaws.com/racevic.silks/bb/12621.png', 'age': 5, 'sex': 'Gelding', 'colour': 'Bay', 'sire': 'Teofilo (IRE)', 'dam': 'Waitress (USA)', 'totalPrizeMoney': '$4,576,227', 'averagePrize': '$508,470'}, 'trainer': {'fullName': None, 'shortName': 'C.Appleby', 'code': 20658431, 'urlSegment': 'charlie-appleby-gb'}, 'jockey': {'fullName': 'K.McEvoy', 'shortName': 'K.McEvoy', 'code': 25602, 'urlSegment': 'kerrin-mcevoy', 'allowedClaim': 0.0, 'apprentice': False}, 'gear': {'hasChanges': True, 'gearCollection': [{'changeDate': '2018-11-02T00:00:00', 'currentChange': True, 'description': 'Bandages (Front): On', 'name': 'Bandages (Front)', 'status': 'On', 'comments': None}, {'changeDate': '2018-08-01T00:00:00', 'currentChange': False, 'description': 'Ear Muffs (Pre-Race Only)', 'name': 'Ear Muffs (Pre-Race Only)', 'status': 'On', 'comments': None}, {'changeDate': '2018-08-01T00:00:00', 'currentChange': False, 'description': 'Lugging Bit', 'name': 'Lugging Bit', 'status': 'On', 'comments': 'Rubber ring bit'}, {'changeDate': '2018-08-01T00:00:00', 'currentChange': False, 'description': 'Cross-over Nose Band', 'name': 'Cross-over Nose Band', 'status': 'On', 'comments': None}], 'currentGearCollection': None}, 'odds': {'priceStart': '$9.00', 'parimutuel': {'returnWin': '12', 'returnPlace': '4.40', 'isFavouriteWin': False}, 'fluctuations': {'priceOpening': '$10.00', 'priceFluc': '$10.00'}}, 'comment': 'Bit Slow Out Settled Down near tail lucky to avoid injured horse was checked though 12l bolting Turn Straightened Up Off Mid-Field 7-8l gets Clear 400 and charged home to score. big win # very good from back', 'extendedApiUrl': '/v1/en-au/form/horsestat/5149117/7/5256710', 'extendedApiUrlMobile': '/v1/en-au/form/horsestatmobile/5149117/7/5256710', 'last5': ['-', '4', '3', '-', '4']}
Another way to do it is to use these parameters (discoverable through the Developer tab in your browser), without using regex:
import requests
import json
url = 'https://graphql.rmdprod.racing.com/?query=query%20GetMeeting($meetCode:%20ID!)%20%7BgetMeeting(id:%20$meetCode)%7Bid,trackName,date,railPosition,races%7Bid,raceNumber,status,tempo,formRaceEntries%7Bid,raceEntryNumber,horseName,silkUrl,jockeyName,trainerName,scratched,speedValue,barrierNumber%7D%7D%7D%7D&variables=%7B%20%22meetCode%22:%205149117%20%7D'
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"Content-Type": "application/json",
"X-Api-Key": "da2-akkuiub3brhahc7nab2msruddq"
}
resp = requests.get(url,headers=headers)
data= json.loads(resp.text) # or data = json.decoder(resp.text)
data