Trying to get specific values from json - python

Can I get specific values from the json?
I'm trying to get the id, but I don't know how exactly to do it.
This is the json:
{
"searchType": "Movie",
"expression": "Lord Of The Rings 2022.json",
"results": [{
"id": "tt18368278",
"resultType": "Title",
"image": "https://m.media-amazon.com/images/M/MV5BZTMwZjUzNGMtMWI3My00ZGJmLWFmYWEtYjk2YWYxYzI2NWRjXkEyXkFqcGdeQXVyODY0NzcxNw##._V1_Ratio1.7600_AL_.jpg",
"title": "The Lord of the Rings Superfans Review the Rings of Power",
"description": "(2022 Video)"
}],
"errorMessage": ""
}
I just want to get the values of result, but I want get specific values, for example the id.
This is my code:
import requests
import json
movie = input("Movies:")
base_url = ("https://imdb-api.com/en/API/SearchMovie/myapi/"+movie)
r = (requests.get(base_url))
b = r.json()
print(b['results'])

Considering your json valid, and to accommodate more than one result, you could do:
[...]
r = (requests.get(base_url))
b = r.json()
for result in b['results']:
print(result['id'])
To get just one item (first item from array), you can do:
print(b['results'][0]['id'])
Requests documentation: https://requests.readthedocs.io/en/latest/

Related

How to write a Python script to automate API calls and retrieve a specific part of the result

I have a csv file of schools that contains one school per row for a total of 32091 schools. The name of the school is indicated in the 6th column, and the city code is indicated in the 7th column.
I would like to retrieve the latitude and longitude of the schools by using the geocoding API of the IGN (Institut Géographique National de France) whose documentation in French is here: https://geoservices.ign.fr/documentation/services/api-et-services-ogc/geocodage-beta-20/documentation-technique-de-lapi-de
This API allows me to indicate a string of characters as search terms, and to restrict the search with a filter on the city code. I have tested several queries and the results seem to be satisfactory. For example, for the school "ecole primaire privee st joseph de bonabry" located in Fougères (city code 35115), the following query:
https://wxs.ign.fr/essentiels/geoportail/geocodage/rest/0.1/search?q=ecole%20primaire%20privee%20st%20joseph%20de%20bonabry&index=poi&limit=1&returntruegeometry=false&postcode=35300
returns the following json:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"postcode": [
"35300"
],
"citycode": [
"35115",
"35"
],
"city": [
"Fougères"
],
"toponym": "École Primaire Saint-Joseph de Bonabry",
"category": [
"area of activity or interest",
"primary education"
],
"extrafields": {
"cleabs": "SURFACTI0000000215529805",
"names": [
"saint joseph de bonabry elementary school"
]
},
"_score": 0.703030303030303,
"_type": "poi"
},
"geometry": {
"type": "Point",
"coordinates": [
-1.19610139955834,
48.3550652629677
]
}
}
]
}
So the coordinates to extract are located here: {"features":[{ "geometry":{"coordinates":[lon, lat]}}]}
I would like to go through a Python script to automate the task. From what I understand, the steps could be as follows:
Open the CSV
Read the value contained in the sixth column
Perform an http get request for each row, changing the URL based on the value in the sixth column
Extract longitude and latitude from the results
Update the longitude and latitude columns (already existing) with the previously extracted values.
Panda allows me to read the CSV while Requests allows me to formulate the query. Being a beginner in programming I don't really know how to write the script. I guess it can start this way:
import panda as pd
import requests
df = pd.read_csv("myfile.csv")
...but I'm stuck on what to do next. I guess a loop would allow to repeat the request but how do you change the URL terms? In general, any help on the whole scrit will be greatly appreciated!
This is how I would do it.
Replace "name" and "post" with the actual column names from your CSV
import pandas as pd
import requests
# read the data CSV
# you have to replace "name" and "post" with the actual column names
df = pd.read_csv("data.csv", usecols=["name", "post"])
# define the request URL
url = "https://wxs.ign.fr/essentiels/geoportail/geocodage/rest/0.1/search"
#api call for each element
for i in range(len(df["name"])):
# prepare the name for URL
genName = df["name"][i].replace(" ", "%20")
print(genName)
# prepare request
request = url + "?q=" + genName + "&index=poi&limit=1&returntruegeometry=false&postcode=" + str(df["post"][i])
print(request)
# do the request
r = requests.get(request)
# response
result = r.text
print(result)

How to requests all sizes in stock - Python

I'm trying to request all the sizes in stock from Zalando. I can not quite figure out how to do it since the video I'm watching
showing how to request sizes look different than min.
The video that I watch was this. Video - 5.30
Does anyone know how to request the sizes in stock and print the sizes that in stock?
The site in trying to request sizes of: here
My code looks like this:
import requests
from bs4 import BeautifulSoup as bs
session = requests.session()
def get_sizes_in_stock():
global session
endpoint = "https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html"
response = session.get(endpoint)
soup = bs(response.text, "html.parser")
I have tried to go to the View page source and look for the sizes, but I could not see the sizes in the page source.
I hope someone out there can help me what to do.
The sizes are in the page
I found them in the html, in a javascript tag, in the format
{
"sku": "NI112O0BT-A110090000",
"size": "42.5",
"deliveryOptions": [
{
"deliveryTenderType": "FASTER"
}
],
"offer": {
"price": {
"promotional": null,
"original": {
"amount": 114500
},
"previous": null,
"displayMode": null
},
"merchant": {
"id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
},
"selectionContext": null,
"isMeaningfulOffer": true,
"displayFlags": [],
"stock": {
"quantity": "MANY"
},
"sku": "NI112O0BT-A110090000",
"size": "42.5",
"deliveryOptions": [
{
"deliveryTenderType": "FASTER"
}
],
"offer": {
"price": {
"promotional": null,
"original": {
"amount": 114500
},
"previous": null,
"displayMode": null
},
"merchant": {
"id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
},
"selectionContext": null,
"isMeaningfulOffer": true,
"displayFlags": [],
"stock": {
"quantity": "MANY"
}
},
"allOffers": [
{
"price": {
"promotional": null,
"original": {
"amount": 114500
},
"previous": null,
"displayMode": null
},
"merchant": {
"id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
},
"selectionContext": null,
"isMeaningfulOffer": true,
"displayFlags": [],
"stock": {
"quantity": "MANY"
},
"deliveryOptions": [
{
"deliveryWindow": "2022-05-23 - 2022-05-25"
}
],
"fulfillment": {
"kind": "ZALANDO"
}
}
]
}
}
If you parse the html with bs4 you should be able to find the script tag and extract the JSON.
The sizes for the default color of shoe are shown in html. Alongside this are the urls for the other colors. You can extract these into a dictionary and loop, making requests and pulling the different colors and their availability, which I think is what you are actually requesting, as follows (note: I have kept quite generic to avoid hardcoding keys which change across requests):
import requests, re, json
def get_color_results(link):
headers = {"User-Agent": "Mozilla/5.0"}
r = requests.get(link, headers=headers).text
data = json.loads(re.search(r'(\{"enrichedEntity".*size.*)<\/script', r).group(1))
results = []
color = ""
for i in data["graphqlCache"]:
if "ern:product" in i:
if "product" in data["graphqlCache"][i]["data"]:
if "name" in data["graphqlCache"][i]["data"]["product"]:
results.append(data["graphqlCache"][i]["data"]["product"])
if (
color == ""
and "color" in data["graphqlCache"][i]["data"]["product"]
):
color = data["graphqlCache"][i]["data"]["product"]["color"]["name"]
return (color, results)
link = "https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html"
final = {}
color, results = get_color_results(link)
colors = {
j["node"]["color"]["name"]: j["node"]["uri"]
for j in [
a
for b in [
i["family"]["products"]["edges"]
for i in results
if "family" in i
if "products" in i["family"]
]
for a in b
]
}
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
for k, v in colors.items():
if k not in final:
color, results = get_color_results(v)
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
print(final)
Explanatory notes from chat:
Use chrome browser to navigate to link
Press Ctrl + U to view page source
Press Ctrl + F to search for 38.5 in html
The first match is the long string you already know about. The string is long and difficult to navigate in page source and identify which tag it is part of. There are a number of ways I could identify the right script from these, but for now, an easy way would be:
from bs4 import BeautifulSoup as bs
link = 'https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html'
headers = {'User-Agent':'Mozilla/5.0'}
r = requests.get(link, headers = headers)
soup = bs(r.text, 'lxml')
for i in soup.select('script[type="application/json"]'):
if '38.5' in i.text:
print(i)
break
Slower method would be:
soup.find("script", text=re.compile(r'.*38.5.*'))
Whilst I used bs4 to get the right script tag contents, this was so I knew the start and end of the string denoting the JavaScript object I wanted to use re to extract, and then to deserialize into a JSON object with json; this in a re-write to use re rather than bs4 i.e. use re on entire response text, from the request, and pass a regex pattern which would pull out the same string
I put the entire page source in a regex tool and wrote a regex to return that same string as identified above. See that regex here
Click on right hand side, match 1 group 1, to see highlighted the same string being returned from regex as you saw with BeautifulSoup. Two different ways of getting the same string containing the sizes
That is the string which I needed to examine, as JSON, the structure of. See in json viewer here
You will notice the JSON is very nested with some keys to dictionaries that are likely dynamic, meaning I needed to write code which could traverse the JSON and use certain more stable keys to pull out the colours available, and for the default shoe colour the sizes and availability
There is an expand all button in that JSON viewer. You can then search with Ctrl + F for 38.5 again
10a) I noticed that size and availability were for the default shoe colour
10b) I also noticed that within JSON, if I searched by one of the other colours from the dropdown, I could find URIs for each colour of show listed
I used Wolf as my search term (as I suspected less matches for that term within the JSON)
You can see one of the alternate colours and its URI listed above
I visited that URI and found the availability and shoe sizes for that colour in same place as I did for the default white shoes
I realised I could make an initial request and get the default colour and sizes with availability. From that same request, extract the other colours and their URIs
I could then make requests to those other URIs and re-use my existing code to extract the sizes/availability for the new colours
This is why I created my get_color_results() function. This was the re-usable code to extract the sizes and availability from each page
results holds all the matches within the JSON to certain keys I am looking for to navigate to the right place to get the sizes and availabilities, as well as the current colour
This code traverses the JSON to get to the right place to extract data I want to use later
results = []
color = ""
for i in data["graphqlCache"]:
if "ern:product" in i:
if "product" in data["graphqlCache"][i]["data"]:
if "name" in data["graphqlCache"][i]["data"]["product"]:
results.append(data["graphqlCache"][i]["data"]["product"])
if (
color == ""
and "color" in data["graphqlCache"][i]["data"]["product"]
):
color = data["graphqlCache"][i]["data"]["product"]["color"]["name"]
The following pulls out the sizes and availability from results:
{
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
For the first request only, the following gets the other shoes colours and their URIs into a dictionary to later loop:
colors = {
j["node"]["color"]["name"]: j["node"]["uri"]
for j in [
a
for b in [
i["family"]["products"]["edges"]
for i in results
if "family" in i
if "products" in i["family"]
]
for a in b
]
}
This bit gets all the other colours and their availability:
for k, v in colors.items():
if k not in final:
color, results = get_color_results(v)
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
Throughout, I update the dictionary final with the found colour and associated size and availabilities
Always check if an hidden api is available, it will save you a looooot of time.
In this case I found this api:
https://www.zalando.dk/api/graphql
You can pass a payload and you obtain a json answer
# I extracted the payload from the network tab of my browser debbuging tools
payload = """[{"id":"0ec65c3a62f6bd0b29a59f22021a44f42e6282b7f8ff930718a1dd5783b336fc","variables":{"id":"ern:product::NI112O0S7-H11"}},{"id":"0ec65c3a62f6bd0b29a59f22021a44f42e6282b7f8ff930718a1dd5783b336fc","variables":{"id":"ern:product::NI112O0RY-A11"}}]"""
conn = http.client.HTTPSConnection("www.zalando.dk")
headers = {
'content-type': "application/json"
}
conn.request("POST", "/api/graphql", payload, headers)
res = conn.getresponse()
res = res.read() # json output
res contains for each product a json leaf containing the available size:
"simples": [
{
"size": "38.5",
"sku": "NI112O0P5-A110060000"
},
{
"size": "44.5",
"sku": "NI112O0P5-A110105000"
},
{
...
It's now easy to extract the informations.
There also is a field that indicate if the product got a promotion or not, cool if you want to track a discount.

python Issue with getting correct info from the json

I have issue then i try get from correct information.
For example i have very big json output after request i made in post (i cant use get).
"offers": [
{
"rank": 1,
"provider": {
"id": 6653,
"isLocalProvider": false,
"logoUrl": "https://img.vxcdn.com/i/partner-energy/c_6653.png?v=878adaf9ed",
"userRatings": {
"additonalCustomerRatings": {
"price": {
"percent": 73.80
},
"service": {
"percent": 67.50
},
"switching": {
"percent": 76.37
},
"caption": {
"text": "Zusätzliche Kundenbewertungen"
}
},
I cant show it all because its very big.
Like you see "rank" 1 in this request exist 20 ranks with information like content , totalCost and i need pick them all. Like 6 rank content and totalCost, 8 rank content and totalCost.
So first off all in python i use code for getting what json data.
import requests
import json
url = "https://www.verivox.de/api/energy/offers/electricity/postCode/10555/custom?"
payload="{\"profile\":\"H0\",\"prepayment\":true,\"signupOnly\":true,\"includePackageTariffs\":true,\"includeTariffsWithDeposit\":true,\"includeNonCompliantTariffs\":true,\"bonusIncluded\":\"non-compliant\",\"maxResultsPerPage\":20,\"onlyProductsWithGoodCustomerRating\":false,\"benchmarkTariffId\":741122,\"benchmarkPermanentTariffId\":38,\"paolaLocationId\":\"71085\",\"includeEcoTariffs\":{\"includesNonEcoTariffs\":true},\"maxContractDuration\":240,\"maxContractProlongation\":240,\"usage\":{\"annualTotal\":3500,\"offPeakUsage\":0},\"priceGuarantee\":{\"minDurationInMonths\":0},\"maxTariffsPerProvider\":999,\"cancellationPeriod\":null,\"previewDisplayTime\":null,\"onlyRegionalTariffs\":false,\"sorting\":{\"criterion\":\"TotalCosts\",\"direction\":\"Ascending\"},\"includeSpecialBonusesInCalculation\":\"None\",\"totalCostViewMode\":1,\"ecoProductType\":0}"
headers = {
'Content-Type': 'application/json',
'Cookie': '__cfduid=d97a159bb287de284487ebdfa0fd097b41606303469; ASP.NET_SessionId=jfg3y20s31hclqywloocjamz; 0e3a873fd211409ead79e21fffd2d021=product=Power&ReturnToCalcLink=/power/&CustomErrorsEnabled=False&IsSignupWhiteLabelled=False; __RequestVerificationToken=vrxksNqu8CiEk9yV-_QHiinfCqmzyATcGg18dAqYXqR0L8HZNlvoHZSZienIAVQ60cB40aqfQOXFL9bsvJu7cFOcS2s1'
}
response = requests.request("POST", url, headers=headers, data=payload)
jsondata = response.json()
# print(response.text)
For it working fine, but then i try pick some data what i needed like i say before im getting
for Rankdata in str(jsondata['rank']):
KeyError: 'rank'
my code for this error.
dataRank = []
for Rankdata in str(jsondata['rank']):
dataRank.append({
'tariff':Rankdata['content'],
'cost': Rankdata['totalCost'],
'sumOfOneTimeBonuses': Rankdata['content'],
'savings': Rankdata['content']
})
Then i try do another way. Just get one or some data, but not working too.
data = response.json()
#print(data)
test = float((data['rank']['totalCost']['content']))
I know my code not perfect, but i first time deal with json what are so big and are so difficult. I will be very grateful if show my in my case example how i can pick rank 1 - rank 20 data and print it.
Thank you for your help.
If you look closely at the highest level in the json, you can see that the value for key offers is a list of dicts. You can therefore loop through it like this:
for offer in jsondata['offers']:
print(offer.get('rank'))
print(offer.get('provider').get('id'))
And the same goes for other keys in the offers.

For looping several thousands of Url apis and adding it to a list

Problem: The output of this code seems to be repeating alot of the same entries in the final list, thus making it exponentially longer.
The goal would be complete the query and the print the final list with all city within the region
[
{
"name": "Herat",
"id": "AF~HER~Herat"
}
]
[
{
"name": "Herat",
"id": "AF~HER~Herat"
},
{
"name": "Kabul",
"id": "AF~KAB~Kabul"
}
]
[
{
"name": "Herat",
"id": "AF~HER~Herat"
},
{
"name": "Kabul",
"id": "AF~KAB~Kabul"
},
{
"name": "Kandahar",
"id": "AF~KAN~Kandahar"
}
]
My goal is to to a get a list with cityID. I first to a GET request and parse the JSON response to get the country IDs to a list,
Second: I have a for loop, which will make another GET request for the region id, but i now need to add the country IDs to the api url. I do that by adding .format on the GET request. and iterate trough all the countries and there respective region IDs, i parse them and store them in a list.
Third: i have another for loop, which will make another GET request for the cityID that will loop trough all cities with the above Region ID list, and the respectively collect the cityID that i really need.
Code :
from requests.auth import HTTPBasicAuth
import requests
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def countries():
data = requests.get("https://localhost/api/netim/v1/countries/", verify=False, auth=HTTPBasicAuth("admin", "admin"))
rep = data.json()
a = []
for elem in rep['items']:
a.extend([elem.get("id","")])
print(a)
return a
def regions():
ids = []
for c in countries():
url = requests.get("https://localhost/api/netim/v1/countries/{}/regions".format(c), verify=False, auth=HTTPBasicAuth("admin", "admin"))
response = url.json()
for cid in response['items']:
ids.extend([cid.get("id","")])
data = []
for r in ids:
url = requests.get("https://localhost/api/netim/v1/regions/{}/cities".format(r), verify=False, auth=HTTPBasicAuth("admin", "admin"))
response = url.json()
data.extend([{"name":r.get("name",""),"id":r.get("id", "")} for r in response['items']])
print(json.dumps(data, indent=4))
return data
regions()
print(regions())
You will see thou output contains several copies of the same entry.
Not a programmer, not sure where am i getting it wrong
It looks as though the output you're concerned with might be due to the fact that you're printing data as you iterate through it in the regions() method.
Try to remove the line:
print(json.dumps(data, indent=4))?
Also, and more importantly - you're setting data to an empty list every time you iterate on an item in Countries. You should probably declare that variable before the initial loop.
You're already printing the final result when you call the function. So printing as you iterate only really makes sense if you're debugging & needing to review the data as you go through it.

Accessing nested objects with python

I have a response that I receive from foursquare in the form of json. I have tried to access the certain parts of the object but have had no success. How would I access say the address of the object? Here is my code that I have tried.
url = 'https://api.foursquare.com/v2/venues/explore'
params = dict(client_id=foursquare_client_id,
client_secret=foursquare_client_secret,
v='20170801', ll=''+lat+','+long+'',
query=mealType, limit=100)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
msg = '{} {}'.format("Restaurant Address: ",
data['response']['groups'][0]['items'][0]['venue']['location']['address'])
print(msg)
Here is an example of json response:
"items": [
{
"reasons": {
"count": 0,
"items": [
{
"summary": "This spot is popular",
"type": "general",
"reasonName": "globalInteractionReason"
}
]
},
"venue": {
"id": "412d2800f964a520df0c1fe3",
"name": "Central Park",
"contact": {
"phone": "2123106600",
"formattedPhone": "(212) 310-6600",
"twitter": "centralparknyc",
"instagram": "centralparknyc",
"facebook": "37965424481",
"facebookUsername": "centralparknyc",
"facebookName": "Central Park"
},
"location": {
"address": "59th St to 110th St",
"crossStreet": "5th Ave to Central Park West",
"lat": 40.78408342593807,
"lng": -73.96485328674316,
"labeledLatLngs": [
{
"label": "display",
"lat": 40.78408342593807,
"lng": -73.96485328674316
}
],
the full response can be found here
Like so
addrs=data['items'][2]['location']['address']
Your code (at least as far as loading and accessing the object) looks correct to me. I loaded the json from a file (since I don't have your foursquare id) and it worked fine. You are correctly using object/dictionary keys and array positions to navigate to what you want. However, you mispelled "address" in the line where you drill down to the data. Adding the missing 'a' made it work. I'm also correcting the typo in the URL you posted.
I answered this assuming that the example JSON you linked to is what is stored in data. If that isn't the case, a relatively easy way to see exact what python has stored in data is to import pprint and use it like so: pprint.pprint(data).
You could also start an interactive python shell by running the program with the -i switch and examine the variable yourself.
data["items"][2]["location"]["address"]
This will access the address for you.
You can go to any level of nesting by using integer index in case of an array and string index in case of a dict.
Like in your case items is an array
#items[int index]
items[0]
Now items[0] is a dictionary so we access by string indexes
item[0]['location']
Now again its an object s we use string index
item[0]['location']['address]

Categories