Get json data from multiple api pages into one main json output

Get json data from multiple api pages into one main json output - python

I'm trying to get the json data from every page on an API and put that into one big json output.
(Docs for API i'm using: https://docs.scoresaber.com/#/Leaderboards/get_api_leaderboards)
When doing the following API call:
https://scoresaber.com/api/leaderboards?qualified=true&withMetadata=true
i get the object metadata which has total and itemsPerPage
Example:
"metadata": {
"total": 193,
"page": 1,
"itemsPerPage": 14
}
So 193/14 means i get 14 pages.
This means i can iterate through all pages by doing a request for each page with this API call: https://scoresaber.com/api/leaderboards?qualified=true&page=2
until i get to &page=4
Each page will result this json (trimmed example):
{
"leaderboards": [
{
"id": 466447,
"songHash": "E527C82AF2DEC46A23F12D742035D76CCA875904",
"songName": "Parasite",
"songSubName": "(feat. Hatsune Miku)",
"songAuthorName": "DECO*27",
"levelAuthorName": "Alice",
"difficulty": {
"leaderboardId": 466447,
"difficulty": 1,
"gameMode": "SoloStandard",
"difficultyRaw": "_Easy_SoloStandard"
},
"maxScore": 0,
"createdDate": "2022-06-01T17:16:52.000Z",
"rankedDate": null,
"qualifiedDate": "2022-06-14T05:53:21.000Z",
"lovedDate": null,
"ranked": false,
"qualified": true,
"loved": false,
"maxPP": -1,
"stars": 0,
"plays": 70,
"dailyPlays": 0,
"positiveModifiers": false,
"playerScore": null,
"coverImage": "https://cdn.scoresaber.com/covers/E527C82AF2DEC46A23F12D742035D76CCA875904.png",
"difficulties": null
},
],
"metadata": {
"total": 193,
"page": 2,
"itemsPerPage": 14
}
}
So what i want is to loop through all the pages and have every item in leaderboards into one json.
This is what I've tried:
import requests
import math
import json
response = requests.get("https://scoresaber.com/api/leaderboards?qualified=true&withMetadata=true")
api = json.loads(response.text)
pages = math.ceil(api['metadata']['total'] / api['metadata']['itemsPerPage'])
api = {}
for page in range(1, pages+1):
api.update(json.loads(requests.get(f"https://scoresaber.com/api/leaderboards?qualified=true&page={page}").text))
api = json.dumps(api, indent=4)
But that seems to only get the last page and just overwrite the dictionary (i'm also not sure if i need to declare api as a dict.
So I'm just not sure what is going wrong, if im declaring stuff wrongly, if im requesting the api wrongly, or if im putting stuff wrongly into the dict, etc.

If I understand you correctly you want to receive all data to one big list:
import json
import math
import requests
url1 = (
"https://scoresaber.com/api/leaderboards?qualified=true&withMetadata=true"
)
url2 = "https://scoresaber.com/api/leaderboards?qualified=true&page={}"
api = requests.get(url1).json()
pages = math.ceil(api["metadata"]["total"] / api["metadata"]["itemsPerPage"])
all_data = []
for page in range(1, pages + 1):
data = requests.get(url2.format(page)).json()
all_data.extend(data["leaderboards"])
print(json.dumps(all_data, indent=4))
This will print all 193 items from all pages:
[
{
"id": 484864,
"songHash": "80559A7A4AC0F62F27DAF1C59DF67F305250ADFF",
"songName": "Phony",
"songSubName": "feat. KAFU (Hoshimachi Suisei Cover)",
"songAuthorName": "Tsumiki",
"levelAuthorName": "Joshabi & Shad",
...

Related

Python Taking API response and adding to JSON Key error

I have a script that takes an ID from a JSON file, adds it to a URL for an API request. The aim is to have a loop run through the 1000-ish ids and preduce one JSON file with all the information contained within.
The current code calls the first request and creates and populates the JSON file, but when run in a loop it throws a key error.
import json
import requests
fname = "NewdataTest.json"
def write_json(data, fname):
fname = "NewdataTest.json"
with open(fname, "w") as f:
json.dump(data, f, indent = 4)
with open (fname) as json_file:
data = json.load(json_file)
temp = data[0]
#print(newData)
y = newData
data.append(y)
# Read test.json to get tmdb IDs
tmdb_ids = []
with open('test.json', 'r') as json_fp:
imdb_info = json.load(json_fp)
tmdb_ids = [movie_info['tmdb_id'] for movies_chunk in imdb_info for movie_index, movie_info in movies_chunk.items()]
# Add IDs to API call URL
for tmdb_id in tmdb_ids:
print("https://api.themoviedb.org/3/movie/" + str(tmdb_id) + "?api_key=****")
# Send API Call
response_API = requests.get("https://api.themoviedb.org/3/movie/" + str(tmdb_id) + "?api_key=****")
# Check API Call Status
print(response_API.status_code)
write_json((response_API.json()), "NewdataTest.json")
The error is in this line "temp = data[0]" I have tried printing the keys for data, nothing. At this point, I have no idea where I am with this as I have hacked it about it barely resembles anything like a cohesive piece of code. My aim was to make a simple function to get the data from the JSON, one to produce the API call URLs, and one to write the results to the new JSON.
Example of API reponse JSON:
{
"adult": false,
"backdrop_path": "/e1cC9muSRtAHVtF5GJtKAfATYIT.jpg",
"belongs_to_collection": null,
"budget": 0,
"genres": [
{
"id": 10749,
"name": "Romance"
},
{
"id": 35,
"name": "Comedy"
}
],
"homepage": "",
"id": 1063242,
"imdb_id": "tt24640474",
"original_language": "fr",
"original_title": "Disconnect: The Wedding Planner",
"overview": "After falling victim to a scam, a desperate man races the clock as he attempts to plan a luxurious destination wedding for an important investor.",
"popularity": 34.201,
"poster_path": "/tGmCxGkVMOqig2TrbXAsE9dOVvX.jpg",
"production_companies": [],
"production_countries": [
{
"iso_3166_1": "KE",
"name": "Kenya"
},
{
"iso_3166_1": "NG",
"name": "Nigeria"
}
],
"release_date": "2023-01-13",
"revenue": 0,
"runtime": 107,
"spoken_languages": [
{
"english_name": "English",
"iso_639_1": "en",
"name": "English"
},
{
"english_name": "Afrikaans",
"iso_639_1": "af",
"name": "Afrikaans"
}
],
"status": "Released",
"tagline": "",
"title": "Disconnect: The Wedding Planner",
"video": false,
"vote_average": 5.8,
"vote_count": 3
}

You can store all results from the API calls into a list and then save this list in Json format into a file. For example:
#...
all_data = []
for tmdb_id in tmdb_ids:
print("https://api.themoviedb.org/3/movie/" + str(tmdb_id) + "?api_key=****")
# Send API Call
response_API = requests.get("https://api.themoviedb.org/3/movie/" + str(tmdb_id) + "?api_key=****")
# Check API Call Status
print(response_API.status_code)
if response_API.status_code == 200:
# store the Json data in a list:
all_data.append(response_API.json())
# write the list to file
with open('output.json', 'w') as f_out:
json.dump(all_data, f_out, indent=4)
This will produce output.json with all responses in Json format.

How to requests all sizes in stock - Python

I'm trying to request all the sizes in stock from Zalando. I can not quite figure out how to do it since the video I'm watching
showing how to request sizes look different than min.
The video that I watch was this. Video - 5.30
Does anyone know how to request the sizes in stock and print the sizes that in stock?
The site in trying to request sizes of: here
My code looks like this:
import requests
from bs4 import BeautifulSoup as bs
session = requests.session()
def get_sizes_in_stock():
global session
endpoint = "https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html"
response = session.get(endpoint)
soup = bs(response.text, "html.parser")
I have tried to go to the View page source and look for the sizes, but I could not see the sizes in the page source.
I hope someone out there can help me what to do.

The sizes are in the page
I found them in the html, in a javascript tag, in the format
{
"sku": "NI112O0BT-A110090000",
"size": "42.5",
"deliveryOptions": [
{
"deliveryTenderType": "FASTER"
}
],
"offer": {
"price": {
"promotional": null,
"original": {
"amount": 114500
},
"previous": null,
"displayMode": null
},
"merchant": {
"id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
},
"selectionContext": null,
"isMeaningfulOffer": true,
"displayFlags": [],
"stock": {
"quantity": "MANY"
},
"sku": "NI112O0BT-A110090000",
"size": "42.5",
"deliveryOptions": [
{
"deliveryTenderType": "FASTER"
}
],
"offer": {
"price": {
"promotional": null,
"original": {
"amount": 114500
},
"previous": null,
"displayMode": null
},
"merchant": {
"id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
},
"selectionContext": null,
"isMeaningfulOffer": true,
"displayFlags": [],
"stock": {
"quantity": "MANY"
}
},
"allOffers": [
{
"price": {
"promotional": null,
"original": {
"amount": 114500
},
"previous": null,
"displayMode": null
},
"merchant": {
"id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
},
"selectionContext": null,
"isMeaningfulOffer": true,
"displayFlags": [],
"stock": {
"quantity": "MANY"
},
"deliveryOptions": [
{
"deliveryWindow": "2022-05-23 - 2022-05-25"
}
],
"fulfillment": {
"kind": "ZALANDO"
}
}
]
}
}
If you parse the html with bs4 you should be able to find the script tag and extract the JSON.

The sizes for the default color of shoe are shown in html. Alongside this are the urls for the other colors. You can extract these into a dictionary and loop, making requests and pulling the different colors and their availability, which I think is what you are actually requesting, as follows (note: I have kept quite generic to avoid hardcoding keys which change across requests):
import requests, re, json
def get_color_results(link):
headers = {"User-Agent": "Mozilla/5.0"}
r = requests.get(link, headers=headers).text
data = json.loads(re.search(r'(\{"enrichedEntity".*size.*)<\/script', r).group(1))
results = []
color = ""
for i in data["graphqlCache"]:
if "ern:product" in i:
if "product" in data["graphqlCache"][i]["data"]:
if "name" in data["graphqlCache"][i]["data"]["product"]:
results.append(data["graphqlCache"][i]["data"]["product"])
if (
color == ""
and "color" in data["graphqlCache"][i]["data"]["product"]
):
color = data["graphqlCache"][i]["data"]["product"]["color"]["name"]
return (color, results)
link = "https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html"
final = {}
color, results = get_color_results(link)
colors = {
j["node"]["color"]["name"]: j["node"]["uri"]
for j in [
a
for b in [
i["family"]["products"]["edges"]
for i in results
if "family" in i
if "products" in i["family"]
]
for a in b
]
}
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
for k, v in colors.items():
if k not in final:
color, results = get_color_results(v)
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
print(final)
Explanatory notes from chat:
Use chrome browser to navigate to link
Press Ctrl + U to view page source
Press Ctrl + F to search for 38.5 in html
The first match is the long string you already know about. The string is long and difficult to navigate in page source and identify which tag it is part of. There are a number of ways I could identify the right script from these, but for now, an easy way would be:
from bs4 import BeautifulSoup as bs
link = 'https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html'
headers = {'User-Agent':'Mozilla/5.0'}
r = requests.get(link, headers = headers)
soup = bs(r.text, 'lxml')
for i in soup.select('script[type="application/json"]'):
if '38.5' in i.text:
print(i)
break
Slower method would be:
soup.find("script", text=re.compile(r'.*38.5.*'))
Whilst I used bs4 to get the right script tag contents, this was so I knew the start and end of the string denoting the JavaScript object I wanted to use re to extract, and then to deserialize into a JSON object with json; this in a re-write to use re rather than bs4 i.e. use re on entire response text, from the request, and pass a regex pattern which would pull out the same string
I put the entire page source in a regex tool and wrote a regex to return that same string as identified above. See that regex here
Click on right hand side, match 1 group 1, to see highlighted the same string being returned from regex as you saw with BeautifulSoup. Two different ways of getting the same string containing the sizes
That is the string which I needed to examine, as JSON, the structure of. See in json viewer here
You will notice the JSON is very nested with some keys to dictionaries that are likely dynamic, meaning I needed to write code which could traverse the JSON and use certain more stable keys to pull out the colours available, and for the default shoe colour the sizes and availability
There is an expand all button in that JSON viewer. You can then search with Ctrl + F for 38.5 again
10a) I noticed that size and availability were for the default shoe colour
10b) I also noticed that within JSON, if I searched by one of the other colours from the dropdown, I could find URIs for each colour of show listed
I used Wolf as my search term (as I suspected less matches for that term within the JSON)
You can see one of the alternate colours and its URI listed above
I visited that URI and found the availability and shoe sizes for that colour in same place as I did for the default white shoes
I realised I could make an initial request and get the default colour and sizes with availability. From that same request, extract the other colours and their URIs
I could then make requests to those other URIs and re-use my existing code to extract the sizes/availability for the new colours
This is why I created my get_color_results() function. This was the re-usable code to extract the sizes and availability from each page
results holds all the matches within the JSON to certain keys I am looking for to navigate to the right place to get the sizes and availabilities, as well as the current colour
This code traverses the JSON to get to the right place to extract data I want to use later
results = []
color = ""
for i in data["graphqlCache"]:
if "ern:product" in i:
if "product" in data["graphqlCache"][i]["data"]:
if "name" in data["graphqlCache"][i]["data"]["product"]:
results.append(data["graphqlCache"][i]["data"]["product"])
if (
color == ""
and "color" in data["graphqlCache"][i]["data"]["product"]
):
color = data["graphqlCache"][i]["data"]["product"]["color"]["name"]
The following pulls out the sizes and availability from results:
{
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
For the first request only, the following gets the other shoes colours and their URIs into a dictionary to later loop:
colors = {
j["node"]["color"]["name"]: j["node"]["uri"]
for j in [
a
for b in [
i["family"]["products"]["edges"]
for i in results
if "family" in i
if "products" in i["family"]
]
for a in b
]
}
This bit gets all the other colours and their availability:
for k, v in colors.items():
if k not in final:
color, results = get_color_results(v)
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
Throughout, I update the dictionary final with the found colour and associated size and availabilities

Always check if an hidden api is available, it will save you a looooot of time.
In this case I found this api:
https://www.zalando.dk/api/graphql
You can pass a payload and you obtain a json answer
# I extracted the payload from the network tab of my browser debbuging tools
payload = """[{"id":"0ec65c3a62f6bd0b29a59f22021a44f42e6282b7f8ff930718a1dd5783b336fc","variables":{"id":"ern:product::NI112O0S7-H11"}},{"id":"0ec65c3a62f6bd0b29a59f22021a44f42e6282b7f8ff930718a1dd5783b336fc","variables":{"id":"ern:product::NI112O0RY-A11"}}]"""
conn = http.client.HTTPSConnection("www.zalando.dk")
headers = {
'content-type': "application/json"
}
conn.request("POST", "/api/graphql", payload, headers)
res = conn.getresponse()
res = res.read() # json output
res contains for each product a json leaf containing the available size:
"simples": [
{
"size": "38.5",
"sku": "NI112O0P5-A110060000"
},
{
"size": "44.5",
"sku": "NI112O0P5-A110105000"
},
{
...
It's now easy to extract the informations.
There also is a field that indicate if the product got a promotion or not, cool if you want to track a discount.

python Issue with getting correct info from the json

I have issue then i try get from correct information.
For example i have very big json output after request i made in post (i cant use get).
"offers": [
{
"rank": 1,
"provider": {
"id": 6653,
"isLocalProvider": false,
"logoUrl": "https://img.vxcdn.com/i/partner-energy/c_6653.png?v=878adaf9ed",
"userRatings": {
"additonalCustomerRatings": {
"price": {
"percent": 73.80
},
"service": {
"percent": 67.50
},
"switching": {
"percent": 76.37
},
"caption": {
"text": "Zusätzliche Kundenbewertungen"
}
},
I cant show it all because its very big.
Like you see "rank" 1 in this request exist 20 ranks with information like content , totalCost and i need pick them all. Like 6 rank content and totalCost, 8 rank content and totalCost.
So first off all in python i use code for getting what json data.
import requests
import json
url = "https://www.verivox.de/api/energy/offers/electricity/postCode/10555/custom?"
payload="{\"profile\":\"H0\",\"prepayment\":true,\"signupOnly\":true,\"includePackageTariffs\":true,\"includeTariffsWithDeposit\":true,\"includeNonCompliantTariffs\":true,\"bonusIncluded\":\"non-compliant\",\"maxResultsPerPage\":20,\"onlyProductsWithGoodCustomerRating\":false,\"benchmarkTariffId\":741122,\"benchmarkPermanentTariffId\":38,\"paolaLocationId\":\"71085\",\"includeEcoTariffs\":{\"includesNonEcoTariffs\":true},\"maxContractDuration\":240,\"maxContractProlongation\":240,\"usage\":{\"annualTotal\":3500,\"offPeakUsage\":0},\"priceGuarantee\":{\"minDurationInMonths\":0},\"maxTariffsPerProvider\":999,\"cancellationPeriod\":null,\"previewDisplayTime\":null,\"onlyRegionalTariffs\":false,\"sorting\":{\"criterion\":\"TotalCosts\",\"direction\":\"Ascending\"},\"includeSpecialBonusesInCalculation\":\"None\",\"totalCostViewMode\":1,\"ecoProductType\":0}"
headers = {
'Content-Type': 'application/json',
'Cookie': '__cfduid=d97a159bb287de284487ebdfa0fd097b41606303469; ASP.NET_SessionId=jfg3y20s31hclqywloocjamz; 0e3a873fd211409ead79e21fffd2d021=product=Power&ReturnToCalcLink=/power/&CustomErrorsEnabled=False&IsSignupWhiteLabelled=False; __RequestVerificationToken=vrxksNqu8CiEk9yV-_QHiinfCqmzyATcGg18dAqYXqR0L8HZNlvoHZSZienIAVQ60cB40aqfQOXFL9bsvJu7cFOcS2s1'
}
response = requests.request("POST", url, headers=headers, data=payload)
jsondata = response.json()
# print(response.text)
For it working fine, but then i try pick some data what i needed like i say before im getting
for Rankdata in str(jsondata['rank']):
KeyError: 'rank'
my code for this error.
dataRank = []
for Rankdata in str(jsondata['rank']):
dataRank.append({
'tariff':Rankdata['content'],
'cost': Rankdata['totalCost'],
'sumOfOneTimeBonuses': Rankdata['content'],
'savings': Rankdata['content']
})
Then i try do another way. Just get one or some data, but not working too.
data = response.json()
#print(data)
test = float((data['rank']['totalCost']['content']))
I know my code not perfect, but i first time deal with json what are so big and are so difficult. I will be very grateful if show my in my case example how i can pick rank 1 - rank 20 data and print it.
Thank you for your help.

If you look closely at the highest level in the json, you can see that the value for key offers is a list of dicts. You can therefore loop through it like this:
for offer in jsondata['offers']:
print(offer.get('rank'))
print(offer.get('provider').get('id'))
And the same goes for other keys in the offers.

add params to url in python

I would like to pass two parameters to my url(status code & parent id). The json response of the url request is such :
{
"page": 1,
"per_page": 10,
"total": 35,
"total_pages": 4,
"data": [
{
"id": 11,
"timestamp": 1565193225660,
"status": "RUNNING",
"operatingParams": {
"rotorSpeed": 2363,
"slack": 63.07,
"rootThreshold": 0
},
"asset": {
"id": 4,
"alias": "Secondary Rotor"
},
"parent": {
"id": 2,
"alias": "Main Rotor Shaft"
}
}]
I would like to know how to pass the two parameters in the url. Passing ?status=RUNNING gives the response of all the devices which have running as status (thats pretty straightforward).
For now I have tried this:
import requests
resp = requests.get('https://jsonmock.hackerrank.com/api/iot_devices/search?status=RUNNING')
q = resp.json()
print(q)
How should I pass in parentid=2, so it returns a response with devices which have their parent id=2.Thank you.

It's plainly documented under "Passing Parameters in URLs" in the Requests docs.
resp = requests.get(
'https://jsonmock.hackerrank.com/api/iot_devices/search',
params={
'status': 'RUNNING',
'parentid': 2,
},
)

To add a second get parameter, use the & separator :
import requests
resp = requests.get('https://jsonmock.hackerrank.com/api/iot_devices/search?status=RUNNING&parentid=2')
q = resp.json()
print(q)

If you want to send data via get request the process is straight forward note how different values are seperated with '&'.
url?name1=value1&name2=value2
If you are using Flask for backend then you can access these parameters like.
para1=request.args.get("name1")
para2=request.args.get("name2")
On the front end you can use ajax to send the request
var xhttp=new XMLHttpRequest();
var url="url?name1=value1&name2=value2"
xhttp.open("GET",url,true)
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200)
{
console.log(this.responseText);
}
};
xhttp.send();

How to traverse over JSON object

I am getting following response from an API, I want to extract Phone Number from this object in Python. How can I do that?
{
"ParsedResults": [
{
"TextOverlay": {
"Lines": [
{
"Words": [
{
"WordText": "+971555389583", //this field
"Left": 0,
"Top": 5,
"Height": 12,
"Width": 129
}
],
"MaxHeight": 12,
"MinTop": 5
}
],
"HasOverlay": true,
"Message": "Total lines: 1"
},
"TextOrientation": "0",
"FileParseExitCode": 1,
"ParsedText": "+971555389583 \r\n",
"ErrorMessage": "",
"ErrorDetails": ""
}
],
"OCRExitCode": 1,
"IsErroredOnProcessing": false,
"ProcessingTimeInMilliseconds": "308",
"SearchablePDFURL": "Searchable PDF not generated as it was not requested."**strong text**}

Store the API response to a variable. Let's call it response.
Now convert the JSON string to a Python dictionary using the json module.
import json
response_dict = json.loads(response)
Now traverse through the response_dict to get the required text.
phone_number = response_dict["ParsedResults"][0]["TextOverlay"]["Lines"][0]["Words"][0]["WordText"]
Wherever the dictionary value is an array, [0] is used to access the first element of the array. If you want to access all elements of the array, you will have to loop through the array.

You have to parse the resulting stirng into a dictionary using the library json, afterwards you can traverse the result by looping over the json structure like this:
import json
raw_output = '{"ParsedResults": [ { "Tex...' # your api response
json_output = json.loads(raw_output)
# iterate over all lists
phone_numbers = []
for parsed_result in json_output["ParsedResults"]:
for line in parsed_result["TextOverlay"]["Lines"]:
# now add all phone numbers in "Words"
phone_numbers.extend([word["WordText"] for word in line["Words"]])
print(phone_numbers)
You may want to check if all keys exist within that process, depending on the API you use, like
# ...
for line in parsed_result["TextOverlay"]["Lines"]:
if "Words" in line: # make sure key exists
phone_numbers.extend([word["WordText"] for word in line["Words"]])
# ...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get json data from multiple api pages into one main json output - python

Related

Python Taking API response and adding to JSON Key error

How to requests all sizes in stock - Python

python Issue with getting correct info from the json

add params to url in python

How to traverse over JSON object

Categories

Resources