How to requests all sizes in stock - Python - python

I'm trying to request all the sizes in stock from Zalando. I can not quite figure out how to do it since the video I'm watching
showing how to request sizes look different than min.
The video that I watch was this. Video - 5.30
Does anyone know how to request the sizes in stock and print the sizes that in stock?
The site in trying to request sizes of: here
My code looks like this:
import requests
from bs4 import BeautifulSoup as bs
session = requests.session()
def get_sizes_in_stock():
global session
endpoint = "https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html"
response = session.get(endpoint)
soup = bs(response.text, "html.parser")
I have tried to go to the View page source and look for the sizes, but I could not see the sizes in the page source.
I hope someone out there can help me what to do.

The sizes are in the page
I found them in the html, in a javascript tag, in the format
{
"sku": "NI112O0BT-A110090000",
"size": "42.5",
"deliveryOptions": [
{
"deliveryTenderType": "FASTER"
}
],
"offer": {
"price": {
"promotional": null,
"original": {
"amount": 114500
},
"previous": null,
"displayMode": null
},
"merchant": {
"id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
},
"selectionContext": null,
"isMeaningfulOffer": true,
"displayFlags": [],
"stock": {
"quantity": "MANY"
},
"sku": "NI112O0BT-A110090000",
"size": "42.5",
"deliveryOptions": [
{
"deliveryTenderType": "FASTER"
}
],
"offer": {
"price": {
"promotional": null,
"original": {
"amount": 114500
},
"previous": null,
"displayMode": null
},
"merchant": {
"id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
},
"selectionContext": null,
"isMeaningfulOffer": true,
"displayFlags": [],
"stock": {
"quantity": "MANY"
}
},
"allOffers": [
{
"price": {
"promotional": null,
"original": {
"amount": 114500
},
"previous": null,
"displayMode": null
},
"merchant": {
"id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
},
"selectionContext": null,
"isMeaningfulOffer": true,
"displayFlags": [],
"stock": {
"quantity": "MANY"
},
"deliveryOptions": [
{
"deliveryWindow": "2022-05-23 - 2022-05-25"
}
],
"fulfillment": {
"kind": "ZALANDO"
}
}
]
}
}
If you parse the html with bs4 you should be able to find the script tag and extract the JSON.

The sizes for the default color of shoe are shown in html. Alongside this are the urls for the other colors. You can extract these into a dictionary and loop, making requests and pulling the different colors and their availability, which I think is what you are actually requesting, as follows (note: I have kept quite generic to avoid hardcoding keys which change across requests):
import requests, re, json
def get_color_results(link):
headers = {"User-Agent": "Mozilla/5.0"}
r = requests.get(link, headers=headers).text
data = json.loads(re.search(r'(\{"enrichedEntity".*size.*)<\/script', r).group(1))
results = []
color = ""
for i in data["graphqlCache"]:
if "ern:product" in i:
if "product" in data["graphqlCache"][i]["data"]:
if "name" in data["graphqlCache"][i]["data"]["product"]:
results.append(data["graphqlCache"][i]["data"]["product"])
if (
color == ""
and "color" in data["graphqlCache"][i]["data"]["product"]
):
color = data["graphqlCache"][i]["data"]["product"]["color"]["name"]
return (color, results)
link = "https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html"
final = {}
color, results = get_color_results(link)
colors = {
j["node"]["color"]["name"]: j["node"]["uri"]
for j in [
a
for b in [
i["family"]["products"]["edges"]
for i in results
if "family" in i
if "products" in i["family"]
]
for a in b
]
}
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
for k, v in colors.items():
if k not in final:
color, results = get_color_results(v)
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
print(final)
Explanatory notes from chat:
Use chrome browser to navigate to link
Press Ctrl + U to view page source
Press Ctrl + F to search for 38.5 in html
The first match is the long string you already know about. The string is long and difficult to navigate in page source and identify which tag it is part of. There are a number of ways I could identify the right script from these, but for now, an easy way would be:
from bs4 import BeautifulSoup as bs
link = 'https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html'
headers = {'User-Agent':'Mozilla/5.0'}
r = requests.get(link, headers = headers)
soup = bs(r.text, 'lxml')
for i in soup.select('script[type="application/json"]'):
if '38.5' in i.text:
print(i)
break
Slower method would be:
soup.find("script", text=re.compile(r'.*38.5.*'))
Whilst I used bs4 to get the right script tag contents, this was so I knew the start and end of the string denoting the JavaScript object I wanted to use re to extract, and then to deserialize into a JSON object with json; this in a re-write to use re rather than bs4 i.e. use re on entire response text, from the request, and pass a regex pattern which would pull out the same string
I put the entire page source in a regex tool and wrote a regex to return that same string as identified above. See that regex here
Click on right hand side, match 1 group 1, to see highlighted the same string being returned from regex as you saw with BeautifulSoup. Two different ways of getting the same string containing the sizes
That is the string which I needed to examine, as JSON, the structure of. See in json viewer here
You will notice the JSON is very nested with some keys to dictionaries that are likely dynamic, meaning I needed to write code which could traverse the JSON and use certain more stable keys to pull out the colours available, and for the default shoe colour the sizes and availability
There is an expand all button in that JSON viewer. You can then search with Ctrl + F for 38.5 again
10a) I noticed that size and availability were for the default shoe colour
10b) I also noticed that within JSON, if I searched by one of the other colours from the dropdown, I could find URIs for each colour of show listed
I used Wolf as my search term (as I suspected less matches for that term within the JSON)
You can see one of the alternate colours and its URI listed above
I visited that URI and found the availability and shoe sizes for that colour in same place as I did for the default white shoes
I realised I could make an initial request and get the default colour and sizes with availability. From that same request, extract the other colours and their URIs
I could then make requests to those other URIs and re-use my existing code to extract the sizes/availability for the new colours
This is why I created my get_color_results() function. This was the re-usable code to extract the sizes and availability from each page
results holds all the matches within the JSON to certain keys I am looking for to navigate to the right place to get the sizes and availabilities, as well as the current colour
This code traverses the JSON to get to the right place to extract data I want to use later
results = []
color = ""
for i in data["graphqlCache"]:
if "ern:product" in i:
if "product" in data["graphqlCache"][i]["data"]:
if "name" in data["graphqlCache"][i]["data"]["product"]:
results.append(data["graphqlCache"][i]["data"]["product"])
if (
color == ""
and "color" in data["graphqlCache"][i]["data"]["product"]
):
color = data["graphqlCache"][i]["data"]["product"]["color"]["name"]
The following pulls out the sizes and availability from results:
{
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
For the first request only, the following gets the other shoes colours and their URIs into a dictionary to later loop:
colors = {
j["node"]["color"]["name"]: j["node"]["uri"]
for j in [
a
for b in [
i["family"]["products"]["edges"]
for i in results
if "family" in i
if "products" in i["family"]
]
for a in b
]
}
This bit gets all the other colours and their availability:
for k, v in colors.items():
if k not in final:
color, results = get_color_results(v)
final[color] = {
j["size"]: j["offer"]["stock"]["quantity"]
for j in [i for i in results if "simples" in i][0]["simples"]
}
Throughout, I update the dictionary final with the found colour and associated size and availabilities

Always check if an hidden api is available, it will save you a looooot of time.
In this case I found this api:
https://www.zalando.dk/api/graphql
You can pass a payload and you obtain a json answer
# I extracted the payload from the network tab of my browser debbuging tools
payload = """[{"id":"0ec65c3a62f6bd0b29a59f22021a44f42e6282b7f8ff930718a1dd5783b336fc","variables":{"id":"ern:product::NI112O0S7-H11"}},{"id":"0ec65c3a62f6bd0b29a59f22021a44f42e6282b7f8ff930718a1dd5783b336fc","variables":{"id":"ern:product::NI112O0RY-A11"}}]"""
conn = http.client.HTTPSConnection("www.zalando.dk")
headers = {
'content-type': "application/json"
}
conn.request("POST", "/api/graphql", payload, headers)
res = conn.getresponse()
res = res.read() # json output
res contains for each product a json leaf containing the available size:
"simples": [
{
"size": "38.5",
"sku": "NI112O0P5-A110060000"
},
{
"size": "44.5",
"sku": "NI112O0P5-A110105000"
},
{
...
It's now easy to extract the informations.
There also is a field that indicate if the product got a promotion or not, cool if you want to track a discount.

Related

Trying to get specific values from json

Can I get specific values from the json?
I'm trying to get the id, but I don't know how exactly to do it.
This is the json:
{
"searchType": "Movie",
"expression": "Lord Of The Rings 2022.json",
"results": [{
"id": "tt18368278",
"resultType": "Title",
"image": "https://m.media-amazon.com/images/M/MV5BZTMwZjUzNGMtMWI3My00ZGJmLWFmYWEtYjk2YWYxYzI2NWRjXkEyXkFqcGdeQXVyODY0NzcxNw##._V1_Ratio1.7600_AL_.jpg",
"title": "The Lord of the Rings Superfans Review the Rings of Power",
"description": "(2022 Video)"
}],
"errorMessage": ""
}
I just want to get the values of result, but I want get specific values, for example the id.
This is my code:
import requests
import json
movie = input("Movies:")
base_url = ("https://imdb-api.com/en/API/SearchMovie/myapi/"+movie)
r = (requests.get(base_url))
b = r.json()
print(b['results'])
Considering your json valid, and to accommodate more than one result, you could do:
[...]
r = (requests.get(base_url))
b = r.json()
for result in b['results']:
print(result['id'])
To get just one item (first item from array), you can do:
print(b['results'][0]['id'])
Requests documentation: https://requests.readthedocs.io/en/latest/

Get json data from multiple api pages into one main json output

I'm trying to get the json data from every page on an API and put that into one big json output.
(Docs for API i'm using: https://docs.scoresaber.com/#/Leaderboards/get_api_leaderboards)
When doing the following API call:
https://scoresaber.com/api/leaderboards?qualified=true&withMetadata=true
i get the object metadata which has total and itemsPerPage
Example:
"metadata": {
"total": 193,
"page": 1,
"itemsPerPage": 14
}
So 193/14 means i get 14 pages.
This means i can iterate through all pages by doing a request for each page with this API call: https://scoresaber.com/api/leaderboards?qualified=true&page=2
until i get to &page=4
Each page will result this json (trimmed example):
{
"leaderboards": [
{
"id": 466447,
"songHash": "E527C82AF2DEC46A23F12D742035D76CCA875904",
"songName": "Parasite",
"songSubName": "(feat. Hatsune Miku)",
"songAuthorName": "DECO*27",
"levelAuthorName": "Alice",
"difficulty": {
"leaderboardId": 466447,
"difficulty": 1,
"gameMode": "SoloStandard",
"difficultyRaw": "_Easy_SoloStandard"
},
"maxScore": 0,
"createdDate": "2022-06-01T17:16:52.000Z",
"rankedDate": null,
"qualifiedDate": "2022-06-14T05:53:21.000Z",
"lovedDate": null,
"ranked": false,
"qualified": true,
"loved": false,
"maxPP": -1,
"stars": 0,
"plays": 70,
"dailyPlays": 0,
"positiveModifiers": false,
"playerScore": null,
"coverImage": "https://cdn.scoresaber.com/covers/E527C82AF2DEC46A23F12D742035D76CCA875904.png",
"difficulties": null
},
],
"metadata": {
"total": 193,
"page": 2,
"itemsPerPage": 14
}
}
So what i want is to loop through all the pages and have every item in leaderboards into one json.
This is what I've tried:
import requests
import math
import json
response = requests.get("https://scoresaber.com/api/leaderboards?qualified=true&withMetadata=true")
api = json.loads(response.text)
pages = math.ceil(api['metadata']['total'] / api['metadata']['itemsPerPage'])
api = {}
for page in range(1, pages+1):
api.update(json.loads(requests.get(f"https://scoresaber.com/api/leaderboards?qualified=true&page={page}").text))
api = json.dumps(api, indent=4)
But that seems to only get the last page and just overwrite the dictionary (i'm also not sure if i need to declare api as a dict.
So I'm just not sure what is going wrong, if im declaring stuff wrongly, if im requesting the api wrongly, or if im putting stuff wrongly into the dict, etc.
If I understand you correctly you want to receive all data to one big list:
import json
import math
import requests
url1 = (
"https://scoresaber.com/api/leaderboards?qualified=true&withMetadata=true"
)
url2 = "https://scoresaber.com/api/leaderboards?qualified=true&page={}"
api = requests.get(url1).json()
pages = math.ceil(api["metadata"]["total"] / api["metadata"]["itemsPerPage"])
all_data = []
for page in range(1, pages + 1):
data = requests.get(url2.format(page)).json()
all_data.extend(data["leaderboards"])
print(json.dumps(all_data, indent=4))
This will print all 193 items from all pages:
[
{
"id": 484864,
"songHash": "80559A7A4AC0F62F27DAF1C59DF67F305250ADFF",
"songName": "Phony",
"songSubName": "feat. KAFU (Hoshimachi Suisei Cover)",
"songAuthorName": "Tsumiki",
"levelAuthorName": "Joshabi & Shad",
...

python Issue with getting correct info from the json

I have issue then i try get from correct information.
For example i have very big json output after request i made in post (i cant use get).
"offers": [
{
"rank": 1,
"provider": {
"id": 6653,
"isLocalProvider": false,
"logoUrl": "https://img.vxcdn.com/i/partner-energy/c_6653.png?v=878adaf9ed",
"userRatings": {
"additonalCustomerRatings": {
"price": {
"percent": 73.80
},
"service": {
"percent": 67.50
},
"switching": {
"percent": 76.37
},
"caption": {
"text": "Zusätzliche Kundenbewertungen"
}
},
I cant show it all because its very big.
Like you see "rank" 1 in this request exist 20 ranks with information like content , totalCost and i need pick them all. Like 6 rank content and totalCost, 8 rank content and totalCost.
So first off all in python i use code for getting what json data.
import requests
import json
url = "https://www.verivox.de/api/energy/offers/electricity/postCode/10555/custom?"
payload="{\"profile\":\"H0\",\"prepayment\":true,\"signupOnly\":true,\"includePackageTariffs\":true,\"includeTariffsWithDeposit\":true,\"includeNonCompliantTariffs\":true,\"bonusIncluded\":\"non-compliant\",\"maxResultsPerPage\":20,\"onlyProductsWithGoodCustomerRating\":false,\"benchmarkTariffId\":741122,\"benchmarkPermanentTariffId\":38,\"paolaLocationId\":\"71085\",\"includeEcoTariffs\":{\"includesNonEcoTariffs\":true},\"maxContractDuration\":240,\"maxContractProlongation\":240,\"usage\":{\"annualTotal\":3500,\"offPeakUsage\":0},\"priceGuarantee\":{\"minDurationInMonths\":0},\"maxTariffsPerProvider\":999,\"cancellationPeriod\":null,\"previewDisplayTime\":null,\"onlyRegionalTariffs\":false,\"sorting\":{\"criterion\":\"TotalCosts\",\"direction\":\"Ascending\"},\"includeSpecialBonusesInCalculation\":\"None\",\"totalCostViewMode\":1,\"ecoProductType\":0}"
headers = {
'Content-Type': 'application/json',
'Cookie': '__cfduid=d97a159bb287de284487ebdfa0fd097b41606303469; ASP.NET_SessionId=jfg3y20s31hclqywloocjamz; 0e3a873fd211409ead79e21fffd2d021=product=Power&ReturnToCalcLink=/power/&CustomErrorsEnabled=False&IsSignupWhiteLabelled=False; __RequestVerificationToken=vrxksNqu8CiEk9yV-_QHiinfCqmzyATcGg18dAqYXqR0L8HZNlvoHZSZienIAVQ60cB40aqfQOXFL9bsvJu7cFOcS2s1'
}
response = requests.request("POST", url, headers=headers, data=payload)
jsondata = response.json()
# print(response.text)
For it working fine, but then i try pick some data what i needed like i say before im getting
for Rankdata in str(jsondata['rank']):
KeyError: 'rank'
my code for this error.
dataRank = []
for Rankdata in str(jsondata['rank']):
dataRank.append({
'tariff':Rankdata['content'],
'cost': Rankdata['totalCost'],
'sumOfOneTimeBonuses': Rankdata['content'],
'savings': Rankdata['content']
})
Then i try do another way. Just get one or some data, but not working too.
data = response.json()
#print(data)
test = float((data['rank']['totalCost']['content']))
I know my code not perfect, but i first time deal with json what are so big and are so difficult. I will be very grateful if show my in my case example how i can pick rank 1 - rank 20 data and print it.
Thank you for your help.
If you look closely at the highest level in the json, you can see that the value for key offers is a list of dicts. You can therefore loop through it like this:
for offer in jsondata['offers']:
print(offer.get('rank'))
print(offer.get('provider').get('id'))
And the same goes for other keys in the offers.

For looping several thousands of Url apis and adding it to a list

Problem: The output of this code seems to be repeating alot of the same entries in the final list, thus making it exponentially longer.
The goal would be complete the query and the print the final list with all city within the region
[
{
"name": "Herat",
"id": "AF~HER~Herat"
}
]
[
{
"name": "Herat",
"id": "AF~HER~Herat"
},
{
"name": "Kabul",
"id": "AF~KAB~Kabul"
}
]
[
{
"name": "Herat",
"id": "AF~HER~Herat"
},
{
"name": "Kabul",
"id": "AF~KAB~Kabul"
},
{
"name": "Kandahar",
"id": "AF~KAN~Kandahar"
}
]
My goal is to to a get a list with cityID. I first to a GET request and parse the JSON response to get the country IDs to a list,
Second: I have a for loop, which will make another GET request for the region id, but i now need to add the country IDs to the api url. I do that by adding .format on the GET request. and iterate trough all the countries and there respective region IDs, i parse them and store them in a list.
Third: i have another for loop, which will make another GET request for the cityID that will loop trough all cities with the above Region ID list, and the respectively collect the cityID that i really need.
Code :
from requests.auth import HTTPBasicAuth
import requests
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def countries():
data = requests.get("https://localhost/api/netim/v1/countries/", verify=False, auth=HTTPBasicAuth("admin", "admin"))
rep = data.json()
a = []
for elem in rep['items']:
a.extend([elem.get("id","")])
print(a)
return a
def regions():
ids = []
for c in countries():
url = requests.get("https://localhost/api/netim/v1/countries/{}/regions".format(c), verify=False, auth=HTTPBasicAuth("admin", "admin"))
response = url.json()
for cid in response['items']:
ids.extend([cid.get("id","")])
data = []
for r in ids:
url = requests.get("https://localhost/api/netim/v1/regions/{}/cities".format(r), verify=False, auth=HTTPBasicAuth("admin", "admin"))
response = url.json()
data.extend([{"name":r.get("name",""),"id":r.get("id", "")} for r in response['items']])
print(json.dumps(data, indent=4))
return data
regions()
print(regions())
You will see thou output contains several copies of the same entry.
Not a programmer, not sure where am i getting it wrong
It looks as though the output you're concerned with might be due to the fact that you're printing data as you iterate through it in the regions() method.
Try to remove the line:
print(json.dumps(data, indent=4))?
Also, and more importantly - you're setting data to an empty list every time you iterate on an item in Countries. You should probably declare that variable before the initial loop.
You're already printing the final result when you call the function. So printing as you iterate only really makes sense if you're debugging & needing to review the data as you go through it.

Issue Parsing a site with lxml and xpath in python

I think I am messing up my xpath. What I am trying to do is get the information of each row on the table in this page.
This is what I have so far but its not outputting what I'm looking for.
import requests
from lxml import etree
r = requests.get('http://mtgoclanteam.com/Cards?edition=DTK')
doc = etree.HTML(r.text)
#get list of cards
cards = [card for card in doc.xpath('id("cardtable")/x:tbody/x:tr[1]/x:td[3]')]
for card in cards:
print card
The primary problem here is that the actual document served from the server contains an empty table:
<table id="cardtable" class="cardlist"/>
The data is filled in after the page loads by the embedded javascript that follows the empty table element:
<script>
$('#cardtable').dataTable({
"aLengthMenu": [[25, 100, -1], [25, 100, "All"]],
"bDeferRender": true,
"aaSorting": [],
"bPaginate": false,
"aaData": [
...DATA IS HERE...
],
"aoColumns": [
{ "sTitle": "Card name", "sWidth": "260" },
{ "sTitle": "Rarity", "sWidth": "40" },
{ "sTitle": "Buy", "sWidth": "80" },
{ "sTitle": "Sell", "sWidth": "80" },
{ "sTitle": "Bots with stock" }]
})
</script>
The data itself is contained the aaData element of the dictionary
that is passed to the dataTable() method. Extracting this in Python
is going to be tricky (this isn't just a JSON document). Possibly a
suitable regular expression applied to the script text would get you
what you want (or just iterate over the lines of the script and take the one after the aaData key).
For example:
import pprint
import json
import requests
from lxml import etree
r = requests.get('http://mtgoclanteam.com/Cards?edition=DTK')
doc = etree.HTML(r.text)
script = doc.xpath('id("templatemo_content")/script')[0].text
found = False
result = None
for line in script.splitlines():
if found:
if '[' in line:
result=line
break
if 'aaData' in line:
found = True
if result:
result =json.loads('[' + result + ']')
pprint.pprint(result)
This is ugly and fragile (it would break if the format of the script
changed), but it works for the current input.

Categories