How to print json info with python? - python

I have a json (url = http://open.data.amsterdam.nl/ivv/parkeren/locaties.json) and I want to print all 'title', 'adres', 'postcode'. How can I do that?
I want to print it like this:
title.
adres.
postcode.
title.
adres.
postcode.
so among themselves
I hope you can help me with this
import urllib, json
url = "http://open.data.amsterdam.nl/ivv/parkeren/locaties.json"
import requests
search = requests.get(url).json()
print(search['title'])
print(search['adres'])
print(search['postcode'])

Using print(json.dumps(r, indent=4)) you can see that the structure is
{
"parkeerlocaties": [
{
"parkeerlocatie": {
"title": "Fietsenstalling Tolhuisplein",
"Locatie": "{\"type\":\"Point\",\"coordinates\":[4.9032801,52.3824545]}",
...
}
},
{
"parkeerlocatie": {
"title": "Fietsenstalling Paradiso",
"Locatie": "{\"type\":\"Point\",\"coordinates\":[4.8833735,52.3621851]}",
...
}
},
So to access the inner properties, you need to follow the JSON path
import requests
url = ' http://open.data.amsterdam.nl/ivv/parkeren/locaties.json'
search = requests.get(url).json()
for parkeerlocatie in search["parkeerlocaties"]:
content = parkeerlocatie['parkeerlocatie']
print(content['title'])
print(content['adres'])
print(content['postcode'])
print()

Related

Get specific information out of application/ld+json

I am looking to get just the "ratingValue" and "reviewCount" from the following application/ld+json but cannot figure out how to do this after looking through numerous How-to's, so I've essentially given up. In advance thank you for your help.
Sample of the application/ld+json
{
"#context": "http://schema.org",
"#graph": [
{
"#type": "Product",
"name": "MERV 8 Replacement for Trion Air Bear 20x20x5 (19.63x20.13x4.88) ",
"description": "MERV 8 Replacement for Trion Air Bear 20x20x5 (19.63x20.13x4.88) - FilterBuy.com",
"productID": 30100,
"sku": "ABR20x20x5M8",
"mpn": "ABR20x20x5M8",
"url": "https://filterbuy.com/brand/trion-air-bear-air-filters/20x20x5-air-bear-20x20/merv-8/",
"itemCondition": "new",
"brand": "FilterBuy",
"image": "https://filterbuy.com/media/pla_images/20x25x5AB/20x25x5AB-m8-(x1).jpg",
"aggregateRating": {
"#type": "AggregateRating",
**"ratingValue": 4.79926,
"reviewCount": 538**}
My Code:
from bs4 import BeautifulSoup
import bs4
import requests
import json
import re
import numpy as np
import csv
urls = ['https://filterbuy.com/brand/trion-air-bear-air-filters/20x20x5-air-bear-20x20',
'https://filterbuy.com/brand/trion-air-bear-air-filters/16x25x5-air-bear-1400/?selected_merv=11']
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")
mervs = BeautifulSoup(response.text, 'lxml').find_all('strong')
product = BeautifulSoup(response.text, 'lxml').find("h1", class_="text-center")
jsonString = soup.find_all('script', type='application/ld+json')[1].text
json_schema = soup.find_all('script', attrs={'type': 'application/ld+json'})[1]
json_file = json.loads(json_schema.get_text())
for i, cart in enumerate(BeautifulSoup(response.text, 'lxml').find_all('form', class_='cart')):
for tax in cart.attrs:
if 'data-price' in tax:
print(product.text, mervs[i].get_text(), [tax], cart[tax], json_file)
In python, pretty much anything can be nested inside other things. This is an example of nesting lists and dictionaries inside a dictionary. You can go about getting the value by thinking about what you need to do on each level.
Start by assigning the above dictionary to a variable, like the_dict. You want to access the "#graph" key, then access the first item in the list it returns, then access "aggregateRating". From there, you can get both the values you want. Your code may look something like this:
the_dict = ...
d = the_dict['#graph'][0][aggregateRating']
rating_value, review_count = d['ratingValue'], d['reviewCount']

Different content when accessing a website with requests

I am trying to get corresponding handle ids in ARIN automatically using a companies' name, like "Google".
https://search.arin.net/rdap/?query=google*
My naive approach is to use requests and BeautifulSoup:
import requests
from bs4 import BeautifulSoup
html = 'https://search.arin.net/rdap/?query='
comp = 'google*'
r = requests.get(html + comp)
soup = BeautifulSoup(r.text, 'html.parser')
#example search
search = soup.body.find_all(text = "Handle$")
However, I do not get the same output when I am using requests as when I simply use Google Chrome. The html code that is returned by requests is different and I cannot access the corresponding handles.
Does anyone know how to change the code?
The data you see on the page is loaded from external API URL. You can use requests module to simulate it:
import json
import requests
api_url = "https://rdap.arin.net/registry/entities"
params = {"fn": "google*"}
data = requests.get(api_url, params=params).json()
# pretty print the data:
print(json.dumps(data, indent=4))
Prints:
...
{
"handle": "GF-231",
"vcardArray": [
"vcard",
[
[
"version",
{},
"text",
"4.0"
],
[
"fn",
{},
"text",
"GOOGLE FIBER INC"
],
[
"adr",
{
"label": "3425 MALONE DR\nCHAMBLEE\nGA\n30341\nUnited States"
},
"text",
[
"",
"",
"",
"",
"",
"",
""
]
],
[
"kind",
{},
"text",
"org"
]
]
],
...

How to get text within <script> tag

I am scraping the LaneBryant website.
Part of the source code is
<script type="application/ld+json">
{
"#context": "http://schema.org/",
"#type": "Product",
"name": "Flip Sequin Teach & Inspire Graphic Tee",
"image": [
"http://lanebryant.scene7.com/is/image/lanebryantProdATG/356861_0000015477",
"http://lanebryant.scene7.com/is/image/lanebryantProdATG/356861_0000015477_Back"
],
"description": "Get inspired with [...]",
"brand": "Lane Bryant",
"sku": "356861",
"offers": {
"#type": "Offer",
"url": "https://www.lanebryant.com/flip-sequin-teach-inspire-graphic-tee/prd-356861",
"priceCurrency": "USD",
"price":"44.95",
"availability": "http://schema.org/InStock",
"itemCondition": "https://schema.org/NewCondition"
}
}
}
}
</script>
In order to get price in USD, I have written this script:
def getPrice(self,start):
fprice=[]
discount = ""
price1 = start.find('script', {'type': 'application/ld+json'})
data = ""
#print("price 1 is + "+ str(price1)+"data is "+str(data))
price1 = str(price1).split(",")
#price1=str(price1).split(":")
print("final price +"+ str(price1[11]))
where start is :
d = webdriver.Chrome('/Users/fatima.arshad/Downloads/chromedriver')
d.get(url)
start = BeautifulSoup(d.page_source, 'html.parser')
It doesn't print the price even though I am getting correct text. How do I get just the price?
In this instance you can just regex for the price
import requests, re
r = requests.get('https://www.lanebryant.com/flip-sequin-teach-inspire-graphic-tee/prd-356861#color/0000015477', headers = {'User-Agent':'Mozilla/5.0'})
p = re.compile(r'"price":"(.*?)"')
print(p.findall(r.text)[0])
Otherwise, target the appropriate script tag by id and then parse the .text with json library
import requests, json
from bs4 import BeautifulSoup
r = requests.get('https://www.lanebryant.com/flip-sequin-teach-inspire-graphic-tee/prd-356861#color/0000015477', headers = {'User-Agent':'Mozilla/5.0'})
start = BeautifulSoup(r.text, 'html.parser')
data = json.loads(start.select_one('#pdpInitialData').text)
price = data['pdpDetail']['product'][0]['price_range']['sale_price']
print(price)
price1 = start.find('script', {'type': 'application/ld+json'})
This is actually the <script> tag, so a better name would be
script_tag = start.find('script', {'type': 'application/ld+json'})
You can access the text inside the script tag using .text. That will give you the JSON in this case.
json_string = script_tag.text
Instead of splitting by commas, use a JSON parser to avoid misinterpretations:
import json
clothing=json.loads(json_string)

parsing and getting list from response of get request

I'm trying to parse a website with the requests module:
import requests
some_data = {'a':'',
'b':''}
with requests.Session() as s:
result = s.post('http://website.com',data=some_data)
print(result.text)
The page is responding as below:
{
"arrangetype":"U",
"list": [
{
"product_no":43,
"display_order":4,
"is_selling":"T",
"product_empty":"F",
"fix_position":null,
"is_auto_sort":false
},
{
"product_no":44,
"display_order":6,
"is_selling":"T",
"product_empty":"F",
"fix_position":null,
"is_auto_sort":false
}
],
"length":2
}
I found that instead of parsing full HTML, it would be better to deal with the response as all the data I want is in that response.
What I want to get is a list of the values of product_no, so the expected result is:
[43,44]
How do I do this?
Convert your JSON response to a dictionary with json.loads(), and collect your results in a list comprehension.
Demo:
from json import loads
data = """{
"arrangetype":"U",
"list": [
{
"product_no":43,
"display_order":4,
"is_selling":"T",
"product_empty":"F",
"fix_position":null,
"is_auto_sort":false
},
{
"product_no":44,
"display_order":6,
"is_selling":"T",
"product_empty":"F",
"fix_position":null,
"is_auto_sort":false
}
],
"length":2
}"""
json_dict = loads(data)
print([x['product_no'] for x in json_dict['list']])
# [43, 44]
Full Code:
import requests
from json import loads
some_data = {'a':'',
'b':''}
with requests.Session() as s:
result = s.post('http://website.com',data=some_data)
json_dict = loads(result.text)
print([x["product_no"] for x in json_dict["list"]])

Read a JSON from a URL using Python 3

I need an help on reading a JSON from a URL, which has the below JSON in it:
{
"totalItems":2,
"#href":"/classes/dsxplan:Program",
"#id":"dsxplan:Program",
"#mask":"dsplan:MVMask.WorkPackage.Complex",
"#type":"Collection",
"#code":200,
"#context":{
"dsxplan":"xplan",
"dsplan":"plan",
"dspol":"pol",
"image":{
"#id":"dspol:image",
"#type":"#id"
},
"dskern":"kern"
},
"member":[
{
"dsplan:actualType":{
"#href":"/resources/dsxplan:Program",
"#id":"dsxplan:Program",
"#mask":"dskern:Mask.Default",
"image":"iconProgram.png"
},
"dskern:owner":{
"#href":"/resources/dskern:Person.Creator",
"#id":"dskern:Person.Creator",
"#mask":"dskern:MVMask.Person.Complex",
"dsplan:actualType":{
"#href":"/resources/foaf:Person",
"#id":"foaf:Person",
"#mask":"dskern:Mask.Default"
}
},
"dspol:modificationDate":"2017-09-08T17:54:36.786Z",
"#href":"/resources/dsxplan:DSLCProgram.R-399",
"#id":"dsxplan:DSLCProgram.R-399",
"#mask":"dsplan:MVMask.WorkPackage.Complex",
"#etag":"7412df19-1dde-4245-b40b-5dd86dbbe3f1"
},
{
"dsplan:actualType":{
"#href":"/resources/dsxplan:Program",
"#id":"dsxplan:Program",
"#mask":"dskern:Mask.Default",
"image":"iconProgram.png"
},
"dskern:owner":{
"#href":"/resources/dskern:Person.Creator",
"#id":"dskern:Person.Creator",
"#mask":"dskern:MVMask.Person.Complex",
"dsplan:actualType":{
"#href":"/resources/foaf:Person",
"#id":"foaf:Person",
"#mask":"dskern:Mask.Default"
}
},
"dspol:modificationDate":"2017-09-08T17:54:36.786Z",
"#href":"/resources/dsxplan:xComModel2017program.R-394",
"#id":"dsxplan:xComModel2017program.R-394",
"#mask":"dsplan:MVMask.WorkPackage.Complex",
"#etag":"7412df19-1dde-4245-b40b-5dd86dbbe3f1"
}
]
}
I just need to read this json from a link provided. I tried the below code:
import urllib.request
request= urllib.request.Request("https://dummy_link")
response = urllib.request.urlopen(request)
input = (response.read().decode('utf-8'))
json.loads(input)
This code throws me this error:
"JSONDecodeError: Expecting value: line 9 column 1 (char 12)"
Could you please help me get this right? I really appreciate the help.!!
You could use Requests library which is more simple than urllib:
For instance:
import requests
r = requests.get('https://dummy_link')
obj = r.json()
EDIT
If you want to use urllib, you can do as below:
import urllib.request
import json
with urllib.request.urlopen("https://dummy_link") as f:
content = f.read()
obj = json.loads(content)
There is no need to convert the binary content to unicode string.
There is an urllib howto in the official documentation.

Categories