How to parse a HTML response as json format using python?

How to parse a HTML response as json format using python? - python

I used python2 to make a request to RNAcentral database, and I read the response as JSON format by the use of this command: response.json().
This let me read the data as a dictionary data type, so I used the corresponding syntax to obtain the data from cross references, which contained some links to other databases, but when I try to make the request for each link using the command mentioned above, I can't read it as JSON, because I can only obtain the response content as HTML.
So I need to know how to read make a request to each link from cross references and read it as JSON using python language.
Here is the code:
direcc = 'http://rnacentral.org/api/v1/rna/'+code+'/?flat=true.json'
resp = requests.get(direcc)
datos=resp.json()
d={}
links = []
for diccionario in datos['xrefs']['results']:
if diccionario['taxid']==9606:
base_datos=diccionario['database']
for llave,valor in diccionario['accession'].iteritems():
d[base_datos]={'url':diccionario['accession']['url'],
'expert_db_url':diccionario['accession']['expert_db_url'],
'source_url':diccionario['accession']['source_url']}
for key,value in d.iteritems():
links.append(d[key]['expert_db_url'])
for item in links:
response = requests.get(item)
r = response.json()
And this is the error I get: ValueError: No JSON object could be decoded.
Thank you.

Related

How to get value from JSON response using Python [duplicate]

This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
What are the differences between the urllib, urllib2, urllib3 and requests module?
(11 answers)
Closed last month.
I want to dynamically query Google Maps through the Google Directions API. As an example, this request calculates the route from Chicago, IL to Los Angeles, CA via two waypoints in Joplin, MO and Oklahoma City, OK:
http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false
It returns a result in the JSON format.
How can I do this in Python? I want to send such a request, receive the result and parse it.

I recommend using the awesome requests library:
import requests
url = 'http://maps.googleapis.com/maps/api/directions/json'
params = dict(
origin='Chicago,IL',
destination='Los+Angeles,CA',
waypoints='Joplin,MO|Oklahoma+City,OK',
sensor='false'
)
resp = requests.get(url=url, params=params)
data = resp.json() # Check the JSON Response Content documentation below
JSON Response Content: https://requests.readthedocs.io/en/master/user/quickstart/#json-response-content

The requests Python module takes care of both retrieving JSON data and decoding it, due to its builtin JSON decoder. Here is an example taken from the module's documentation:
>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
>>> r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...
So there is no use of having to use some separate module for decoding JSON.

requests has built-in .json() method
import requests
requests.get(url).json()

import urllib
import json
url = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
result = json.load(urllib.urlopen(url))

Use the requests library, pretty print the results so you can better locate the keys/values you want to extract, and then use nested for loops to parse the data. In the example I extract step by step driving directions.
import json, requests, pprint
url = 'http://maps.googleapis.com/maps/api/directions/json?'
params = dict(
origin='Chicago,IL',
destination='Los+Angeles,CA',
waypoints='Joplin,MO|Oklahoma+City,OK',
sensor='false'
)
data = requests.get(url=url, params=params)
binary = data.content
output = json.loads(binary)
# test to see if the request was valid
#print output['status']
# output all of the results
#pprint.pprint(output)
# step-by-step directions
for route in output['routes']:
for leg in route['legs']:
for step in leg['steps']:
print step['html_instructions']

just import requests and use from json() method :
source = requests.get("url").json()
print(source)
OR you can use this :
import json,urllib.request
data = urllib.request.urlopen("url").read()
output = json.loads(data)
print (output)

Try this:
import requests
import json
# Goole Maps API.
link = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
# Request data from link as 'str'
data = requests.get(link).text
# convert 'str' to Json
data = json.loads(data)
# Now you can access Json
for i in data['routes'][0]['legs'][0]['steps']:
lattitude = i['start_location']['lat']
longitude = i['start_location']['lng']
print('{}, {}'.format(lattitude, longitude))

Also for pretty Json on console:
json.dumps(response.json(), indent=2)
possible to use dumps with indent. (Please import json)

Handling JSON in Python 3.7

I am using Python 3.7 and I am trying to handle some JSON data that I receive back from a website. A sample of the JSON response is below but it can vary in length. In essence, it returns details about 'officers' and in the example below, there is data for two officers. This is using the OpenCorporates API
{"api_version":"0.4","results":{"page":1,"per_page":30,"total_pages":1,"total_count":2,"officers":[{"officer":{"id":212927580,"uid":null,"name":"NEIL KIDMAN","jurisdiction_code":"gb","position":"director","retrieved_at":"2015-12-04T00:00:00+00:00","opencorporates_url":"https://opencorporates.com/officers/212927580","start_date":"2015-01-28","end_date":null,"occupation":"SERVICE MANAGER","current_status":null,"inactive":false,"company":{"name":"GRSS LIMITED","jurisdiction_code":"gb","company_number":"09411531","opencorporates_url":"https://opencorporates.com/companies/gb/09411531"}}},{"officer":{"id":190031476,"uid":null,"name":"NEIL KIDMAN","jurisdiction_code":"gb","position":"director","retrieved_at":"2015-12-04T00:00:00+00:00","opencorporates_url":"https://opencorporates.com/officers/190031476","start_date":"2002-05-17","end_date":null,"occupation":"COMPANY DIRECTOR","current_status":null,"inactive":false,"company":{"name":"GILBERT ROAD SERVICE STATION LIMITED","jurisdiction_code":"gb","company_number":"04441363","opencorporates_url":"https://opencorporates.com/companies/gb/04441363"}}}]}}
My code so far is:-
response = requests.get(url)
response.raise_for_status()
jsonResponse = response.json()
officerDetails = jsonResponse['results']['officers']
This works well but my ultimate goal is to create variables and write them to a .csv. So I'd like to write something like:-
name = jsonResponse['results']['officers']['name']
position = jsonResponse['results']['officers']['name']
companyName = jsonResponse['results']['officers']['company']['name']
Any suggestions how I could do this? As said, I'd like to loop through each 'officer' in the JSON response and then capture these values and write to a .csv (I will tackle the .csv part once I have them assigned to the variables)

officers = jsonResponse['results']['officers']
res = []
for officer in officers:
data = {}
data['name'] = officer['officer']['name']
data['position'] = officer['officer']['position']
data['company_name'] = officer['officer']['company']['name']
res.append(data)
You can then go ahead to write res, which is a list of objects to a csv file.

Python: Automatically updating Twitch moderator list using API

I'm trying to auto update a moderator list using this API:
https://tmi.twitch.tv/group/user/ice3lade/chatters
I am accessing and storing it with
from urllib.request import urlopen
response = urlopen('https://tmi.twitch.tv/group/user/ice3lade/chatters')
chatlist = response.read()
But attempting to simply use it as a dictionary e.g.
print(chatlist("chatters"))
Returns an error
TypeError: 'bytes' object is not callable
I'm a total python noob so any help is appreciated. How do I either access this as a dictionary directly from the API, or how to store the data I get from reading the API as a proper dictionary?

Made a fairly reasonable solution, chatlist gives the full dictionary, chatters gives all the keys and values within the chatters dictionary, and moderators gives the list of moderators.
from urllib.request import urlopen
from json import loads
response = urlopen('https://tmi.twitch.tv/group/user/xflixx_teampokerstars/chatters')
readable = response.read().decode('utf-8')
chatlist = loads(readable)
chatters = chatlist['chatters']
moderators = chatters['moderators']
Didn't know json was required to decode the API.

urllib2.urlopen not getting all content

I am a beginner in python to pull some data from reddit.com
More precisely, I am trying to send a request to http:www.reddit.com/r/nba/.json to get the JSON content of the page and then parse it for entries about a specific team or player.
To automate the data gathering, I am requesting the page like this:
import urllib2
FH = urllib2.urlopen("http://www.reddit.com/r/nba/.json")
rnba = FH.readlines()
rnba = str(rnba[0])
FH.close()
I am also pulling the content like this on a copy of the script, just to be sure:
FH = requests.get("http://www.reddit.com/r/nba/.json",timeout=10)
rnba_json = FH.json()
FH.close()
However, I am not getting the full data that is presented when I manually go to
http://www.reddit.com/r/nba/.json with either method, in particular when I call
print len(rnba_json['data']['children']) # prints 20-something child stories
but when I do the same loading the copy-pasted JSON string like this:
import json
import urllib2
fh = r"""{"kind": "Listing", "data": {"modhash": ..."""# long JSON string
r_nba = json.loads(fh) #loads the json string from the site into json object
print len(r_nba['data']['children']) #prints upwards of 100 stories
I get more story links. I know about the timeout parameter but providing it did not resolve anything.
What am I doing wrong or what can I do to get all the content presented when I pull the page in the browser?

To get the max allowed, you'd use the API like: http://www.reddit.com/r/nba/.json?limit=100

convert from json to post data by python

given json {"foo":"bazz","1":2}
I want to convert it to POST data :
"foo"="bazz";"1"=2;
(the data format in case it were posted from html form)
is there any exist decoder for json>> POST data? if no, does next script will do it as well?
json_body = {"foo":"bazz","1":2}
data = ''
for key, value in json_body.items():
data += '"{key}"={value};'.format(key=key, value=value)
print data
>> "foo"="bazz";"1"=2;
thanks

Use urllib.parse.urlencode:
from urllib.parse import urlencode
data = urlencode(json_body)
This produces x-www-form-urlencoded data, which is the default mime-type used by browsers when POST-ing HTML forms.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse a HTML response as json format using python? - python

Related

How to get value from JSON response using Python [duplicate]

Handling JSON in Python 3.7

Python: Automatically updating Twitch moderator list using API

urllib2.urlopen not getting all content

convert from json to post data by python

Categories

Resources