How do I get my JSON decoder to work properly? - python

I was working on API testing, and I tried everything but it would not print the JSON file into a string. I was wondering if it was the website I was testing the API requests from as I kept getting a 406 error. I even tried taking code from online that shows how to do this but it still would not print and it would give the error listed below. Here I give the code I used and the response Pycharm's console gave me.
import json
import requests
res = requests.get("http://dummy.restapiexample.com/api/v1/employees")
data = json.loads(res.text)
data = json.dumps(data)
print(data)
print(type(data))
Traceback (most recent call last):
File "C:/Users/johnc/PycharmProjects/API_testing/api_testing.py", line 8, in <module>
data = json.loads(res.text)
File "D:\Program Files (x86)\lib\json\__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "D:\Program Files (x86)\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\Program Files (x86)\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

REST API's vary widely in the types of requests they will accept. 406 means that you didn't give enough information for the server to format a response. You should include a user agent because API's are frequently tweaked to deal with the foibles of various http clients and specifically list the format of the output you want. Adding acceptable encodings lets the API compress data. Charset is a good idea. You could even add a language request but most API's don't care.
import json
import requests
headers={"Accept":"application/json",
"User-agent": "Mozilla/5.0",
"Accept-Charset":"utf-8",
"Accept-Encoding":"gzip, deflate",
"Accept-Language":"en-US"} # or your favorite language
res = requests.get("http://dummy.restapiexample.com/api/v1/employees", headers=headers)
data = json.loads(res.text)
data = json.dumps(data)
print(data)
print(type(data))
The thing about REST API's is that they may ignore some or part of the header and return what they please. Its a good idea to form the request properly anyway.

The default Python User-Agent was being probably blocked by the hosting company.
You can setup any string or search for a real device string.
res = requests.get("http://dummy.restapiexample.com/api/v1/employees", headers={"User-Agent": "XY"})

It's you, your connection or a proxy. Things work just fine for me.
>>> import requests
>>> res = requests.get("http://dummy.restapiexample.com/api/v1/employees")
>>> res.raise_for_status() # would raise if status != 200
>>> print(res.json()) # `res.json()` is the canonical way to extract JSON from Requests
{'status': 'success', 'data': [{'id': '1', 'employee_name': 'Tiger Nixon', 'employee_salary': '320800', ...

Related

JSONDecoder Error - While accessing Google Translate API with Python

I am learning to use HTTP requests in Python, using this HTTP request provided by a TopCoder training challenge (learning purposes only! no compensation of any sort) in which you have to access the Google Translate API:
curl --location --request POST 'https://translate.google.com/translate_a/single?client=at&dt=t&dt=ld&dt=qca&dt=rm&dt=bd&dj=1&hl=%25s&ie=UTF-8&oe=UTF-8&inputm=2&otf=2&iid=1dd3b944-fa62-4b55-b330-74909a99969e&' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'User-Agent: AndroidTranslate/5.3.0.RC02.130475354-53000263 5.1 phone TRANSLATE_OPM5_TEST_1' \
--data-urlencode 'sl=de' \
--data-urlencode 'tl=en' \
--data-urlencode 'q=Hallo'
and I'm wondering how to make the equivalent request in my Python application? Any help is appreciated.
So far I have:
installed and imported requests
understood that I need to store my POST request in a variable and parse it with JSON.
The issue is that I get a JSONDecoder error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\myname\Documents\Code\GoogleTranslateApiPy\env\lib\site-packages\requests\models.py", line 898, in json
return complexjson.loads(self.text, **kwargs)
File "c:\users\myname\appdata\local\programs\python\python38-32\lib\json\__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "c:\users\myname\appdata\local\programs\python\python38-32\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\users\myname\appdata\local\programs\python\python38-32\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
with this Python request (I tried to translate the curl request as best as I could):
import requests
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'AndroidTranslate/5.3.0.RC02.130475354-53000263 5.1 phone TRANSLATE_OPM5_TEST_1',
}
params = (
('client', 'at'),
('dt', ['t', 'ld', 'qca', 'rm', 'bd']),
('dj', '1'),
('hl', '%s'),
('ie', 'UTF-8'),
('oe', 'UTF-8'),
('inputm', '2'),
('otf', '2'),
('iid', '1dd3b944-fa62-4b55-b330-74909a99969e'),
('', ''),
)
response = requests.get('https://translate.google.com/translate_a/single', headers=headers, params=params)
I feel that there's something fundamental I'm missing here. The request in the current documentation by Google for Translate differs from this provided request, but I'd like to know how I could get this way to work, in case I'm ever provided with a curl command like this in the future.
Two things:
You must do a POST request (currently you are doing a get request)
You are not including the body of the request. For example, the curl call includes url encoded data like this: data-urlencode 'q=Hallo'. You must include this parameters in your post request too, the provided link shows you how. This are key values that will go inside a dictionary, for example {q: 'Hallo', ...}
PS: I'm 90% sure that you should also convert to a dictionary the query params you currently have inside tuples. So, you'd have post with headers, params and data.

Polling an API endpoint - how do I retry when no JSON is returned?

I'm polling an API endpoint using a while loop that checks for whether or not a .get() method on the JSON returns None:
while requests.get(render_execution_url, headers=headers).json().get('finalized_at') is None:
status = requests.get(render_execution_url, headers=headers).json().get('status')
status_detail = requests.get(render_execution_url, headers=headers).json().get('status_detail')
logger.info("status for {} is {}. detailed status is {}".format(render_execution_url, status, status_detail))
The idea here is that we keep polling the endpoint until the "finalized_at" value is populated.
Unfortunately, we periodically get failures when the JSON doesn't exist at all:
File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
I've tried using the retry decorator on the method (see below for the decorator syntax) but it doesn't seem to be executing retries when I hit this failure.
#retry(stop_max_attempt_number=7, wait_fixed=10000)
Is there a graceful, Pythonic way to deal with the case when the JSON doesn't exist (i.e., to try again in some amount of time)?
Your code is too dense to easily separate out the different conditions you need to handle, and so your error report doesn't make it clear exactly what "when the JSON doesn't exist at all" means - is the server returning 404 (Page Not Found), or is the response data empty, or something else?
Here's a re-write that doesn't access the URL for each access to the JSON. It may not suit your needs perfectly, but it should give you start.
while True:
resp = requests.get(render_execution_url, headers=headers)
# I assume response status is always 200 or 204 -
# Really easy to detect a 404 here if that happens.
if not resp.data:
time.sleep(WAIT_TIME)
continue
rj = resp.json()
if rj.get('finalized_at') is not None:
break
status = rj.get('status')
status_detail = rj.get('status_detail')
logger.info("status for {} is {}. detailed status is {}"
.format(render_execution_url, status, status_detail))
time.sleep(WAIT_TIME)

Why can response.content be read twice and can't be decoded to json

I found a strange behavior today.
I sent a message in google cloud messaging via the python requests lib.
Then I tried to decode the response to json like this:
response = requests.post(Message_Broker.host, data=json.dumps(payload), headers=headers)
response_results = json.loads(response.content)["results"]
This crashed with a decode error:
response_results = json.loads(response.content)["results"]
File "/usr/local/lib/python2.7/dist-packages/simplejson/__init__.py", line 505, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python2.7/dist-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/usr/local/lib/python2.7/dist-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
This happened on my productive system, so I added some debug logging to get to know what the actual content of the response is like this:
logger.info("GCM-Response: " + str(response))
logger.info("GCM-Response: " + response.content)
logger.info("GCM-Response: " + str(response.headers))
Now the actual weird behavior occured. It got logged correctly and didn't throw the decoding error anymore.
Can someone explain me that behavior?
I have also checked what response.content actually is:
#property
def content(self):
"""Content of the response, in bytes."""
if self._content is False:
# Read the contents.
try:
if self._content_consumed:
raise RuntimeError(
'The content for this response was already consumed')
if self.status_code == 0:
self._content = None
else:
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
except AttributeError:
self._content = None
self._content_consumed = True
# don't need to release the connection; that's been handled by urllib3
# since we exhausted the data.
return self._content
It is part of the requests models. Not a actual property but made accessable via the #property decorator.
For my understanding, the first time the content is read for the logging, the _content_consumed flag is set to True. Therefore the second time, when I read it for the json decoding it should actually raise the Runtime Error.
Is there an explanation, which I just not found when browsing the requests docs?
Therefore the second time, when I read it for the json decoding it should actually raise the Runtime Error.
No, it won't raise a RuntimeError. When you access response.content first time it will cache actual data into self._content. On the second (third, fourth, etc) access if self._content is False: is falsy, so you will get content cached in self._content.
The if self._content_consumed: check is most likely internal assert to discover attempts to read data from the socket multiple times (which is obviously an error).
It can't be decoded to JSON, because you received not a JSON in response body or received empty body. Maybe it is 500 response or 429. It's impossible to say without seeing actual response.

Error when trying to use urllib3 & json to get Rotten Tomatoes data (Python)

As an introduction to APIs, I'm trying to figure out how to access data in python using the Rotten Tomatoes API. This is also my first time dealing with json.
I'm using Python 3.4 and have confirmed that json and urllib3 have been installed.
Here is my code:
import urllib3
import json
url = 'http://api.rottentomatoes.com/api/public/v1.0/lists/movies/box_office.json?limit=16&country=us&apikey=API-KEY';
http = urllib3.PoolManager()
request = http.request('GET', url)
print (json.load(request));
request.release_conn()
Here is the error I get:
Traceback (most recent call last):
File "C:\Users\admnasst1\Documents\Personal\Python\RotTomTest.py", line 16, in <module>
print (str(json.load(request)));
File "C:\Python34\lib\json\__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "C:\Python34\lib\json\__init__.py", line 312, in loads
s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'
Since I'm trying out so many new things (API, urllib3, json), I'm not exactly sure what's going on. I've tried doing a few other versions of the above code, and I keep getting the same error, so i think I must be missing something basic... Can any of you spot it?
You'll need to decode the network data to a a string:
json.loads(request.data.decode('utf8'))
Ideally, you'd detect what codec was used from the Content-Type header; you'd parse that and look for a charset= parameter. The default encoding for JSON data is UTF-8.
If you are going to use a 3rd party library, use requests instead. It wraps urllib3 in a friendly API and can handle JSON for you:
import requests
url = 'http://api.rottentomatoes.com/api/public/v1.0/lists/movies/box_office.json'
params = {'limit': 16, 'country': 'us', 'apikey': 'API-KEY'}
response = requests.get(url, params=params)
print(response.json())

Create and parse multipart HTTP requests in Python

I'm trying to write some python code which can create multipart mime http requests in the client, and then appropriately interpret then on the server. I have, I think, partially succeeded on the client end with this:
from email.mime.multipart import MIMEMultipart, MIMEBase
import httplib
h1 = httplib.HTTPConnection('localhost:8080')
msg = MIMEMultipart()
fp = open('myfile.zip', 'rb')
base = MIMEBase("application", "octet-stream")
base.set_payload(fp.read())
msg.attach(base)
h1.request("POST", "http://localhost:8080/server", msg.as_string())
The only problem with this is that the email library also includes the Content-Type and MIME-Version headers, and I'm not sure how they're going to be related to the HTTP headers included by httplib:
Content-Type: multipart/mixed; boundary="===============2050792481=="
MIME-Version: 1.0
--===============2050792481==
Content-Type: application/octet-stream
MIME-Version: 1.0
This may be the reason that when this request is received by my web.py application, I just get an error message. The web.py POST handler:
class MultipartServer:
def POST(self, collection):
print web.input()
Throws this error:
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 242, in process
return self.handle()
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 233, in handle
return self._delegate(fn, self.fvars, args)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 415, in _delegate
return handle_class(cls)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 390, in handle_class
return tocall(*args)
File "/home/richard/Development/server/webservice.py", line 31, in POST
print web.input()
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/webapi.py", line 279, in input
return storify(out, *requireds, **defaults)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 150, in storify
value = getvalue(value)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 139, in getvalue
return unicodify(x)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 130, in unicodify
if _unicode and isinstance(s, str): return safeunicode(s)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 326, in safeunicode
return obj.decode(encoding)
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 137-138: invalid data
My line of code is represented by the error line about half way down:
File "/home/richard/Development/server/webservice.py", line 31, in POST
print web.input()
It's coming along, but I'm not sure where to go from here. Is this a problem with my client code, or a limitation of web.py (perhaps it just can't support multipart requests)? Any hints or suggestions of alternative code libraries would be gratefully received.
EDIT
The error above was caused by the data not being automatically base64 encoded. Adding
encoders.encode_base64(base)
Gets rid of this error, and now the problem is clear. HTTP request isn't being interpreted correctly in the server, presumably because the email library is including what should be the HTTP headers in the body instead:
<Storage {'Content-Type: multipart/mixed': u'',
' boundary': u'"===============1342637378=="\n'
'MIME-Version: 1.0\n\n--===============1342637378==\n'
'Content-Type: application/octet-stream\n'
'MIME-Version: 1.0\n'
'Content-Transfer-Encoding: base64\n'
'\n0fINCs PBk1jAAAAAAAAA.... etc
So something is not right there.
Thanks
Richard
I used this package by Will Holcomb http://pypi.python.org/pypi/MultipartPostHandler/0.1.0 to make multi-part requests with urllib2, it may help you out.
After a bit of exploration, the answer to this question has become clear. The short answer is that although the Content-Disposition is optional in a Mime-encoded message, web.py requires it for each mime-part in order to correctly parse out the HTTP request.
Contrary to other comments on this question, the difference between HTTP and Email is irrelevant, as they are simply transport mechanisms for the Mime message and nothing more. Multipart/related (not multipart/form-data) messages are common in content exchanging webservices, which is the use case here. The code snippets provided are accurate, though, and led me to a slightly briefer solution to the problem.
# open an HTTP connection
h1 = httplib.HTTPConnection('localhost:8080')
# create a mime multipart message of type multipart/related
msg = MIMEMultipart("related")
# create a mime-part containing a zip file, with a Content-Disposition header
# on the section
fp = open('file.zip', 'rb')
base = MIMEBase("application", "zip")
base['Content-Disposition'] = 'file; name="package"; filename="file.zip"'
base.set_payload(fp.read())
encoders.encode_base64(base)
msg.attach(base)
# Here's a rubbish bit: chomp through the header rows, until hitting a newline on
# its own, and read each string on the way as an HTTP header, and reading the rest
# of the message into a new variable
header_mode = True
headers = {}
body = []
for line in msg.as_string().splitlines(True):
if line == "\n" and header_mode == True:
header_mode = False
if header_mode:
(key, value) = line.split(":", 1)
headers[key.strip()] = value.strip()
else:
body.append(line)
body = "".join(body)
# do the request, with the separated headers and body
h1.request("POST", "http://localhost:8080/server", body, headers)
This is picked up perfectly well by web.py, so it's clear that email.mime.multipart is suitable for creating Mime messages to be transported by HTTP, with the exception of its header handling.
My other overall conern is in scalability. Neither this solution nor the others proposed here scale well, as they read the contents of a file into a variable before bundling up in the mime message. A better solution would be one which could serialise on demand as the content is piped out over the HTTP connection. It's not urgent for me to fix that, but I'll come back here with a solution if I get to it.
There is a number of things wrong with your request. As TokenMacGuy suggests, multipart/mixed is unused in HTTP; use multipart/form-data instead. In addition, parts should have a Content-disposition header. A python fragment to do that can be found in the Code Recipes.

Categories