Decoding JSON that contains Base64 - python

I'm sending a request for a set of images to one of my API's. The API returns these images in a JSON format. This format contains data about the resource together with a single property that represents the image in Base64.
An example of the JSON being returned.
{
"id": 548613,
"filename": "00548613.png",
"pictureTaken": "2020-03-30T11:38:21.003",
"isVisible": true,
"lotcode": 23,
"company": "05",
"concern": "46",
"base64": "..."
}
The correct content of the Base64
The incorrectly parsed Base64
This is done with the Python3 requests library. When i receive a successful response from the API i attempt to decode the body to JSON using:
url = self.__url__(f"/rest/all/V1/products/{sku}/images")
headers = self.__headers__()
r = requests.get(url=url, headers=headers)
if r.status_code == 200:
return r.json()
elif r.status_code == 404:
return None
else:
raise IOError(
f"Error retrieving product '{sku}', got {r.status_code}: '{r.text}'")
Calling .json() results in the Base64 content being messed up, some parts are not there, and some are replaced with other characters. I tried manually decoding the content using r.content.decode() with the utf-8 and ascii options to see if this was the problem after seeing this post. Sadly this didn't work.
I know the response from the server is correct, it works with Postman, and calling print(r.content) results in a JSON document containing the valid Base64.
How would i go about de-serializing the response from the API to get the valid Base64?

import base64
import re
...
b64text = re.search(b"\"base64\": \"(?P<base>.*)\"", r.content, flags=re.MULTILINE).group("base")
decode = base64.b64decode(b64text).decode(utf-8)
Since you're saying "calling print(r.content) results in the valid Base64", it's just a matter of decoding the base64.

Related

Python http.client giving empty string when I use response.read().decode()

I am using http.client to hit a POST api and get the json response. My code is working correctly when I print response.read(). However, for some reason, this response is limited to only 20 results and the total count of results is over 20,000. I want to get the complete response in a variable using response.read().decode(), I am hoping that the variable will contain the complete json string. The issue is that I am getting an empty string when I used decode(). How do I get this done? How do I get the complete results?
import http.client
host = 'jooble.org'
key = 'API_KEY'
connection = http.client.HTTPConnection(host)
#request headers
headers = {
"Content-type": "application/json"}
#json query
body = '{ "keywords": "sales", "location": "MA"}'
connection.request('POST','/api/' + key, body, headers)
response = connection.getresponse()
print(response.status, response.reason)
print(response.read())
print(response.read().decode())
Don't call response.read() twice. response is a stream, so each call to read() continues from where the previous one ended. Since the first call is reading the entire response, the second one doesn't read anything.
If you want to print the encoded and decoded response, assign response.read() to a variable, then decode that.
data = response.read()
print(data)
print(data.decode())
But this can be done more simply using the requests module.
import requests
host = 'jooble.org'
key = 'API_KEY'
body = { "keywords": "sales", "location": "MA"}
response = requests.post(f'https://{host}/api/{key}', json=body)
print(response.content)
Note that in this version body is a dictionary, not a string. The json parameter automatically converts it to JSON.

Python - Outputting to .JSON with results from Microsoft's Computer Vision API

Trying to output my response from Microsoft's Computer Vision API to a .json file, it works with all of the other APIs I've been using so far. With the code below, directly from Microsoft's documentation, I get an error:
Error: the JSON object must be str, not 'bytes'
Removing the parsed = json.loads(data) and using print(json.dumps(data, sort_keys=True, indent=2)) prints out the information for the image that I want, but also says Error and is prefixed with
b
denoting it's in bytes and ending with
is not JSON serializable
I'm just trying to find out how I can get the response into a .json file like i'm able to do with other APIs and am at a loss for how I can possible convert this in a way that will work.
import http.client, urllib.request, urllib.parse, urllib.error, base64, json
API_KEY = '{API_KEY}'
uri_base = 'westus.api.cognitive.microsoft.com'
headers = {
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': API_KEY,
}
params = urllib.parse.urlencode(
{
'visualFeatures': 'Categories, Description, Color',
'language': 'en',
}
)
body = "{'url': 'http://i.imgur.com/WgPtc53.jpg'}"
try:
conn = http.client.HTTPSConnection(uri_base)
conn.request('POST', '/vision/v1.0/analyze?%s' % params, body, headers)
response = conn.getresponse()
data = response.read()
# 'data' contains the JSON data. The following formats the JSON data for display.
parsed = json.loads(data)
print ("Response:")
print (json.dumps(parsed, sort_keys=True, indent=2))
conn.close()
except Exception as e:
print('Error:')
print(e)
Shortly after posting the question, I realized I had missed looking for something: just converting bytes to a string.
found this Convert bytes to a Python string
and was able to modify my code to:
parsed = json.loads(data.decode('utf-8'))
And it seems to have resolved my issue. Now error-free and able to export to .json file like I needed.

ValueError: Data must not be a string. in Python [duplicate]

I am trying to do the following with requests:
data = {'hello', 'goodbye'}
json_data = json.dumps(data)
headers = {
'Access-Key': self.api_key,
'Access-Signature': signature,
'Access-Nonce': nonce,
'Content-Type': 'application/json',
'Accept': 'text/plain'
}
r = requests.post(url, headers=headers, data=json_data,
files={'file': open('/Users/david/Desktop/a.png', 'rb')})
However, I get the following error:
ValueError: Data must not be a string.
Note that if I remove the files parameter, it works as needed. Why won't requests allow me to send a json-encoded string for data if files is included?
Note that if I change data to be just the normal python dictionary (and not a json-encoded string), the above works. So it seems that the issue is that if files is not json-encoded, then data cannot be json-encoded. However, I need to have my data encoded to match a hash signature that's being created by the API.
When you specify your body to a JSON string, you can no longer attach a file since file uploading requires the MIME type multipart/form-data.
You have two options:
Encapsulate your JSON string as part as the form data (something like json => json.dumps(data))
Encode your file in Base64 and transmit it in the JSON request body. This looks like a lot of work though.
1.Just remove the line
json_data = json.dumps(data)
and change in request as data=data.
2.Remove 'Content-Type': 'application/json' inside headers.
This worked for me.
Alternative solution to this problem is to post data as file.
You can post strings as files. Read more here:
http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file
Here is explained how to post multiple files:
http://docs.python-requests.org/en/latest/user/advanced/#post-multiple-multipart-encoded-files
removing the following helped me in my case:
'Content-Type': 'application/json'
then the data should be passed as dictionary
If your files are small, you could simply convert the binary (image or anything) to base64 string and send that as JSON to the API. That is much simpler and more straight forward than the suggested solutions. The currently accepted answer claims that is a lot of work, but it's really simple.
Client:
with open('/Users/houmie/Downloads/log.zip','rb') as f:
bytes = f.read()
tb = b64encode(bytes)
tb_str = tb.decode('utf-8')
body = {'logfile': tb_str}
r = requests.post('https://prod/feedback', data=json.dumps(body), headers=headers)
API:
def create(event, context):
data = json.loads(event["body"])
if "logfile" in data:
tb_back = data["logfile"].encode('utf-8')
zip_data = base64.b64decode(tb_back)

Python Requests, getting back: Unexpected character encountered while parsing value: L. Path

I am attempting to get an auth token from The Trade Desk's (sandbox) api but I get back a 400 response stating:
"Error reading Content-Type 'application/json' as JSON: Unexpected
character encountered while parsing value: L. Path '', line 0,
position 0."
Whole response.json():
{u'ErrorDetails': [{u'Reasons': [u"Error reading Content-Type 'application/json' as JSON: Unexpected character encountered while parsing value: L. Path '', line 0, position 0."], u'Property': u'TokenRequest'}], u'Message': u'The request failed validation. Please check your request and try again.'}
My script (runnable):
import requests
def get_token():
print "Getting token"
url = "https://apisb.thetradedesk.com/v3/authentication"
headers = {'content-type': 'application/json'}
data = {
"Login":"logintest",
"Password":"password",
"TokenExpirationInMinutes":60
}
response = requests.post(url, headers=headers, data=data)
print response.status_code
print response.json()
return
get_token()
Sandbox docs here
I believe this means my headers var is not being serialized correctly by requests, which seems impossible, or not being deserialized correctly by The Trade Desk. I've gotten into the requests lib but I can't seem to crack it and am looking for other input.
You need to do
import json
and convert your dict into json:
response = requests.post(url, headers=headers, data=json.dumps(data))
Another way would be to explicitely use json as parameter:
response = requests.post(url, headers=headers, json=data)
Background: In the prepare_body method of requests a dictionary is explicitely converted to json and a content-header is also automatically set:
if not data and json is not None:
content_type = 'application/json'
body = complexjson.dumps(json)
If you pass data=data then your data will be only form-encoded (see http://docs.python-requests.org/en/latest/user/quickstart/#more-complicated-post-requests). You will need to explicitely convert it to json, if you want json to be the content-type of your http body.
Your follow-up question was about why headers don't have to be converted to json. Headers can be simply passed as dictionary into the request. There's no need to convert it to json. The reason is implementation specific.

ValueError: Data must not be a string

I am trying to do the following with requests:
data = {'hello', 'goodbye'}
json_data = json.dumps(data)
headers = {
'Access-Key': self.api_key,
'Access-Signature': signature,
'Access-Nonce': nonce,
'Content-Type': 'application/json',
'Accept': 'text/plain'
}
r = requests.post(url, headers=headers, data=json_data,
files={'file': open('/Users/david/Desktop/a.png', 'rb')})
However, I get the following error:
ValueError: Data must not be a string.
Note that if I remove the files parameter, it works as needed. Why won't requests allow me to send a json-encoded string for data if files is included?
Note that if I change data to be just the normal python dictionary (and not a json-encoded string), the above works. So it seems that the issue is that if files is not json-encoded, then data cannot be json-encoded. However, I need to have my data encoded to match a hash signature that's being created by the API.
When you specify your body to a JSON string, you can no longer attach a file since file uploading requires the MIME type multipart/form-data.
You have two options:
Encapsulate your JSON string as part as the form data (something like json => json.dumps(data))
Encode your file in Base64 and transmit it in the JSON request body. This looks like a lot of work though.
1.Just remove the line
json_data = json.dumps(data)
and change in request as data=data.
2.Remove 'Content-Type': 'application/json' inside headers.
This worked for me.
Alternative solution to this problem is to post data as file.
You can post strings as files. Read more here:
http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file
Here is explained how to post multiple files:
http://docs.python-requests.org/en/latest/user/advanced/#post-multiple-multipart-encoded-files
removing the following helped me in my case:
'Content-Type': 'application/json'
then the data should be passed as dictionary
If your files are small, you could simply convert the binary (image or anything) to base64 string and send that as JSON to the API. That is much simpler and more straight forward than the suggested solutions. The currently accepted answer claims that is a lot of work, but it's really simple.
Client:
with open('/Users/houmie/Downloads/log.zip','rb') as f:
bytes = f.read()
tb = b64encode(bytes)
tb_str = tb.decode('utf-8')
body = {'logfile': tb_str}
r = requests.post('https://prod/feedback', data=json.dumps(body), headers=headers)
API:
def create(event, context):
data = json.loads(event["body"])
if "logfile" in data:
tb_back = data["logfile"].encode('utf-8')
zip_data = base64.b64decode(tb_back)

Categories