Django StreamingHttpResponse format error - python

I have simple Django view, for downloading file from Amazon s3.
Test by saving file locally was alright:
def some_view(request):
res = s3.get_object(...)
try:
s3_file_content = res['Body'].read()
with open("/Users/yanik/ololo.jpg", 'wb') as f:
f.write(s3_file_content)
# file saved and I can view it
except:
pass
When switch to StreamingHttpResponse I got incorrect file format (can't open) and even wrong size (If original is 317kb image the output wood be around 620kb)
def some_view(request):
res = s3.get_object(...)
response = StreamingHttpResponse(res['Body'].read(), content_type=res['ContentType'])
response['Content-Disposition'] = 'attachment;filename=' + 'ololo.jpg'
response['ContentLength'] = res['ContentLength']
return response
Tried many different setting, but so far nothing worked for me. The output file is broken.
UPDATE
I managed to get more debuting information. If I change my file writing method in first sample from 'wb' to 'w' mode I'll have same output as with StreamingHttpResponse (first view will generate same broken file).
So it looks like I must tell http header that my output is on binary format
UPDATE (get to the core of problem)
Now I'm understand the problem. But still don't have the solution.
The res['Body'].read() returns bytes type and StreamingHttpResponse iterate through these bytes, which returns byte codes. So my pretty incoming bytes '...\x05cgr\xb8=:\xd0\xc3\x97U\xf4\xf3\xdc\xf0*\xd4#\xff\xd9' force converted to array like: [ ... , 195, 151, 85, 244, 243, 220, 240, 42, 212, 64, 255, 217] and then downloaded like concatenated strings. Screenshot: http://take.ms/JQztk
As you see, the list elements in the end.
StreamingHttpResponse.make_bytes
"""Turn a value into a bytestring encoded in the output charset."""

Still don't sure what is going on. But FileWrapper from https://stackoverflow.com/a/8601118/2576817 works fine for boto3 StreamingBody response type.
Welcome if someone have courage to explain this fuzzy behavior.

https://docs.djangoproject.com/en/1.9/ref/request-response/#streaminghttpresponse-objects
StreamingHttpResponse needs an iterator. I think if your file is binary (image), then StreamingHttpResponse is not the best solution, or you should create chunks of that file.
Bytearray is an iterator but perhaps you want to go on lines not on bytes/characters.
I'm not sure if your file is line based text data, but if it is, you could create a generator in order to iterate over the file like object:
def line_generator(file_like_obj):
for line in file_like_obj:
yield line
and feed that generator to the StreamingHttpResponse:
some_view(request):
res = s3.get_object(...)
response = StreamingHttpResponse(line_generator(res['Body']), ...)
return response

Related

Upload file to Databricks DBFS with Python API

I'm following the Databricks example for uploading a file to DBFS (in my case .csv):
import json
import requests
import base64
DOMAIN = '<databricks-instance>'
TOKEN = '<your-token>'
BASE_URL = 'https://%s/api/2.0/dbfs/' % (DOMAIN)
def dbfs_rpc(action, body):
""" A helper function to make the DBFS API request, request/response is encoded/decoded as JSON """
response = requests.post(
BASE_URL + action,
headers={'Authorization': 'Bearer %s' % TOKEN },
json=body
)
return response.json()
# Create a handle that will be used to add blocks
handle = dbfs_rpc("create", {"path": "/temp/upload_large_file", "overwrite": "true"})['handle']
with open('/a/local/file') as f:
while True:
# A block can be at most 1MB
block = f.read(1 << 20)
if not block:
break
data = base64.standard_b64encode(block)
dbfs_rpc("add-block", {"handle": handle, "data": data})
# close the handle to finish uploading
dbfs_rpc("close", {"handle": handle})
When using the tutorial as is, I get an error:
Traceback (most recent call last):
File "db_api.py", line 65, in <module>
data = base64.standard_b64encode(block)
File "C:\Miniconda3\envs\dash_p36\lib\base64.py", line 95, in standard_b64encode
return b64encode(s)
File "C:\Miniconda3\envs\dash_p36\lib\base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
I tried doing with open('./sample.csv', 'rb') as f: before passing the blocks to base64.standard_b64encode but then getting another error:
TypeError: Object of type 'bytes' is not JSON serializable
This happens when the encoded block data is being sent into the API call.
I tried skipping encoding entirely and just passing the blocks into the post call. In this case the file gets created in the DBFS but has 0 bytes size.
At this point I'm trying to make sense of it all. It doesn't want a string but it doesn't want bytes either. What am I doing wrong? Appreciate any help.
In Python we have strings and bytes, which are two different entities note that there is no implicit conversion between them, so you need to know when to use which and how to convert when necessary. This answer provides nice explanation.
With the code snippet I see two issues:
This you already got - open by default reads the file as text. So your block is a string, while standard_b64encode expects bytes and returns bytes. To read bytes from file it needs to be opened in binary mode:
with open('/a/local/file', 'rb') as f:
Only strings can be encoded as JSON. There's no source code available for dbfs_rpc (or I can't find it), but apparently it expects a string, which it internally encodes. Since your data is bytes, you need to convert it to string explicitly and that's done using decode:
dbfs_rpc("add-block", {"handle": handle, "data": data.decode('utf8')})

How can a GRIB file be opened with pygrib without first downloading the file?

The documentation for pygrib shows a function called fromstring which creates a gribmessage instance from a python bytes object representing a binary grib message. I might be misunderstanding the purpose of this function, but it leads me to believe I can use it in place of downloading a GRIB file and using the open function on it. Unfortunately, my attempts to open a multi-message GRIB file from NLDAS2 have failed. Does anyone else know how to use pygrib on GRIB data without first saving the file? My code below shows how I would like it to work. Instead, it gives the error TypeError: expected bytes, int found on the line for grib in gribs:
from urllib import request
import pygrib
url = "<remote address of desired file>"
username = "<username>"
password = "<password>"
redirectHandler = request.HTTPRedirectHandler()
cookieProcessor = request.HTTPCookieProcessor()
passwordManager = request.HTTPPasswordMgrWithDefaultRealm()
passwordManager.add_password(None, "https://urls.earthdata.nasa.gov", username, password)
authHandler = request.HTTPBasicAuthHandler(passwordManager)
opener = request.build_opener(redirectHandler, cookieProcessor, authHandler)
request.install_opener(opener)
with request.urlopen(url) as response:
data = response.read()
gribs = pygrib.fromstring(data)
for grib in gribs:
print(grib)
Edit to add the entire error output:
Traceback (most recent call last):
File ".\example.py", line 19, in <module>
for grb in grbs:
File "pygrib.pyx", line 1194, in pygrib.gribmessage.__getitem__
TypeError: expected bytes, int found
Edit: This interface does not support multi-message GRIB files, but the authors are open to a pull request if anyone wants to write up the code. Unfortunately, my research focus has shifted and I don't have time to contribute myself.
As stated by jasonharper you can use pygrib.fromstring(). I just tried it myself and this works.
Here is the link to the documentation.
Starting with pygrib v2.1.4, the changelog says that pygrib.open() now accepts io.BufferedReader object as an input argument.
see pygrib changelog here
That would theoretically allow you to read a GRIB2 file from RAM memory without writing it to disk.
I think the usage is supposed to be the following :
binary_io = io.BytesIO(bytes_data)
buffer_io = io.BufferedReader(binary_io)
grib_file = pygrib.open(buffer_io)
But I was not able to make it work on my side !

How to fix Python at Object of type datetime is not JSON serializable error

I use data mining by twitter.
So I get value create_at from twitter to save in file excel after that send file excel to google sheet but
It can't to send it.
It have error this :
response = service.spreadsheets().values().append(
File "C:\Users\What Name\AppData\Local\Programs\Python\Python38-32\lib\site-
packages\googleapiclient\discovery.py", line 830, in method
headers, params, query, body = model.request(
File "C:\Users\What Name\AppData\Local\Programs\Python\Python38-32\lib\site-
packages\googleapiclient\model.py", line 161, in request
body_value = self.serialize(body_value)
File "C:\Users\What Name\AppData\Local\Programs\Python\Python38-32\lib\site-
packages\googleapiclient\model.py", line 274, in serialize
return json.dumps(body_value)
File "C:\Users\What Name\AppData\Local\Programs\Python\Python38-32\lib\json\__init__.py", line 231,
in dumps
return _default_encoder.encode(obj)
File "C:\Users\What Name\AppData\Local\Programs\Python\Python38-32\lib\json\encoder.py", line 199, in
encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Users\What Name\AppData\Local\Programs\Python\Python38-32\lib\json\encoder.py", line 257, in
iterencode
return _iterencode(o, 0)
File "C:\Users\What Name\AppData\Local\Programs\Python\Python38-32\lib\json\encoder.py", line 179, in
default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable
Or may be problem have this code ?
xlApp = win32.Dispatch('Excel.Application')
wb = xlApp.Workbooks.Open(r"F:\work\feen\WU\twitter.xlsx")
ws = wb.WorkSheets('Sheet1')
rngData = ws.Range('A1').CurrentRegion()
gsheet_id = 'sheet_id'
CLIENT_SECRET_FILE = 'credentials2.json'
API_SERVICE_NAME = 'sheets'
API_VERSION = 'v4'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
service = Create_Service(CLIENT_SECRET_FILE,API_SERVICE_NAME,API_VERSION,SCOPES)
response = service.spreadsheets().values().append(
spreadsheetId=gsheet_id,
valueInputOption='RAW',
range='data1!A1',
body=dict(
majorDimension='ROWS',
values=rngData
)
).execute()
wb.Close(r"F:\work\feen\WU\twitter.xlsx")
Answer:
In order to fix the Object of type datetime is not JSON serializable error, you need to convert all instances of datetime objects in your object to string.
There are other errors in your code, however, meaning this alone will not make your program run.
Converting the datetime objects to string objects:
In python, you can natively convert JSON data to string using json.dumps() with a default conversion of string.
You can do this by adding this line before the service.spreadsheets().values().append() call:
//rngData at this point has already been assigned
rngData = json.dumps(rngData, indent = 4, sort_keys = True, default = str)
NB: This on its own will not fix your code!
Other Matters:
When making calls to the Google Sheets API, it is very important that you make the requests in the way that the servers are expecting to receive those requests. That is to say, it is important to follow the documentation for making requests.
I am on a linux machine, and so I can not test the output format of win32.Dispatch().Workbooks.Open().Worksheets().Range().CurrentRegion(), but if the Microsoft Documentation on the Worksheet.Range property of Excel is anything to go by, I can safely assume that it's output isn't in the format required by the spreadsheets.values.append method:
array (ListValue format):
The data that was read or to be written. This is an array of arrays, the outer array representing all the data and each inner array representing a major dimension. Each item in the inner array corresponds with one cell.
For output, empty trailing rows and columns will not be included.
For input, supported value types are: bool, string, and double. Null values will be skipped. To set a cell to an empty value, set the string value to an empty string.
I'm not 100% sure if the output is the same, but to try and emulate what you're trying I used the python package xlrd to get the values from the Excel file you provided like so:
workbook = xlrd.open_workbook("twitter.xlsx")
sheet = workbook.sheet_by_index(0)
data = [sheet.row_values(rowx) for rowx in range(sheet.nrows)]
And, as in the screenshot you provided in a comment (seen below):
I had the same response. Scrolling up, the error was due to a bad request:
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://sheets.googleapis.com/v4/spreadsheets/XXXXX/values/Sheet1%21A1:append?alt=json&valueInputOption=RAW returned "Invalid value at 'data.values' (type.googleapis.com/google.protobuf.ListValue)..."
specifically, Invalid value at 'data.values'. You will need to adhere to the Google Sheets API request specification for this method.
References:
Method: spreadsheets.values.append | Sheets API - Response body
Worksheet.Range property (Excel) | Microsoft Docs
REST Resource: spreadsheets.values | Sheets API - Resource: ValueRange
xlrd documentation - xlrd 1.1.0 documentation

normally my api returns json, but sometimes returns a full response object?

All,
I have a script i have in place which fetches JSON off of a webserver. Simple as the following:
url = "foo.com/json"
response = requests.get(url).content
data = json.loads(response)
but i noticed is that sometimes instead of returning the JSON object, it will return what looks like a response dump. See here: https://pastebin.com/fUy5YMuY
What confuses me is to how to continue on.
Right now i took the above python and wrapped it
try:
url = "foo.com/json"
response = requests.get(url).content
data = json.loads(response)
except Exception as ex:
with open("test.txt", "w") as t:
t.write(response)
print("Error", sys.exc_info())
Is there a way to catch this? Right now I get a ValueError... and then reparse it? I was thinking to do something like:
except Exception as ex:
response = reparse(response)
but im still confused as to why it will sometimes return the JSON and other times, the header info + content.
def reparse(response):
"""
Catch the ValueError and attempt to reparse it for the json contnet
"""
Can i feed something like the pastebin dump into some sort of requests.Reponse class or similar?
Edit Here is the full stack trace I am getting.
File "scrape_people_by_fcc_docket.py", line 82, in main
json_data = get_page(limit, page*limit)
File "scrape_people_by_fcc_docket.py", line 13, in get_page
data = json.loads(response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 16 column 367717 (char 3 - 368222)
None
In the above code, the response variable is defined by:
response = requests.get(url).content
which is odd because most of the time, reponse will return a JSON object which is completely parsable.
Ideally, I have been trying to find a way to, when content isnt JSON, pass some how parse it for the actual content and then continue on.
instead of using .text or .content you can use the response method: .json() which so far seems to resolve my issues. I am doing continual testing and watching for errors and will update this as needed, but it seems that the json function will return the data i need without headers, and similarly already calls json.loads or similar to parse the information.

python - parse string (which is an array) returned from a web service as array/list

I am using
httplib.HTTPConnection(self._myurl)
conn.request("GET", "/")
data = conn.getresponse().read()
now this URL returns an python type arrays similar to the below:
[1,"apple",23,"good"]
[2,"grape",4,"bad"]
Now i am getting this result from the service as a string in data. How do i get this result parsed/encoded as array/list straight away without having to dissect it myself and create an array?
If the server is returning JSON (which it looks like it might be) it is a simple matter of:
import json
# ... snip ...
rehydrated_data = json.loads(data)
[updated]
Service was actually designed for stream rather than data in bulk. So that is why I was actually getting array of objects instead of array of array objects.
I handled it at javascript side finally using the following logic.
1. data = data.replace(']','],')
2. data = '[' + data.rstrip(',') + ']'
3. data = json.loads(data)
Reply to the above answer
Actually it is not, the json will be
{"key":[[...],[...]]}
I have tried the code and it fails with the following error.
...
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 38 column 119 (char 25 - 4048)

Categories