Question
What is a good way to handle a file that has been uploaded through a WSGI POST request?
More info
So far, I'm able to read the raw POST data from environ[wsgi.input]. At this point the issue I am having is that the information associated with the file and the file itself are jammned together into one string:
'------WebKitFormBoundarymzmB1wyHKjyqZrDm
Content-Disposition: form-data; name="file"; filename="new file.wav"
Content-Type: audio/wav
THIS IS THE CONTENT
THIS IS THE CONTENT
THIS IS THE CONTENT
THIS IS THE CONTENT
THIS IS THE CONTENT
------WebKitFormBoundarymzmB1wyHKjyqZrDm--
'
Is there a library in python I should be using to handle information more cleanly? Ultimately, I'd like to take the file contents and then turn around and upload to Amazon S3.
You can use cgi.FieldStorage.
import cgi
form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
f = form['file'].file
# You can use `f` as a file object: f.read(...)
Usually you want much more abstraction than raw WSGI. Consider frameworks that run on WSGI.
Related
I have two separate Django projects and want to locate them in 2 different servers. Is there a way to communicate with each other? In one projects I need to be able to upload a file and send it to second project for processing.
It's called service discovery. read more about it here: https://microservices.io/patterns/server-side-discovery.html
basically you can just use python request library:
import request
files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}
r = requests.post(url, files=files, data=values)
and the response from the other server can be for example json with urls to the files, or something...
check python requests file upload
but be careful, you should send some hashes/keys or something so other people can't imitate the behavior and send you false data. That ofc depends on what you need it for
I am trying to parse the POST request sent by the App Engine blobstore handler in development to get the Google Cloud Storage file path ('/gs/...') using Flask. Webapp2 has a method to get this if you inherit from blobstore_handlers.BlobstoreUploadHandler :- self.get_file_infos(). This method is not available in Flask.
This is a sample of the raw request data in Flask using request.get_data():
--===============0287937837666164318==
Content-Type: message/external-body; blob-key="encoded_gs_file:ZnBscy1kZXYvZmFrZS1nVTFHNFdrc3hobUFoaEtWVEVmNHZnPT0="; access-type="X-AppEngine-BlobKey"
Content-Disposition: form-data; name="file"; filename="Human Code Reviews One.pdf"
Content-Type: application/pdf
Content-Length: 951486
Content-MD5: NzNhOTI0YjdjNTFiMjEyYmY0NDUzZGFmYzBlOTExNTY=
X-AppEngine-Cloud-Storage-Object: /gs/appname/fake-gU1G4WksxhmAhhKVTEf4vg==
content-disposition: form-data; name="file"; filename="Human Code Reviews One.pdf"
X-AppEngine-Upload-Creation: 2018-01-22 12:26:08.095166
--===============0287937837666164318==--
I have tried both msg = email.parser.Parser().parsestr(raw_data) and msg = email.message_from_string(raw_data) but msg.items() return an empty list.
If I do rd = raw_data.split('\r\n') and parse a line containing a proper header I get what I want for that line: [('X-AppEngine-Cloud-Storage-Object', '/gs/appname/fake-gU1G4WksxhmAhhKVTEf4vg==')].
The issue is how to do this for the entire string and skip the blank and boundary lines.
For now, I am using the following code but I can't help but think there's a way to do this without reinventing the wheel:
for line in raw_data.split('\r\n'):
if line.startswith(blobstore.CLOUD_STORAGE_OBJECT_HEADER):
gcs_path = line.split(':')[1].strip()
Thank you.
Edit:
This question is not a duplicate of the one here (How to get http headers in flask?) because I have a raw string (called a field header, see the boundary delimiters not present in HTTP headers) I would like to parse into a dictionary.
I am attempting to create a simple Flask endpoint for uploading files via POST or PUT. I want the filename in the URL, and then to (after the request headers) just stream the raw file data in the request.
I also need to be able to upload files slightly larger than 2GB, and I need to do this without storing the entire file in memory. At first, this seemed simple enough:
#application.route("/upload/<filename>", methods=['POST', 'PUT'])
def upload(filename):
# Authorization and sanity checks skipped.
filename = secure_filename(filename)
fileFullPath = os.path.join(application.config['UPLOAD_FOLDER'], filename)
with open(fileFullPath, 'wb') as f:
copyfileobj(request.stream, f)
return jsonify({'filename': filename})
With a multipart/formdata upload, I can simply call .save() on the file.
However, any file I upload seems to have a different checksum (well, sha256sum, on the server then on the source). When uploading a standard text file, newlines seem to be getting stripped. Binary files seem to be getting mangled in other strange ways.
I am sending Content-Type: application/octet-stream when uploading to try to make Flask treat all uploads as binary. Is request.stream (a proxy to wsgi.input) opened as non-binary? I can't seem to figure that out from the Flask code. How can I stream the request data, in raw binary format, to a file on disk?
I'm open to hacks; this is for a test project (so I'm also not interested in hearing how sending this as formdata would be better, or how this isn't a good way to upload files, etc.)
I am testing this via:
curl -H 'Content-Type: application/octet-stream' -H 'Authorization: ...' -X PUT --data #/path/to/test/file.name https://test.example.com/upload/file.name
My question is similar in nature to this very helpful answer, but I want to stream in and out simultaneously.
I have a django app that stores references to files on an external http fileserver. Right now when a user requests a zip collection it does the following (pseudo):
generate session_id
for url in url_list:
download file to sessionid/filename.ext
for file in session_id folder:
zip.write
close
http response
Obviously this is less than ideal as it: 1. Requires cleanup, 2. Is slow and 3. causes a long delay before the user sees any download progress.
The bit I'm unable to re-code is the io buffer/"file like object". Zipfile looks for a file on write but I want to provide a stream. In brief, how can I pipe requests to zipfile to HttpResponse?
You can use the writestr command with a streaming file download, I recommend the requests library.
Zipfile.writestr Documentation
ZipFile.writestr(zinfo_or_arcname, bytes[, compress_type])
Edit: Sample
zipinfo = ...
zip = ...
for url in list:
r = requests.get(url, stream=True)
for line in r.iter_lines():
# filter out keep-alive new lines
if line:
print(zip.writestr(zipinfo, line))
I want to upload a file to an url. The file I want to upload is not on my computer, but I have the url of the file. I want to upload it using requests library. So, I want to do something like this:
url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)
But, only difference is, the file report.xls comes from some url which is not in my computer.
The only way to do this is to download the body of the URL so you can upload it.
The problem is that a form that takes a file is expecting the body of the file in the HTTP POST. Someone could write a form that takes a URL instead, and does the fetching on its own… but that would be a different form and request than the one that takes a file (or, maybe, the same form, with an optional file and an optional URL).
You don't have to download it and save it to a file, of course. You can just download it into memory:
urlsrc = 'http://example.com/source'
rsrc = requests.get(urlsrc)
urldst = 'http://example.com/dest'
rdst = requests.post(urldst, files={'file': rsrc.content})
Of course in some cases, you might always want to forward along the filename, or some other headers, like the Content-Type. Or, for huge files, you might want to stream from one server to the other without downloading and then uploading the whole file at once. You'll have to do any such things manually, but almost everything is easy with requests, and explained well in the docs.*
* Well, that last example isn't quite easy… you have to get the raw socket-wrappers off the requests and read and write, and make sure you don't deadlock, and so on…
There is an example in the documentation that may suit you. A file-like object can be used as a stream input for a POST request. Combine this with a stream response for your GET (passing stream=True), or one of the other options documented here.
This allows you to do a POST from another GET without buffering the entire payload locally. In the worst case, you may have to write a file-like class as "glue code", allowing you to pass your glue object to the POST that in turn reads from the GET response.
(This is similar to a documented technique using the Node.js request module.)
import requests
img_url = "http://...."
res_src = requests.get(img_url)
payload={}
files=[
('files',('image_name.jpg', res_src.content,'image/jpeg'))
]
headers = {"token":"******-*****-****-***-******"}
response = requests.request("POST", url, headers=headers, data=payload, files=files)
print(response.text)
above code is working for me.