Handling POSTed multipart/form-data file - python

I'm wondering what is the best way to handle POSTed raw data on the server side.
So I'm using Falconframework and I'm able to receive user submitted file
-----------------------------1209846671886287098156775745
Content-Disposition: form-data; name="qquuid"
d3ad452e-a287-4cb7-ac1f-f0a5cdb54386
-----------------------------1209846671886287098156775745
Content-Disposition: form-data; name="qqfilename"
Screenshot.png
-----------------------------1209846671886287098156775745
Content-Disposition: form-data; name="qqtotalfilesize"
1951677
-----------------------------1209846671886287098156775745
Content-Disposition: form-data; name="qqfile"; filename="Screenshot.png"
Content-Type: image/png
�PNG
.................lots of bites............
Using python and hopefully some other lib i would like to turn it into some sort of file object which i can extract metadata - filename , uuid etc, as well as the file itself.
Which lib should i use?

Here is a middle ware project that looks promising I'm currently trying to implement this myself in a falcon service.
falcon-multipart
I have have pretty good luck as well using cgi.FeildStorage(). As found in the following post.
cgi article
import cgi
def on_post(req, resp):
env = req.env
env.setdefault('QUERY_STRING','')
form = cgi.FieldStorage(fp=req.stream,environ=env)
form['fileinputname'].file
If you are willing to have one non falcon hook here is an example with bottle:
example

Just a very late followup to this old discussion.
As of Falcon 3.0, the framework supports multipart/form-data natively for both WSGI and ASGI applications.

Related

Handling and uploading binary file data to S3 using AWS API gateway and Lambda in Python

I am trying to process an http request which consists of multipart/form-data.
Getting input request body as -
START RequestId: 77e9936c-6bf5-48e2-91bc-ab6c9a2d15da Version: $LATEST ------WebKitFormBoundarytY6U5v3pmlyEDnhY Content-Disposition: form-data; name="contactType" {"value":"Technical","label":"Question for Technical Team"} ------WebKitFormBoundarytY6U5v3pmlyEDnhY Content-Disposition: form-data; name="message" "Test 07/06/2022 9:59" ------WebKitFormBoundarytY6U5v3pmlyEDnhY Content-Disposition: form-data; name="upload"; filename="W15V011.pdf" Content-Type: application/pdf %PDF-1.3 %����--- long filemeta data
After reading form data (with the help of cgi), I am able to extract all the fields and their data separately. Problem is about getting the file data properly. When I print, I am getting below response:
b'\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\x00\xef\xbf\xbd\x00\x03...conti
when I am trying to upload it on S3 bucket using s3.put_object.
s3.put_object(Bucket=bucket, Key=filename, Body=upload)
able to upload file but while downloading received corrupted file. I have tried in many ways but unable to fix it. Please help me here to fix it.

Extract Field Headers from Blob Handler in Flask on App Engine

I am trying to parse the POST request sent by the App Engine blobstore handler in development to get the Google Cloud Storage file path ('/gs/...') using Flask. Webapp2 has a method to get this if you inherit from blobstore_handlers.BlobstoreUploadHandler :- self.get_file_infos(). This method is not available in Flask.
This is a sample of the raw request data in Flask using request.get_data():
--===============0287937837666164318==
Content-Type: message/external-body; blob-key="encoded_gs_file:ZnBscy1kZXYvZmFrZS1nVTFHNFdrc3hobUFoaEtWVEVmNHZnPT0="; access-type="X-AppEngine-BlobKey"
Content-Disposition: form-data; name="file"; filename="Human Code Reviews One.pdf"
Content-Type: application/pdf
Content-Length: 951486
Content-MD5: NzNhOTI0YjdjNTFiMjEyYmY0NDUzZGFmYzBlOTExNTY=
X-AppEngine-Cloud-Storage-Object: /gs/appname/fake-gU1G4WksxhmAhhKVTEf4vg==
content-disposition: form-data; name="file"; filename="Human Code Reviews One.pdf"
X-AppEngine-Upload-Creation: 2018-01-22 12:26:08.095166
--===============0287937837666164318==--
I have tried both msg = email.parser.Parser().parsestr(raw_data) and msg = email.message_from_string(raw_data) but msg.items() return an empty list.
If I do rd = raw_data.split('\r\n') and parse a line containing a proper header I get what I want for that line: [('X-AppEngine-Cloud-Storage-Object', '/gs/appname/fake-gU1G4WksxhmAhhKVTEf4vg==')].
The issue is how to do this for the entire string and skip the blank and boundary lines.
For now, I am using the following code but I can't help but think there's a way to do this without reinventing the wheel:
for line in raw_data.split('\r\n'):
if line.startswith(blobstore.CLOUD_STORAGE_OBJECT_HEADER):
gcs_path = line.split(':')[1].strip()
Thank you.
Edit:
This question is not a duplicate of the one here (How to get http headers in flask?) because I have a raw string (called a field header, see the boundary delimiters not present in HTTP headers) I would like to parse into a dictionary.

Handle file from WSGI request

Question
What is a good way to handle a file that has been uploaded through a WSGI POST request?
More info
So far, I'm able to read the raw POST data from environ[wsgi.input]. At this point the issue I am having is that the information associated with the file and the file itself are jammned together into one string:
'------WebKitFormBoundarymzmB1wyHKjyqZrDm
Content-Disposition: form-data; name="file"; filename="new file.wav"
Content-Type: audio/wav
THIS IS THE CONTENT
THIS IS THE CONTENT
THIS IS THE CONTENT
THIS IS THE CONTENT
THIS IS THE CONTENT
------WebKitFormBoundarymzmB1wyHKjyqZrDm--
'
Is there a library in python I should be using to handle information more cleanly? Ultimately, I'd like to take the file contents and then turn around and upload to Amazon S3.
You can use cgi.FieldStorage.
import cgi
form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
f = form['file'].file
# You can use `f` as a file object: f.read(...)
Usually you want much more abstraction than raw WSGI. Consider frameworks that run on WSGI.

POST request with Multipart/form-data. Content-type not correct

We're trying to write a script with python (using python-requests a.t.m.) to do a POST request to a site where the content has to be MultipartFormData.
When we do this POST request manually (by filling in the form on the site and post), using wireshark, this came up (short version):
Content-Type: multipart/form-data;
Content-Disposition: form-data; name="name"
Data (8 Bytes)
John Doe
When we try to use the python-requests library for achieving the same result, this is sent:
Content-Type: application/x-pandoplugin
Content-Disposition: form-data; name="name"; filename="name"\r\n
Media type: application/x-pandoplugin (12 Bytes)
//and then in this piece is what we posted://
John Doe
The weird thing is that the 'general type' of the packet indeed is multipart/form-data, but the individual item sent (key = 'name', value= 'John Doe') has type application/x-pandoplugin (a random application on my pc I guess).
This is the code used:
response = s.post('http://url.com', files={'name': 'John Doe'})
Is there a way to specify the content-type of the individual items instead of using the headers argument (which only changes the type of the 'whole' packet)?
We think the server doesn't respond correctly due to the fact that it can't understand the content-type we send it.
Little update:
I think the different parts of the multipart content are now identical to the ones sent if I do the POST in the browser, so that's good. Still the server doesn't actually do the changes I send it with the script. The only thing that still is different is the order of the different parts.
For example this is what my browser sends:
Boundary: \r\n------WebKitFormBoundary3eXDYO1lG8Pgxjwj\r\n
Encapsulated multipart part: (text/plain)
Content-Disposition: form-data; name="file"; filename="ex.txt"\r\n
Content-Type: text/plain\r\n\r\n
Line-based text data: text/plain
lore ipsum blabbla
Boundary: \r\n------WebKitFormBoundary3eXDYO1lG8Pgxjwj\r\n
Encapsulated multipart part:
Content-Disposition: form-data; name="seq"\r\n\r\n
Data (2 bytes)
Boundary: \r\n------WebKitFormBoundary3eXDYO1lG8Pgxjwj\r\n
Encapsulated multipart part:
Content-Disposition: form-data; name="name"\r\n\r\n
Data (2 bytes)
And this is what the script (using python-requests) sends:
Boundary: \r\n------WebKitFormBoundary3eXDYO1lG8Pgxjwj\r\n
Encapsulated multipart part:
Content-Disposition: form-data; name="name"\r\n\r\n
Data (2 bytes)
Boundary: \r\n------WebKitFormBoundary3eXDYO1lG8Pgxjwj\r\n
Encapsulated multipart part: (text/plain)
Content-Disposition: form-data; name="file"; filename="ex.txt"\r\n
Content-Type: text/plain\r\n\r\n
Line-based text data: text/plain
lore ipsum blabbla
Boundary: \r\n------WebKitFormBoundary3eXDYO1lG8Pgxjwj\r\n
Encapsulated multipart part:
Content-Disposition: form-data; name="seq"\r\n\r\n
Data (2 bytes)
Could it be possible that the server counts on the order of the parts? According to Multipart upload form: Is order guaranteed?, it apparently is? And if so, is it possible to explicitly force an order using the requests library?
And to make things worse in that case: There is a mixture of a file and just text values.
So forcing an order seems rather difficult. This is the current way I do it:
s.post('http://www.url.com', files=files,data = form_values)
EDIT2:
I did a modification in the requests plugin to make sure the order of the parts is the same as in the original request. This doesn't fix the problem so I guess there is no straightforward solution for my problem. I'll send a mail to the devs of the site and hope they can help me!
your code looks correct.
requests.post('http://url.com', files={'name': 'John Doe'})
... and should send a 'multipart/form-data' Post.
and indeed, I get something like this posted:
Accept-Encoding: gzip, deflate, compress
Connection: close
Accept: */*
Content-Length: 188
Content-Type: multipart/form-data; boundary=032a1ab685934650abbe059cb45d6ff3
User-Agent: python-requests/1.2.3 CPython/2.7.4 Linux/3.8.0-27-generic
--032a1ab685934650abbe059cb45d6ff3
Content-Disposition: form-data; name="name"; filename="name"
Content-Type: application/octet-stream
John Doe
--032a1ab685934650abbe059cb45d6ff3--
I have no idea why you'd get that weird Content-Type header:
Content-Type: application/x-pandoplugin
I would begin by removing Pando Web Plugin from your machine completely, and then try your python-requests code again. (or try from a different machine)
As of today you can do:
response = s.post('http://url.com', files={'name': (filename, contents, content_type)})
Python uses a system-wide configuration file to "guess" the mime-type of a file. If those plugins are registering your file extension with their custom mime-type you'll end up putting that in instead.
The safest approach is make your own mime type guessing that suits the particular server you're sending do, and only use the native python mime type guessing for extensions you didn't think of.
How exactly you specify the content-type manually with python-requests I don't know, but I expect it should be possible.

Python 2.6 - Upload zip file - Poster 0.4

I came here via this question:
Send file using POST from a Python script
And by and large it's what I need, plus some additional.
Besides the zipfile som additional information is needed and the POST_DATA looks something like this:
POSTDATA =-----------------------------293432744627532
Content-Disposition: form-data; name="categoryID"
1
-----------------------------293432744627532
Content-Disposition: form-data; name="cID"
-3
-----------------------------293432744627532
Content-Disposition: form-data; name="FileType"
zip
-----------------------------293432744627532
Content-Disposition: form-data; name="name"
Kylie Minogue
-----------------------------293432744627532
Content-Disposition: form-data; name="file1"; filename="At the Beach x8-8283.zip"
Content-Type: application/x-zip-compressed
PK........................
Is this somehow possible with the poster 0.4 module (and before you ask, yes, I'm fairly new to Python...)
Kind regards,
Brian K. Andersen
Poster has basic and advanced multipart support.
You may try something like this (modified from poster documentation):
# test_client.py
from poster.encode import multipart_encode
from poster.streaminghttp import register_openers
import urllib2
# Register the streaming http handlers with urllib2
register_openers()
# headers contains the necessary Content-Type and Content-Length
# datagen is a generator object that yields the encoded parameters
datagen, headers = multipart_encode({
'categoryID' : 1,
'cID' : -3,
'FileType' : 'zip',
'name' : 'Kylie Minogue',
'file1' : open('At the Beach x8-8283.zip')
})
# Create the Request object
request = urllib2.Request("http://localhost:5000/upload_data", datagen, headers)
# Actually do the request, and get the response
print urllib2.urlopen(request).read()

Categories