How do I handle file upload via PUT request in Django?

How do I handle file upload via PUT request in Django? - python

I'm implementing a REST-style interface and would like to be able to create (via upload) files via a HTTP PUT request. I would like to create either a TemporaryUploadedFile or a InMemoryUploadedFile which I can then pass to my existing FileField and .save() on the object that is part of the model, thereby storing the file.
I'm not quite sure about how to handle the file upload part. Specifically, this being a put request, I do not have access to request.FILES since it does not exist in a PUT request.
So, some questions:
Can I leverage existing functionality in the HttpRequest class, specifically the part that handles file uploads? I know a direct PUT is not a multipart MIME request, so I don't think so, but it is worth asking.
How can I deduce the mime type of what is being sent? If I've got it right, a PUT body is simply the file without prelude. Do I therefore require that the user specify the mime type in their headers?
How do I extend this to large amounts of data? I don't want to read it all into memory since that is highly inefficient. Ideally I'd do what TemporaryUploadFile and related code does - write it part at a time?
I've taken a look at this code sample which tricks Django into handling PUT as a POST request. If I've got it right though, it'll only handle form encoded data. This is REST, so the best solution would be to not assume form encoded data will exist. However, I'm happy to hear appropriate advice on using mime (not multipart) somehow (but the upload should only contain a single file).
Django 1.3 is acceptable. So I can either do something with request.raw_post_data or request.read() (or alternatively some other better method of access). Any ideas?

Django 1.3 is acceptable. So I can
either do something with
request.raw_post_data or
request.read() (or alternatively some
other better method of access). Any
ideas?
You don't want to be touching request.raw_post_data - that implies reading the entire request body into memory, which if you're talking about file uploads might be a very large amount, so request.read() is the way to go. You can do this with Django <= 1.2 as well, but it means digging around in HttpRequest to figure out the the right way to use the private interfaces, and it's a real drag to then ensure your code will also be compatible with Django >= 1.3.
I'd suggest that what you want to do is to replicate the existing file upload behaviour parts of the MultiPartParser class:
Retrieve the upload handers from request.upload_handlers (Which by default will be MemoryFileUploadHandler & TemporaryFileUploadHandler)
Determine the request's content length (Search of Content-Length in HttpRequest or MultiPartParser to see the right way to do this.)
Determine the uploaded file's filename, either by letting the client specify this using the last path part of the url, or by letting the client specify it in the "filename=" part of the Content-Disposition header.
For each handler, call handler.new_file with the relevant args (mocking up a field name)
Read the request body in chunks using request.read() and calling handler.receive_data_chunk() for each chunk.
For each handler call handler.file_complete(), and if it returns a value, that's the uploaded file.
How can I deduce the mime type of what
is being sent? If I've got it right, a
PUT body is simply the file without
prelude. Do I therefore require that
the user specify the mime type in
their headers?
Either let the client specify it in the Content-Type header, or use python's mimetype module to guess the media type.
I'd be interested to find out how you get on with this - it's something I've been meaning to look into myself, be great if you could comment to let me know how it goes!
Edit by Ninefingers as requested, this is what I did and is based entirely on the above and the django source.
upload_handlers = request.upload_handlers
content_type = str(request.META.get('CONTENT_TYPE', ""))
content_length = int(request.META.get('CONTENT_LENGTH', 0))
if content_type == "":
return HttpResponse(status=400)
if content_length == 0:
# both returned 0
return HttpResponse(status=400)
content_type = content_type.split(";")[0].strip()
try:
charset = content_type.split(";")[1].strip()
except IndexError:
charset = ""
# we can get the file name via the path, we don't actually
file_name = path.split("/")[-1:][0]
field_name = file_name
Since I'm defining the API here, cross browser support isn't a concern. As far as my protocol is concerned, not supplying the correct information is a broken request. I'm in two minds as to whether I want say image/jpeg; charset=binary or if I'm going to allow non-existent charsets. In any case, I'm putting setting Content-Type validly as a client-side responsibility.
Similarly, for my protocol, the file name is passed in. I'm not sure what the field_name parameter is for and the source didn't give many clues.
What happens below is actually much simpler than it looks. You ask each handler if it will handle the raw input. As the author of the above states, you've got MemoryFileUploadHandler & TemporaryFileUploadHandler by default. Well, it turns out MemoryFileUploadHandler will when asked to create a new_file decide whether it will or not handle the file (based on various settings). If it decides it's going to, it throws an exception, otherwise it won't create the file and lets another handler take over.
I'm not sure what the purpose of counters was, but I've kept it from the source. The rest should be straightforward.
counters = [0]*len(upload_handlers)
for handler in upload_handlers:
result = handler.handle_raw_input("",request.META,content_length,"","")
for handler in upload_handlers:
try:
handler.new_file(field_name, file_name,
content_type, content_length, charset)
except StopFutureHandlers:
break
for i, handler in enumerate(upload_handlers):
while True:
chunk = request.read(handler.chunk_size)
if chunk:
handler.receive_data_chunk(chunk, counters[i])
counters[i] += len(chunk)
else:
# no chunk
break
for i, handler in enumerate(upload_handlers):
file_obj = handler.file_complete(counters[i])
if not file_obj:
# some indication this didn't work?
return HttpResponse(status=500)
else:
# handle file obj!

Newer Django versions allow for handling this a lot easier thanks to https://gist.github.com/g00fy-/1161423
I modified the given solution like this:
if request.content_type.startswith('multipart'):
put, files = request.parse_file_upload(request.META, request)
request.FILES.update(files)
request.PUT = put.dict()
else:
request.PUT = QueryDict(request.body).dict()
to be able to access files and other data like in POST. You can remove the calls to .dict() if you want your data to be read-only.

I hit this problem while working with Django 2.2, and was looking for something that just worked for uploading a file via PUT request.
from django.http import QueryDict
from django.http.multipartparser import MultiValueDict
from django.core.files.uploadhandler import (
SkipFile,
StopFutureHandlers,
StopUpload,
)
class PutUploadMiddleware(object):
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
method = request.META.get("REQUEST_METHOD", "").upper()
if method == "PUT":
self.handle_PUT(request)
return self.get_response(request)
def handle_PUT(self, request):
content_type = str(request.META.get("CONTENT_TYPE", ""))
content_length = int(request.META.get("CONTENT_LENGTH", 0))
file_name = request.path.split("/")[-1:][0]
field_name = file_name
content_type_extra = None
if content_type == "":
return HttpResponse(status=400)
if content_length == 0:
# both returned 0
return HttpResponse(status=400)
content_type = content_type.split(";")[0].strip()
try:
charset = content_type.split(";")[1].strip()
except IndexError:
charset = ""
upload_handlers = request.upload_handlers
for handler in upload_handlers:
result = handler.handle_raw_input(
request.body,
request.META,
content_length,
boundary=None,
encoding=None,
)
counters = [0] * len(upload_handlers)
for handler in upload_handlers:
try:
handler.new_file(
field_name,
file_name,
content_type,
content_length,
charset,
content_type_extra,
)
except StopFutureHandlers:
break
for chunk in request:
for i, handler in enumerate(upload_handlers):
chunk_length = len(chunk)
chunk = handler.receive_data_chunk(chunk, counters[i])
counters[i] += chunk_length
if chunk is None:
# Don't continue if the chunk received by
# the handler is None.
break
for i, handler in enumerate(upload_handlers):
file_obj = handler.file_complete(counters[i])
if file_obj:
# If it returns a file object, then set the files dict.
request.FILES.appendlist(file_name, file_obj)
break
any(handler.upload_complete() for handler in upload_handlers)

Related

How to convert suds object to xml string

This is a duplicate to this question:
How to convert suds object to xml
But the question has not been answered: "totxt" is not an attribute on the Client class.
Unfortunately I lack of reputation to add comments. So I ask again:
Is there a way to convert a suds object to its xml?
I ask this because I already have a system that consumes wsdl files and sends data to a webservice. But now the customers want to alternatively store the XML as files (to import them later manually). So all I need are 2 methods for writing data: One writes to a webservice (implemented and tested), the other (not implemented yet) writes to files.
If only I could make something like this:
xml_as_string = My_suds_object.to_xml()
The following code is just an example and does not run. And it's not elegant. Doesn't matter. I hope you get the idea what I want to achieve:
I have the function "write_customer_obj_webservice" that works. Now I want to write the function "write_customer_obj_xml_file".
import suds
def get_customer_obj():
wsdl_url = r'file:C:/somepathhere/Customer.wsdl'
service_url = r'http://someiphere/Customer'
c = suds.client.Client(wsdl_url, location=service_url)
customer = c.factory.create("ns0:CustomerType")
return customer
def write_customer_obj_webservice(customer):
wsdl_url = r'file:C:/somepathhere/Customer.wsdl'
service_url = r'http://someiphere/Customer'
c = suds.client.Client(wsdl_url, location=service_url)
response = c.service.save(someparameters, None, None, customer)
return response
def write_customer_obj_xml_file(customer):
output_filename = r'C\temp\testxml'
# The following line is the problem. "to_xml" does not exist and I can't find a way to do it.
xml = customer.to_xml()
fo = open(output_filename, 'a')
try:
fo.write(xml)
except:
raise
else:
response = 'All ok'
finally:
fo.close()
return response
# Get the customer object always from the wsdl.
customer = get_customer_obj()
# Since customer is an object, setting it's attributes is very easy. There are very complex objects in this system.
customer.name = "Doe J."
customer.age = 42
# Write the new customer to a webservice or store it in a file for later proccessing
if later_processing:
response = write_customer_obj_xml_file(customer)
else:
response = write_customer_obj_webservice(customer)

I found a way that works for me. The trick is to create the Client with the option "nosend=True".
In the documentation it says:
nosend - Create the soap envelope but don't send. When specified, method invocation returns a RequestContext instead of sending it.
The RequestContext object has the attribute envelope. This is the XML as string.
Some pseudo code to illustrate:
c = suds.client.Client(url, nosend=True)
customer = c.factory.create("ns0:CustomerType")
customer.name = "Doe J."
customer.age = 42
response = c.service.save(someparameters, None, None, customer)
print response.envelope # This prints the XML string that would have been sent.

You have some issues in write_customer_obj_xml_file function:
Fix bad path:
output_filename = r'C:\temp\test.xml'
The following line is the problem. "to_xml" does not exist and I can't find a way to do it.
What's the type of customer? type(customer)?
xml = customer.to_xml() # to be continued...
Why mode='a'? ('a' => append, 'w' => create + write)
Use a with statement (file context manager).
with open(output_filename, 'w') as fo:
fo.write(xml)
Don't need to return a response string: use an exception manager. The exception to catch can be EnvironmentError.
Analyse
The following call:
customer = c.factory.create("ns0:CustomerType")
Construct a CustomerType on the fly, and return a CustomerType instance customer.
I think you can introspect your customer object, try the following:
vars(customer) # display the object attributes
help(customer) # display an extensive help about your instance
Another way is to try the WSDL URLs by hands, and see the XML results.
You may obtain the full description of your CustomerType object.
And then?
Then, with the attributes list, you can create your own XML. Use an XML template and fill it with the object attributes.
You may also found the magic function (to_xml) which do the job for you. But, not sure the XML format matches your need.

client = Client(url)
client.factory.create('somename')
# The last XML request by client
client.last_sent()
# The last XML response from Web Service
client.last_received()

open('file_path').read() - save contents back to file

I need to upload a file to the server using urllib2. Since I cannot use any external libraries (like requests and others) because I am using OpenOffice python, I needed a simple way to post file data.
So I came with:
post_url = "http://localhost:8000/admin/oo_file_uploader?user_id=%s&file_id=%s" % (user_id, file_id)
file_path = doc.Location.replace('file://', '')
data = urllib.urlencode({"file": open(file_path).read()})
urllib2.urlopen(post_url, data)
which posts something to the server.
I wonder if it is possible to save posted contents back to the file using python/django?

This expands somewhat on #zero323's answer. You will want to make sure that you implement some sort of security to prevent random files being uploaded by unauthorized users, which is what the #file_upload_security decorator is implied to handle.
#file_upload_security
def oo_file_uploader(user_id=None, file_id=None):
if request.method == 'POST':
# Exception handling skipped if get() fails.
user = User.objects.get(id=user_id)
save_to_file = MyFiles.objects.get(id=file_id)
# You will probably want to ensure 'file' is in post data.
file_contents = save_to_file.parse_post_to_content(request.POST['file'])
with open(save_to_file.path_to_file, 'w') as fw:
fw.write(file_contents)

File upload with Django via PUT

I am trying to implement a function in Django to upload an image from a client (an iPhone app) to an Amazon S3 server. The iPhone app sends a HttpRequest (method PUT) with the content of the image in the HTTPBody. For instance, the client PUTs the image to the following URL: http://127.0.0.1:8000/uploadimage/sampleImage.png/
My function in Django looks like this to handle such a PUT request and save the file to S3:
def store_in_s3(filename, content):
conn = S3Connection(settings.ACCESS_KEY, settings.PASS_KEY) # gets access key and pass key from settings.py
bucket = conn.create_bucket("somepicturebucket")
k = Key(bucket)
k.key = filename
mime = mimetypes.guess_type(filename)[0]
k.set_metadata("Content-Type", mime)
k.set_contents_from_string(content)
k.set_acl("public-read")
def upload_raw_data(request, name):
if request.method == 'PUT':
store_in_s3(name,request.raw_post_data)
return HttpResponse('Upload of raw data to S3 successful')
else:
return HttpResponse('Upload not successful')
My problem is how to tell my function the name of the image. In my urls.py I have the following but it won't work:
url(r'^uploadrawdata/(\d+)/', upload_raw_data ),
Now as far as I'm aware, d+ stands for digits, so it's obviously of no use here when I pass the name of a file. However, I was wondering if this is the correct way in the first place. I read this post here and it suggests the following line of code which I don't understand at all:
file_name = path.split("/")[-1:][0]
Also, I have no clue what the rest of the code is all about. I'm a bit new to all of this, so any suggestions of how to simply upload an image would be very welcome. Thanks!

This question is not really about uploading, and the linked answer is irrelevant. If you want to accept a string rather than digits in the URL, in order to pass a filename, you can just use w instead of d in the regex.
Edit to clarify Sorry, didn't realise you were trying to pass a whole file+extension. You probably want this:
r'^uploadrawdata/(.+)/$'
so that it matches any character. You should probably read an introduction to regular expressions, though.

Rendering requested type in Tornado

In Tornado, how do you tell apart the different request types? Also, what is the proper way to split out the requests? In the end if I go to /item/1.xml, I want xml, /item/1.html to be a proper html view etc.
Something like:
def getXML():
return self.render('somexmlresult.xml')
def getHTML():
return self.rendeR('htmlresult.html')
or
def get():
if request == 'xml':
return self.render('somexmlresult.xml')
elif request == 'html':
return self.render('htmlresult.html')
~ edit ~ I was shooting for something along the lines of rails' implementation seen here

I would prefer a self describing url like a RESTful application. An url part need not be required to represent the format of the resource. http://www.enterprise.com/customer/abc/order/123 must represent the resource irrespective of whether it is xml/html/json. A way to send the requested format is to send it as one of the request parameters.
http://www.enterprise.com/customer/abc/order/123?mimetype=application/xml
http://www.enterprise.com/customer/abc/order/123?mimetype=application/json
http://www.enterprise.com/customer/abc/order/123?mimetype=text/html
Use the request parameter to serialize to the appropriate format.

mimetype is the correct way to do this, however I can see where an end user would want a more simplistic way of accessing the data in the format they wish.
In order to maintain compatibility with standards compliant libraries etc you should ultimately determine the response type based on the mimetype requested and respond with the appropriate mimetype in the headers.
A way to achieve this while not breaking anything would be to add a parser that checks the URI that was requested for a suffix that matches a tuple of defined suffixes that the route can respond to, if it does and the mimetype is not already specified change the mimetype passed in to be the correct type for the suffix.
Make sure that the final decision is based on the supplied mimetype not the suffix.
This way others can interact with your RESTful service in the way their used to and you can still maintain ease of use for humans etc.
~ edit ~
Heres an example regexp that checks to see if it ends in .js | .html | .xml | .json. This is assuming your given the full URI.
(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*\.(?:js|html|xml|json))(?:\?([^#]*))?(?:#(.*))?
Here's an example that's easier to interpret but less robust
^https?://(?:[a-z\-]+\.)+[a-z]{2,6}(?:/[^/#?]+)+\.(?:js|html|xml|json)$
These regex's are taken from rfc2396

First, set up the handlers to count on a restful style URI. We use 2 chunks of regex looking for an ID and a potential request format (ie html, xml, json etc)
class TaskServer(tornado.web.Application):
def __init__(self, newHandlers = [], debug = None):
request_format = "(\.[a-zA-Z]+$)?"
baseHandlers = [
(r"/jobs" + request_format, JobsHandler),
(r"/jobs/", JobsHandler),
(r"/jobs/new" + request_format, NewJobsHandler),
(r"/jobs/([0-9]+)/edit" + request_format, EditJobsHandler)
]
for handler in newHandlers:
baseHandlers.append(handler)
tornado.web.Application.__init__(self, baseHandlers, debug = debug)
Now, in the handler define a reusable function parseRestArgs (I put mine in a BaseHandler but pasted it here for ease of understanding/to save space) that splits out ID's and request formats. Since you should be expecting id's in a particular order, I stick them in a list.
The get function can be abstracted more but it shows the basic idea of splitting out your logic into different request formats...
class JobsHandler(BaseHandler):
def parseRestArgs(self, args):
idList = []
extension = None
if len(args) and not args[0] is None:
for arg in range(len(args)):
match = re.match("[0-9]+", args[arg])
if match:
slave_id = int(match.groups()[0])
match = re.match("(\.[a-zA-Z]+$)", args[-1])
if match:
extension = match.groups()[0][1:]
return idList, extension
def get(self, *args):
### Read
job_id, extension = self.parseRestArgs(args)
if len(job_id):
if extension == None or "html":
#self.render(html) # Show with some ID voodoo
pass
elif extension == 'json':
#self.render(json) # Show with some ID voodoo
pass
else:
raise tornado.web.HTTPError(404) #We don't do that sort of thing here...
else:
if extension == None or "html":
pass
# self.render(html) # Index- No ID given, show an index
elif extension == "json":
pass
# self.render(json) # Index- No ID given, show an index
else:
raise tornado.web.HTTPError(404) #We don't do that sort of thing here...

Python test for a url and image type

In the following code how to test for if the type is url or if the type is an image
for dictionaries in d_dict:
type = dictionaries.get('type')
if (type starts with http or https):
logging.debug("type is url")
else if type ends with .jpg or .png or .gif
logging.debug("type is image")
else:
logging.debug("invalid type")

You cannot tell what type a resource is purely from its URL. It is perfectly valid to have an GIF file at a URL without a .gif file extension, or with a misleading file extension like .txt. In fact it is quite likely, now that URL-rewriting is popular, that you'll get image URLs with no file extension at all.
It is the Content-Type HTTP response header that governs what type a resource on the web is, so the only way you can find out for sure is to fetch the resource and see what response you get. You can do this by looking at the headers returned by urllib.urlopen(url).headers, but that actually fetches the file itself. For efficiency you may prefer to make HEAD request that doesn't transfer the whole file:
import urllib2
class HeadRequest(urllib2.Request):
def get_method(self):
return 'HEAD'
response= urllib2.urlopen(HeadRequest(url))
maintype= response.headers['Content-Type'].split(';')[0].lower()
if maintype not in ('image/png', 'image/jpeg', 'image/gif'):
logging.debug('invalid type')
If you must try to sniff type based on the file extension in a URL path part (eg because you don't have a net connection), you should parse the URL with urlparse first to remove any ?query or #fragment part, so that http://www.example.com/image.png?blah=blah&foo=.txt doesn't confuse it. Also you should consider using mimetypes to map the filename to a Content-Type, so you can take advantage of its knowledge of file extensions:
import urlparse, mimetypes
maintype= mimetypes.guess_type(urlparse.urlparse(url).path)[0]
if maintype not in ('image/png', 'image/jpeg', 'image/gif'):
logging.debug('invalid type')
(eg. so that alternative extensions are also allowed. You should at the very least allow .jpeg for image/jpeg files, as well as the mutant three-letter Windows variant .jpg.)

Use regular expressions.
import re
r_url = re.compile(r"^https?:")
r_image = re.compile(r".*\.(jpg|png|gif)$")
for dictionaries in d_dict:
type = dictionaries.get('type')
if r_url.match(type):
logging.debug("type is url")
else if r_image.match(type)
logging.debug("type is image")
else:
logging.debug("invalid type")
Two remarks: type is a builtin, and images could be loaded from an URL too.

I wrote based on previous comments a python script, which first checks per HEAD request for the content_type and if this fails for the mimetype.
Hope this helps.
import mimetypes
import urllib2
class HeadRequest(urllib2.Request):
def get_method(self):
return 'HEAD'
def get_contenttype(image_url):
try:
response= urllib2.urlopen(HeadRequest(image_url))
maintype= response.headers['Content-Type'].split(';')[0].lower()
return maintype
except urllib2.HTTPError as e:
print(e)
return None
def get_mimetype(image_url):
(mimetype, encoding) = mimetypes.guess_type(image_url)
return mimetype
def get_extension_from_type(type_string):
if type(type_string) == str or type(type_string) == unicode:
temp = type_string.split('/')
if len(temp) >= 2:
return temp[1]
elif len(temp) >= 1:
return temp[0]
else:
return None
def get_type(image_url):
valid_types = ('image/png', 'image/jpeg', 'image/gif', 'image/jpg')
content_type = get_contenttype(image_url)
if content_type in valid_types:
return get_extension_from_type(content_type)
mimetypes = get_mimetype(image_url)
if mimetypes in valid_types:
return get_extension_from_type(mimetypes)
return None

If you are going to guess the type of a resource from its URL, then I suggest you use the mimetypes library. Realize, however, that you can really only make an educated guess this way.
As bobince suggests, you could also make a HEAD request and use the Content-Type header. This, however, assumes that the server is configured (or, in the case of a web application, programmed) to return the correct Content-Type. It might not be.
So, the only way to really tell is to download the file and use something like libmagic (although it is conceivable even that could fail). If you decide this degree of accuracy is necessary, you might be interested in this python binding for libmagic.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.