Django - Uploaded file type validation - python

I need to validate the file type of the uploaded file and should allow only pdf, plain test and MS word files. Here is my model and and the form with validation function. But, I'm able to upload files even without the extension.
class Section(models.Model):
content = models.FileField(upload_to="documents")
class SectionForm(forms.ModelForm):
class Meta:
model = Section
FILE_EXT_WHITELIST = ['pdf','text','msword']
def clean_content(self):
content = self.cleaned_data['content']
if content:
file_type = content.content_type.split('/')[0]
print file_type
if len(content.name.split('.')) == 1:
raise forms.ValidationError("File type is not supported.")
if content.name.split('.')[-1] in self.FILE_EXT_WHITELIST:
return content
else:
raise forms.ValidationError("Only '.txt' and '.pdf' files are allowed.")
Here is the view,
def section_update(request, object_id):
section = models.Section.objects.get(pk=object_id)
if 'content' in request.FILES:
if request.FILES['content'].name.split('.')[-1] == "pdf":
content_file = ContentFile(request.FILES['content'].read())
content_type = "pdf"
section.content.save("test"+'.'+content_type , content_file)
section.save()
In my view, I'm just saving the file from the request.FILE. I thought while save() it'll call the clean_content and do content-type validation. I guess, the clean_content is not at all calling for validation.

You approach will not work: As an attacker, I could simply forge the HTML header to send you anything with the mime type text/plain.
The correct solution is to use a tool like file(1) on Unix to examine the content of the file to determine what it is. Note that there is no good way to know whether something is really plain text. If the file is saved in 16 bit Unicode, the "plain text" can even contain 0 bytes.
See this question for options how to do this: How to find the mime type of a file in python?

You can use python-magic
import magic
magic.from_file('/my/file.jpg', mime=True)
# image/jpeg

This is an old question, but for later users main question as mentioned in comments is why field validation not happens, and as described in django documentation field validation execute when you call is_valid(). So must use something sa bellow in view to activate field validation:
section = models.Section.objects.get(pk=object_id)
if request.method == 'POST':
form = SectionForm(request.POST, request.FILES)
if form.is_valid:
do_something_with_form
Form validation happens when the data is cleaned. If you want to customize this process, there are various places to make changes, each one serving a different purpose. Three types of cleaning methods are run during form processing. These are normally executed when you call the is_valid() method on a form

Related

django-post_office - Using file-based instead of database-based email templates

I'm using django-post_office to send emails to users. Post office uses templates stored in the database in the EmailTemplate model. I'd prefer to use file-based templates so I can keep them under version control.
An Email is created and the template is rendered in mail.py:
def create(sender, recipients=None, cc=None, bcc=None, subject='', message='',
html_message='', context=None, scheduled_time=None, headers=None,
template=None, priority=None, render_on_delivery=False,
commit=True):
...
if template:
subject = template.subject
message = template.content
html_message = template.html_content
if context:
_context = Context(context)
subject = Template(subject).render(_context)
message = Template(message).render(_context)
html_message = Template(html_message).render(_context)
Can anyone suggest a nice way to override this behavior? I was thinking I'd like to be able to pass a string with the template location, and render based on this (some context variables), but any input would be appreciated.
One option is to render the email template before calling mail.send() along with any extra context variables.
from django.template import Context, loader
template = loader.get_template(template_name)
context = Context(extra_context)
html_message = template.render(context)
Then instead of passing a template:
mail.send(template=template)
pass rendered html with the html_message argument:
mail.send(html_message=html_message)
I don't think that is very simple, as when we see the code which creates and sends an email, I see that the mail.send(...) calls internally mail.create(...)
Now here is that code:
def send(recipients=None, sender=None, template=None, context=None, subject='',
message='', html_message='', scheduled_time=None, headers=None,
priority=None, attachments=None, render_on_delivery=False,
log_level=None, commit=True, cc=None, bcc=None):
#-------------
#code sections
#-------------
if template:
if subject:
raise ValueError('You can\'t specify both "template" and "subject" arguments')
if message:
raise ValueError('You can\'t specify both "template" and "message" arguments')
if html_message:
raise ValueError('You can\'t specify both "template" and "html_message" arguments')
# template can be an EmailTemplate instance or name
if isinstance(template, EmailTemplate):
template = template
else:
template = get_email_template(template)
email = create(sender, recipients, cc, bcc, subject, message, html_message,
context, scheduled_time, headers, template, priority,
render_on_delivery, commit=commit)
The interesting piece of code is:
# template can be an EmailTemplate instance or name
if isinstance(template, EmailTemplate):
template = template
else:
template = get_email_template(template)
The get_email_template gets the template using either the EmailTemplate or name of the EmailTemplate which is during cache creation.
Even if you do not pass the object of EmailTemplate (in their case the model), they have just two cases:
Either get the data from the cache (which is again stored for first access from DB, since template wont change)
Get directly from DB
Either way they use DB and have no interface for using from a file.
I would suggest something like this:
Create a json file that stores your template in the format that suffices the EmailTemplate object creation. Here are the properties of EmailTemplate model class:
name = models.CharField(max_length=255, help_text=("e.g: 'welcome_email'"))
description = models.TextField(blank=True, help_text='Description of this template.')
subject = models.CharField(max_length=255, blank=True, validators=[validate_template_syntax])
content = models.TextField(blank=True, validators=[validate_template_syntax])
html_content = models.TextField(blank=True, validators=[validate_template_syntax])
created = models.DateTimeField(auto_now_add=True)
last_updated = models.DateTimeField(auto_now=True)
Read that json file in python using file read and convert the read string to python object by json dump or something (i guess you can figure that out)
Create the EmailTemplate object using the value you need and assign it to template in mail.send(...
Update:
If you use html_message, you still have maintain separate subject parameter, which can be avoided if you use template. So if you use json file, in your version control, from the subject,content,name and html_message everything will be stored in one file which can be handled in the file source control and recreate the template object and not worry about segregated values in the files to maintain.

Django - Problems with PDF upload

I got another problem with Django. I want to upload an PDF with an Form in the template, when I click upload in my form, this happens:
Cannot assign "<InMemoryUploadedFile: thebook.pdf (application/pdf)>": "Product.book" must be a "File" instance.
This is the line in my model
book = FilerFileField(null=true,blank=true)
This is the line in my form
book = forms.FileField(label=u"Book Upload")
Django's forms.FileField expects an UploadedFile. Whereby the FilerFileField is actually a subclasses of django.db.models.ForeignKey. Therefor you should use a ChoiceField at your form.
book = forms.ModelChoiceField(queryset=filer.models.File.objects.all())
See also django-filer's usage notes and django's docs on the ModelChoiceField:
http://django-filer.readthedocs.org/en/latest/usage.html
https://docs.djangoproject.com/en/dev/ref/forms/fields/#modelchoicefield

flask-wtf editing a model using wtform Form constructor: pre-filling the form

I am reading the Flask Web Development book and came across this:
def edit_profile():
form = EditProfileForm()
if form.validate_on_submit():
current_user.name = form.name.data
current_user.location = form.location.data
current_user.about_me = form.about_me.data
db.session.add(user)
flash('Your profile has been updated.')
return redirect(url_for('.user', username=current_user.username))
form.name.data = current_user.name
form.location.data = current_user.location
form.about_me.data = current_user.about_me
return render_template('edit_profile.html', form=form)
Basically, when the form is not posted or doesn't validate, this copies over the data from the current user. Now reading up on wtforms, I read this about the init method on a form:
obj – If formdata is empty or not provided, this object is checked for attributes
matching form field names, which will be used for field values.
So I guess that means that we could write this (the sample below is my own):
def edit_post(post_id):
post = Post.query.get_or_404(post_id)
if current_user != post.author:
abort(403)
# Below is the line I am concerned about
form = PostForm(formdata=request.form, obj=post)
if form.validate_on_submit():
form.populate_obj(post)
db.session.commit()
return redirect(url_for('user', username=current_user.username))
return render_template('post_form.html', form=form)
I figure that this should fill the form instance from the database model on GET, and from POST data after post. Testing this, it seems to work..
Now my question: is this way of writing an edit view correct? Or should I copy everything over field by field, like in the book?
Loading in a POST MultiDict is certainly the accepted way to map key/value pairs to your WTForms instance. Even more so, if you are using the Flask-WTF extension, this is automatically done for you, it is one of the perks that this extension brings you.
If you would crack open the code of Flask-WTF you would see that it inherits the SecureForm class of WTForms and tries to load in the Werkzeug POST MultiDict (called formdata) by default (if it is present). So loading in your form in your view like this:
form = PostForm(obj=post)
Should be sufficient (if using Flask-WTF) to also fill the fields with POSTed data.
The way it is done in your book example is certainly not wrong, but creates a lot of unnecessary code and is error prone / redundant - one could forget to mention a field in the view that is declared in the WTForms instance.

File upload not working after changing models format

I have a model class similar to following -
class Document(models.Model):
docfile = models.FileField(upload_to='documents/%Y/%M/%D')
Everything is working fine and files are uploaded successfully based on directory structure.
Now I don't want to upload files in this format but simply all files in one folder so I changed the logic ..
class Document(models.Model):
docfile = models.FileField(upload_to='documents')
Now It is not uploading the files and throwing error. Maybe I need to run some command but I do not know what ??
Please suggest something
Edit1:
Ok .. I found that the actual problem lies somewhere else.
I have a view like this - (please ignore the bad spacing but that is fine in actual code)
def lists(request):
// Problematic Code Start
path = settings.MEDIA_URL + 'upload/location.txt'
f = open(path, 'w')
myfile = File(f)
myfile.write('Hello World')
myfile.closed
f.closed
// Problematic Code ends
# Handle file upload
if request.method == 'POST':
form = DocumentForm(request.POST, request.FILES)
if form.is_valid():
filename = Document(docfile = request.FILES['docfile'])
filename.save()
# Redirect to the document list after POST
return HttpResponseRedirect(reverse('sdm:lists'))
#return render_to_response(reverse('sdm:lists'))
else:
form = DocumentForm() # A empty, unbound form
# Load documents for the list page
documents = Document.objects.all()
# Render list page with the documents and the form
return render_to_response(
'sdm/lists.html',
{'documents': documents, 'form': form},
context_instance=RequestContext(request)
)
When I remove the problematic code , everything works fine. (ignore the purpose of this weird code, actual interest is something bigger)
MEDIA_URL=/media/
Here is the error:
IOError at /sdm/lists
[Errno 2] No such file or directory: '/media/upload/location.txt'
Although File Exists and all permissions are www-data:www-data with 755
"problematic" code indeed - whoever wrote this should find another job. This code is wrong in more than one way (using MEDIA_URL instead of MEDIA_ROOT - which is the cause of the IOError you get - and also badly misusing Python's dead simple file objects) and totally useless, and looks like a leftover of someone programming by accident. To make a long story short : just remove it and you'll be fine.

GAE Blobstore: upload blob along with other text fields

I have a form that includes, between text fields, an element to upload a picture.
I want to store the blob in the blobstore and reference it in my model (ndb.Model) using ndb.BlobKeyProperty().
The method shown in this link uses an upload handler (UploadHandler) which is called from the link created in this way:
upload_url = blobstore.create_upload_url('/upload')
upload_url is the form action in the page created to upload the blob. However, my form includes other fields that are not processed in the UploadHandler post method.
The temporary solution I found was to create a handler for my form that inherits from my BaseHandler AND from BlobstoreUploadHandler:
class EditProfile(blobstore_handlers.BlobstoreUploadHandler, BaseHandler)
def get(self):
params['upload_url'] = blobstore.create_upload_url('/upload_blob1')
... fields ...
def post(self):
upload_blob = self.get_uploads()
blob_key = upload_blob[0].key()
value_field1 = self.request.POST.get('field1')
value_field2 = self.request.POST.get('field2')
value_field3 = self.request.POST.get('field3')
...
This method works, except that I have to define a new handler in main.py for each page that has a blob to be uploaded:
app = webapp2.WSGIApplication([ ('/upload_blob1', handlers.EditProfile),
('/upload_blob2', handlers.EditBlob2Handler),
('/serve/([^/]+)?', handlers.ServeHandler) ],
debug=os.environ['SERVER_SOFTWARE'].startswith('Dev'), config=webapp2_config)
Question: how can I use one single upload handler (for instance: UploadHandler) that is called from different pages to perform the upload blob task? I know this might be very simple for an experienced GAE programmer, but I haven't found a solution around.
Short answer: Yes you can.
The handler is just the code that parse your upload form and then performs action based on the information.
Technically, you can have one form even for different uploads, but it really depends on how different and whether or not you wish to split up the code.
For example, if your form1 uploads "First name" "Last name" "Favorite color" and your form2 uploads "First name" "Last name" "Favorite number", then your /upload handler can simply read the parameter and process them differently.
If (req.getParameter("Favorite_Number") != null) {Do whatever}
Else if (req.getParameter("Favorite_Color") != null) {Do whatever}
It's just a matter of design. Your question of whether or not you CAN use one handler, the answer is yes. However, it's recommended to use different ones if they are functionally different.

Categories