Django / Python: Convert Markdown to Secure HTML - python

I'm accepting Markdown and need to convert it to HTML to render securely in Django. Right now I'm accepting the form.cleaned_data and converting it to HTML with:
import markdown
html_body = markdown.markdown(body_markdown, safe_mode=True)
html_body = html_body.replace('[HTML_REMOVED]', '')
return html_body
In the template, I'm rendering it as :
{{ object.content|safe|capfirst }}
However if you post:
0;url=javascript:alert('hi');" http-equiv="refresh
The JS will render so XSS is possible.

django's built in safe template tag means that you are marking that variable as ok to output, i.e. you know that it's contents are safe:
safe: Marks a string as not requiring further HTML escaping prior to output.
Django by default escapes your template variables:
By default in Django, every template automatically escapes the output of every variable tag. Specifically, these five characters are escaped ...
but it won't strip the javascript away for you (it will just render it unusable), you need to do that manually with a template tag:
Strip javascript code before rendering in django templates
On the other hand, safe_mode on markdown strips any HTML in the text with [HTML REMOVED] as you've seen.
So removing safe should be enough to make it safe,

Related

How to include context variable in a wagtail cms field?

I am looking for a way to render a variable that will be available in the context of the the page where the cms page will be rendered.
Ex:
I have in the context the logged in user and I also have the last transaction he made on the website.
I would like the text in the rich text field in Wagtail to be like this so that the marketing team can tweak the copy.
Hello ||firstname|| thanks for your purchase. ||productname|| will be
shipped to you soon. The expected delivery date is
||expected_delivery_date||
To be less confusing I replace the double brackets by double pipes to show that the templating system does not need to be django templates for those ones. Simple templating is enough maybe using https://docs.python.org/3.4/library/string.html#template-strings
I think I can achieve this by doing:
A stream field that would have blocks of rich text field and a custom block with the possible context variable they can use
A custom render function that would regex and replace the merge tags in the rich text block with the context values
Create a new filter for simple templating. ex: {{ page.body|richtext|simpletemplate }}
Is there any more obvious way or out of the box way to do templating from within a rich text field?
It would be clunky with a separate streamfield block for each inserted context variable. You'd have to override the default rendering which wraps elements in div tags. However I like that it is more foolproof for the editors.
I've done something like the custom rendering before, but with simple TextFields for formatting special offer code messages. Wagtail editors were given the following help_text to illustrate:
valid_placeholders = ['offer_code', 'month_price']
template_text = models.TextField(
_('text'),
help_text="Valid placeholder values are: {all_valid}. Write as {{{example}}}".format(
all_valid=", ".join(valid_placeholders),
example=valid_placeholders[0],
)
)
This rendered as Valid placeholder values are: offer_code, month_price. Write as {{offer_code}}.
Then in the view:
template_keys = [i[1] for i in Formatter().parse(template_text)]
…and continued rendering from there. Remember to validate the field appropriately using the above Formatter().parse() function too.
I used Django's template formatting rather than Python's string.format() because it fails silently, but you could go with string.format() if cleaned adequately.
The custom template filter would feel easiest to me, so I'd start with that approach and switch to a custom render function if I ran into hurdles.
I found an easier way to do this. I wanted my editors to be able to create pages with dynamic customization to the individual user. With this, my editors are actually able to put template variables into any type of content block as {{ var }} which works just like the Django templating language. For my use case, I am allowing my editors to create email content in the CMS, then pulling that to send the emails:
This is the function to call:
def re_render_html_template(email_body, context):
"""
This function takes already rendered HTML anbd re-renders it as a template
this is necessary because variables added via the CMS are not caught by the
first rendering because the first rendering is rendering the containing block,
so instead they are rendered as plaintext in content the first render, e.g., {{ var }}
Example:
input: <p>Hey {{ user_account.first_name }}, welcome!</p>
output: <p>Hey Brett, welcome!</p>
#param email_body: html string
#type email_body: str
#param context: context dictionary
#type context: dict
#return: html string
#rtype: str
"""
from django.template import Context
from django.template import Template
template = Template(email_body)
context = Context(context)
email_body = template.render(context)
return email_body
Then I call it like so:
email_body = render_to_string(template, context)
# need to re-render to substitute tags added via CMS
email_body = re_render_html_template(email_body, context)

Html string from a variable not rendered using mako in python

When I try to render a string variable in mako template like:
${ variable_name }
As, the variable contains html content, the content is not rendered properly. Rather than displaying HTML, the output just displays the source code like:
<div>...<p>..</p>...</div>
But the HTML directly written in MAKO renders correctly. Imean
var = <p>Not Rendering HTML</p>
Line 1: <p>Testing line</p>
Line 2: ${var}
Line1 renders as: Testing line
But line 2 renders as: <p>Not Rendering HTML</p>
What should I do...?
Try outputting your variable with a n filter, like so:
${var | n}
This should disable all default filtering. You can read more about filtering here: http://docs.makotemplates.org/en/latest/filtering.html
You may also want to look at this question and its answers: Mark string as safe in Mako

HTMLParser unescape does not pass < or > unescaped in Pyramid app

The title says most of it. Python3.3 using a Pyramid app framework (-s starter)
Adding this code to the views.py controller:
from HTMLParser import HTMLParser
h = HTMLParser()
string = '<p>Hi there!</p>';
return dict( string=h.unescape(string) )
where return dict(..) is handed off to a template with a simple ${string} marker in it, the result in the web browser is always to show the angle brackets instead of rendering them as tags. I.e, the web page shows: <p>Hi there!</p>
I need to be able to pass user content (html with markup) through to the template for it to render inline. What more do I need to do?
The templating engine is escaping the variable, because just about every templating engine does that.
You need to structure your templates to not escape variables. This differs in mako , chameleon, etc.
IIRC, the starter scaffold uses Chameleon's .pt templates.
If that's what your'e using, this other StackOverflow question answers your question fully: Python Pyramid & Chameleon templating language escapes html

rendering of textfield and charfield chomps out extra whitespace (Django/Python)

I've noticed that my template is rendering my model.CharField and model.TextField without any excess whitespace.
For example, if I enter data such as...
This is a test
to see what happens.
The rendered object field will appear as...
This is a test to see what happens.
Is this an intentional feature of Django or have I missed some filter or parameter somewhere?
I've checked the field itself with some debug code (print object.field) and it does contains the extra whitespace, so the problem is in the rendering side.
How can I allow the user to enter paragraphs of data in TextFields? How can I preserve the whitespace that the user may have entered?
As you can see even in StackOverflow your spaces do not display, this is from the source of your question:
This is a test
to see what happens.
Will save in the database as:
This is a test\n\n\nto see what happens.
You have to problems when rendering as html:
Extra spaces between words are stripped on display by the browser, unless it is between <pre></pre> tags
Linebreaks will be rendered as plain text linebreaks, which do not display in the browser unless between <pre></pre> tags.
For spaces, you can use such a template filter to replace them with their html entity equivalent: .
To convert database linebreaks in HTML linebreaks, use linebreaksbr built-in filters. For example, if {{ foo }} is: test\nbar, then {{ foo|linebreaksbr }} will render: test<br />bar
Create a "templatetags" folder in some of your apps with an __init__.py file in it.
Save the snippet for example in someapp/templatetags/replace_tag.py
Load the template filter in the template as such {% load replace_tag %}
Combine replace and linebreaksbr as such: {{ foo|linebreaksbr|replace:" "," " }}
You can also make your own template filter that will process the text into the HTML you need. In any case, refer to the custom template filter documentation for complete information.

Markdown in Django XSS safe

I am using Markdown in an app to display a user biography. I want the user to be able to slightly format the biography, so I'm letting them use the TinyMCE editor.
Then, displaying it in the Django Template like this
{% load markup %}
<div id="biography">
{{ biography|markdown }}
</div>
The problem is, if there is a tag in the biography, it is not being escaped as django does everywhere else. This is the source output from a biography test:
<p><strong>asdfsdafsadf</strong></p>
<p><strong>sd<em>fdfdsfsd</em></strong><em>sdfsdfsdfdsf</em>sdfsdfsdf</p>
<p><strong>sdafasdfasdf</strong></p>
<script>document.location='http://test.com'</script>
How do I set Markdown to escape these malicious scripts?
According to django.contrib.markup.templatetags.markup.markdown's docstrings:
To enable safe mode, which strips raw HTML and only returns HTML
generated by actual Markdown syntax, pass "safe" as the first
extension in the list.
This should work:
{{ biography|markdown:"safe" }}
Markdown in safe mode would remove all html tags, which means your users cannot input HTML segments in the biography. In some cases, this is not preferable. I would recommend you use force_escape before markdown, so anything fed into markdown is safe.
For example, if your biography is <html>I'm really a HTML fan!</html>, using
{{ biography|markdown:"safe"}}
would produce HTML REMOVED.. Instead, if you use
{{ biography|force_escape|markdown }}
The output would be something like
<p><html>I'm really a HTML fan!</html&gt</p>

Categories