flask url parameters with spaces generates urls with spaces

flask url parameters with spaces generates urls with spaces - python

I'm trying to pass a user supplied string as a Flask URL parameter.
url_for(func_name, param="string with spaces") or similar generates a URL with spaces.
If the user enter a string with spaces the generated url has spaces it seems to work.
Also if I enter a URL with %20 it seems to redirect to a url with spaces.
I thought URLs with spaces were a bad idea.
How do I get it to work right (url_for and redirection)?
Or should I just accept it?
P.S.
Is passing a user supplied string as a parameter safe? If not how should I sanitize the user input string?

No, Flask generates URLs properly URL encoded; demoing with an existing application:
>>> with app.test_request_context('/'):
... print url_for('core.city', city='new york')
...
/new%20york
Your browser on the other hand may elect to show such URLs decoded for ease of reading.
url_for() quotes input to be URL-safe; encoded URLs cannot contain values that could be interpreted as HTML, so you are safe there as far as user-supplied values are concerned.

Related

How can I escape certain characters while using python's urllib.urlencode()?

I have a dictionary that I want to urlencode as query parameters.
The server that I am hitting expects the the query to look like this: http://www.example.com?A=B,C
But when I try to use urllib.urlencode to build the URL, I find that the comma gets turned into %2C:
>>> import urllib
>>> urllib.urlencode({"A":"B,C"})
'A=B%2CC'
Is there any way I can escape the comma so that urlencode treats it like a normal character?
If not, how can I work around this problem?

You can escape certain characters by specifying them explicitly as safe argument value
urllib.quote(str, safe='~()*!.\'')
More : https://docs.python.org/3.0/library/urllib.parse.html#urllib.parse.quote

You can do this by adding the query params as a string before hitting the endpoint.
I have used requests for making a request.
For example:
GET Request
import requests
url = "https://www.example.com/?"
query = "A=B,C"
url_final = url + query
url = requests.get(url_final)
print(url.url)
# https://www.example.com/?A=B,C
The comma (along with some other characters) is defined in RFC 3986 as a reserved character. This means the comma has defined meaning at various parts in a URL, and if it is not being used in that context it needs to be percent-encoded.
That said, the query parameter doesn't give the comma any special syntax, so in query parameters, we probably shouldn't be encoding it. That said, it's not entirely Requests' fault: the parameters are encoded using urllib.urlencode(), which is what is percent-encoding the query parameters.
This isn't easy to fix though, because some web services use , and some use %2C, and neither is wrong. You might just have to handle this encoding yourself.

How to use extended ascii with bs4 url

I've been reluctant to post a question about this, but after 3 days of google I can't get this to work. Long story short i'm making a raid gear tracker for WoW.
I'm using BS4 to handle the webscraping, I'm able to pull the page and scrape the info I need from it. The problem I'm having is when there is an extended ascii character in the player's name, ex: thermíte. (the i is alt+161)
http://us.battle.net/wow/en/character/garrosh/thermíte/advanced
I'm trying to figure out how to re-encode the url so it is more like this:
http://us.battle.net/wow/en/character/garrosh/therm%C3%ADte/advanced
I'm using tkinter for the gui, I have the user select their realm from a dropdown and then type in the character name in an entry field.
namefield = Entry(window, textvariable=toonname)
I have a scraping function that performs the initial scrape of the main profile page. this is where I assign the value of namefield to a global variable.(I tried to passing it directly to the scraper from with this
namefield = Entry(window, textvariable=toonname, command=firstscrape)
I thought I was close, because when it passed "thermíte", the scrape function would print out "therm\xC3\xADte" all I needed to do was replace the '\x' with '%' and i'd be golden. But it wouldn't work. I could use mastername.find('\x') and it would find instances of it in the string, but doing mastername.replace('\x','%') wouldn't actually replace anything.
I tried various combinations of r'\x' '\%' r'\x' etc etc. no dice.
Lastly when I try to do things like encode into latin then decode back into utf-8 i get errors about how it can't handle the extended ascii character.
urlpart1 = "http://us.battle.net/wow/en/character/garrosh/"
urlpart2 = mastername
urlpart3 = "/advanced"
url = urlpart1 + urlpart2 + urlpart3
That's what I've been using to try and rebuild the final url(atm i'm leaving the realm constant until I can get the name problem fixed)
Tldr:
I'm trying to take a url with extended ascii like:
http://us.battle.net/wow/en/character/garrosh/thermíte/advanced
And have it become a url that a browser can easily process like:
http://us.battle.net/wow/en/character/garrosh/therm%C3%ADte/advanced
with all of the normal extended ascii characters.
I hope this made sense.
here is a pastebin for the full script atm. there are some things in it atm that aren't utilized until later on. pastebin link

There shouldn't be non-ascii characters in the result url. Make sure mastername is a Unicode string (isinstance(mastername, str) on Python 3):
#!/usr/bin/env python3
from urllib.parse import quote
mastername = "thermíte"
assert isinstance(mastername, str)
url = "http://us.battle.net/wow/en/character/garrosh/{mastername}/advanced"\
.format(mastername=quote(mastername, safe=''))
# -> http://us.battle.net/wow/en/character/garrosh/therm%C3%ADte/advanced

You can try something like this:
>>> import urllib
>>> 'http://' + '/'.join([urllib.quote(x) for x in url.strip('http://').split('/')]
'http://us.battle.net/wow/en/character/garrosh/therm%C3%ADte/advanced'
urllib.quote() "safe" urlencodes characters of a string. You don't want all the characters to be affected, just everything between the '/' characters and excluding the initial 'http://'. So the strip and split functions take those out of the equation, and then you concatenate them back in with the + operator and join
EDIT: This one is on me for not reading the docs... Much cleaner:
>>> url = 'http://us.battle.net/wow/en/character/garrosh/therm%C3%ADte/advanced'
>>> urllib.quote(url, safe=':/')
'http://us.battle.net/wow/en/character/garrosh/therm%25C3%25ADte/advanced'

How do you build a URL that contains a username and password from user input in Python?

I see you can parse usernames and passwords with:
from urlparse import urlparse
r = urlparse('http://myuser:mypass#example.com')
print r.username
# => 'myuser'
How can I go the other way? I can't use urlunparse because I can't do this:
r.username = request.args['username']
# => AttributeError: can't set attribute
I'm interested because the username contains characters that need escaped (namely: #, /).
Edit:
This string is coming from user input, so I won't know ahead of time what it is. It's a security risk to maintain your own list of escape characters, so string concatenation and custom character escaping with replace won't help here.

Turns out the username part of basic auth URLs use the same escape characters that query parameters do.
So the answer is to use urllib.quote(username, safe='') then concatenate. (You'll need the safe parameter or / won't be escapted.)

Urls.py unable to pass #(pound) character to a view in Django,

I used to pass data through django URL while passing #character is not able to pass through urls.py, I am using pattern as
url(r'^pass/(?P<sentence>[\w|\W]*)/$',pass)
I tried with these pattern also
url(r'^pass/(?P<sentence>[a-zA-Z0-9-/:-?##{-~!^_\'\[\]*]*)/$',pass)
Thanks in advance.

The "#" character marks inline anchors (links within the same page) in a URL, so the browser will never send it to Django.
For example, if the URL is /something/pass/#test/something-else/ the browser will sent only /something/pass/ to the server. You can try /something/pass/%23test/something-else/ instead, 23 is the hexadecimal ascii code for # - not pretty (ugly by ugly just pass it as a get variable instead).
There is nothing you can do on the Django side - you better avoid characters with special meanings in the URL path when designing your routes - of course it is a matter of taste, but I really think that strings passed in the URL path should be "slugfied" in order to remove any funny character.

Browsers won't send the url fragment part (ends with "#") to servers. Why not converting your data to base64 first, then pass the data via url.
RFC 1808 (Relative Uniform Resource Locators) : Note that the fragment identifier (and the "#" that precedes it) is
not considered part of the URL. However, since it is commonly used
within the same string context as a URL, a parser must be able to
recognize the fragment when it is present and set it aside as part of
the parsing process.

Flask route with URI encoded component

It seems Flask doesn't support routes with a URI encoded component. I'm curious if I'm doing something wrong, or if there is a special flag I need to include.
My route looks something like this:
#app.route('/foo/<encoded>/bar/')
def foo(encoded):
# ...
pass
The URL that this should match can look like these:
http://foobar.com/foo/xxx/bar/ # matched correctly, no URI component
http://foobar.com/foo/x%2Fx%2Fx%2F/bar/ # not matched correctly, URI component
Former URL works, latter spits out a lovely 404.
Thanks!

Add path to your url rule:
#app.route('/foo/<path:encoded>/bar/')
Update per comment: The route API docs are here: http://flask.pocoo.org/docs/api/#flask.Flask.route. The underlying classes that implement the path style route converter are here: http://werkzeug.pocoo.org/docs/routing/#custom-converters (this is one of the really nice parts of pocoostan.) As far as the trailing slashes, there are special rules that amount to:
If a rule ends with a slash and is requested without a slash by the
user, the user is automatically redirected to the same page with a
trailing slash attached.
If a rule does not end with a trailing slash and the user request the
page with a trailing slash, a 404 not found is raised.
Also keep in mind that if you are on Apache and are expecting a slash-trailed url, ie a bookmarklet that submits to http://ex.com/foo/<path:encoded>/bar and encoded gets something with double slashes, Apache will convert multiple slashes to a single one.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.