Python - Shortest way to format a string to an url - python

I'm involved in a web project. I have to choose the best ways to represent the code, so that other people can read it without problems/headaches/whatever.
The "problem" I've tackled now is to show a nice formatted url (will be taken from a "title" string).
So, let's suppose we have a title, fetched from the form:
title = request.form['title'] # 'Hello World, Hello Cat! Hello?'
then we need a function to format it for inclusion in the url (it needs to become 'hello_world_hello_cat_hello'), so for the moment I'm using this one which I think sucks for readability:
str.replace(title, ' ', '-').str.replace(title, '!', '').str.replace(title, '?', '').str.replace(string, ',' '').lower()
What would be a good way to compact it? Is there already a function for doing what I'm doing?
I'd also like to know which characters/symbols I should strip from the url.

You can use urlencode() which is the way for url-encode strings in Python.
If otherwise you want a personalized encoding as your expected output and all you want to do is leave the words in the final string you can use the re.findall function to grab them and later join them with and underscore:
>>>s = 'Hello World, Hello Cat! Hello?'
>>>'_'.join(re.findall(r'\w+',s)).lower()
'hello_world_hello_cat_hello'
What this does is:
g = re.findall(r'\w+',s) # ['Hello', 'World', 'Hello', 'Cat', 'Hello']
s1 = '_'.join(g) # 'Hello_World_Hello_Cat_Hello'
s1.lower() # 'hello_world_hello_cat_hello'
This technique also works well with numbers in the string:
>>>s = 'Hello World, Hello Cat! H123ello? 123'
>>>'_'.join(re.findall(r'\w+',s)).lower()
'hello_world_hello_cat_h123ello_123'
Another way which I think should be faster is to actually replace non alphanumeric chars. This can be accomplished with re.sub by grabbing all the non alphanumerics toghether and replace them with _ like this:
>>>re.sub(r'\W+','_',s).lower()
'hello_world_hello_cat_h123ello_123'
Well... not really, speed tests:
$python -mtimeit -s "import re" -s "s='Hello World, Hello Cat! Hello?'" "'_'.join(re.findall(r'\w+',s)).lower()"
100000 loops, best of 3: 5.08 usec per loop
$python -mtimeit -s "import re" -s "s='Hello World, Hello Cat! Hello?'" "re.sub(r'\W+','_',s).lower()"
100000 loops, best of 3: 6.55 usec per loop

You could use urlencode() from the urllib module in python2 or urllib.parse module in python3.
This will work assuming you're trying to use the text in the query string of your URL.
title = {'title': 'Hello World, Hello Cat! Hello?'} # or get it programmatically as you did
encoded = urllib.urlencode(title)
print encoded # title=Hello+World%2C+Hello+Cat%21+Hello%3F

So I've been playing with all your answer's solutions and here's what I've come up with.
note: These "benchmarks" are not to be taken too seriously, as I didn't go through all the possible plans, but it's a good way to have a fast broad view.
re.findall()
def findall():
string = 'Hello World, Hello Cat! Hello?'
return '_'.join(re.findall(r'\w+',string)).lower()
real=0.019s, user=0.012s, sys=0.004s, rough=0.016s
re.sub()
def sub():
string = 'Hello World, Hello Cat! Hello?'
return re.sub(r'\W+','_',string).lower()
real=0.020s, user=0.016s, sys=0.004s, rough=0.020s
slugify()
def slug():
string = 'Hello World, Hello Cat! Hello?'
return slugify(string)
real=0.031s, user=0.024s, sys=0.004s, rough=0.028s
urllib.urlencode()
def urlenc():
string = {'title': 'Hello World, Hello Cat! Hello?'}
return urllib.urlencode(string)
real=0.036s, user=0.024s, sys=0.008s, rough=0.032s
As you can see, the fastest is re.findall(), the slowest urllib.urlencode() and in the middle there's slugify() which is also the shortest/cleanest of them all (altough not the fastest).
What I've chosen for now is Slugify, the lucky cat in between the bulldogs.

import re
re.sub(r'!|\?|,', '', text)
This will remove ! ? and , from the string.

I mean you could split it up into multiple statements:
str = str.replace(title, ' ', '-')
str = str.replace(title, '!', '')
str = str.replace(title, '?', '')
str = str.replace(string, ',' '')
str = str.lower()
This will make for better readability.

sure you can do this:
import string
uppers = string.ascii_uppercase # ABC...Z
lowers = string.ascii_lowercase # abc...z
removals = ''.join([ch for ch in string.punctuation if ch != '_'])
transtable = str.maketrans(uppers+" ",lowers+"_",removals)
title = "Hello World, Hello Cat! Hello?"
title.translate(transtable)
You could also do a list comp and ''.join it.
whitelist = string.ascii_uppercase + string.ascii_lowercase + " "
newtitle = ''.join('_' if ch == ' ' else ch.lower() for ch in title if ch in
whitelist)

Related

Specific pattern strings from List in python 3

Requirement:using regex want to fetch only specific strings i.e. string betwee "-" and "*" symbols from input list. Below is the code snippet
ZTon = ['one-- and preferably only one --obvious', " Hello World", 'Now is better than never.', 'Although never is often better than *right* now.']
ZTon = [ line.strip() for line in ZTon]
print (ZTon)
r = re.compile(".^--")
portion = list(filter(r.match, ZTon)) # Read Note
print (portion)
Expected response:
['and preferably only one','right']
Using regex
import re
ZTon = ['one-- and preferably only one --obvious', " Hello World", 'Now is better than never.', 'Although never is often better than *right* now.']
pattern=r'(--|\*)(.*)\1'
l=[]
for line in ZTon:
s=re.search(pattern,line)
if s:l.append(s.group(2).strip())
print (l)
# ['and preferably only one', 'right']
import re
ZTon = ['one-- and preferably only one --obvious', " Hello World", 'Now is better than never.', 'Although never is often better than *right* now.']
def gen(lst):
for s in lst:
s = ''.join(i.strip() for g in re.findall(r'(?:-([^-]+)-)|(?:\*([^*]+)\*)', s) for i in g)
if s:
yield s
print(list(gen(ZTon)))
Prints:
['and preferably only one', 'right']

Templates with argument in string formatting

I'm looking for a package or any other approach (other than manual replacement) for the templates within string formatting.
I want to achieve something like this (this is just an example so you could get the idea, not the actual working code):
text = "I {what:like,love} {item:pizza,space,science}".format(what=2,item=3)
print(text)
So the output would be:
I love science
How can I achieve this? I have been searching but cannot find anything appropriate. Probably used wrong naming terms.
If there isnt any ready to use package around I would love to read some tips on the starting point to code this myself.
I think using list is sufficient since python lists are persistent
what = ["like","love"]
items = ["pizza","space","science"]
text = "I {} {}".format(what[1],items[2])
print(text)
output:
I love science
My be use a list or a tuple for what and item as both data types preserve insertion order.
what = ['like', 'love']
item = ['pizza', 'space', 'science']
text = "I {what} {item}".format(what=what[1],item=item[2])
print(text) # I like science
or even this is possible.
text = "I {what[1]} {item[2]}".format(what=what, item=item)
print(text) # I like science
Hope this helps!
Why not use a dictionary?
options = {'what': ('like', 'love'), 'item': ('pizza', 'space', 'science')}
print("I " + options['what'][1] + ' ' + options['item'][2])
This returns: "I love science"
Or if you wanted a method to rid yourself of having to reformat to accommodate/remove spaces, then incorporate this into your dictionary structure, like so:
options = {'what': (' like', ' love'), 'item': (' pizza', ' space', ' science'), 'fullstop': '.'}
print("I" + options['what'][0] + options['item'][0] + options['fullstop'])
And this returns: "I like pizza."
Since no one have provided an appropriate answer that answers my question directly, I decided to work on this myself.
I had to use double brackets, because single ones are reserved for the string formatting.
I ended up with the following class:
class ArgTempl:
def __init__(self, _str):
self._str = _str
def format(self, **args):
for k in re.finditer(r"{{(\w+):([\w,]+?)}}", self._str,
flags=re.DOTALL | re.MULTILINE | re.IGNORECASE):
key, replacements = k.groups()
if not key in args:
continue
self._str = self._str.replace(k.group(0), replacements.split(',')[args[key]])
return self._str
This is a primitive, 5 minute written code, therefore lack of checks and so on. It works as expected and can be improved easly.
Tested on Python 2.7 & 3.6~
Usage:
test = "I {{what:like,love}} {{item:pizza,space,science}}"
print(ArgTempl(test).format(what=1, item=2))
> I love science
Thanks for all of the replies.

Using python regex to exclude '.' at the end but not inside a string

I am trying to use python regex to spot #mentions such as #user and #user.name
So far I have:
htmlcontent = re.sub(r'((\#)([\w\.-]+))', r"a href='/users/\3'>\1 /a>", htmlcontent)
When this code spots a #mention ending in a . it does not exclude it:
e.g. Hi #user.name. How are you?
Output so far:
<a href='/users/user.name.'>#user.name. /a>
Desired output:
<a href='/users/user.name'>#user.name /a> <-- without . after name
try this:
re.sub(r'((\#)([\w.-]+[\w]+))', r"<a href='/users/\3'>\1</a>", htmlcontent)
this will let the re engine know that '.' and '-' can be in the middle - but the string must end on a character.
running on your example:
In [3]: htmlcontent = 'Hi #user.name. How are you?'
In [4]: re.sub(r'((\#)([\w.-]+[\w]+))', r"<a href='/users/\3'>\1</a>", htmlcontent)
Out[4]: "Hi <a href='/users/user.name'>#user.name</a>. How are you?"
You could use a positive look ahead for the . at the end of the match like
([\w\.-]+)(?=\.\s)?
Example
string = "Hi #user.name. How are you?"
print re.sub(r'#([\w\.-]+)(?=\.\s)?', r"a href='/users/\1'>\1 /a>", string)
#Output
#Hi a href='/users/user.name.'>user.name. /a> How are you?
string = "Hi #user.name How are you?"
print re.sub(r'#([\w\.-]+)(?=\.\s)?', r"a href='/users/\1'>\1 /a>", string)
#Output
#Hi a href='/users/user.name'>user.name /a> How are you?

Can you have variables within triple quotes? If so, how?

This is probably a very simple question for some, but it has me stumped. Can you use variables within python's triple-quotes?
In the following example, how do use variables in the text:
wash_clothes = 'tuesdays'
clean_dishes = 'never'
mystring =""" I like to wash clothes on %wash_clothes
I like to clean dishes %clean_dishes
"""
print(mystring)
I would like it to result in:
I like to wash clothes on tuesdays
I like to clean dishes never
If not what is the best way to handle large chunks of text where you need a couple variables, and there is a ton of text and special characters?
The preferred way of doing this is using str.format() rather than the method using %:
This method of string formatting is the new standard in Python 3.0, and should be preferred to the % formatting described in String Formatting Operations in new code.
Example:
wash_clothes = 'tuesdays'
clean_dishes = 'never'
mystring =""" I like to wash clothes on {0}
I like to clean dishes {1}
"""
print mystring.format(wash_clothes, clean_dishes)
Yes! Starting from Python 3.6 you can use the f strings for this: They're interpolated in place, so mystring would have the desired value after the mystring = ... line:
wash_clothes = 'tuesdays'
clean_dishes = 'never'
mystring = f"""I like to wash clothes on {wash_clothes}
I like to clean dishes {clean_dishes}
"""
print(mystring)
Should you need to add a literal { or } in the string, you would just double it:
if use_squiggly:
kind = 'squiggly'
else:
kind = 'curly'
print(f"""The {kind} brackets are:
- '{{', or the left {kind} bracket
- '}}', or the right {kind} bracket
""")
would print, depending on the value of use_squiggly, either
The squiggly brackets are:
- '{', or the left squiggly bracket
- '}', or the right squiggly bracket
or
The curly brackets are:
- '{', or the left curly bracket
- '}', or the right curly bracket
One of the ways in Python 2 :
>>> mystring =""" I like to wash clothes on %s
... I like to clean dishes %s
... """
>>> wash_clothes = 'tuesdays'
>>> clean_dishes = 'never'
>>>
>>> print mystring % (wash_clothes, clean_dishes)
I like to wash clothes on tuesdays
I like to clean dishes never
Also look at string formatting
http://docs.python.org/library/string.html#string-formatting
Yes. I believe this will work.
do_stuff = "Tuesday"
mystring = """I like to do stuff on %(tue)s""" % {'tue': do_stuff}
EDIT: forgot an 's' in the format specifier.
I think the simplest way is str.format() as others have said.
However, I thought I'd mention that Python has a string.Template class starting in Python2.4.
Here's an example from the docs.
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
One of the reasons I like this is the use of a mapping instead of positional arguments.
Also note that you don't need the intermediate variable:
name = "Alain"
print """
Hello %s
""" % (name)
Pass multiple args in simple way
wash_clothes = 'tuesdays'
clean_dishes = 'never'
a=""" I like to wash clothes on %s I like to clean dishes %s"""%(wash_clothes,clean_dishes)
print(a)

Manipulating strings in python - concentrating on part of a user's input

resp = raw_input("What is your favorite fruit?\n")
if "I like" in resp:
print "%s" - "I like" + " is a delicious fruit." % resp
else:
print "Bubbles and beans."
OK I know this code doesn't work, and I know why. You can't subtract strings from each other like numbers.
But is there a way to break apart a string and only use part of the response?
And by "is there a way" I really mean "how," because anything is possible in programming. :D
I'm trying to write my first chatterbot from scratch.
One option would be to simply replace the part that you want to remove with an empty string:
resp = raw_input("What is your favorite fruit?\n")
if "I like" in resp:
print "%s is a delicious fruit." % (resp.replace("I like ", ""))
else:
print "Bubbles and beans."
If you want to look into more advanced pattern matching to grab out more specific parts of strings via flexible patterns, you might want to look into regular expressions.
# python
"I like turtles".replace("I like ","")
'turtles'
Here's a way to do it with regular expressions:
In [1]: import re
In [2]: pat = r'(I like )*(\w+)( is a delicious fruit)*'
In [3]: print re.match(pat, 'I like apples').groups()[1]
apples
In [4]: print re.match(pat, 'apple is a delicious fruit').groups()[1]
apple
Can you just strip "I like"?
resp.strip("I like")
be careful of case sensitivity though.

Categories