I am trying to format a string in python that takes arguments as items from a list of names.
The catch is, I want to print all the list items with double quotes and backslash and one after each other in the same string only.
The code is:
list_names=['Alex', 'John', 'Joseph J']
String_to_pring='Hi my name is (\\"%s\\")'%(list_names)
The output should look like this:
'Hi my name is (\"Alex\",\"John\",\"Joseph J\")'
But instead, I keep getting like this:
'Hi my names is (\"['Alex','John','Joseph J']\")'
I've even tried using .format() and json.dumps() but still the same result.
Is there any way to print the desired output or can I only print each list item at a time?
Without changing much of your code, you could simply format the repr representation of the list that's converted into a tuple.
# proper way - this is what you actually want
list_names = ['Alex', 'John', 'Joseph J']
string_to_print = 'Hi my name is %s' % (repr(tuple(list_names)))
print(string_to_print)
# Hi my name is ('Alex', 'John', 'Joseph J')
If you want to get your exact output, just do some string replacing:
# improper way
list_names = ['Alex', 'John', 'Joseph J']
string_to_print = 'Hi my name is %s' % (repr(tuple(list_names)).replace("\'", '\\"'))
print(string_to_print)
# Hi my name is (\"Alex\", \"John\", \"Joseph J\")
if you're trying to pass string_to_print to some other place, just try the proper way first, it might actually work for you.
If you were mindful enough, you'll find that the previous "improper way" contains a small bug, try this adding "Alex's house" into list_names, the output would look like this:
Hi my name is (\"Alex\", \"John\", \"Joseph J\", "Alex\"s house")
To take care of that bug, you'll need to have a better way of replacing, by using re.sub().
from re import sub
list_names = ['Alex', 'John', 'Joseph J', "Alex's house"]
string_to_print = 'Hi my name is %s' % (sub(r'([\'\"])(.*?)(?!\\\1)(\1)', r'\"\2\"', repr(tuple(list_names))))
print(string_to_print)
But if things like this wouldn't happen during your usage, I would suggest to keep using the "improper way" as it's a lot simpler.
There is no function for formatting lists as human-friendly strings You have to format lists yourself:
names = ",".join(r'\"{}\"'.format(name) for name in list_names)
print(names)
#\"Alex\",\"John\",\"Joseph J\"
print('Hi my name is ({})'.format(names))
#Hi my name is (\"Alex\",\"John\",\"Joseph J\")
This is one way using format and join:
list_names = ['Alex', 'John', 'Joseph J']
String_to_pring='Hi my name is (\\"{}\\")'.format('\\",\\"'.join(i for i in list_names))
# Hi my name is (\"Alex\",\"John\",\"Joseph J\")
Related
I'm looking for a package or any other approach (other than manual replacement) for the templates within string formatting.
I want to achieve something like this (this is just an example so you could get the idea, not the actual working code):
text = "I {what:like,love} {item:pizza,space,science}".format(what=2,item=3)
print(text)
So the output would be:
I love science
How can I achieve this? I have been searching but cannot find anything appropriate. Probably used wrong naming terms.
If there isnt any ready to use package around I would love to read some tips on the starting point to code this myself.
I think using list is sufficient since python lists are persistent
what = ["like","love"]
items = ["pizza","space","science"]
text = "I {} {}".format(what[1],items[2])
print(text)
output:
I love science
My be use a list or a tuple for what and item as both data types preserve insertion order.
what = ['like', 'love']
item = ['pizza', 'space', 'science']
text = "I {what} {item}".format(what=what[1],item=item[2])
print(text) # I like science
or even this is possible.
text = "I {what[1]} {item[2]}".format(what=what, item=item)
print(text) # I like science
Hope this helps!
Why not use a dictionary?
options = {'what': ('like', 'love'), 'item': ('pizza', 'space', 'science')}
print("I " + options['what'][1] + ' ' + options['item'][2])
This returns: "I love science"
Or if you wanted a method to rid yourself of having to reformat to accommodate/remove spaces, then incorporate this into your dictionary structure, like so:
options = {'what': (' like', ' love'), 'item': (' pizza', ' space', ' science'), 'fullstop': '.'}
print("I" + options['what'][0] + options['item'][0] + options['fullstop'])
And this returns: "I like pizza."
Since no one have provided an appropriate answer that answers my question directly, I decided to work on this myself.
I had to use double brackets, because single ones are reserved for the string formatting.
I ended up with the following class:
class ArgTempl:
def __init__(self, _str):
self._str = _str
def format(self, **args):
for k in re.finditer(r"{{(\w+):([\w,]+?)}}", self._str,
flags=re.DOTALL | re.MULTILINE | re.IGNORECASE):
key, replacements = k.groups()
if not key in args:
continue
self._str = self._str.replace(k.group(0), replacements.split(',')[args[key]])
return self._str
This is a primitive, 5 minute written code, therefore lack of checks and so on. It works as expected and can be improved easly.
Tested on Python 2.7 & 3.6~
Usage:
test = "I {{what:like,love}} {{item:pizza,space,science}}"
print(ArgTempl(test).format(what=1, item=2))
> I love science
Thanks for all of the replies.
I'm using python 2.7 for this here. I've got a bit of code to extract certain mp3 tags, like this here
mp3info = EasyID3(fileName)
print mp3info
print mp3info['genre']
print mp3info.get('genre', default=None)
print str(mp3info['genre'])
print repr(mp3info['genre'])
genre = unicode(mp3info['genre'])
print genre
I have to use the name ['genre'] instead of [2] as the order can vary between tracks. It produces output like this
{'artist': [u'Really Cool Band'], 'title': [u'Really Cool Song'], 'genre': [u'Rock'], 'date': [u'2005']}
[u'Rock']
[u'Rock']
[u'Rock']
[u'Rock']
[u'Rock']
At first I was like, "Why thank you, I do rock" but then I got on with trying to debug the code. As you can see, I've tried a few different approaches, but none of them work. All I want is for it to output
Rock
I reckon I could possibly use split, but that could get very messy very quickly as there's a distinct possibility that artist or title could contain '
Any suggestions?
It's not a string that you can use split on,, it's a list; that list usually (always?) contains one item. So you can get that first item:
genre = mp3info['genre'][0]
[u'Rock']
Is a list of length 1, its single element is a Unicode string.
Try
print genre[0]
To only print the first element of the list.
I am implementing a simple DSL. I have the following input string:
txt = 'Hi, my name is <<name>>. I was born in <<city>>.'
And I have the following data:
{
'name': 'John',
'city': 'Paris',
'more': 'xxx',
'data': 'yyy',
...
}
I need to implement the following function:
def tokenize(txt):
...
return fmt, vars
Where I get:
fmt = 'Hi, my name is {name}. I was born in {city}.'
vars = ['name', 'city']
That is, fmt can be passed to the str.format() function, and vars is a list of the detected tokens (so that I can perform lookup in the data, which can be more complex than what I described, since it can be split in several namespaces)
After this, processing the format would be simple:
def expand(fmt, vars, data):
params = get_params(vars, data)
return fmt.format(params)
Where get_params is performing simple lookup of the data, and returning something like:
params = {
'name': 'John',
'city': 'Paris',
}
My question is:
How can I implement tokenize? How can I detect the tokens, knowing that the delitimers are << and >>? Should I go for regexes, or is there an easier path?
This is something similar to what pystache, or even .format itself, are doing, but I would like a light-weight implementation. Robustness is not very critical at this stage.
Yes, this is a perfect target for regexp. Find the begin/end quotation marks, replace them with braces, and extract the symbol names into a list. Do you have a solid description of legal symbols? You'll want a search such as
/\<\<([a-zA-Z]+[a-zA-Z0-9_]*)\>\>/
For classical variable names (note that this excludes leading underscores). Are you familiar enough with regexps to take it from here?
import re
def tokenize(text):
found_variables = []
def replace_and_capture(match):
found_variables.append(match.group(1))
return "{{{}}}".format(match.group(1))
return re.sub(r'<<([^>]+)>>', replace_and_capture, text), found_variables
fmt, vars = tokenize('Hi, my name is <<name>>. I was born in <<city>>.')
print(fmt)
print(vars)
# Output:
# Hi, my name is {name}. I was born in {city}.
# ['name', 'city']
this is my code so far:
import re
template="Hello,my name is [name],today is [date] and the weather is [weather]"
placeholder=re.compile('(\[([a-z]+)\])')
find_tags=placeholder.findall(cam.template_id.text)
fields={field_name:'Michael',field_date:'21/06/2015',field_weather:'sunny'}
for key,placeholder in find_tags:
assemble_msg=template.replace(placeholder,?????)
print assemble_msg
I want to replace every tag with the associated dictionary field and the final message to be like this:
My name is Michael,today is 21/06/2015 and the weather is sunny.
I want to do this automatically and not manually.I am sure that the solution is simple,but I couldn't find any so far.Any help?
No need for a manual solution using regular expressions. This is (in a slightly different format) already supported by str.format:
>>> template = "Hello, my name is {name}, today is {date} and the weather is {weather}"
>>> fields = {'name': 'Michael', 'date': '21/06/2015', 'weather': 'sunny'}
>>> template.format(**fields)
Hello, my name is Michael, today is 21/06/2015 and the weather is sunny
If you can not alter your template string accordingly, you can easily replace the [] with {} in a preprocessing step. But note that this will raise a KeyError in case one of the placeholders is not present in the fields dict.
In case you want to keep your manual approach, you could try like this:
template = "Hello, my name is [name], today is [date] and the weather is [weather]"
fields = {'field_name': 'Michael', 'field_date': '21/06/2015', 'field_weather': 'sunny'}
for placeholder, key in re.findall('(\[([a-z]+)\])', template):
template = template.replace(placeholder, fields.get('field_' + key, placeholder))
Or a bit simpler, without using regular expressions:
for key in fields:
placeholder = "[%s]" % key[6:]
template = template.replace(placeholder, fields[key])
Afterwards, template is the new string with replacements. If you need to keep the template, just create a copy of that string and do the replacement in that copy. In this version, if a placeholder can not be resolved, it stays in the string. (Note that I swapped the meaning of key and placeholder in the loop, because IMHO it makes more sense that way.)
You can use dictionaries to put data straight into strings, like so...
fields={'field_name':'Michael','field_date':'21/06/2015','field_weather':'sunny'}
string="Hello,my name is %(field_name)s,today is %(field_date)s and the weather is %(field_weather)s" % fields
This might be an easier alternative for you?
I have a list of names which I'm using to pull out of a target list of strings. For example:
names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kim','Christmas is here', 'CHRIS']
output = ['Chris Smith', 'Kim', 'CHRIS']
So the rules so far are:
Case insensitive
Cannot match partial word ('ie Christmas/hijacked shouldn't match Chris/Jack)
Other words in string are okay as long as name is found in the string per the above criteria.
To accomplish this, another SO user suggested this code in this thread:
[targ for targ in target_list if any(re.search(r'\b{}\b'.format(name), targ, re.I) for name in first_names)]
This works very accurately so far, but very slowly given the names list is ~5,000 long and the target list ranges from 20-100 lines long with some strings up to 30 characters long.
Any suggestions on how to improve performance here?
SOLUTION: Both of the regex based solutions suffered from OverflowErrors so unfortunately I could not test them. The solution that worked (from #mglison's answer) was:
new_names = set(name.lower() for name in names)
[ t for t in target if any(map(new_names.__contains__,t.lower().split())) ]
This provided a tremendous increase in performance from 15 seconds to under 1 second.
Seems like you could combine them all into 1 super regex:
import re
names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kim','Christmas is here', 'CHRIS']
regex_string = '|'.join(r"(?:\b"+re.escape(x)+r"\b)" for x in names)
print regex_string
regex = re.compile(regex_string,re.I)
print [t for t in target if regex.search(t)]
A non-regex solution which will only work if the names are a single word (no whitespace):
new_names = set(name.lower() for name in names)
[ t for t in target if any(map(new_names.__contains__,t.lower().split())) ]
the any expression could also be written as:
any(x in new_names for x in t.lower().split())
or
any(x.lower() in new_names for x in t.split())
or, another variant which relies on set.intersection (suggested by #DSM below):
[ t for t in target if new_names.intersection(t.lower().split()) ]
You can profile to see which performs best if performance is really critical, otherwise choose the one that you find to be easiest to read/understand.
*If you're using python2.x, you'll probably want to use itertools.imap instead of map if you go that route in the above to get it to evaluate lazily -- It also makes me wonder if python provides a lazy str.split which would have performance on par with the non-lazy version ...
this one is the simplest one i can think of:
[item for item in target if re.search(r'\b(%s)\b' % '|'.join(names), item)]
all together:
import re
names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kim','Christmas is here', 'CHRIS']
results = [item for item in target if re.search(r'\b(%s)\b' % '|'.join(names), item)]
print results
>>>
['Chris Smith', 'Kim']
and to make it more efficient, you can compile the regex first.
regex = re.compile( r'\b(%s)\b' % '|'.join(names) )
[item for item in target if regex.search(item)]
edit
after considering the question and looking at some comments, i have revised the 'solution' to the following:
import re
names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kim','Christmas is here', 'CHRIS']
regex = re.compile( r'\b((%s))\b' % ')|('.join([re.escape(name) for name in names]), re.I )
results = [item for item in target if regex.search(item)]
results:
>>>
['Chris Smith', 'Kim', 'CHRIS']
You're currently doing one loop inside another, iterating over two lists. That's always going to give you quadratic performance.
One local optimisation is to compile each name regex (which will make applying each regex faster). However, the big win is going to be to combine all of your regexes into one regex which you apply to each item in your input. See #mgilson's answer for how to do that. After that, your code performance should scale linearly as O(M+N), rather than O(M*N).