Substitutions with elements from a list with re.sub? - python

What is the best way to perform substitutions with re.sub given a list? For example:
import re
some_text = 'xxxxxxx#yyyyyyyyy#zzzzzzzzz#'
substitutions = ['ONE', 'TWO', 'THREE']
x = re.sub('#', lambda i: i[0] substitutions.pop(0), some_text) # this doesn't actually work
The desired output would be:
some_text = 'xxxxxxxONEyyyyyyyyyTWOzzzzzzzzzTHREE'

You just have a syntax error in your lambda:
>>> substitutions = ['ONE', 'TWO', 'THREE']
>>> re.sub('#', lambda _: substitutions.pop(0), some_text)
'xxxxxxxONEyyyyyyyyyTWOzzzzzzzzzTHREE'
If you don't want to modify the list, you can wrap it an iterable.
>>> substitutions = ['ONE', 'TWO', 'THREE']
>>> subs = iter(substitutions)
>>> re.sub('#', lambda _: next(subs), some_text)
'xxxxxxxONEyyyyyyyyyTWOzzzzzzzzzTHREE'

One way (there's probably a better one, I don't really know Python) is to compile the regular expression, then use that sub instead:
import re
some_text = 'xxxxxxx#yyyyyyyyy#zzzzzzzzz#'
substitutions = ['ONE', 'TWO', 'THREE']
pattern = re.compile('#')
x = pattern.sub(lambda i: substitutions.pop(0), some_text)
Here's a demo.

The code is almost correct, it needs a slight correction of a syntax error:
import re
some_text = 'xxxxxxx#yyyyyyyyy#zzzzzzzzz#'
substitutions = ['ONE', 'TWO', 'THREE']
x = re.sub('#', lambda i: substitutions.pop(0), some_text) # the error was in the lambda function

Related

List all elements, but only one of duplicated elements?

Say I have a list of strings such as
words = ['one', 'two', 'one', 'three', 'three']
I want to create a new list in alphabetical order like
newList = ['one', 'three', 'two']
Anyone have any solutions? I have seen suggestions that output duplicates, but I cannot figure out how to achieve this particular goal (or maybe I just can't figure out how to google well.)
Throw the contents into a set to remove duplicates and sort:
newList = sorted(set(words))
OR maybe this, using set:
newList=sorted({*words})
If Order of elements in words is important for you. You can try this.
from collections import OrderedDict
words = ['one', 'two', 'one', 'three', 'three']
w1 = OrderedDict()
for i in words:
if i in w1:
w1[i]+=1
else:
w1[i] = 1
print(w1.keys())

Converting String list to pure list in Python

I have a string type list from bash which looks like this:
inp = "["one","two","three","four","five"]"
The input is coming from bash script.
In my python script I would like to convert this to normal python list in this format:
["one","two","three","four","five"]
where all elements would be string, but the whole thin is represented as list.
I tried: list(inp)
it does not work. Any suggestions?
Try this code,
import ast
inp = '["one","two","three","four","five"]'
ast.literal_eval(inp) # will prints ['one', 'two', 'three', 'four', 'five']
Have a look at ast.literal_eval:
>>> import ast
>>> inp = '["one","two","three","four","five"]'
>>> converted_inp = ast.literal_eval(inp)
>>> type(converted_inp)
<class 'list'>
>>> print(converted_inp)
['one', 'two', 'three', 'four', 'five']
Notice that your original input string is not a valid python string, since it ends after "[".
>>> inp = "["one","two","three","four","five"]"
SyntaxError: invalid syntax
The solution using re.sub() and str.split() functions:
import re
inp = '["one","two","three","four","five"]'
l = re.sub(r'["\]\[]', '', inp).split(',')
print(l)
The output:
['one', 'two', 'three', 'four', 'five']
you can use replace and split as the following:
>>> inp
"['one','two','three','four','five']"
>>> inp.replace('[','').replace(']','').replace('\'','').split(',')
['one', 'two', 'three', 'four', 'five']

Expand Variables in a List Python

Need to know how to expand variable inside list element
>>> one_more = "four"
>>> var_names = ["one", "two", "three_<expand variable one_more>"]
should get something like
['one', 'two', 'three_four']
Very basic:
In [1]: a="four"
In [2]: b="five"
In [3]: ['one', 'two', 'three_%s' % a]
Out[3]: ['one', 'two', 'three_four']
You could also join a list a variables:
In [5]: ['one', 'two', 'three_%s' % '_'.join((a,b))]
Out[5]: ['one', 'two', 'three_four_five']
Here is the same solution with str.format:
In [6]: ['one', 'two', 'three_{}'.format('_'.join((a,b)))]
Out[6]: ['one', 'two', 'three_four_five']
I presume you want to do a string replacement within a list. Here is an example fits your requirement.
one_more = "four"
var_names = ["one", "two", "three_<var>"]
print [x.replace("<var>", one_more) for x in var_names]
>>> ["one", "two", "three_four"]
If you want to have replace more than one pattern in one shot, you can do this:
a = "AA"
b = "BB"
var_names = ["one", "two", "three_$a", "four_$b"]
def batch_replace(str, lookup):
for pattern in lookup:
replacement = lookup[pattern]
str = str.replace(pattern, replacement)
return str
print [batch_replace(x, {"$a": a, "$b": b}) for x in var_names]
>>> ["one", "two", "three_AA", "four_BB"]

'LIKE' function for Lists

Is there any equivalent 'LIKE' function(like in MySQL) for lists. for example;
This is my list:
abc = ['one', 'two', 'three', 'twenty one']
if I give the word "on", it should print the matching words from the list (in this case: 'one', 'twenty one') and if I give "fo", it should print False
You can use list comprehension:
[m for m in abc if 'on' in m]
This roughly translates to "for each element in abc, append element to a list if the element contains the substring 'on'"
>>> abc = ['one', 'two', 'three', 'twenty one']
>>> print [word for word in abc if 'on' in word]
['one', 'twenty one']
Would these list comprehensions suffice?
>>> abc = ['one', 'two', 'three', 'twenty one']
>>> [i for i in abc if 'on' in i]
['one', 'twenty one']
>>> [i for i in abc if 'fo' in i]
[]
You could wrap this in a function:
>>> def like(l, word):
... words = [i for i in abc if word in i]
... if words:
... print '\n'.join(words)
... else:
... print False
...
>>> like(abc, 'on')
one
twenty one
>>> like(abc, 'fo')
False
>>>
for x in abc:
if "on" in x:
print x
Or, as a function,
def like(str, search):
res = []
for x in search:
if str in x:
res.append(x)
return res

Effective way to iteratively append to a string in Python?

I'm writing a Python function to split text into words, ignoring specified punctuation. Here is some working code. I'm not convinced that constructing strings out of lists (buf = [] in the code) is efficient though. Does anyone have a suggestion for a better way to do this?
def getwords(text, splitchars=' \t|!?.;:"'):
"""
Generator to get words in text by splitting text along specified splitchars
and stripping out the splitchars::
>>> list(getwords('this is some text.'))
['this', 'is', 'some', 'text']
>>> list(getwords('and/or'))
['and', 'or']
>>> list(getwords('one||two'))
['one', 'two']
>>> list(getwords(u'hola unicode!'))
[u'hola', u'unicode']
"""
splitchars = set(splitchars)
buf = []
for char in text:
if char not in splitchars:
buf.append(char)
else:
if buf:
yield ''.join(buf)
buf = []
# All done. Yield last word.
if buf:
yield ''.join(buf)
http://www.skymind.com/~ocrow/python_string/ talks about several ways of concatenating strings in Python and assesses their performance as well.
You don't want to use re.split?
import re
re.split("[,; ]+", "coucou1 , coucou2;coucou3")
You can use re.split
re.split('[\s|!\?\.;:"]', text)
However if the text is very large the resulting array may be consuming too much memory. Then you may consider re.finditer:
import re
def getwords(text, splitchars=' \t|!?.;:"'):
words_iter = re.finditer(
"([%s]+)" % "".join([("^" + c) for c in splitchars]),
text)
for word in words_iter:
yield word.group()
# a quick test
s = "a:b cc? def...a||"
words = [x for x in getwords(s)]
assert ["a", "b", "cc", "def", "a"] == words, words
You can split the input using re.split():
>>> splitchars=' \t|!?.;:"'
>>> re.split("[%s]" % splitchars, "one\ttwo|three?four")
['one', 'two', 'three', 'four']
>>>
EDIT: If your splitchars may contain special chars like ] or ^, you can use re.escpae()
>>> re.escape(splitchars)
'\\ \\\t\\|\\!\\?\\.\\;\\:\\"'
>>> re.split("[%s]" % re.escape(splitchars), "one\ttwo|three?four")
['one', 'two', 'three', 'four']
>>>

Categories