Multi-condition regex in Python - python

I need to search a string for a list of several different matches, let's say I have this list:
['this', 'is', 'a', 'regex', 'test']
I want to see if any of those items is within a string, either using regex or any other method in Python.
I tried first just doing string in list, but that proved to be insufficient, so I tried concatenating the conditions in a regex like:
(this|is)(a|regex)(text)
But that tries to match several of the items as if they were concatenated.

You can use the built-in function any():
In [1]: strs="I am a string"
In [2]: lis=['this', 'is', 'a', 'regex', 'test']
In [3]: any(x in strs for x in lis)
Out[3]: True
This will return True for something like "thisisafoobar" as well.
But if you want to match the exact word, then try re.search() or str.split():
In [4]: import re
In [5]: any(re.search(r"\b{0}\b".format(x),strs) for x in lis)
Out[5]: True
In [6]: strs="foo bar"
In [7]: any(re.search(r"\b{0}\b".format(x),strs) for x in lis)
Out[7]: False
Using str.split():
In [12]: strs="I am a string"
In [13]: spl=strs.split() #use set(strs.split()) if the list returned is huge
In [14]: any(x in spl for x in lis)
Out[14]: True
In [15]: strs="Iamastring"
In [16]: spl=strs.split()
In [17]: any(x in spl for x in lis)
Out[17]: False

>>> l = ['this', 'is', 'a', 'regex', 'test']
>>> s = 'this is a test string'
>>> def check(elements, string):
... for element in elements:
... if element in string:
... return True
... return False
...
>>> check(l, s)
True
Apparently this function has better performance than any()
import time
def main():
# Making a huge list
l = ['this', 'is', 'a', 'regex', 'test'] * 10000
s = 'this is a test string'
def check(elements, string):
for element in elements:
if element in string:
return True
return False
def test_a(elements, string):
"""Testing check()"""
start = time.time()
check(elements, string)
end = time.time()
return end - start
def test_b(elements, string):
"""Testing any()"""
start = time.time()
any(element in string for element in elements)
end = time.time()
return end - start
print 'Using check(): %s' % test_a(l, s)
print 'Using any(): %s' % test_b(l, s)
if __name__ == '__main__':
main()
Results:
pearl:~ pato$ python test.py
Using check(): 3.09944152832e-06
Using any(): 5.96046447754e-06
pearl:~ pato$ python test.py
Using check(): 1.90734863281e-06
Using any(): 7.15255737305e-06
pearl:~ pato$ python test.py
Using check(): 2.86102294922e-06
Using any(): 6.91413879395e-06
But if you combine any() with map() in something like any(map(lambda element: element in string, elements)), these are the results:
pearl:~ pato$ python test.py
Using check(): 3.09944152832e-06
Using any(): 0.00903916358948
pearl:~ pato$ python test.py
Using check(): 2.86102294922e-06
Using any(): 0.00799989700317
pearl:~ pato$ python test.py
Using check(): 3.09944152832e-06
Using any(): 0.00829982757568

You could do:
if any(test in your_string for test in tests):
...

Related

Searching for obstacles in list of lists?

I'm writing a function that when given a list of lists it can determine if a specific first character is followed by a specific second character (either once or repeated) or is followed by another random string. When another random string is present the function should return False but when no random string is present the function should return True.
Basically if any other character other then . comes after B in its current list or any following lists then it should return False but if only . is present then it should return True.
For example, if the first character was B and the second character was . and the list of lists was [['.','.','B','.'],['.','.','.','.']] then it should return True but if the list of lists was [['a','c','B','r'],['.','s','g','h']] it should return False since a series of random strings follows B.
Any tips or help would be appreciated this is the code I have so far:
def free_of_obstacles(lst):
A = 'B'
B = '.'
for i, v in enumerate(lst):
if A in v:
continue
if B in v:
continue
return True
else:
return False
You could join all the chars in each list and join all the joined strings into a single string and then apply a regex to check if there is a match or not:
>>> lst=[['.','.','B','.'],['.','.','.','.']]
>>> import re
>>> bool(re.search(r'B(\.+)$', ''.join(''.join(i) for i in lst)))
True
>>> lst=[['a','c','B','r'],['.','s','g','h']]
>>> bool(re.search(r'B(\.+)$', ''.join(''.join(i) for i in lst)))
False
>>>
EDIT 1 ----> Above solution as a function returning True or False:
>>> import re
>>> def free_of_obstacles(lst):
... return bool(re.search(r'B(\.+)$', ''.join(''.join(i) for i in lst)))
...
>>> lst=[['a','c','B','r'],['.','s','g','h']]
>>> free_of_obstacles(lst)
False
>>> lst=[['.','.','B','.'],['.','.','.','.']]
>>> free_of_obstacles(lst)
True
Without using any imported modules:
Sample run 1
>>> lst=[['.','.','B','.'],['.','.','.','.']]
>>> newlst=[j for i in lst for j in i]
>>> newlst=newlst[newlst.index('B')+1:]
>>> newlst
['.', '.', '.', '.', '.']
>>> list(map(lambda x:x=='.', newlst))
[True, True, True, True, True]
>>> all(list(map(lambda x:x=='.', newlst)))
True
Sample run 2
>>> lst=[['a','c','B','r'],['.','s','g','h']]
>>> newlst=[j for i in lst for j in i]
>>> newlst=newlst[newlst.index('B')+1:]
>>> newlst
['r', '.', 's', 'g', 'h']
>>> list(map(lambda x:x=='.', newlst))
[False, True, False, False, False]
>>> all(list(map(lambda x:x=='.', newlst)))
False
EDIT 2 ----> Above solution as a function returning True or False:
>>> def free_of_obstacles(lst):
... newlst=[j for i in lst for j in i]
... newlst=newlst[newlst.index('B')+1:]
... return all(list(map(lambda x:x=='.', newlst)))
...
>>> lst=[['.','.','B','.'],['.','.','.','.']]
>>> free_of_obstacles(lst)
True
>>> lst=[['a','c','B','r'],['.','s','g','h']]
>>> free_of_obstacles(lst)
False
>>>

use .format() in a string in two steps

I have a string in which I want to replace some variables, but in different steps, something like:
my_string = 'text_with_{var_1}_to_variables_{var_2}'
my_string.format(var_1='10')
### make process 1
my_string.format(var_2='22')
But when I try to replace the first variable I get an Error:
KeyError: 'var_2'
How can I accomplish this?
Edit:
I want to create a new list:
name = 'Luis'
ids = ['12344','553454','dadada']
def create_list(name,ids):
my_string = 'text_with_{var_1}_to_variables_{var_2}'.replace('{var_1}',name)
return [my_string.replace('{var_2}',_id) for _id in ids ]
this is the desired output:
['text_with_Luis_to_variables_12344',
'text_with_Luis_to_variables_553454',
'text_with_Luis_to_variables_dadada']
But using .format instead of .replace.
In simple words, you can not replace few arguments with format {var_1}, var_2 in string(not all) using format. Even though I am not sure why you want to only replace partial string, but there are few approaches that you may follow as a workaround:
Approach 1: Replacing the variable you want to replace at second step by {{}} instead of {}. For example: Replace {var_2} by {{var_2}}
>>> my_string = 'text_with_{var_1}_to_variables_{{var_2}}'
>>> my_string = my_string.format(var_1='VAR_1')
>>> my_string
'text_with_VAR_1_to_variables_{var_2}'
>>> my_string = my_string.format(var_2='VAR_2')
>>> my_string
'text_with_VAR_1_to_variables_VAR_2'
Approach 2: Replace once using format and another using %.
>>> my_string = 'text_with_{var_1}_to_variables_%(var_2)s'
# Replace first variable
>>> my_string = my_string.format(var_1='VAR_1')
>>> my_string
'text_with_VAR_1_to_variables_%(var_2)s'
# Replace second variable
>>> my_string = my_string % {'var_2': 'VAR_2'}
>>> my_string
'text_with_VAR_1_to_variables_VAR_2'
Approach 3: Adding the args to a dict and unpack it once required.
>>> my_string = 'text_with_{var_1}_to_variables_{var_2}'
>>> my_args = {}
# Assign value of `var_1`
>>> my_args['var_1'] = 'VAR_1'
# Assign value of `var_2`
>>> my_args['var_2'] = 'VAR_2'
>>> my_string.format(**my_args)
'text_with_VAR_1_to_variables_VAR_2'
Use the one which satisfies your requirement. :)
Do you have to use format? If not, can you just use string.replace? like
my_string = 'text_with_#var_1#_to_variables_#var2#'
my_string = my_string.replace("#var_1#", '10')
###
my_string = my_string.replace("#var2#", '22')
following seems to work now.
s = 'a {} {{}}'.format('b')
print(s) # prints a b {}
print(s.format('c')) # prints a b c

Does a True value exist in a list of dictionaries?

I create a list of dictionaries like this:
list = []
for i in xrange(4):
list.append({})
list[i]['a'] = False
Now after a while, I want to (using a single line of code) see if any of the 'a' values are True.
I have tried:
anyTrue = True in list # always returns false
anyTrue = True in list[:]['a']
Is there such a convenient way of doing this?
Thanks!
Using any with generator expression:
>>> lst = []
>>>
>>> for i in xrange(4):
... lst.append({})
... lst[i]['a'] = False
...
>>> any(d['a'] for d in lst)
False
>>> lst[1]['a'] = True
>>> any(d['a'] for d in lst)
True
BTW, don't use the list as a variable name. It shadows builtin function/type list.
You can use any and a generator expression:
if any(x['a'] for x in list):
# Do stuff
See a demonstration below:
>>> lst = []
>>> for i in xrange(4):
... lst.append({})
... lst[i]['a'] = False
...
>>> any(x['a'] for x in lst)
False
>>> lst[2]['a'] = True # Set an 'a' value to True
>>> any(x['a'] for x in lst)
True
>>>
Also, you should refrain from naming a variable list. Doing so overshadows the built-in.

Checking two string in python?

let two strings
s='chayote'
d='aceihkjouty'
the characters in string s is present in d Is there any built-in python function to accomplish this ?
Thanks In advance
Using sets:
>>> set("chayote").issubset("aceihkjouty")
True
Or, equivalently:
>>> set("chayote") <= set("aceihkjouty")
True
I believe you are looking for all and a generator expression:
>>> s='chayote'
>>> d='aceihkjouty'
>>> all(x in d for x in s)
True
>>>
The code will return True if all characters in string s can be found in string d.
Also, if string s contains duplicate characters, it would be more efficient to make it a set using set:
>>> s='chayote'
>>> d='aceihkjouty'
>>> all(x in d for x in set(s))
True
>>>
Try this
for i in s:
if i in d:
print i

how to define function with variable arguments in python - there is 'but'

I am going to define a function which takes a variable number of strings and examines each string and replaces / with -. and then return them back. (here is my logic problem - return what?)
def replace_all_slash(**many):
for i in many:
i = i.replace('/','-')
return many
is it correct? how can i recollect the strings as separate strings again?
example call:
allwords = replace_all_slash(word1,word2,word3)
but i need allwords to be separate strings as they were before calling the function. how to do this?
i hope i am clear to understand
You want to use *args (one star) not **args:
>>> def replace_all_slash(*words):
return [word.replace("/", "-") for word in words]
>>> word1 = "foo/"
>>> word2 = "bar"
>>> word3 = "ba/zz"
>>> replace_all_slash(word1, word2, word3)
['foo-', 'bar', 'ba-zz']
Then, to re-assign them into the same variables, use the assignment unpacking syntax:
>>> word1
'foo/'
>>> word2
'bar'
>>> word3
'ba/zz'
>>> word1, word2, word3 = replace_all_slash(word1, word2, word3)
>>> word1
'foo-'
>>> word2
'bar'
>>> word3
'ba-zz'
Solution one: create a new list and append that that:
def replace_all_slash(*many):
result = []
for i in many:
result.append(i.replace('/','-'))
return result
Solution two using a list comprehension:
def replace_all_slash(*many):
return [i.replace('/','-') for i in many]
You should rewrite your function:
def replace_all_slash(*args):
return [s.replace('/','-') for s in args]
and you can call it this way:
w1,w2,w3 = replace_all_slash("AA/","BB/", "CC/")
Disassembling the arguments in the calling code requires variables for each string.
word1,word2,word3 = replace_all_slash(word1,word2,word3)

Categories