If any strings in a list match regex [duplicate] - python

This question already has answers here:
How can I check if something is true in any (at least one) iteration of a Python for loop? [duplicate]
(4 answers)
Closed 7 months ago.
I need to check if any of the strings in a list match a regex. If any do, I want to continue. The way I've always done it in the past is using list comprehension with something like:
r = re.compile('.*search.*')
if [line for line in output if r.match(line)]:
do_stuff()
Which I now realize is pretty inefficient. If the very first item in the list matches, we can skip all the rest of the comparisons and move on. I could improve this with:
r = re.compile('.*search.*')
for line in output:
if r.match(line):
do_stuff()
break
But I'm wondering if there's a more pythonic way to do this.

You can use the builtin any():
r = re.compile('.*search.*')
if any(r.match(line) for line in output):
do_stuff()
Passing in the lazy generator to any() will allow it to exit on the first match without having to check any farther into the iterable.

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can also capture a witness of an any expression when a match is found and directly use it:
# pattern = re.compile('.*search.*')
# items = ['hello', 'searched', 'world', 'still', 'searching']
if any((match := pattern.match(x)) for x in items):
print(match.group(0))
# 'searched'
For each item, this:
Applies the regex search (pattern.match(x))
Assigns the result to a match variable (either None or a re.Match object)
Applies the truth value of match as part of the any expression (None -> False, Match -> True)
If match is None, then the any search loop continues
If match has captured a group, then we exit the any expression which is considered True and the match variable can be used within the condition's body

Given that I am not allowed to comment yet, I wanted to provide a small correction to MrAlexBailey's answer, and also answer nat5142's question. Correct form would be:
r = re.compile('.*search.*')
if any(r.match(line) for line in output):
do_stuff()
If you desire to find the matched string, you would do:
lines_to_log = [line for line in output if r.match(line)]
In addition, if you want to find all lines that match any compiled regular expression in a list of compiled regular expressions r=[r1,r2,...,rn], you can use:
lines_to_log = [line for line in output if any(reg_ex.match(line) for reg_ex in r)]

In reply to a question asked by #nat5142, in the answer given by #MrAlexBailey:
"Any way to access the matched string using this method? I'd like to print it for logging purposes", assuming "this" implies to:
if any(re.match(line) for line in output):
do_stuff()
You can do a for loop over the generator
# r = re.compile('.*search.*')
for match in [line for line in output if r.match(line)]:
do_stuff(match) # <- using the matched object here
Another approach is mapping each match with the map function:
# r = re.compile('.*search.*')
# log = lambda x: print(x)
map(log, [line for line in output if r.match(line)])
Although this does not involve the "any" function and might not even be close to what you desire...
I thought this answer was not very relevant so here's my second attempt...
I suppose you could do this:
# def log_match(match):
# if match: print(match)
# return match
if any(log_match(re.match(line)) for line in output):
do_stuff()

Related

Find the word from the list given and replace the words so found

My question is pretty simple, but I haven't been able to find a proper solution.
Given below is my program:
given_list = ["Terms","I","want","to","remove","from","input_string"]
input_string = input("Enter String:")
if any(x in input_string for x in given_list):
#Find the detected word
#Not in bool format
a = input_string.replace(detected_word,"")
print("Some Task",a)
Here, given_list contains the terms I want to exclude from the input_string.
Now, the problem I am facing is that the any() produces a bool result and I need the word detected by the any() and replace it with a blank, so as to perform some task.
Edit: any() function is not required at all, look for useful solutions below.
Iterate over given_list and replace them:
for i in given_list:
input_string = input_string.replace(i, "")
print("Some Task", input_string)
No need to detect at all:
for w in given_list:
input_string = input_string.replace(w, "")
str.replace will not do anything if the word is not there and the substring test needed for the detection has to scan the string anyway.
The problem with finding each word and replacing it is that python will have to iterate over the whole string, repeatedly. Another problem is you will find substrings where you don't want to. For example, "to" is in the exclude list, so you'd end up changing "tomato" to "ma"
It seems to me like you seem to want to replace whole words. Parsing is a whole new subject, but let's simplify. I'm just going to assume everything is lowercase with no punctuation, although that can be improved later. Let's use input_string.split() to iterate over whole words.
We want to replace some words with nothing, so let's just iterate over the input_string, and filter out the words we don't want, using the builtin function of the same name.
exclude_list = ["terms","i","want","to","remove","from","input_string"]
input_string = "one terms two i three want to remove"
keepers = filter(lambda w: w not in exclude_list, input_string.lower().split())
output_string = ' '.join(keepers)
print (output_string)
one two three
Note that we create an iterator that allows us to go through the whole input string just once. And instead of replacing words, we just basically skip the ones we don't want by having the iterator not return them.
Since filter requires a function for the boolean check on whether to include or exclude each word, we had to define one. I used "lambda" syntax to do that. You could just replace it with
def keep(word):
return word not in exclude_list
keepers = filter(keep, input_string.split())
To answer your question about any, use an assignment expression (Python 3.8+).
if any((word := x) in input_string for x in given_list):
# match captured in variable word

How to make python check EACH value

I am working on this function and I want to Return a list of the elements of L that end with the specified token in the order they appear in the original list.
def has_last_token(s,word):
""" (list of str, str) -> list of str
Return a list of the elements of L that end with the specified token in the order they appear in the original list.
>>> has_last_token(['one,fat,black,cat', 'one,tiny,red,fish', 'two,thin,blue,fish'], 'fish')
['one,tiny,red,fish', 'two,thin,blue,fish']
"""
for ch in s:
ch = ch.replace(',' , ' ')
if word in ch:
return ch
So I know that when I run the code and test out the example I provided, it checks through
'one,fat,black,cat'
and sees that the word is not in it and then continues to check the next value which is
'one,tiny,red,fish'
Here it recognizes the word fish and outputs it. But the code doesn't check for the last input which is also valid. How can I make it check all values rather then just check until it sees one valid output?
expected output
>>> has_last_token(['one,fat,black,cat', 'one,tiny,red,fish', 'two,thin,blue,fish'], 'fish')
>>> ['one,tiny,red,fish', 'two,thin,blue,fish']
I'll try to answer your question altering your code and your logic the least I can, in case you understand the answer better this way.
If you return ch, you'll immediately terminate the function.
One way to accomplish what you want is to simply declare a list before your loop and then append the items you want to that list accordingly. The return value would be that list, like this:
def has_last_token(s, word):
result = []
for ch in s:
if ch.endswith(word): # this will check only the string's tail
result.append(ch)
return result
PS: That ch.replace() is unnecessary according to the function's docstring
You are returning the first match and this exits the function. You want to either yield from the loop (creating a generator) or build a list and return that. I would just use endswith in a list comprehension. I'd also rename things to make it clear what's what.
def has_last_token(words_list, token):
return [words for words in words_list if words.endswith(token)]
Another way is to use rsplit to split the last token from the rest of the string. If you pass the second argument as 1 (could use named argument maxsplit in py3 but py2 doesn't like it) it stops after one split, which is all we need here.
You can then use filter rather than an explicit loop to check each string has word as its final token and return a list of only those strings which do have word as their final token.
def has_last_token(L, word):
return filter(lambda s: s.rsplit(',', 1)[-1] == word, L)
result = has_last_token(['one,fat,black,cat',
'one,tiny,red,fish',
'two,thin,blue,fish',
'two,thin,bluefish',
'nocommas'], 'fish')
for res in result:
print(res)
Output:
one,tiny,red,fish
two,thin,blue,fish

Test multiple substrings against a string

If I have an list of strings:
matches = [ 'string1', 'anotherstring', 'astringystring' ]
And I have another string that I want to test:
teststring = 'thestring1'
And I want to test each string, and if any match, do something. I have:
match = 0
for matchstring in matches:
if matchstring in teststring:
match = 1
if !match:
continue
This is in a loop, so we just go around again if we don't get a match (I can reverse this logic of course and do something if it matches), but the code looks clumsy and not pythonic, if easy to follow.
I am thinking there is a better way to do this, but I don't grok python as well as I would like. Is there a better approach?
Note the "duplicate" is the opposite question (though the same answer approach is the same).
You could use any here
Code:
if any(matchstring in teststring for matchstring in matches):
print "Matched"
Notes:
any exits as soon it see's a match.
As per as the loop what is happening is for matchstring in matches here each string from the matches is iterated.
And here matchstring in teststring we are checking if the iterated string is in the defined check string.
The any will exit as soon as it see's a True[match] in the expression.
If you want to know what the first match was you can use next:
match = next((match for match in matches if match in teststring), None)
You have to pass None as the second parameter if you don't want it to raise an exception when nothing matches. It will use the value as the default, so match will be None if nothing is found.
How about you try this:
len([ x for x in b if ((a in x) or (x in a)) ]) > 0
I've updated the answer to check the substring both ways. You can pick and choose or modify as you see fit but I think the basics should be pretty clear.

Get the actual ending when testing with .endswith(tuple)

I found a nice question where one can search for multiple endings of a string using: endswith(tuple)
Check if string ends with one of the strings from a list
My question is, how can I return which value from the tuple is actually found to be the match? and what if I have multiple matches, how can I choose the best match?
for example:
str= "ERTYHGFYUUHGFREDFYAAAAAAAAAA"
endings = ('AAAAA', 'AAAAAA', 'AAAAAAA', 'AAAAAAAA', 'AAAAAAAAA')
str.endswith(endings) ## this will return true for all of values inside the tuple, but how can I get which one matches the best
In this case, multiple matches can be found from the tuple, how can I deal with this and return only the best (biggest) match, which in this case should be: 'AAAAAAAAA' which I want to remove at the end (which can be done with a regular expression or so).
I mean one could do this in a for loop, but maybe there is an easier pythonic way?
>>> s = "ERTYHGFYUUHGFREDFYAAAAAAAAAA"
>>> endings = ['AAAAA', 'AAAAAA', 'AAAAAAA', 'AAAAAAAA', 'AAAAAAAAA']
>>> max([i for i in endings if s.endswith(i)],key=len)
'AAAAAAAAA'
import re
str= "ERTYHGFYUUHGFREDFYAAAAAAAAAA"
endings = ['AAAAA', 'AAAAAA', 'AAAAAAA', 'AAAAAAAA', 'AAAAAAAAA']
print max([i for i in endings if re.findall(i+r"$",str)],key=len)
How about:
len(str) - len(str.rstrip('A'))
str.endswith(tuple) is (currently) implemented as a simple loop over tuple, repeatedly re- running the match, any similarities between the endings are not taken into account.
In the example case, a regular expression should compile into an automaton that essentially runs in linear time:
regexp = '(' + '|'.join(
re.escape(ending) for ending in sorted(endings, key=len, reverse=True
) + ')$'
Edit 1: As pointed out correctly by Martijn Pieters, Python's re does not return the longest overall match, but for alternates only matches the first matching subexpression:
https://docs.python.org/2/library/re.html#module-re:
When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match.
(emphasis mine)
Hence, unfortunately the need for sorting by length.
Note that this makes Python's re different from POSIX regular expressions, which match the longest overall match.

Python parser for Calculator [duplicate]

This question already has answers here:
Evaluating a mathematical expression in a string
(14 answers)
Closed 9 years ago.
I am trying to write a parser which takes expressions as a input from file.
expressions can be A=B=10 or B=(C-A)-4 etc.
What i have tried so far is . I am reading a file IP.txt
import re
opert = '+-/*()_='
fileName = "input.txt"
f = open(fileName,'r')
variableDict = {}
lines = f.readlines()
for i in lines:
for x in re.finditer(r'[A-Z_]\w*', i):
print x.group() # prints list containing all the alphabets.
for z in re.finditer(r'[0-9]\d*', i):
print z.group() # prints list containing all the numbers.
for c in i:
if c in opert:
print c # prints all the operators.
# '_' has special meaning. '_' can only be used before numbers only like _1 or _12 etc
#And i have parsed this also using
print re.findall(r'[_][0-9]\d+',i) # prints the _digits combination.
Now the problem is i have struck at how should i proceed with expression evaluation.
First some rule which i must mention about above inputs are.
No line should be greater then 50 characters.
Left most operator will always be '=' assignment operator.
'=' always Preceded by variables[A-Z],operators are {'+','-','/','*','_'}, digits {0-9}.
How should i first extract the first variable then push it into python list then '=' operator,then either '(','A-Z' push it into stack and so on
Could someone help me with this problem. I am overwhelmed with problem..
If any one is not able to understand the description please goto this link
So, you asked about the stack problem, which of course you need for evaluation. I would do something like this:
import re #1
stack = [] #2 FIX: NOT NECESSARY (since fourth line returns a list anyway)
inputstr = "A=B=C+26-(23*_2 )-D" #3
stack = re.findall(r'(?:[A-Z])|(?:[0-9]+)|(?:[/*+_=\(\)-])', inputstr) #4
while len(stack): #5
print stack.pop() #6
First three lines are some init stuff only. After that, I would make a stack with regex in the fourth line. (?:[A-Z]) matches variable, (?:[0-9]+) matches number (which may have more than one digit) and (?:[/*+_=\(\)-]) matches all the operators. Braces are escaped, and - is on the end, so you don't have to escape it.
Fifth and sixth line prints the stack.
I used (?: ...) because I don't want to match either group. Hard to explain - just try to run it without ?: and you will see the effect.

Categories