Let's say I have compiled five regular expression patterns and then created five Boolean variables:
a = re.search(first, mystr)
b = re.search(second, mystr)
c = re.search(third, mystr)
d = re.search(fourth, mystr)
e = re.search(fifth, mystr)
I want to use the Powerset of (a, b, c, d, e) in a function so it finds more specific matches first then falls through. As you can see, the Powerset (well, its list representation) should be sorted by # of elements descending.
Desired behavior:
if a and b and c and d and e:
return 'abcde'
if a and b and c and d:
return 'abcd'
[... and all the other 4-matches ]
[now the three-matches]
[now the two-matches]
[now the single matches]
return 'No Match' # did not match anything
Is there a way to utilize the Powerset programmatically and ideally, tersely, to get this function's behavior?
You could use the powerset() generator function recipe in the itertools documentation like this:
from itertools import chain, combinations
from pprint import pprint
import re
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
mystr = "abcdefghijklmnopqrstuvwxyz"
first = "a"
second = "B" # won't match, should be omitted from result
third = "c"
fourth = "d"
fifth = "e"
a = 'a' if re.search(first, mystr) else ''
b = 'b' if re.search(second, mystr) else ''
c = 'c' if re.search(third, mystr) else ''
d = 'd' if re.search(fourth, mystr) else ''
e = 'e' if re.search(fifth, mystr) else ''
elements = (elem for elem in [a, b, c, d, e] if elem is not '')
spec_ps = [''.join(item for item in group)
for group in sorted(powerset(elements), key=len, reverse=True)
if any(item for item in group)]
pprint(spec_ps)
Output:
['acde',
'acd',
'ace',
'ade',
'cde',
'ac',
'ad',
'ae',
'cd',
'ce',
'de',
'a',
'c',
'd',
'e']
First, those aren't booleans; they're either match objects or None. Second, going through the power set would be a terribly inefficient way to go about this. Just stick each letter in the string if the corresponding regex matched:
return ''.join(letter for letter, match in zip('abcde', [a, b, c, d, e]) if match)
Related
I'm a beginner who'd like to return strings in pairs of characters. If the input to the function is odd then the last pair it to include an _.
Example: solution("asdfadb") should return ['as', 'df', 'ad', 'b_']
My code however, returns: ['a', 's']['d', 'f']['a', 'd']['b', '_']
I've tried multiple ways and cannot get it to return the correctly formatted result:
def solution(s):
if len(s)%2 != 0:
s = "".join((s, "_"))
s = list(s)
s = [ s[i:i+2] for i in range(0 , len(s) , 2) ]
s = ''.join(str(pair) for pair in s )
print(s)
solution("asdfadb")
['a', 's']['d', 'f']['a', 'd']['b', '_']
You had a small confusion in the last list comprehension, try this (see my comment):
def solution(s):
if len(s)%2 != 0:
s = "".join((s, "_"))
s = list(s)
s = [ s[i:i+2] for i in range(0 , len(s) , 2) ]
s = [''.join(pair) for pair in s] # For each sublist (aka pair) - do join.
print(s)
Output:
['as', 'df', 'ad', 'b_']
Just a bit more compact than #idanz answer, but the principle is the same:
def solution(s: str):
s = s + "_" if len(s) % 2 !=0 else s
pairs = [s[i:i+2] for i in range(0, len(s), 2)]
print(pairs)
solution("asdfadb")
Output:
['as', 'df', 'ad', 'b_']
Here a solution using a list comprehension, string slicing, and zip_longest:
from itertools import zip_longest
def solution(string):
return ["".join(pair) for pair in zip_longest(string[0::2], string[1::2], fillvalue="_")]
print(solution("asdfadb"))
Output:
['as', 'df', 'ad', 'b_']
I wanted to know how I could separate a text in the different letters it has without saving the same letter twice in python. So the output of a text like "hello" will be {'h','e',l','o'}, counting the letter l only once.
As the comments say, put your word in a set to remove duplicates:
>>> set("hello")
set(['h', 'e', 'l', 'o'])
Iterate through it (sets don't have order, so don't count on that):
>>> h = set("hello")
>>> for c in h:
... print(c)
...
h
e
l
o
Test if a character is in it:
>>> 'e' in h
True
>>> 'x' in h
False
There's a few ways to do this...
word = set('hello')
Or the following...
letters = []
for letter in "hello":
if letter not in letters:
letters.append(letter)
how to compare the 2 list of same length and return the matches and non matches elements have to append as a single element with space
l1=["a","b","c"]
l2=["a","d","c"]
result=[]
for i in l1:
for j in l2:
if i == j:
match = i
result.append(match)
else:
non_match = i + " "+ j
result.append(non_match)
print(result)
Actual Output:
['a', 'a d', 'a c', 'b a', 'b d', 'b c', 'c a', 'c d', 'c']
Expected Output:
["a","b d","c"]
As long as order of the items in the output doesn't matter, you could do this:
Output = list(map(lambda x: " ".join(set(x)), zip(List1, List2)))
>>> Output
['a', 'd b', 'c']
The logic can be broken down at follows:
1: zip the two lists together:
# display the zipped lists:
>>> list(zip(List1, List2))
[('a', 'a'), ('b', 'd'), ('c', 'c')]
2: Turn each tuple in the resulting list into a set (to get unique values):
# display the result of calling set on the zipped lists
>>> list(map(set, zip(List1, List2)))
[{'a'}, {'d', 'b'}, {'c'}]
3: concatenate the members of each set with join
Output = list(map(lambda x: " ".join(set(x)), zip(List1, List2)))
Other answers tell other methods, but I solve your issue. The issue with you code is because you are running through the python list completely two times. You can use zip function from python here. I have solved your code for you.
l1=["a","b","c"]
l2=["a","d","c"]
result=[]
for i,j in zip(l1, l2):
if i == j:
match = i
result.append(match)
else:
non_match = i + " "+ j
result.append(non_match)
print(result)
Loop through both lists, appending a space & the List2 element if the corresponding elements are not equal to each other.
[List1[i] + (f" {List2[i]}" if List1[i] != List2[i] else '') for i in range(len(List1))]
I'll add #grind's answer as well for completeness. I think we both like it a bit better. As mentioned it doesn't need indexes and the formatting also includes the concatenation of left & right which I consider an improvement too.
[left if left == right else f'{left} {right'} for left, right in zip(List1, List2)]
The first one will throw an IndexError if the lengths of the two lists are different. The second will result in a new list which has a length equal to the shorter of the two input lists.
Try:
[' '.join((a, b)) if a != b else a
for a, b in zip(l1, l2)]
zip(l1, l2) can let you iterate l1, l2 simultaneously, ''.join((a, b)) if a != b else a is a Conditional Expression which expresses what you want. Conditional Expression part can be evaluated to value, which will eventually aggregate into the result you want through the list comprehension.
I'm trying to catch if one letter that appears twice in a string using RegEx (or maybe there's some better ways?), for example my string is:
ugknbfddgicrmopn
The output would be:
dd
However, I've tried something like:
re.findall('[a-z]{2}', 'ugknbfddgicrmopn')
but in this case, it returns:
['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn'] # the except output is `['dd']`
I also have a way to get the expect output:
>>> l = []
>>> tmp = None
>>> for i in 'ugknbfddgicrmopn':
... if tmp != i:
... tmp = i
... continue
... l.append(i*2)
...
...
>>> l
['dd']
>>>
But that's too complex...
If it's 'abbbcppq', then only catch:
abbbcppq
^^ ^^
So the output is:
['bb', 'pp']
Then, if it's 'abbbbcppq', catch bb twice:
abbbbcppq
^^^^ ^^
So the output is:
['bb', 'bb', 'pp']
You need use capturing group based regex and define your regex as raw string.
>>> re.search(r'([a-z])\1', 'ugknbfddgicrmopn').group()
'dd'
>>> [i+i for i in re.findall(r'([a-z])\1', 'abbbbcppq')]
['bb', 'bb', 'pp']
or
>>> [i[0] for i in re.findall(r'(([a-z])\2)', 'abbbbcppq')]
['bb', 'bb', 'pp']
Note that , re.findall here should return the list of tuples with the characters which are matched by the first group as first element and the second group as second element. For our case chars within first group would be enough so I mentioned i[0].
As a Pythonic way You can use zip function within a list comprehension:
>>> s = 'abbbcppq'
>>>
>>> [i+j for i,j in zip(s,s[1:]) if i==j]
['bb', 'bb', 'pp']
If you are dealing with large string you can use iter() function to convert the string to an iterator and use itertols.tee() to create two independent iterator, then by calling the next function on second iterator consume the first item and use call the zip class (in Python 2.X use itertools.izip() which returns an iterator) with this iterators.
>>> from itertools import tee
>>> first = iter(s)
>>> second, first = tee(first)
>>> next(second)
'a'
>>> [i+j for i,j in zip(first,second) if i==j]
['bb', 'bb', 'pp']
Benchmark with RegEx recipe:
# ZIP
~ $ python -m timeit --setup "s='abbbcppq'" "[i+j for i,j in zip(s,s[1:]) if i==j]"
1000000 loops, best of 3: 1.56 usec per loop
# REGEX
~ $ python -m timeit --setup "s='abbbcppq';import re" "[i[0] for i in re.findall(r'(([a-z])\2)', 'abbbbcppq')]"
100000 loops, best of 3: 3.21 usec per loop
After your last edit as mentioned in comment if you want to only match one pair of b in strings like "abbbcppq" you can use finditer() which returns an iterator of matched objects, and extract the result with group() method:
>>> import re
>>>
>>> s = "abbbcppq"
>>> [item.group(0) for item in re.finditer(r'([a-z])\1',s,re.I)]
['bb', 'pp']
Note that re.I is the IGNORECASE flag which makes the RegEx match the uppercase letters too.
Using back reference, it is very easy:
import re
p = re.compile(ur'([a-z])\1{1,}')
re.findall(p, u"ugknbfddgicrmopn")
#output: [u'd']
re.findall(p,"abbbcppq")
#output: ['b', 'p']
For more details, you can refer to a similar question in perl: Regular expression to match any character being repeated more than 10 times
It is pretty easy without regular expressions:
In [4]: [k for k, v in collections.Counter("abracadabra").items() if v==2]
Out[4]: ['b', 'r']
Maybe you can use the generator to achieve this
def adj(s):
last_c = None
for c in s:
if c == last_c:
yield c * 2
last_c = c
s = 'ugknbfddgicrmopn'
v = [x for x in adj(s)]
print(v)
# output: ['dd']
"or maybe there's some better ways"
Since regex is often misunderstood by the next developer to encounter your code (may even be you),
And since simpler != shorter,
How about the following pseudo-code:
function findMultipleLetters(inputString) {
foreach (letter in inputString) {
dictionaryOfLettersOccurrance[letter]++;
if (dictionaryOfLettersOccurrance[letter] == 2) {
multipleLetters.add(letter);
}
}
return multipleLetters;
}
multipleLetters = findMultipleLetters("ugknbfddgicrmopn");
A1 = "abcdededdssffffccfxx"
print A1[1]
for i in range(len(A1)-1):
if A1[i+1] == A1[i]:
if not A1[i+1] == A1[i-1]:
print A1[i] *2
>>> l = ['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn']
>>> import re
>>> newList = [item for item in l if re.search(r"([a-z]{1})\1", item)]
>>> newList
['dd']
I have the following requirement.
I have a list which say has 3 elements [X,Y,2]
What I would like to do is to generate strings with a separator (say "-") between (or not) each element. The order of the elements in the array should be preserved.
So the output would be:
'XY2'
'X-Y-2'
'X-Y2'
'XY-2'
is there an elegant way to this in python?
>>> import itertools
>>> for c in itertools.product(' -', repeat=2): print ('X%sY%s2' % c).replace(' ', '')
XY2
XY-2
X-Y2
X-Y-2
Or, with the elements coming from a python list:
import itertools
a = ['X', 'Y', 2]
for c in itertools.product(' -', repeat=2):
print ('%s%s%s%s%s' % (a[0],c[0],a[1],c[1],a[2])).replace(' ', '')
Or, in a slightly different style:
import itertools
a = ['X', 'Y', '2']
for c in itertools.product(' -', repeat=2):
print ( '%s'.join(a) % c ).replace(' ', '')
To capture the output to a list:
import itertools
a = ['X', 'Y', '2']
output = []
for c in itertools.product(' -', repeat=len(a)-1):
output.append( ('%s'.join(a) % c).replace(' ', '') )
print 'output=', output
A little more generalized but works for any number of separators and hopefully is easy to understand at each step:
import itertools
a = ['X', 'Y', '2']
all_separators = ['', '-', '+']
results = []
# this product puts all separators in all positions for len-1 (spaces between each element)
for this_separators in itertools.product(all_separators, repeat=len(a)-1):
this_result = []
for pair in itertools.izip_longest(a, this_separators, fillvalue=''):
for element in pair:
this_result.append(element)
# if you want it, here it is as a comprehension
# this_result = [element for pair
# in itertools.izip_longest(a, this_separators, fillvalue='')
# for element in pair]
this_result_string = ''.join(this_result) # check out join docs if it's new to you
results.append(this_result_string)
print results
>>> ['XY2', 'XY-2', 'XY+2', 'X-Y2', 'X-Y-2', 'X-Y+2', 'X+Y2', 'X+Y-2', 'X+Y+2']
These are the results for your case with just '' and '-' as separators:
>>> ['XY2', 'XY-2', 'X-Y2', 'X-Y-2']
If you want everything in one comprehension:
results = [''.join(element for pair
in itertools.izip_longest(a, this_separators, fillvalue='')
for element in pair)
for this_separators in itertools.product(all_separators, repeat=len(a)-1)]
I don't know if there is a function in itertool in order to do that. But i always think it's fun and a good exercice to do this kind of things. So there is a solution with recursive generator :
def generate(liste):
if len(liste) == 1:
yield [liste]
else:
for i in generate(liste[1:]):
yield [[liste[0]]]+i
yield [ [liste[0]]+i[0] ] + i[1:]
if __name__ == "__main__":
for i in generate (["X","Y","2"]):
print "test : " + str(i)
if len(i) == 1:
print "".join(i[0])
else:
print reduce(
lambda left, right : left + "".join(right),
i,
"")
Something like this?
from itertools import permutations
i = ["X","Y","2"]
for result in permutations(i, 3):
print "-".join(result)
Result:
X-Y-2
X-2-Y
Y-X-2
Y-2-X
2-X-Y
2-Y-X