Regular expression to find largest repeating pattern? [duplicate] - python

This question already has answers here:
Longest consecutive substring of certain character type in python
(2 answers)
Closed 2 years ago.
How can I use regular expressions to find the largest repeating pattern?
For example, in the string "CATchickenchickenCATCATCATCATchickenchickenCATCATchicken"
I need a way to get this string: "CATCATCATCAT" since it is the largest repeating chunk of my substring "CAT"
How can I do this?
Thanks :)

import re
string = "CATchickenchickenCATCATCATCATchickenchickenCATCATchicken"
pattern = "((CAT)+)"
print(max(re.findall(pattern, string), key=lambda tpl: len(tpl[0]))[0])
Output:
CATCATCATCAT
>>>

Related

Using regex to match two specific characters [duplicate]

This question already has answers here:
Regular Expressions: Is there an AND operator?
(14 answers)
Closed 2 years ago.
I have a list of strings like:
1,-102a
1,123-f
1943dsa
-da238,
-,dwjqi92
How can I make a Regex expression in Python that matches as long as the string contains the characters , AND - regardless of the order or the pattern in which they appear?
I would use the following regex alternation:
,.*-|-.*,
Sample script:
inp = ['1,-102a', '1,123-f', '1943dsa', '-da238,', '-,dwjqi92']
output = [x for x in inp if re.search(r',.*-|-.*,', x)]
print(output)
This prints:
['1,-102a', '1,123-f', '-da238,', '-,dwjqi92']

Finding strings with gaps that match a string in a list of strings [duplicate]

This question already has answers here:
Python wildcard search in string
(6 answers)
Closed 5 years ago.
I have a list of strings that looks like this: ['ban*', 'c*rr*r', 'pl*s', pist*l ]. I want to check if those strings have matching equivalents in another list of strings which is the following:
['banner', 'bannana', ban, 'carrer', 'clorror', 'planes', 'plots']
Comparing first string from the list I have'banner' and 'bannana' and that would mean that there is a word that is matching that string ("ban*") So the '*' means that there can be one or more letters in that word.
Try this fnmatch approach
import fnmatch
lst = ['banner', 'bannana', 'ban', 'carrer', 'clorror', 'planes', 'plots']
f1 = fnmatch.filter(lst, 'ban*')
print (f1)
Output
['banner', 'bannana', 'ban']

How can I remove the duplicated consecutive chars and reserve the first one using regex? [duplicate]

This question already has answers here:
Remove duplicate chars using regex?
(3 answers)
Closed 6 years ago.
I found a code snippet of removing duplicated consecutive characters and reserving the first character in Python by regex from web like this:
import re
re.sub(r'(?s)(.)(?=.*\1)','','aabbcc') #'abc'
But there is a defect that if the string is 'aabbccaabb' it will ignore the first 'aa', 'bb' and turn out 'cab'.
re.sub(r'(?s)(.)(?=.*\1)','','aabbccaabb') #'cab'
Is there a way to solve it by regex?
Without regex, check if previous character is the same as current, using a list comprehension with a condition and join the results:
s='aabbccaabb'
print("".join([c for i,c in enumerate(s) if i==0 or s[i-1]!=c]))
Just remove the .* in the positive look ahead.
import re
print re.sub(r'(?s)(.)(?=\1)','','aabbcc')
print re.sub(r'(?s)(.)(?=\1)','','aabbccaabb')
Output:
abc
abcab

Splitting two concatenated terms in python [duplicate]

This question already has answers here:
Split a string at uppercase letters
(22 answers)
Closed 6 years ago.
In general I have a string say
temp = "ProgramFields"
Now I want to split strings like these into two terms(I can identify tow strings based on uppercase character)
term1 = "Program"
term2 = "Field"
How to achieve this in python?
I tried regular expression and splitting terms but nothing gave me the result that I expected
Python code -
re.split("[A-Z][a-z]*","ProgramField")
Any suggestions?
You have to include groups:
re.split('([A-Z][a-z]*)', 'ProgramField)

Count how many time a string appears in a longer string [duplicate]

This question already has answers here:
String count with overlapping occurrences [closed]
(25 answers)
Closed 7 years ago.
So I have a little problem,
I want to count how many times a string : "aa" is in my longer string "aaatattgg" its looks like a dna sequence.
Here for exemple I expect 2 (overlap is allow)
There is the .count method but overlap is not allowed
PS: excuse my english , I'm french
Through re module. Put your regex inside positive lookarounds in-order to do overlapping match.
>>> import re
>>> s = "aaatattgg"
>>> re.findall(r'(?=(aa))', s)
['aa', 'aa']
>>> len(re.findall(r'(?=(aa))', s))
2

Categories