Folks,
I have to match the following pattern:
First letter must be N
Second any letter except P
Third have to be S or T
and the Fourth any letter except P again.
The string is only capital letters, no number, white spaces, etc.
So using python this is what I got so far:
import re
strRegex = r"N[^P][ST][^P]"
objRegex = re.compile(strRegex)
print objRegex.findall('NNSTL')
This will print: NNST
What I expect is: NNST and NSTL
Thanks
re.findall will only return non-overlapping matches
Try this:
>>> strRegex = r"N[^P][ST][^P]"
>>> regex = compile(strRegex)
>>> def newfind(regex,str,pos=0):
... result=regex.search(str,pos)
... if result is None: return []
... else: return [result.group()]+newfind(regex,str,result.start()+1)
...
>>>
>>> newfind(regex,'NNSTL')
['NNST', 'NSTL']
Reference: https://mail.python.org/pipermail/tutor/2005-September/041126.html
Related
I have a problem, I try to recognize a pattern among a list of words. I need to find a number of 1 to 6 digits with or without characters around.
my input is this: [1]: https://i.stack.imgur.com/RNOdL.png
With the OCR I obtained:
Kundennummer:
21924
The pattern r"(\D|\A)+\d{5}(\D|\Z)+" works but when I change it to r"(\D|\A)+\d{1,6}(\D|\Z)+" it doesn't.
I used re.match, re.findall and re.search and none of them works
the repr():
'Kundennummer:'
'21924'
Assuming you only need the first match:
import re
ocr_result = """
Kundennummer:
21924
"""
for result in re.findall(r'\d+', ocr_result):
if 1 <= len(result) <= 6:
break
else:
result = None
print(result)
Result:
21924
ocr_result1 = """
Kundennummer:
21924
"""
ocr_result2 = """
Kundennummer:3000
"""
for e in [ocr_result1, ocr_result2]:
print(re.findall(r'\w*\d{1,6}\w*', e))
['21924']
['3000']
This question already has answers here:
How to use re to find consecutive, repeated chars
(3 answers)
Closed 2 years ago.
I want to detect if there are three of the same letter next to each other in a string.
For example:
string1 = 'this is oooonly excaple' # ooo
string2 = 'nooo way that he did this' # ooo
string3 = 'I kneeeeeew it!' # eee
Is there any pythonic way to do this?
I guess that a solution like this is not the best one:
for letters in ['aaa', 'bbb', 'ccc', 'ddd', ..., 'zzz']:
if letters in string:
print(True)
you dont have to use regex but solution is little long for something as simple as that
def repeated(string, amount):
current = None
count = 0
for letter in string:
if letter == current:
count += 1
if count == amount:
return True
else:
count = 1
current = letter
return False
print(repeated("helllo", 3) == True)
print(repeated("hello", 3) == False)
You can use groupby to group similar letters and then check the length of each group:
from itertools import groupby
string = "this is ooonly an examplle nooo wway that he did this I kneeeeeew it!"
for letter, group in groupby(string):
if len(list(group)) >= 3:
print(letter)
Will output:
o
o
e
If you don't care for the letters themselves and just want to know if there was a repetition, take advantage of short-circuiting with the built-in any function:
print(any(len(list(group)) >= 3 for letter, group in groupby(string)))
One of the best ways to tackle these simple pattern problems is with regex
import re
test_cases = [
'abc',
'a bbb a', # expected match for 'bbb'
'bb a b',
'aaa c bbb', # expected match for 'aaa' and 'bbb'
]
for string in test_cases:
# We use re.findall because don't want to keep only with the first result.
# In case we want to stop at the first result, we should use re.search
match = re.findall(r'(?P<repeated_characters>(.)\2{2})', string)
if match:
print([groups[0] for groups in match])
Result:
['bbb']
['aaa', 'bbb']
Use a regular expression:
import re
pattern = r"(\w)\1{2}"
string = "this is ooonly an example"
print(re.search(pattern, string) is not None)
Output:
True
>>>
How about using regex? - ([a-z])\1{2}
>>> import re
>>> re.search(r'([a-z])\1{2}', 'I kneeew it!', flags=re.I)
<re.Match object; span=(4, 7), match='eee'>
re.search will return None if it doesn't find a match, otherwise it'll return a match object, you can get the full match from the match object using [0] on it.
string1 = 'nooo way that he did this'
for i in range(0,len(string1)-2):
sub_st = string1[i:i+3]
if sub_st[0]*3 == sub_st:
print('true')
print statement is from your example.
sub_st[0]*3 clone fist character in the sub_st combine those into single string. If original sub_st and clone one same it means sub_st carries the same latter 3 times.
If you don't need a general answer for n repetitions you can just iterate through the string and print true if previous character and next character are equal to current character, excluding the first and last character.
text = "I kneeeeeew it!"
for i in range(1,len(text)-1):
if text[i-1] == text[i] and text[i+1] == text[i]:
print(True)
break;
We can define following predicate with itertools.groupby + any functions like
from itertools import groupby
def has_repeated_letter(string):
return any(len(list(group)) >= 3 for _, group in groupby(string))
and after that use it
>>> has_repeated_letter('this is oooonly example')
True
>>> has_repeated_letter('nooo way that he did this')
True
>>> has_repeated_letter('I kneeeeeew it!')
True
>>> has_repeated_letter('I kneew it!')
False
Suppose I have a string Hello, world, and I'd like to replace all instances of o by its occurrence number, i.e. get a sting: Hell1, w2rld?
I have found how to reference a numbered group, \g<1>, but it does require a group number.
Is there are way to do what I want in python?
Update: Sorry for not mentioning that I was indeed looking for a regexp solution, not just a string. I have marked the solution I liked the best, but thank for all contributions, they were cool!
For a regular expression solution:
import re
class Replacer:
def __init__(self):
self.counter = 0
def __call__(self, mo):
self.counter += 1
return str(self.counter)
s = 'Hello, World!'
print(re.sub('o', Replacer(), s))
Split the string on the letter "o" and reassemble it by adding the index of the part in front of each part (except the first one):
string = "Hello World"
result = "".join(f"{i or ''}"+part for i,part in enumerate(string.split("o")))
output:
print(result)
# Hell1 W2rld
Using itertools.count
Ex:
import re
from itertools import count
c = count(1)
s = "Hello, world"
print(re.sub(r"o", lambda x: "{}".format(next(c)), s))
#or
print(re.sub(r"o", lambda x: f"{next(c)}", s))
# --> Hell1, w2rld
You don't need regex for that, a simple loop will suffice:
sample = "Hello, world"
pattern = 'o'
output = ''
count = 1
for char in sample:
if char == pattern:
output += str(count)
count += 1
else:
output += char
print(output)
>>> Hell1, w2rld
I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?
Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK
You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'
Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'
Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)
You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?
You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E
this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()
There is a string, it contains numbers and characters.
I need to find an entire number(s) (in that string) that contains number 467033.
e.g. 1.467033777777777
Thanks
Try this:
import re
RE_NUM = re.compile('(\d*\.\d+)', re.M)
text = 'eghwodugo83o135.13508yegn1.4670337777777773u87208t'
for num in RE_NUM.findall(text):
if '467033' in num:
print num
Prints:
1.4670337777777773
Generalized / optimized in response to comment:
def find(text, numbers):
pattern = '|'.join('[\d.]*%s[\d.]*' % n for n in numbers)
re_num = re.compile(pattern, re.M)
return [m.group() for m in re_num.finditer(text)]
print find(text, ['467033', '13'])
Prints:
['135.13508', '1.4670337777777773']
If you're just searching for a substring within another substring, you can use in:
>>> sub_num = "467033"
>>> my_num = "1.467033777777777"
>>> sub_num in my_num
True
However, I suspect there's more to your problem than just searching strings, and that doing it this way might not be optimal. Can you be more specific about what you're trying to do?
import re
a = 'e.g. 1.467033777777777\nand also 576575567467033546.90 Thanks '
r = re.compile('[0-9.]*467033[0-9.]*')
r.findall(a)
['1.467033777777777', '576575567467033546.90']