Python doesn't recognize regex pattern - Number between 1 and 6 digits - python

I have a problem, I try to recognize a pattern among a list of words. I need to find a number of 1 to 6 digits with or without characters around.
my input is this: [1]: https://i.stack.imgur.com/RNOdL.png
With the OCR I obtained:
Kundennummer:
21924
The pattern r"(\D|\A)+\d{5}(\D|\Z)+" works but when I change it to r"(\D|\A)+\d{1,6}(\D|\Z)+" it doesn't.
I used re.match, re.findall and re.search and none of them works
the repr():
'Kundennummer:'
'21924'

Assuming you only need the first match:
import re
ocr_result = """
Kundennummer:
21924
"""
for result in re.findall(r'\d+', ocr_result):
if 1 <= len(result) <= 6:
break
else:
result = None
print(result)
Result:
21924

ocr_result1 = """
Kundennummer:
21924
"""
ocr_result2 = """
Kundennummer:3000
"""
for e in [ocr_result1, ocr_result2]:
print(re.findall(r'\w*\d{1,6}\w*', e))
['21924']
['3000']

Related

Python return regexp-formatted string

There is an input string like "2r-rj1225-f11e-12-x-w"
The task is to return it in the following format:
all groups except the first and last must be 5 characters
the first and the last groups must be between 1 and 5 characters
if the first group in the input is less than 5 characters, it must be preserved
that results to is "2r-rj122-5f11e-12xw"
import re
string = "2r-rj1225-f11e-12-x-w"
baseLength = 5
def formatKey(string: str, baseLength: int) -> str:
p = re.compile(r"{1,baseLength}[a-zA-Z0-9]{baseLength}[a-zA-z0-9]+")
formatted = '-'.join(p.match(string))
return formatted
print(f'The reformatted string is {formatKey(string, baseLength)}')
that does not work, naturally. And I also wish to avoid '-'.join and to simply return something like regexp(re.compile('[a-z]FORMATREGEXP'), string) where FORMATREGEXP is the regexp that does the job.
Clarification: The actual solution is to use re.sub(pattern, repl, string) function: "The sub() function searches for the pattern in the string and replaces the matched strings with the replacement" -- And that is exactly what I've been asking for, that simple, in one line!!
I don't really see this as a regex problem. It's just reorganizing the characters after the first hyphen.
x = "2r-rj1225-f11e-12-x-w"
def reencode(x):
parts = x.split('-')
p1 = ''.join(parts[1:])
s = parts[0]
while len(p1) >= 5:
s += '-' + p1[:5]
p1 = p1[5:]
if p1:
s += '-' + p1
return s
print(reencode(x))
Output:
2r-rj122-5f11e-12xw

Python - Function to find last number

I'm trying to create a function that returns the very last digit in a python string and if there are no digits in the string, it simply returns -5 as a result.
This is what i've gotten but it returns 0 if the string is made up of no digits or if the final character in the string is not a digit.
For example, LastNum("1*2*3*") should return 3, LastNum("****") should return -5. Help is greatly appreciated.
def LastNum(st):
Result = 0
for i in (st):
if i.isdigit():
Result = Result + int(max(st[-1::]))
return Result
It would be a good idea to start searching from the reverse
def lastNum(st):
# st[::-1] is reverse of st
for s in st[::-1]:
if s.isdigit():
return int(s)
return -5
I don't understand the intended logic behind your code, but this simpler one should work:
def LastNum(st):
Result = -5
for i in st:
if i.isdigit():
Result = int(i)
return Result
You can use regex as follows too.
import re
def LastNum(st):
result = -5
regex_res = re.findall('\d+', st)
if regex_res:
result = regex_res[-1]
return result
print LastNum("1*2*3*")
output:
3
A nice one-liner (probably not the most readable way to do it though)
import re
def LastNum(st, default=-5):
return int((re.findall(r'\d', st) or [default])[-1])

Match RegEx pattern within pattern

Folks,
I have to match the following pattern:
First letter must be N
Second any letter except P
Third have to be S or T
and the Fourth any letter except P again.
The string is only capital letters, no number, white spaces, etc.
So using python this is what I got so far:
import re
strRegex = r"N[^P][ST][^P]"
objRegex = re.compile(strRegex)
print objRegex.findall('NNSTL')
This will print: NNST
What I expect is: NNST and NSTL
Thanks
re.findall will only return non-overlapping matches
Try this:
>>> strRegex = r"N[^P][ST][^P]"
>>> regex = compile(strRegex)
>>> def newfind(regex,str,pos=0):
... result=regex.search(str,pos)
... if result is None: return []
... else: return [result.group()]+newfind(regex,str,result.start()+1)
...
>>>
>>> newfind(regex,'NNSTL')
['NNST', 'NSTL']
Reference: https://mail.python.org/pipermail/tutor/2005-September/041126.html

Python how to eliminate 3 or more consequent letter

I am trying to replace those words which had continuous letters more than 3 example realllllly to really.
pattern = re.compile(r"(.)\1\1{2,}", re.DOTALL)
return pattern.sub(r"\1\1\1", text)
I can't get it work anyone can help?
Your solution actually appears to be working correctly:
>>> import re
>>> a = 'foooooooo baaaar'
>>> reg = re.compile( r"(.)\1\1{2,}")
>>> reg.sub(r'\1\1', a)
'foo baar'
But based on comment, you want to replace xyyyx by xyyx, but you've specified regexp for at least 4 of them, therefor only xyyyyx gets replaced... Simply change this line:
>>> reg = re.compile( r"(.)\1{2,}")
>>> reg.sub(r'\1\1', 'fooo baaaar actuallly')
'foo baar actually'
I'd suggest not to use regular expressions when they're not necessary. This task can be accomplished easily without, in a more readable fashion, with linear time and constant space complexity (not sure about the regex).
def filter_repetitions(text, max_repetitions=0):
last_character = None
repetition_count = 0
for character in text:
if character == last_character:
repetition_count += 1
else:
last_character = character
repetition_count = 0
if repetition_count <= max_repetitions:
yield character
print ''.join(filter_repetitions("fooo baaaar actuallly", 1))

Find inside a string in Python

There is a string, it contains numbers and characters.
I need to find an entire number(s) (in that string) that contains number 467033.
e.g. 1.467033777777777
Thanks
Try this:
import re
RE_NUM = re.compile('(\d*\.\d+)', re.M)
text = 'eghwodugo83o135.13508yegn1.4670337777777773u87208t'
for num in RE_NUM.findall(text):
if '467033' in num:
print num
Prints:
1.4670337777777773
Generalized / optimized in response to comment:
def find(text, numbers):
pattern = '|'.join('[\d.]*%s[\d.]*' % n for n in numbers)
re_num = re.compile(pattern, re.M)
return [m.group() for m in re_num.finditer(text)]
print find(text, ['467033', '13'])
Prints:
['135.13508', '1.4670337777777773']
If you're just searching for a substring within another substring, you can use in:
>>> sub_num = "467033"
>>> my_num = "1.467033777777777"
>>> sub_num in my_num
True
However, I suspect there's more to your problem than just searching strings, and that doing it this way might not be optimal. Can you be more specific about what you're trying to do?
import re
a = 'e.g. 1.467033777777777\nand also 576575567467033546.90 Thanks '
r = re.compile('[0-9.]*467033[0-9.]*')
r.findall(a)
['1.467033777777777', '576575567467033546.90']

Categories