There is a string, it contains numbers and characters.
I need to find an entire number(s) (in that string) that contains number 467033.
e.g. 1.467033777777777
Thanks
Try this:
import re
RE_NUM = re.compile('(\d*\.\d+)', re.M)
text = 'eghwodugo83o135.13508yegn1.4670337777777773u87208t'
for num in RE_NUM.findall(text):
if '467033' in num:
print num
Prints:
1.4670337777777773
Generalized / optimized in response to comment:
def find(text, numbers):
pattern = '|'.join('[\d.]*%s[\d.]*' % n for n in numbers)
re_num = re.compile(pattern, re.M)
return [m.group() for m in re_num.finditer(text)]
print find(text, ['467033', '13'])
Prints:
['135.13508', '1.4670337777777773']
If you're just searching for a substring within another substring, you can use in:
>>> sub_num = "467033"
>>> my_num = "1.467033777777777"
>>> sub_num in my_num
True
However, I suspect there's more to your problem than just searching strings, and that doing it this way might not be optimal. Can you be more specific about what you're trying to do?
import re
a = 'e.g. 1.467033777777777\nand also 576575567467033546.90 Thanks '
r = re.compile('[0-9.]*467033[0-9.]*')
r.findall(a)
['1.467033777777777', '576575567467033546.90']
Related
I have a following string
"TAUXXTAUXXTAUXX"
i want to make a list contains the following
lst = ["TAUXX", "TAUXX", "TAUXX"]
How i make it and is there is a string library in python to do it ?
Thanks in advance.
P.S : I want it in python
Find the string in its double:
s = 'TAUXXTAUXXTAUXX'
i = (s * 2).find(s, 1)
lst = len(s) // i * [s[:i]]
print(lst)
Output (Try it online!):
['TAUXX', 'TAUXX', 'TAUXX']
There are many ways to deal with,
I recommend to use the built-in package: re
import re
test_str = "TAUXXTAUXXTAUXX"
def splitstring(string):
match= re.match(r'(.*?)(?:\1)*$', string)
word= match.group(1)
return [word] * (len(string)//len(word))
splitstring(test_str)
output:
['TAUXX', 'TAUXX', 'TAUXX']
I have a problem, I try to recognize a pattern among a list of words. I need to find a number of 1 to 6 digits with or without characters around.
my input is this: [1]: https://i.stack.imgur.com/RNOdL.png
With the OCR I obtained:
Kundennummer:
21924
The pattern r"(\D|\A)+\d{5}(\D|\Z)+" works but when I change it to r"(\D|\A)+\d{1,6}(\D|\Z)+" it doesn't.
I used re.match, re.findall and re.search and none of them works
the repr():
'Kundennummer:'
'21924'
Assuming you only need the first match:
import re
ocr_result = """
Kundennummer:
21924
"""
for result in re.findall(r'\d+', ocr_result):
if 1 <= len(result) <= 6:
break
else:
result = None
print(result)
Result:
21924
ocr_result1 = """
Kundennummer:
21924
"""
ocr_result2 = """
Kundennummer:3000
"""
for e in [ocr_result1, ocr_result2]:
print(re.findall(r'\w*\d{1,6}\w*', e))
['21924']
['3000']
I have a list of strings similar to the one below:
l = ['ad2g3f234','4jafg32','fg23g523']
For each string in l, I want to delete every digit (except for 2 and 3 if they appear as 23). So in this case, I want the following outcome:
n = ['adgf23','jafg','fg23g23']
How do I go about this? I tried re.findall like:
w = [re.findall(r'[a-zA-Z]+',t) for t in l]
but it doesn't give my desired outcome.
You can capture 23 in a group, and remove all other digits. In the replacement, use the group which holds 23 if it is there, else replace with an empty string.
import re
l = ['ad2g3f234', '4jafg32', 'fg23g523']
result = [
re.sub(
r"(23)|(?:(?!23)\d)+",
lambda m: m.group(1) if m.group(1) else "", s) for s in l
]
print(result)
Output
['adgf23', 'jafg', 'fg23g23']
Python demo
One way would be just to replace the string twice:
[re.sub("\d", "", i.replace("23", "*")).replace("*", "23") for i in l]
Output:
['adgf23', 'jafg', 'fg23g23']
Use a placeholder with re.sub
l = ['ad2g3f234','4jafg32','fg23g523']
w = [re.sub('#','23',re.sub('\d','',re.sub('23','#',t))) for t in l]
['adgf23', 'jafg', 'fg23g23']
EDIT
As answered by Chris, the approach is the same although string replace will be a better alternative stack_comparison
Using re.sub with function
import re
def replace(m):
if m.group() == '23':
return m.group()
else:
return ''
l = ['ad2g3f234','4jafg32','fg23g523']
w = [re.sub(r'23|\d', replace, x) for x in l]
#w: ['adgf23', 'jafg', 'fg23g23']
Explanation
re.sub(r'23|\d', replace, x)
- checks first for 23, next for a digit
- replace function leaves alone match with 23
- changes match with digit to null string.
I am trying to replace those words which had continuous letters more than 3 example realllllly to really.
pattern = re.compile(r"(.)\1\1{2,}", re.DOTALL)
return pattern.sub(r"\1\1\1", text)
I can't get it work anyone can help?
Your solution actually appears to be working correctly:
>>> import re
>>> a = 'foooooooo baaaar'
>>> reg = re.compile( r"(.)\1\1{2,}")
>>> reg.sub(r'\1\1', a)
'foo baar'
But based on comment, you want to replace xyyyx by xyyx, but you've specified regexp for at least 4 of them, therefor only xyyyyx gets replaced... Simply change this line:
>>> reg = re.compile( r"(.)\1{2,}")
>>> reg.sub(r'\1\1', 'fooo baaaar actuallly')
'foo baar actually'
I'd suggest not to use regular expressions when they're not necessary. This task can be accomplished easily without, in a more readable fashion, with linear time and constant space complexity (not sure about the regex).
def filter_repetitions(text, max_repetitions=0):
last_character = None
repetition_count = 0
for character in text:
if character == last_character:
repetition_count += 1
else:
last_character = character
repetition_count = 0
if repetition_count <= max_repetitions:
yield character
print ''.join(filter_repetitions("fooo baaaar actuallly", 1))
I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?
Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK
You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'
Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'
Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)
You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?
You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E
this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()