This question already has answers here:
Regex matching 5-digit substrings not enclosed with digits
(2 answers)
Closed 2 years ago.
I need to extract from text any 3 or 4 consecutive numbers only, not longer, here's an example
text = 'abc 123\n ab3245ss a234234234234\n 12'
I'm trying this:
re.findall(r'\d{3,4}', text)
What I'm expecting:
['123', '3245']
What I'm getting:
['123', '3245', '2342', '3423', '4234']
When using a positive lookahead (?=\D) or lookbehind (?<=\D), there has to be a character present.
If you also want to match for example only 123, you can assert what is on the left and on the right is not a digit using a negative lookahead and lookbehind.
(?<!\d)\d{3,4}(?!\d)
Regex demo
text = 'abc 123\n ab3245ss a234234234234\n 12'
re.findall(r'(?<=\D)(\d{3,4})(?=\D)',text)
As a number of answers are suggesting, you need to explicitly check that the characters before and after your string are not numbers.
But additionally, there might not be any character before or after the number. Let's handle that as well.
re.findall(r"(?:^|[^\d])(\d{3,4})(?:$|[^\d])", text)
# ↑↑↑↑↑↑↑↑↑↑↑ ↑↑↑↑↑↑↑↑↑↑↑
# handles the handles the
# leading character trailing character
Please try below regex. This will take numbers that surrounded with a non-digit.
(?<=\D)(\d{3,4})(?=\D)
Demo
Edit: Use re.findall(r"(?:\D|^)(\d{3,4})(?:\D|$)",text) to also match numbers that may occur in start and end of the line.
Related
This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 2 years ago.
Im trying to limit an input of phone numbers to:
1-16 digits OR
A single "+" followed by 1-16 digits.
This is my code
txt = "+++854"
x = re.search(str("([\+]{0,1}[0-9]{3,16})|([0-9]{3,16})"), txt)
###^\+[0-9]{1,16}|[0-9]{1,16}", txt) #startswith +, followed by numbers.
if x:
print("YES! We have a match!")
else:
print("No match")
# Thatas a match
Yet it yields a match. I tried also "^+{0,1}[0-9]{1,16}|[0-9]{1,16}" but despite it works in "https://regex101.com/r/aP0qH2/4" it doesnt work in my code as i think it should work.
re.search searches for "the first location where the regular expression pattern produces a match" and returns the resulting match object. In the string "+++854", the substring "+854" matches.
To match the whole string, use re.match. The documentation has a section about the difference between re.match and re.search.
pattern = r"\+?[0-9]{16}"
This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 2 years ago.
Can some one please help me on this - Here I'm trying extract word from given sentence which contains G,ML,KG,L,ML,PCS along with numbers .
I can able to match the string , but not sure how can I extract the comlpete word
for example my input is "This packet contains 250G Dates" and output should be 250G
another example is "You paid for 2KG Apples" and output should be 2KG
in my regular expression I'm getting only match string not complete word :(
import re
val = 'FUJI ALUMN FOIL CAKE, 240G, CHCLTE'
key_vals = ['G','GM','KG','L','ML','PCS']
re.findall("\d+\.?\d*(\s|G|KG|GM|L|ML|PCS)\s?", val)
This regex will not get you what you want:
re.findall("\d+\.?\d*(\s|G|KG|GM|L|ML|PCS)\s?", val)
Let's break it down:
\d+: one or more digits
\.?: a dot (optional, as indicated by the question mark)
\d*: one or more optional digits
(\s|G|KG|GM|L|ML|PCS): a group of alternatives, but whitespace is an option among others, it should be out of the group: what you probably want is allow optional whitespace between the number and the unit ie: 240G or 240 G
\s?: optional whitespace
A better expression for your purpose could be:
re.findall("\d+\s*(?:G|KG|GM|L|ML|PCS)", val)
That means: one or more digits, followed by optional whitespace and then either of these units: G|KG|GM|L|ML|PCS.
Note the presence of ?: to indicate a non-capturing group. Without it the expression would return G
Try using this Regex:
\d+\s*(G|KG|GM|L|ML|PCS)\s?
It matches every string which starts with at least one digit, is then followed by one the units. Between the digits and the units and behind the units there can also be whitespaces.
Adjust this like you want to :)
Use non-grouping parentheses (?:...) instead of the normal ones. Without grouping parentheses findall returns the string(s) which match the whole pattern.
This question already has answers here:
Regex: match word that ends with "Id"
(6 answers)
Closed 3 years ago.
I want to get substring that ends with 'd', 'g', 'y' and the substring contains at least five characters.
str = 'Did David go swimming yesterday'
The result should be 'David', 'swimming' and 'yesterday'
How to use regex to do it?
Thank you.
Try this regex: [^\s]{4,}[dgy]\b
[^\s] to match any non-whitespace character.
{4,} to match at least 4 of the characters described above.
[dgy] to match the last character of the word.
\b to match a word boundary.
I would recommend checking out https://regexr.com/ to help you solve a regex problem.
This question already has answers here:
Regex whitespace word boundary
(3 answers)
Closed 4 years ago.
I tried to replace a specific number like 22 in my string with a string like "Hi there", but it also replace float numbers like 22.14 in my string (Hi there.14).
import re
my_string = "22 and 22.14"
re.sub(r'\b22\b', "Hi there", my_string)
You can use this regex, which will not let it match decimal values by using positive lookahead to ensure it only matches if 22 is followed by a space or end of input.
\b22(?= |$)
Demo
If you want 22 to be matched at the end of line ending with a . then you can use this regex,
\b22(?!\.\d)
Demo where line ends with 22.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Suppose a string:
s = 'F3·Compute·Introduction to Methematical Thinking.pdf'
I substitute F3·Compute· with '' using regex
In [23]: re.sub(r'F3?Compute?', '',s)
Out[23]: 'F3·Compute·Introduction to Methematical Thinking.pdf'
It failed to work as I intented
When tried,
In [21]: re.sub(r'F3·Compute·', '', 'F3·Compute·Introduction to Methematical Thinking.pdf')
Out[21]: 'Introduction to Methematical Thinking.pdf'
What's the problem with my regex pattern?
The question mark ? does not stand in for a single character in regular expressions. It means 0 or 1 of the previous character, which in your case was 3 and e. Instead, the . is what you're looking for. It is a wildcard that stands for a single character (and has nothing to do with your middle-dot character; that is just coincidence).
re.sub(r'F3.Compute.', '',s)
Use dot to match any single character:
#coding: utf-8
import re
s = 'F3·Compute·Introduction to Methematical Thinking.pdf'
output = re.sub(r'F3.Compute.', '', unicode(s,"utf-8"), flags=re.U)
print output
Your original pattern, 'F3?Compute? was not having the desired effect. This said to match F followed by the number 3 optionally. Also, you made the final e of Compute optional. In any case, you were not accounting for the separator characters.
Note also that we must match on the unicode version of the string, and not the string directly. Without doing this, a dot won't match the unicode separator which you are trying to target. Have a look at the demo below for more information.
Demo