This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 2 years ago.
Im trying to limit an input of phone numbers to:
1-16 digits OR
A single "+" followed by 1-16 digits.
This is my code
txt = "+++854"
x = re.search(str("([\+]{0,1}[0-9]{3,16})|([0-9]{3,16})"), txt)
###^\+[0-9]{1,16}|[0-9]{1,16}", txt) #startswith +, followed by numbers.
if x:
print("YES! We have a match!")
else:
print("No match")
# Thatas a match
Yet it yields a match. I tried also "^+{0,1}[0-9]{1,16}|[0-9]{1,16}" but despite it works in "https://regex101.com/r/aP0qH2/4" it doesnt work in my code as i think it should work.
re.search searches for "the first location where the regular expression pattern produces a match" and returns the resulting match object. In the string "+++854", the substring "+854" matches.
To match the whole string, use re.match. The documentation has a section about the difference between re.match and re.search.
pattern = r"\+?[0-9]{16}"
Related
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What does this pattern (?<=\w)\W+(?=\w) mean in a Python regular expression?
#l is a list
print(re.sub("(?<=\w)\W+(?=\w)", " ", l))
Here's a breakdown of the elements:
\w means an alphanumeric character
\W+ is the opposite of \w; with the + it means one or more non-alphanumeric characters
?<= is called a "lookbehind assertion"
?= is a "lookahead assertion"
So this re.sub statement means "if there are one or more non-alphanumeric characters with an alphanumeric character before and after, replace the non-alphanumeric character(s) with a space".
And by the way, the third argument to re.sub must be a string (or bytes-like object); it can't be a list.
Just put it into a site like regex101.com and hover the cursor over the parts.
https://regex101.com/r/JtrWIw/1
It would match non-word chars between word chars. Bits between the last 'd' of 'word' and the first 'w' of 'word' from the string below as an example...
word^&*((*&^%$%^&*& ^%$£%^&**&^%$£!"£$%^&*()word
Example:
import re
#if it is a list...
l = ['John Smith', 'This%^&*(string', 'Never!£$Mind^&*I$?/Solved{}][]It']
#l is a list
print(re.sub(r"(?<=\w)\W+(?=\w)", " ", l[2]))
Never Mind I Solved It
This question already has answers here:
Regex matching 5-digit substrings not enclosed with digits
(2 answers)
Closed 2 years ago.
I need to extract from text any 3 or 4 consecutive numbers only, not longer, here's an example
text = 'abc 123\n ab3245ss a234234234234\n 12'
I'm trying this:
re.findall(r'\d{3,4}', text)
What I'm expecting:
['123', '3245']
What I'm getting:
['123', '3245', '2342', '3423', '4234']
When using a positive lookahead (?=\D) or lookbehind (?<=\D), there has to be a character present.
If you also want to match for example only 123, you can assert what is on the left and on the right is not a digit using a negative lookahead and lookbehind.
(?<!\d)\d{3,4}(?!\d)
Regex demo
text = 'abc 123\n ab3245ss a234234234234\n 12'
re.findall(r'(?<=\D)(\d{3,4})(?=\D)',text)
As a number of answers are suggesting, you need to explicitly check that the characters before and after your string are not numbers.
But additionally, there might not be any character before or after the number. Let's handle that as well.
re.findall(r"(?:^|[^\d])(\d{3,4})(?:$|[^\d])", text)
# ↑↑↑↑↑↑↑↑↑↑↑ ↑↑↑↑↑↑↑↑↑↑↑
# handles the handles the
# leading character trailing character
Please try below regex. This will take numbers that surrounded with a non-digit.
(?<=\D)(\d{3,4})(?=\D)
Demo
Edit: Use re.findall(r"(?:\D|^)(\d{3,4})(?:\D|$)",text) to also match numbers that may occur in start and end of the line.
This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 2 years ago.
Can some one please help me on this - Here I'm trying extract word from given sentence which contains G,ML,KG,L,ML,PCS along with numbers .
I can able to match the string , but not sure how can I extract the comlpete word
for example my input is "This packet contains 250G Dates" and output should be 250G
another example is "You paid for 2KG Apples" and output should be 2KG
in my regular expression I'm getting only match string not complete word :(
import re
val = 'FUJI ALUMN FOIL CAKE, 240G, CHCLTE'
key_vals = ['G','GM','KG','L','ML','PCS']
re.findall("\d+\.?\d*(\s|G|KG|GM|L|ML|PCS)\s?", val)
This regex will not get you what you want:
re.findall("\d+\.?\d*(\s|G|KG|GM|L|ML|PCS)\s?", val)
Let's break it down:
\d+: one or more digits
\.?: a dot (optional, as indicated by the question mark)
\d*: one or more optional digits
(\s|G|KG|GM|L|ML|PCS): a group of alternatives, but whitespace is an option among others, it should be out of the group: what you probably want is allow optional whitespace between the number and the unit ie: 240G or 240 G
\s?: optional whitespace
A better expression for your purpose could be:
re.findall("\d+\s*(?:G|KG|GM|L|ML|PCS)", val)
That means: one or more digits, followed by optional whitespace and then either of these units: G|KG|GM|L|ML|PCS.
Note the presence of ?: to indicate a non-capturing group. Without it the expression would return G
Try using this Regex:
\d+\s*(G|KG|GM|L|ML|PCS)\s?
It matches every string which starts with at least one digit, is then followed by one the units. Between the digits and the units and behind the units there can also be whitespaces.
Adjust this like you want to :)
Use non-grouping parentheses (?:...) instead of the normal ones. Without grouping parentheses findall returns the string(s) which match the whole pattern.
This question already has answers here:
Python non-greedy regexes
(7 answers)
Closed 3 years ago.
I am trying to find all strings that follows a specific pattern in a python string.
"\[\[Cats:.*\]\]"
However if many occurrences of such pattern exist together on a line in a string it sees the pattern as just one, instead of taking the patterns separately.
strng = '[[Cats: Text1]] said I am in the [[Cats: Text2]]fhg is abnorn'
x = re.findall("\[\[Cats:.*\]\]", strng)
The output gotten is:
['[[Cats: Text1]] said I am in the [[Cats: Text2]]']
instead of
['[[Cats: Text1]]', '[[Cats: Text2]]']
which is a list of lists.
What regex do I use?
"\[\[Cats:.*?\]\]"
Your current regex is greedy - it's gobbling up EVERYTHING, from the first open brace to the last close brace. Making it non-greedy should return all of your results.
Demo
The problom is that you are doing a greedy search, add a ? after .* to get a non greedy return.
code follows:
import re
strng = '[[Cats: Text1]] said I am in the [[Cats: Text2]]fhg is abnorn'
regex_template = re.compile(r'\[\[Cats:.*?\]\]')
matches = re.findall(regex_template, strng)
print(matches)
Don't do .*, because that will never terminate. .* means any character and not even one occurence is required.
import re
strng = '''[[Cats: lol, this is 100 % cringe]]
said I am in the [[Cats: lol, this is 100 % cringe]]
fhg is abnorn'''
x = re.findall(r"\[\[Cats: [^\]]+\]\]", strng)
print(x)
This question already has answers here:
Python regular expression pattern * is not working as expected
(2 answers)
Closed 4 years ago.
Why doesn't this regex code return anything?
#Searches for a pattern in string
pattern = r'[a-zA-Z ]*'
string = '111-456-7890 This is my number... Gimme a ring.'
match = re.search(pattern, string)
match
The star is a false friend, try this:
[a-zA-Z]+
Try your statements with: https://regexr.com/
If you use re.search, deaktivate global flag that gives you only the first word.
You match is "only" an object of type re.Match, you have to use it in the if statement in the form
if match:
# Some code, if pattern was found
or use some of its methods to obtain some details, e. g.
match.start()
returns the starting position (index) of the pattern in your string, i.e. 12 for your (little modified) pattern r'[a-zA-Z ]+' and string.
Example of the full code:
import re
pattern = r'[a-zA-Z ]+'
string = '111-456-7890 This is my number... Gimme a ring.'
match = re.search(pattern, string)
if match:
print("Start position:", match.start())
print("Matching part:" , match.group())
else:
print("No match.")
The output:
Start position: 12
Matching part: This is my number