regex positioning confusion [duplicate] - python

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I have a question about the positioning of the "do not include operator in regex in python" shown below
[^]
If I have the following expression
print(re.findall(r'^[^_][-\w\d]+[^:/)]$',x))
does it matter where i place [^:/)] or will it only exclude : and / at the end of the string since i placed it at the end

With the $ at the end of your regular expression you've anchored the [^:/)] character group to only match at the end of the string. Any matches must end with [^:/)].

Related

Python Regex to match a colon either side (left and right) of a word [duplicate]

This question already has answers here:
Regex to find and replace emoji names within colons
(4 answers)
Closed 3 months ago.
At a complete loss here - trying to match a a colon either side of any given word in a passage of text.
For example:
:wave: Hello guys! :partyface: another huge win for us all to celebrate!
An appropriate regex that would match:
:wave:
:partyface:
Really appreciate your help!
\w*:\b
To catch all the content
:[^:]*:
To catch the content between
(?<=:)[^:]*(?=:)

Sub all the specified regex interval except some characters [duplicate]

This question already has answers here:
Exclude characters from a character class
(5 answers)
Closed 2 years ago.
For example, I want to replace all the data going from the specified intervals with * (except the chars u0650, u0660, u064F), for example.
Note: I don't want to break the interval because I have a lot of characters to preserve.
data = re.sub(r'[\u0600-\u061E\u0620-\u065F\u0670-\u06ef]', "*", data)
You can put the characters to be excluded in a negative Lookahead before the main character class.
For example:
(?![\u0650\u0660\u064F])[\u0600-\u061E\u0620-\u065F\u0670-\u06ef]
Demo.

simplify re.compile for matching any long character [duplicate]

This question already has answers here:
Difference between * and + regex
(7 answers)
Using explicitly numbered repetition instead of question mark, star and plus
(4 answers)
Closed 3 years ago.
I have a string like:
[10/Jul/2019:00:45:18 +0900] "POST /auth/identity/success HTTP/1.1"
I want to extract everything inside ""
for the date I used
re.compile('\[(\d+\/\w\w\w\/\d\d\d\d:\d\d:\d\d:\d\d\s\+\d\d\d\d)\]')
for the string inside "" instead of matching one by one I am hoping if there is way to match say 10 character with one regex command.
I've done re.compile('(\"[A-Z]\s[\w\/]\s[\w\/\"])')
What I am trying to do:
\" matches "
[A-Z] matches 4 character (but actually match only one character)
\s for whitespace
[\w\/] for matching everything in /auth/../success
[\w\/\"] for HTTP/1.1"

Why does regex with “|” (or/alternation) match differently when order is switched? [duplicate]

This question already has answers here:
Why doesn't regular expression alternation (A|B) match as per doc?
(3 answers)
Closed 3 years ago.
I want to clarify a doubt in python - regular expression
import re
stri="Item3. Super Market ListsItem4"
#1st print
print(re.sub(r'(Item[0-9]|Item[0-9]\.)', "", stri,))
#2nd print
print(re.sub(r'(Item[0-9]\.|Item[0-9])', "", stri,))
In the stri, I need to remove the "Item4" and "Item3."
output -
'. Super Market Lists'
' Super Market Lists'
My question is, I used OR(|) operator for both patterns.
In the 1st print statement, it did not remove the dot(.) in the given string. And in the 2nd print statement, I switched the pattern with OR operator. In this time, it removed the dot(.) in the string. Why it happens like this
Thank you
It happens because it first tries to match the left operand of the OR operator.
Because it matches without the dot, it removes the matched part without looking into the right operand.

Regex but just in substring [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 3 years ago.
I cant find the solution for a regex that looks for a pattern but only in a specific range of the string
I want to find $ $ but only if it is in the 5-7 position of the string and it doesnt matter which character is between those two
Example
xxxx$x$xxxxx would match
xx$x$xxxxxxx would not
import re
should = "xxxx$x$xxxxx would match"
shouldnt = "xx$x$xxxxxxx would not"
pattern = r'^.{4}\$.\$.+'
re.match(pattern, should)
re.match(pattern, shouldnt)
gives
match
None
https://regex101.com/r/RLHrZb/1

Categories