Sub all the specified regex interval except some characters [duplicate] - python

This question already has answers here:
Exclude characters from a character class
(5 answers)
Closed 2 years ago.
For example, I want to replace all the data going from the specified intervals with * (except the chars u0650, u0660, u064F), for example.
Note: I don't want to break the interval because I have a lot of characters to preserve.
data = re.sub(r'[\u0600-\u061E\u0620-\u065F\u0670-\u06ef]', "*", data)

You can put the characters to be excluded in a negative Lookahead before the main character class.
For example:
(?![\u0650\u0660\u064F])[\u0600-\u061E\u0620-\u065F\u0670-\u06ef]
Demo.

Related

simplify re.compile for matching any long character [duplicate]

This question already has answers here:
Difference between * and + regex
(7 answers)
Using explicitly numbered repetition instead of question mark, star and plus
(4 answers)
Closed 3 years ago.
I have a string like:
[10/Jul/2019:00:45:18 +0900] "POST /auth/identity/success HTTP/1.1"
I want to extract everything inside ""
for the date I used
re.compile('\[(\d+\/\w\w\w\/\d\d\d\d:\d\d:\d\d:\d\d\s\+\d\d\d\d)\]')
for the string inside "" instead of matching one by one I am hoping if there is way to match say 10 character with one regex command.
I've done re.compile('(\"[A-Z]\s[\w\/]\s[\w\/\"])')
What I am trying to do:
\" matches "
[A-Z] matches 4 character (but actually match only one character)
\s for whitespace
[\w\/] for matching everything in /auth/../success
[\w\/\"] for HTTP/1.1"

regex positioning confusion [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I have a question about the positioning of the "do not include operator in regex in python" shown below
[^]
If I have the following expression
print(re.findall(r'^[^_][-\w\d]+[^:/)]$',x))
does it matter where i place [^:/)] or will it only exclude : and / at the end of the string since i placed it at the end
With the $ at the end of your regular expression you've anchored the [^:/)] character group to only match at the end of the string. Any matches must end with [^:/)].

python match exclamation mark in mutiline [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 5 years ago.
I have a file that has a repeated pattern output
!-----------------------------------------------------------------
line 1
line 2
line 3
.....
-------------------------------------------------------------------!
I am trying to match and extract all the occurrences of these blocks but the below returns all the file
match = re.search(r'\!-.*-\!', data, re.DOTALL)
print match.group()
Regexes in Python are greedy by default, meaning * will consume as many characters as possible. You can turn off greediness by using *?:
match = re.search(r'\!-.*?-\!', data, re.DOTALL)

Regular Expression that matches a number with commas for every three digits [duplicate]

This question already has answers here:
Regex for number with decimals and thousand separator
(6 answers)
Matching numbers with regular expressions — only digits and commas
(10 answers)
Closed 6 years ago.
It needs to match these
'42'
'1,234'
'6,368,745'
but not the following:
'12,34,567' (which has only two digits between the commas)
'1234' (which lacks commas)
I been using site such as http://www.regexpal.com/ to test out expressions.
I tried
^\d{1,3}(,\d{3})*$
(\d{1,3},)*(\d{1,3})$
([0-9]{1,3},)*([0-9]{1,3})$
[0-9]{1,3}((,[0-9]){1,3})*
but it doesn't work.
Could someone explain what's wrong with my attempts and an model answer?
^([0-9]{1,3})(,[0-9]{3})*$
Should do what you are after.
I usually use http://pythex.org/ to test python regex strings.
I think the following pattern will fit your need.
It allows the accepted numbers to be preceded by a comma not preceded by a digit, and to be followed by a comma not followed by a digit.
pati = ('(?<!\d,)(?<!\d)'
'('
'\d{1,3}' '(?:,\d\d\d)*'
')'
'(?!,\d)(?!\d)'
)
rgx= re.compile(pati)

Python Checking Characters [duplicate]

This question already has answers here:
How do I verify that a string only contains letters, numbers, underscores and dashes? [duplicate]
(11 answers)
Closed 8 years ago.
I am trying to parse text and determine if it contains only letters and numbers, not special keyboard symbols like ! and #. I tried using .isalpha, but it says ! and # are valid. Is there away I can have something return false if it encounters one of these symbols?
Use regex matching:
import re
print re.match(r'^\w+$',your_string).group(0)
This matches the whole string only if it is alphanumeric
>>> print re.match(r'^\w+$', '1kjh2431k2j43').group(0)
'1kjh2431k2j43'
>>> print re.match(r'^\w+$', 'hjs7*Y##kha9Y*##').group(0)
NoneType
can check if the ordinals of the characters are in the ranges of characters, think the ranges are like 65-90ish and then 95-122ish, and then just check .isdigit() to see if the number was a digit beforehand

Categories