Regex check if backslash before every symbols using python - python

I met some problems when I'd like to check if the input regex if correct or not.
I'd like to check is there one backslash before every symbol, but I don't know how to implement using Python.
For example:
number: 123456789. (return False)
phone\:111111 (return True)
I try to use (?!) and (?=) in Python, but it doesn't work.
Update:
I'd like to match the following string:
\~, \!, \#, \$, \%, \^, \&, \*, \(, \), \{, \}, \[, \], \:, \;, \", \', \>, \<, \?
Thank you very much.

import re
if re.seach(r"\\\W", "phone\:111111") is not None:
print("OK")
Does it work?

Reading between the lines a bit, it sounds like you are trying to pass a string to a regex and you want to make sure it has no special characters in it that are unescaped.
Python's re module has an inbuilt re.escape() function for this.
Example:
>>> import re
>>> print(re.escape("phone:111111"))
"phone\\:111111"

Check that the entire string is composed of single characters or pairs of backslash+symbol:
import re
def has_backslash_before_every_symbol(s):
return re.match(r"^(\\[~!#$%^&*(){}\[\]:;"'><?]|[^~!#$%^&*(){}\[\]:;"'><?])*$", s) is not None
Python regex reference: https://docs.python.org/3/library/re.html

Related

remove special charecters in a string using python

I have a string = "msdjdgf(^&%*(Aroha Technologies&^$^&*^CHJdjg" with special characters.
what i am trying is to remove all special charecters in the string and then display the word 'Aroha Technologies'
i was able to do with hard coding using lstrip() function but can anyone help me out how can i display string 'Aroha Technologies' in a single line using regular expressions.
edit suggested:-
by using this lstrip() and rstrip() functions i was able to remove characters from the string.
str = "msdjdgf(^&%*(Aroha Technologies&^$^&*^CHJdjg"
str=str.lstrip('msdjdgf(^&%*(')
str=str.rstrip('&^$^&*^CHJdjg')
here, A bit more dirty approach
import re # A module in python for String matching/operations
a = "msdjdgf(^&%*(Aroha Technologies&^$^&*^CHJdjg"
stuff = re.findall('\W(\w+\s\w+)\W', a)
print(stuff[0]) # Aroha Technologies
hope this helps ;)
You don't provide a lot of information, so this may or may not be close to what you want:
import re
origstr = "msdjdgf(^&%(Aroha Technologies&^$^&^CHJdjg"
match = re.search("[A-Z][a-z]*(?: [A-Z][a-z]*)*", origstr)
if match:
newstr = match.group()
(looks for a series of capitalized words with spaces between them)

how to replace multiple consecutive repeating characters into 1 character in python?

I have a string in python and I want to replace multiple consecutive repeating character into 1.
For example:
st = "UUUURRGGGEENNTTT"
print(st.replace(r'(\w){2,}',r'\1'))
But this command doesn't seems to be working, please can anybody help in finding what's wrong with this command?
There is one more way to solve this but wanted to understand why the above command fails and is there any way to correct it:
print(re.sub(r"([a-z])\1+",r"\1",st)) -- print URGENT
you need to use regex.
so you can do this:
import re
re.sub(r'[^\w\s]|(.)(?=\1)', '', 'UUURRRUU')
the result is UR.
this is a snapshot of what I have got:
for this regex: (.)(?=.*\1)
(.) means: match any char except new lines (line breaks)
?=. means: lookahead every char except new line (.)
* means: match a preceding token
\1 means: to mach the result of captured group, which is the U or R ...
then replace all matches with ''
also you can check this:
lookahead
also check this tool I solve my regex using it,
it describe everything and you can learn a lot from it:
regexer
The reason for why your code does not work is because str.replace does not support regex, you can only replace a substring with another string. You will need to use the re module if you want to replace by matching a regex pattern.
Secondly, your regex pattern is also incorrect, (\w){2,} will match any characters that occurs 2 or more times (doesn’t have to be the same character though), so it will not work. You will need to do something like this:
import re
st = "UUUURRGGGEENNTTT"
print(re.sub(r'(\w)\1+',r'\1', st)))
# URGENT
Now this will only match the same character 2 or more times.
An alternative, “unique” solution to this is that you can use the unique_justseen recipe that itertools provides:
from itertools import groupby
from operator import itemgetter
st = "UUUURRGGGEENNTTT"
new ="".join(map(next, map(itemgetter(1), groupby(st))))
print(new)
# URGENT
string.replace(s, old, new[, maxreplace]) only does substring replacement:
>>> '(\w){2,}'.replace(r'(\w){2,}',r'\1')
'\\1'
That's why it fails and it can't work with regex expression so no way to correct the first command.

How to find a specific character in a string and put it at the end of the string

I have this string:
'Is?"they'
I want to find the question mark (?) in the string, and put it at the end of the string. The output should look like this:
'Is"they?'
I am using the following regular expression in python 2.7. I don't know why my regex is not working.
import re
regs = re.sub('(\w*)(\?)(\w*)', '\\1\\3\\2', 'Is?"they')
print regs
Is?"they # this is the output of my regex.
Your regex doesn't match because " is not in the \w character class. You would need to change it to something like:
regs = re.sub('(\w*)(\?)([^"\w]*)', '\\1\\3\\2', 'Is?"they')
As shown here, " is not captured by \w. Hence, it would probably be best to just use a .:
>>> import re
>>> re.sub("(.*)(\?)(.*)", r'\1\3\2', 'Is?"they')
'Is"they?'
>>>
. captures anything/everything in Regex (except newlines).
Also, you'll notice that I used a raw-string for the second argument of re.sub. Doing so is cleaner than having all those backslashes.

Python compile all non-words except dot[.]

I am trying to break a line on all non-word patterns except .(dot)
Usually I guess it can be done as [\W ^[.]] in java, but how to I do in python?
>>> import re
>>> the_string="http://hello-world.com"
>>> re.findall(r'[\w.]+',the_string)
['http', 'hello', 'world.com']
A very good reference for Python's regular expression module is available here. Following should do the trick for you.
import re
re.split(r'[\w.]+', text_string)
Or,
import re
re.findall('[^\w.]+', text_string)
Your Java syntax is off, to begin with. This is what you were trying for:
[\W&&[^.]]
That matches a character from the intersection of the sets described by "any non-word character" and "any character except ." But that's overkill when you can just use:
[^\w.]
...or, "any character that's not a word character or .". It's the same in Python (and in most other flavors, too), though you probably want to match one or more of the characters:
re.split(r'[^\w.]+', the_string)
But it's probably simpler to use #gnibbler's approach of matching the parts that you want to keep, not the ones you want to throw away:
re.findall(r'[\w.]+', the_string)
I'm assuming that you want to split a string on all non-word patterns except a dot.
Edit: Python doesn't support the Java-style regex syntax that you are using. I'd suggest first replacing all dots with a long string, then splitting the string, then putting the dots back in.
import re
long_str = "ABCDEFGH"
str = str.replace('.', long_str)
result = re.split(r'\W', str)
Then as you are using result, replace all the long_str sequences with a dot again.
This is a very bad solution, but it works.
Python has a convenience function for that
>>> s = "ab.cd.ef.gh"
>>> s.split(".")
['ab', 'cd', 'ef', 'gh']

Using Regex Plus Function in Python to Encode and Substitute

I'm trying to substitute something in a string in python and am having some trouble. Here's what I'd like to do.
For a given comment in my posting:
"here are some great sites that i will do cool things with! https://stackoverflow.com/it's a pig & http://google.com"
I'd like to use python to make the strings like this:
"here are some great sites that i will do cool things with! http%3A//stackoverflow.com & http%3A//google.com
Here's what I have so far...
import re
import urllib
def getExpandedURL(url)
encoded_url = urllib.quote(url)
return ""+encoded_url+""
text = '<text from above>'
url_pattern = re.compile('(http.+?[^ ]+', re.I | re.S | re.M)
url_iterator = url_pattern.finditer(text)
for matched_url in url_iterator:
getExpandedURL(matched_url.groups(1)[0])
But this is where i'm stuck. I've previously seen things on here like this: Regular Expressions but for Writing in the Match but surely there's got to be a better way than iterating through each match and doing a position replace on them. The difficulty here is that it's not a straight replace, but I need to do something specific with each match before replacing it.
I think you want url_pattern.sub(getExpandedURL, text).
re.sub(pattern, repl, string, count=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a callable, it's passed the match object and must return a replacement string to be used.

Categories