Delete specific duplicated punctuation from string

Delete specific duplicated punctuation from string - python

I have this string s = "(0|\\+33)[1-9]( *[0-9]{2}){4}". And I want to delete just the duplicated just one ' \ ', like I want the rsult to look like (0|\+33)[1-9]( *[0-9]{2}){4}.
When I used this code, all the duplicated characters are removed:
result = "".join(dict.fromkeys(s)).
But in my case I want just to remove the duplicated ' \ '. Any help is highly appreciated

A solution using the re module:
import re
s = r"(0|\\+33)[1-9]( *[0-9]{2}){4}"
s = re.sub(r"\\(?=\\)", "", s)
print(s)
I look for all backslashes, that are followed by another backslash and replace it with an empty sign.
Output: (0|\+33)[1-9]( *[0-9]{2}){4}

The function you need is replace
s = "(0|\\+33)[1-9]( *[0-9]{2}){4}"
result = s.replace("\\","")
EDIT
I see now that you want to remove just one \ and not both.
In order to do this you have to modify the call to replace this way
result = s.replace("\","",1) # last argument is the number of occurrances to replace
or
result = s.replace("\\","\")
EDIT of the EDIT
Backslashes are special in Python.
I'm using Python 3.10.5. If I do
x = "ab\c"
y = "ab\\c"
print(len(x)==len(y))
I get a True.
That's because backslashes are used to escape special characters, and that makes the backslash a special character :)
I suggest you to try a little bit with replace until you get what you need.

Related

replacing special characters in string Python

I'm trying to replace special characters in a data frame with unaccented or different ones.
I can replace one with
df['col_name'] = df.col_name.str.replace('?','j')
this turned the '?' to 'j' - but - I can't seem to figure out how to change more than one..
I have a list of special characters that I want to change. I've tried using a dictionary but it doesn't seem to work
the_reps = {'?','j'}
df1 = df.replace(the_reps, regex = True)
this gave me the error nothing to replace at position 0
EDIT:
this is what worked - although it is probably not that pretty:
df[col]=df.col.str.replace('old char','new char')
df[col]=df.col.str.replace('old char','new char')
df[col]=df.col.str.replace('old char','new char')
df[col]=df.col.str.replace('old char','new char')...
for each one ..

import re
s=re.sub("[_list of special characters_]","",_your string goes here_)
print(s)
An example for this..
str="Hello$#& Python3$"
import re
s=re.sub("[$#&]","",str)
print (s)
#Output:Hello Python3
Explanation goes here..
s=re.sub("[$#&]","",s)
Pattern to be replaced → “[$#&]”
[] used to indicate a set of characters
[$#&] → will match either $ or # or &
The replacement string is given as an empty string
If these characters are found in the string, they’ll be replaced with an empty string

you can use Series.replace with a dictionary
#d = { 'actual character ':'replacement ',...}
df.columns = df.columns.to_series().replace(d, regex=True)

Try This:
import re
my_str = "hello Fayzan-Bhatti Ho~!w"
my_new_string = re.sub('[^a-zA-Z0-9 \n\.]', '', my_str)
print my_new_string
Output: hello FayzanBhatti How

How to remove everything before certain character in Python

I'm new to python and struggle with a certain task:
I have a String that could have anything in it, but it always "ends" the same.
It can be just a Filename, a complete path, or just a random string, ending with a Version Number.
Example:
C:\Users\abc\Desktop\string-anotherstring-15.1R7-S8.1
string-anotherstring-15.1R7-S8.1
string-anotherstring.andanother-15.1R7-S8.1
What always is the same (looking from the end) is that if you reach the second dot and go 2 characters in front of it, you always match the part that I'm interested in.
Cutting everything after a certain string was "easy," and I solved it myself - that's why the string ends with the version now :)
Is there a way to tell python, "look for the second dot from behind the string and go 2 in front of it and delete everything in front of that so that I get the Version as a string?
Happy for any pointers in the right direction.
Thanks

If you want the version number, can you use the hyphen (-) to split the string? Or do you need to depend on the dots only?
Please see below use of rsplit and join which can help you.
>>> a = 'string-anotherstring.andanother-15.1R7-S8.1'
>>> a.rsplit('-')
['string', 'anotherstring.andanother', '15.1R7', 'S8.1']
>>> a.rsplit('-')[-2:] #Get everything from second last to the end
['15.1R7', 'S8.1']
>>> '-'.join(a.rsplit('-')[-2:]) #Get everything from second last to the end, and join them with a hyphen
'15.1R7-S8.1'
>>>
For using dots, use the same way
>>> a
'string-anotherstring.andanother-15.1R7-S8.1'
>>> data = a.rsplit('.')
>>> [data[-3][-2:]]
['15']
>>> [data[-3][-2:]] + data[-2:]
['15', '1R7-S8', '1']
>>> '.'.join([data[-3][-2:]] + data[-2:])
'15.1R7-S8.1'
>>>

You can build a regex from the end mark of a line using the anchor $.
Using your own description, use the regex:
(\d\d\.[^.]*)\.[^.]*$
Demo
If you want the last characters from the end included, just move the capturing parenthesis:
(\d\d\.[^.]*\.[^.]*)$
Demo
Explanation:
(\d\d\.[^.]*\.[^.]*)$
^ ^ #digits
^ # a literal '.'
^ # anything OTHER THAN a '.'
^ # literal '.'
^ # anything OTHER THAN a '.'
^ # end of line

Assuming I understand this correctly, there are two ways to do this that come to mind:
Including both, since I might not understand this correctly, and for completeness reasons. I think the split/parts solution is cleaner, particularly when the 'certain character' is a dot.
>>> msg = r'C:\Users\abc\Desktop\string-anotherstring-15.1R7-S8.1'
>>> re.search(r'.*(..\..*)', msg).group(1)
'S8.1'
>>> parts = msg.split('.')
>>> ".".join((parts[-2][-2:], parts[-1]))
'S8.1'

For your example, you can split the string by the separator '-', and then join the last two indices. Like so:
txt = "string-anotherstring-15.1R7-S8.1"
x = txt.split("-")
y = "".join(x[-2:])
print(y) # outputs 15.1R7S8.1

How to remove text before a particular character or string in multi-line text?

I want to remove all the text before and including */ in a string.
For example, consider:
string = ''' something
other things
etc. */ extra text.
'''
Here I want extra text. as the output.
I tried:
string = re.sub("^(.*)(?=*/)", "", string)
I also tried:
string = re.sub(re.compile(r"^.\*/", re.DOTALL), "", string)
But when I print string, it did not perform the operation I wanted and the whole string is printing.

I suppose you're fine without regular expressions:
string[string.index("*/ ")+3:]
And if you want to strip that newline:
string[string.index("*/ ")+3:].rstrip()

The problem with your first regex is that . does not match newlines as you noticed. With your second one, you were closer but forgot the * that time. This would work:
string = re.sub(re.compile(r"^.*\*/", re.DOTALL), "", string)
You can also just get the part of the string that comes after your "*/":
string = re.search(r"(\*/)(.*)", string, re.DOTALL).group(2)

Update: After doing some research, I found that the pattern (\n|.) to match everything including newlines is inefficient. I've updated the answer to use [\s\S] instead as shown on the answer I linked.
The problem is that . in python regex matches everything except newlines. For a regex solution, you can do the following:
import re
strng = ''' something
other things
etc. */ extra text.
'''
print(re.sub("[\s\S]+\*/", "", strng))
# extra text.
Add in a .strip() if you want to remove that remaining leading whitespace.

to keep text until that symbol you can do:
split_str = string.split(' ')
boundary = split_str.index('*/')
new = ' '.join(split_str[0:boundary])
print(new)
which gives you:
something
other things
etc.

string_list = string.split('*/')[1:]
string = '*/'.join(string_list)
print(string)
gives output as
' extra text. \n'

python regular expression to match strings

I want to parse a string, such as:
package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'
uses-permission:'android.permission.WRITE_APN_SETTINGS'
uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'
uses-permission:'android.permission.ACCESS_NETWORK_STATE'
I want to get:
string1: jp.tjkapp.droidllwp`
string2: 1.1
Because there are multiple uses-permission, I want to get permission as a list, contains:
WRITE_APN_SETTINGS, RECEIVE_BOOT_COMPLETED and ACCESS_NETWORK_STATE.
Could you help me write the python regular expression to get the strings I want?
Thanks.

Assuming the code block you provided is one long string, here stored in a variable called input_string:
name = re.search(r"(?<=name\=\')[\w\.]+?(?=\')", input_string).group(0)
versionName = re.search(r"(?<=versionName\=\')\d+?\.\d+?(?=\')", input_string).group(0)
permissions = re.findall(r'(?<=android\.permission\.)[A-Z_]+(?=\')', input_string)
Explanation:
name
(?<=name\=\'): check ahead of the main string in order to return only strings that are preceded by name='. The \ in front of = and ' serve to escape them so that the regex knows we're talking about the = string and not a regex command. name=' is not also returned when we get the result, we just know that the results we get are all preceded by it.
[\w\.]+?: This is the main string we're searching for. \w means any alphanumeric character and underscore. \. is an escaped period, so the regex knows we mean . and not the regex command represented by an unescaped period. Putting these in [] means we're okay with anything we've stuck in brackets, so we're saying that we'll accept any alphanumeric character, _, or .. + afterwords means at least one of the previous thing, meaning at least one (but possibly more) of [\w\.]. Finally, the ? means don't be greedy--we're telling the regex to get the smallest possible group that meets these specifications, since + could go on for an unlimited number of repeats of anything matched by [\w\.].
(?=\'): check behind the main string in order to return only strings that are followed by '. The \ is also an escape, since otherwise regex or Python's string execution might misinterpret '. This final ' is not returned with our results, we just know that in the original string, it followed any result we do end up getting.

You can do this without regex by reading the file content line by line.
>>> def split_string(s):
... if s.startswith('package'):
... return [i.split('=')[1] for i in s.split() if "=" in i]
... elif s.startswith('uses-permission'):
... return s.split('.')[-1]
...
>>> split_string("package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'")
["'jp.tjkapp.droid1lwp'", "'2'", "'1.1'"]
>>> split_string("uses-permission:'android.permission.WRITE_APN_SETTINGS'")
"WRITE_APN_SETTINGS'"
>>> split_string("uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'")
"RECEIVE_BOOT_COMPLETED'"
>>> split_string("uses-permission:'android.permission.ACCESS_NETWORK_STATE'")
"ACCESS_NETWORK_STATE'"
>>>

Here is one example code
#!/usr/bin/env python
inputFile = open("test.txt", "r").readlines()
for line in inputFile:
if line.startswith("package"):
words = line.split()
string1 = words[1].split("=")[1].replace("'","")
string2 = words[3].split("=")[1].replace("'","")
test.txt file contains input data you mentioned earlier..

regex in python 2.4

I have a string in python as below:
"\\B1\\B1xxA1xxMdl1zzInoAEROzzMofIN"
I want to get the string as
"B1xxA1xxMdl1zzInoAEROzzMofIN"
I think this can be done using regex but could not achieve it yet. Please give me an idea.

st = "\B1\B1xxA1xxMdl1zzInoAEROzzMofIN"
s = re.sub(r"\\","",st)
idx = s.rindex("B1")
print s[idx:]
output = 'B1xxA1xxMdl1zzInoAEROzzMofIN'
OR
st = "\B1\B1xxA1xxMdl1zzInoAEROzzMofIN"
idx = st.rindex("\\")
print st[idx+1:]
output = 'B1xxA1xxMdl1zzInoAEROzzMofIN'

Here is a try:
import re
s = "\\B1\\B1xxA1xxMdl1zzInoAEROzzMofIN"
s = re.sub(r"\\[^\\]+\\","", s)
print s
Tested on http://py-ide-online.appspot.com (couldn't find a way to share though)
[EDIT] For some explanation, have a look at the Python regex documentation page and the first comment of this SO question:
How to remove symbols from a string with Python?
because using brackets [] can be tricky (IMHO)
In this case, [^\\] means anything but two backslashes \\.
So [^\\]+ means one or more character that matches anything but two backslashes \\.

If the desired section of the string is always on the RHS of a \ char then you could use:
string = "\\B1\\B1xxA1xxMdl1zzInoAEROzzMofIN"
string.rpartition("\\")[2]
output = 'B1xxA1xxMdl1zzInoAEROzzMofIN'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Delete specific duplicated punctuation from string - python

A solution using the re module: import re s = r"(0|\\+33)[1-9]( [0-9]{2}){4}" s = re.sub(r"\\(?=\\)", "", s) print(s) I look for all backslashes, that are followed by another backslash and replace it with an empty sign. Output: (0|\+33)[1-9]( [0-9]{2}){4}

Related

replacing special characters in string Python

How to remove everything before certain character in Python

How to remove text before a particular character or string in multi-line text?

python regular expression to match strings

regex in python 2.4

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Delete specific duplicated punctuation from string - python

A solution using the re module: import re s = r"(0|\\+33)[1-9]( *[0-9]{2}){4}" s = re.sub(r"\\(?=\\)", "", s) print(s) I look for all backslashes, that are followed by another backslash and replace it with an empty sign. Output: (0|\+33)[1-9]( *[0-9]{2}){4}​

Related

replacing special characters in string Python

How to remove everything before certain character in Python

How to remove text before a particular character or string in multi-line text?

python regular expression to match strings

regex in python 2.4

Categories

Resources

A solution using the re module: import re s = r"(0|\\+33)[1-9]( [0-9]{2}){4}" s = re.sub(r"\\(?=\\)", "", s) print(s) I look for all backslashes, that are followed by another backslash and replace it with an empty sign. Output: (0|\+33)[1-9]( [0-9]{2}){4}