remove certain charicters from a string python [duplicate] - python

This question already has answers here:
Remove specific characters from a string in Python
(26 answers)
Closed 2 years ago.
is there a function in python that does something like this:
input:
text = "s.om/e br%0oken tex!t".remove(".","/","%","0","!")
print(text)
output:
some broken text
The only thing that i know that can kinda to this is .replace("x", "") and that takes way too long to get rid of lots of different charicters. Thanks in advance.

Use regex module re to replace them all. The [] means any character in it :
text = re.sub("[./%0!]", "", "s.om/e br%0oken tex!t")

There is a module named re which is used in Regular expressions. You can use its sub function to replace or substitute characters from a string. Then you can try like this:
from re import sub
text = sub("[./%0!]","","The string")
print(text)
Regex details: Character class of . / % 0 ! if these are found in string replace them with a blank string and later print the text variable.

You might use str.maketrans combined with .translate; example:
t = str.maketrans("","","./%0!")
text = "s.om/e br%0oken tex!t"
cleantext = text.translate(t)
print(cleantext) # print(cleantext)
maketrans accept 3 arguments, every n-th character from first will be replaced with n-th character from second, all characters present in third will be jettisoned. In this case we only want to jettison so 1st and 2nd arguments are empty strs.
Alternatively you might use comprehension as follows:
text = "s.om/e br%0oken tex!t"
cleantext = ''.join(i for i in text if i not in "./%0!")
print(cleantext) # some broken text

Related

How do I get part of a string with a regex in Python [duplicate]

This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 2 years ago.
I am new to regex's with python
I have a string which has got a sub-string which I would like to extract from
I have the following pattern:
r = re.compile("(flag{.+[^}]})")
and the string is
Something has gone horribly wrong\n\nflag{Hi!}
I would like to get hold of just flag{Hi!}
I have tried it with:
a = re.search(r,string)
a = re.split(r,string)
None of the approaches work, if I print a I get None
How can I get hold of the desired flag.
Thanks in advance
import re
str="Something has gone horribly wrong\n\nflag{Hi!}"
r = re.compile("(flag{.+[^}]})")
a = re.search(r,str)
print(a.group())
This worked.
Firstly, as mentioned in the comments, your output is not None. You do get a match, the match you were looking for. You actually get a Match object that spans from position 35 -> 44 and matches flag{Hi!}. You can use group() to get the match represented as a string:
>>> a = re.search(r, string)
>>> print(a.group())
"flag{Hi!}"
You can also shorten your regex a little bit. There really isn't a need to use .+ because it becomes redundant when you add [^}], which matches all characters that aren't a closing curly bracket (}):
"(flag{[^}]+})"
You can replace the +, which matches one or more with * which matches zero or more if you want to match things like flag{} where there are no characters inside the curly brackets.
We can directly search the string for matching string.
import re
line = 'Something has gone horribly wrong\n\nflag{Hi!}'
r = re.search("(flag{[^}]*})", line)
print(r.group())
Output:-
flag{Hi!}

How would I remove the Arabic prefix "ال" from an arabic string?

I have tried things like this, but there is no change between the input and output:
def remove_al(text):
if text.startswith('ال'):
text.replace('ال','')
return text
text.replace returns the updated string but doesn't change it, you should change the code to
text = text.replace(...)
Note that in Python strings are "immutable"; there's no way to change even a single character of a string; you can only create a new string with the value you want.
If you want to only remove the prefix ال and not all of ال combinations in the string, I'd rather suggest to use:
def remove_prefix_al(text):
if text.startswith('ال'):
return text[2:]
return text
If you simply use text.replace('ال',''), this will replace all ال combinations:
Example
text = 'الاستقلال'
text.replace('ال','')
Output:
'استقل'
I would recommend the method str.lstrip instead of rolling your own in this case.
example text (alrashid) in Arabic: 'الرَشِيد'
text = 'الرَشِيد'
clean_text = text.lstrip('ال')
print(clean_text)
Note that even though arabic reads from right to left, lstrip strips the start of the string (which is visually to the right)
also, as user 6502 noted, the issue in your code is because python strings are immutable, thus the function was returning the input back
"ال" as prefix is quite complex in Arabic that you will need Regex to accurately separate it from its stem and other prefixes. The following code will help you isolate "ال" from most words:
import re
text = 'والشعر كالليل أسود'
words = text.split()
for word in words:
alx = re.search(r'''^
([وف])?
([بك])?
(لل)?
(ال)?
(.*)$''', word, re.X)
groups = [alx.group(1), alx.group(2), alx.group(3), alx.group(4), alx.group(5)]
groups = [x for x in groups if x]
print (word, groups)
Running that (in Jupyter) you will get:

How to replace the multiple different words with a single character/word in Python? [duplicate]

This question already has answers here:
Better way to remove multiple words from a string?
(5 answers)
Closed 3 years ago.
Note: Without chaining replace method (or) looping the characters in for loop (or) list comprehension
input_string = "the was is characters needs to replaced by empty spaces"
input_string.replace("the","").replace("was","").replace("is","").strip()
output: 'characters needs to replaced by empty spaces'
Is there any direct way to do this?
You can use python regex module(re.sub) to replace multiple characters with a single character:
input_string = "the was is characters needs to replaced by empty spaces"
import re
re.sub("the|was|is","",input_string).strip()
'characters needs to replaced by empty spaces'
This should help..
input_string = "the was is characters needs to replaced by empty spaces"
words_to_replace=['the', 'was','is']
print(input_string)
for words in words_to_replace:
input_string = input_string.replace(words, "")
print(input_string.strip())

Delete /n at end of a String (Python) [duplicate]

This question already has answers here:
How to remove \n from a list element?
(15 answers)
Closed 7 years ago.
How can I delete a /n linebreak at the end of a String ?
I´m trying to read two strings from an .txt file and want to format them with os.path.join() method after I "cleared" the string.
Here you can see my try with dummy data:
content = ['Source=C:\\Users\\app\n', 'Target=C:\\Apache24\\htdocs']
for string in content:
print(string)
if string.endswith('\\\n'):
string = string[0:-2]
print(content)
You can not update a string like you are trying to. Python strings are immutable. Every time you change a string, new instance is created. But, your list still refers to the old object. So, you can create a new list to hold updated strings. And to strip newlines you can use rstrip function. Have a look at the code below,
content = ['Source=C:\\Users\\app\n', 'Target=C:\\Apache24\\htdocs']
updated = []
for string in content:
print(string)
updated.append(string.rstrip())
print(updated)
You can use rstrip function. it trims any 'empty' string including \n from the string, like below:
>>> a = "aaa\n"
>>> print a
aaa
>>> a.rstrip()
'aaa'
To remove only \n use this:
string = string.rstrip('\n')
When you do string[0:-2] you are actually removing 2 characters from the end, while \n is one character.
try:
content = map(lambda x: x.strip(), content)

Retrieve part of string, variable length

I'm trying to learn how to use Regular Expressions with Python. I want to retrieve an ID number (in parentheses) in the end from a string that looks like this:
"This is a string of variable length (561401)"
The ID number (561401 in this example) can be of variable length, as can the text.
"This is another string of variable length (99521199)"
My coding fails:
import re
import selenium
# [Code omitted here, I use selenium to navigate a web page]
result = driver.find_element_by_class_name("class_name")
print result.text # [This correctly prints the whole string "This is a text of variable length (561401)"]
id = re.findall("??????", result.text) # [Not sure what to do here]
print id
This should work for your example:
(?<=\()[0-9]*
?<= Matches something preceding the group you are looking for but doesn't consume it. In this case, I used \(. ( is a special character, so it has to be escaped with \. [0-9] matches any number. The * means match any number of the directly preceding rule, so [0-9]* means match as many numbers as there are.
Solved this thanks to Kaz's link, very useful:
http://regex101.com/
id = re.findall("(\d+)", result.text)
print id[0]
You can use this simple solution :
>>> originString = "This is a string of variable length (561401)"
>>> str1=OriginalString.replace("("," ")
'This is a string of variable length 561401)'
>>> str2=str1.replace(")"," ")
'This is a string of variable length 561401 '
>>> [int(s) for s in string.split() if s.isdigit()]
[561401]
First, I replace parantheses with space. and then I searched the new string for integers.
No need to really use regular expressions here, if it is always at the end and always in parenthesis you can split, extract last element and remove the parenthesis by taking the substring ([1:-1]). Regexes are relatively time expensive.
line = "This is another string of variable length (99521199)"
print line.split()[-1][1:-1]
If you did want to use regular expressions I would do this:
import re
line = "This is another string of variable length (99521199)"
id_match = re.match('.*\((\d+)\)',line)
if id_match:
print id_match.group(1)

Categories