How to remove backslashes in strings in python - python

As an output of Pytesseract, I get a string variable which contains backslashes. I would like to remove all of the back slashes.
'13, 0\\'70'
Unforturnately the replace function does not work as the string doesn't seem to be an actual string when the variable value is copied. Anybody knows how I can remove all the backslashes?

I replaced your outermost quotation marks with double-quotes, and then properly applied `replace:
>>> brut_mass = "13, 0\\'70"
>>> brut_mass.replace('\\', '')
"13, 0'70"
Does that solve your problem?

Fixed it with the code below.
brut_mass = repr(brut_mass).replace(" ' ", '')
or alternatively to avoid the double quotations
brut_mass = brut_mass.replace(" ' ", '')

Related

Delete specific duplicated punctuation from string

I have this string s = "(0|\\+33)[1-9]( *[0-9]{2}){4}". And I want to delete just the duplicated just one ' \ ', like I want the rsult to look like (0|\+33)[1-9]( *[0-9]{2}){4}.
When I used this code, all the duplicated characters are removed:
result = "".join(dict.fromkeys(s)).
But in my case I want just to remove the duplicated ' \ '. Any help is highly appreciated
A solution using the re module:
import re
s = r"(0|\\+33)[1-9]( *[0-9]{2}){4}"
s = re.sub(r"\\(?=\\)", "", s)
print(s)
I look for all backslashes, that are followed by another backslash and replace it with an empty sign.
Output: (0|\+33)[1-9]( *[0-9]{2}){4}​
The function you need is replace
s = "(0|\\+33)[1-9]( *[0-9]{2}){4}"
result = s.replace("\\","")
EDIT
I see now that you want to remove just one \ and not both.
In order to do this you have to modify the call to replace this way
result = s.replace("\","",1) # last argument is the number of occurrances to replace
or
result = s.replace("\\","\")
EDIT of the EDIT
Backslashes are special in Python.
I'm using Python 3.10.5. If I do
x = "ab\c"
y = "ab\\c"
print(len(x)==len(y))
I get a True.
That's because backslashes are used to escape special characters, and that makes the backslash a special character :)
I suggest you to try a little bit with replace until you get what you need.

Remove string containing apostrophes python

How can I remove a string that contains apostrophes e.g.: I want to remove ' Cannot get 'data' from cell' from my text.
i would use str.replace('Cannot get 'data' from cell',''), but the apostrophes are "splitting" the string and so this doesnt work.
You can escape single quotes using the backslash like this:
str.replace('Cannot get \'data\' from cell', '')
If you want to remove both the initial quotes and the ones in the middle, you should escape the first and last too like this:
str.replace('\'Cannot get \'data\' from cell\'', '')
Just use double quotes to mark the string you want to remove, or use backslashes, though is more unclear.
string.replace("'Cannot get 'data' from cell'",'')
string.replace('\'Cannot get \'data\' from cell\'', '')
EDIT: If you don't have quotes before Cannot and after cell, you just need to remove first and last single quote from the string to be replaced
string.replace("Cannot get 'data' from cell",'')
string.replace('Cannot get \'data\' from cell', '')
I don't know if the other answers here answered your question, but it's confusing me. In what way "remove"? If you really want to remove it:
foo = 'Hello, World! This string has an \' in it!'
if "'" in foo: # if '\'' in foo is also possible
del foo
If you mean to replace the apostrophes with something try:
foo = 'Hello, World! This string has an \' in it!'
foo = foo.replace('\'', parameter2) #parameter2 is the value which you wanna replace the apostrophe with
Please be more specific with your questions in the future!

Python replace backward (\) with forward (/)

I am trying to replace \ with /. However, I'm having no success.
Following is the snapshot of the scenario that I am trying to achieve
string = "//SQL-SERVER/Lacie/City of X/Linservo\171002"
print string.replace("\\","/")
Output:
//SQL-SERVER/Lacie/City of X/Linservoy002
Desired output:
//SQL-SERVER/Lacie/City of X/Linservo/171002
You need to escape "\" with an extra "\".
>>> string = "//SQL-SERVER/Lacie/City of X/Linservo\\171002"
>>> string
'//SQL-SERVER/Lacie/City of X/Linservo\\171002'
>>> print string.replace("\\","/")
//SQL-SERVER/Lacie/City of X/Linservo/171002
string = r"//SQL-SERVER/Lacie/City of X/Linservo\171002"
print string.replace("\\","/")
output
//SQL-SERVER/Lacie/City of X/Linservo/171002
You have errors both in replace function and in string definition.
In your string definition \171 gives char with octal value of 171 – y
In you replace function, backslash escapes quote.
You should escape backslashes
string = "//SQL-SERVER/Lacie/City of X/Linservo\\171002"
string.replace("\\","/")
You can simply use ".replace" in python or if you want you can use regex :
import re
string = r"//SQL-SERVER/Lacie/City of X/Linservo\171002"
pattern=r'[\\]'
replaced_string=re.sub(pattern,"/",string)
print(replaced_string)
Since your original question shows : "X/Linservo\171002" here \171 referring to character encoding so it's replacing \171 to "y". you can try this in python interpreter :
In[2]: print("\171")
y

Removing the single quotes after using re.sub() in python

After replacing all word characters in a string with the character '^', using re.sub("\w", "^" , stringorphrase) I'm left with :
>>> '^^^ ^^ ^^^^'
Is there any way to remove the single quotes so it looks cleaner?
>>> ^^^ ^^ ^^^^
Are you sure it's just not how it's displayed in the interactive prompt or something (and there aren't actually apost's in your string)?
If the ' is actually part of the string, and is first/last then either:
string = string.strip("'")
or:
string = string[1:-1] # lop ending characters off
Use the print statement. The quotes aren't actually part of the string.
To remove all occurrences of single quotes:
mystr = some_string_with_single_quotes
answer = mystr.replace("'", '')
To remove single quotes ONLY at the ends of the string:
mystr = some_string_with_single_quotes
answer = mystr.strip("'")
Hope this helps

Python remove everything after a space with hex \x00

I have a variable string with unknown length that has the important string at the left side and the unimportant things on the right side separated by a single space. How do I remove the unimportant information to the right?
I have tried rstrip, and split with no success.
Edit: I'll place the actual value that needs to be fixed.
"NPC_tester_contact() ) ntact() "
The very first space (the one left to the closed parenthesis) should have everything after including itself be marked as unimportant.
Edit: The output should be "NPC_tester_contact()"!
Look carefully at my string that I placed above. There is alot of whitespace after it as well. I assume that is what is causing the hiccup.
I have tried most of the solutions here and they either don't do anything or just produce whitespace.
repr(s) gives me.
'NPC_me_lvup_event_contact()\x00t()\x00act()\x00act()\x00ntact()\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
It should be "NPC_me_lvup_event_contact()".
Thanks!
Perhaps this is a better question. Is there a way to remove all characters after the first \x00 hex that shows up in the string?
For some reason, it works sometimes and doesn't always work. The above example was done with the method that Levon posted.
Solution: Problem solved. This is more of a null byte rather than a space byte. The solution would of been any of the below using "\x00" as the identifier instead of " ".
Thank you everyone!
UPDATE based on new string data:
Assuming s contains your string:
s.split('\x00')[0]
yields
'NPC_me_lvup_event_contact()'
split() will give you a list of strings separated by the character you specify with split. If none is specified space is used, in this case we use the hex character you are interested in.
USE split(' ')[0]
>>> a = 'aaa bbb'
>>> a.split(' ')[0]
'aaa'
>>> >
>>> mystring = 'important useless'
>>> mystring[:mystring.find(' ')]
'important'
split() w/o delimiter splits by any whitespace:
>>> "asdasd xyz".split()[0]
'asdasd'
str = "important unimportant"
important = str.split(' ')[0]
try this:
lhs,rhs=s.split() #lhs is what you want.
This only works if there is really only one space.
Otherwise, you can get lhs by (but you lose rhs):
lhs=s.split()[0]
Use the split() function, and get the first item that it returns:
raw_string = 'NPC_tester_contact() ) ntact() '
important = raw_string.split(' ')[0]
Will return:
NPC_tester_contact()
try this,
will assume that your string is stored in str
print str[0:str.index(" ")]
comment if it dont work, will solve it..
here is
My code
str = "NPC_tester_contact() ) ntact() "
print str[0:str.index(" ")]
output
NPC_tester_contact()
link
http://ideone.com/i9haI
and if you want output to be have surrounded with double-quotes then
`print '"',str[0:str.index(" ")],'"'
you could use a regex type solution also. Something like:
import re
input_string = 'NPC_me_lvup_event_contact()\x00t()\x00act()\x00act()\x00ntact()\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
string_pat = re.compile(r'[a-zA-Z0-9\(\)_]+')
try:
first_part = string_pat.findall(input_string)[0]
except IndexError:
# There is nothing of interest for you in this string
first_part = ''

Categories