Remove Characters from string with replace not working - python

I have a number of strings from which I am aiming to remove charactars using replace. However, this dosent seem to wake. To give a simplified example, this code:
row = "b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'"
row = row.replace("b'", "").replace("'", "").replace('b"', '').replace('"', '')
print(row.encode('ascii', errors='ignore'))
still ouputs this b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38' wheras I would like it to output James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38. How can I do this?
Edit: Updataed the code with a better example.

You seem to be mistaking single quotes for double quotes. Simple replace 'b:
>>> row = "xyz'b"
>>> row.replace("'b", "")
'xyz'
As an alternative to str.replace, you can simple slice the string to remove the unwanted leading and trailing characters:
>>> row[2:-1]
'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'

In your first .replace, change b' to 'b. Hence your code should be:
>>> row = "xyz'b"
>>> row = row.replace("'b", "").replace("'", "").replace('b"', '').replace('"', '')
# ^ changed here
>>> print(row.encode('ascii', errors='ignore'))
xyz
I am assuming rest of the conditions you have are the part of other task/matches that you didn't mentioned here.
If all you want is to take the string before first ', then you may just do:
row.split("'")[0]

You haven't listed this to remove 'b:
.replace("'b", '')

import ast
row = "b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'"
b_string = ast.literal_eval(row)
print(b_string)
u_string = b_string.decode('utf-8')
print(u_string)
out:
b_string:b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'
u_string: James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38
The real question is how to convert a string to python object.
You get a string which contains an a binary string, to convert it to python's binary string object, you should use eval(). ast.literal_eval() is more safe way to do it.
Now you get a binary string, you can convert it to unicode string which do not start with "b" by using decode()

Related

Python string.rstrip() doesn't strip specified characters

string = "hi())("
string = string.rstrip("abcdefghijklmnoprstuwxyz")
print(string)
I want to remove every letter from given string using rstrip method, however it does not change the string in the slightest.
Output:
'hi())('
What i Want:
'())('
I know that I can use regex, but I really don't understand why it doesn't work.
Note : It is a part of the Valid Parentheses challenge on code-wars
You have to use lstrip instead of rstrip:
>>> string = "hi())("
>>> string = string.lstrip("abcdefghijklmnoprstuwxyz")
>>> string
'())('

replacing special characters in string Python

I'm trying to replace special characters in a data frame with unaccented or different ones.
I can replace one with
df['col_name'] = df.col_name.str.replace('?','j')
this turned the '?' to 'j' - but - I can't seem to figure out how to change more than one..
I have a list of special characters that I want to change. I've tried using a dictionary but it doesn't seem to work
the_reps = {'?','j'}
df1 = df.replace(the_reps, regex = True)
this gave me the error nothing to replace at position 0
EDIT:
this is what worked - although it is probably not that pretty:
df[col]=df.col.str.replace('old char','new char')
df[col]=df.col.str.replace('old char','new char')
df[col]=df.col.str.replace('old char','new char')
df[col]=df.col.str.replace('old char','new char')...
for each one ..
import re
s=re.sub("[_list of special characters_]","",_your string goes here_)
print(s)
An example for this..
str="Hello$#& Python3$"
import re
s=re.sub("[$#&]","",str)
print (s)
#Output:Hello Python3
Explanation goes here..
s=re.sub("[$#&]","",s)
Pattern to be replaced → “[$#&]”
[] used to indicate a set of characters
[$#&] → will match either $ or # or &
The replacement string is given as an empty string
If these characters are found in the string, they’ll be replaced with an empty string
you can use Series.replace with a dictionary
#d = { 'actual character ':'replacement ',...}
df.columns = df.columns.to_series().replace(d, regex=True)
Try This:
import re
my_str = "hello Fayzan-Bhatti Ho~!w"
my_new_string = re.sub('[^a-zA-Z0-9 \n\.]', '', my_str)
print my_new_string
Output: hello FayzanBhatti How

python convert "unicode" as list

I have a doubt about treat a return type in python.
I have a database function that returns this as value:
(1,13616,,"My string, that can have comma",170.90)
I put this into a variable and did test the type:
print(type(var))
I got the result:
<type 'unicode'>
I want to convert this to a list and get the values separeteds by comma.
Ex.:
var[0] = 1
var[1] = 13616
var[2] = None
var[3] = "My string, that can have comma"
var[4] = 170.90
Is it possible?
Using standard library csv readers:
>>> import csv
>>> s = u'(1,13616,,"My string, that can have comma",170.90)'
>>> [var] = csv.reader([s[1:-1]])
>>> var[3]
'My string, that can have comma'
Some caveats:
var[2] will be an empty string, not None, but you can post-process that.
numbers will be strings and also need post-processing, since csv does not tell the difference between 0 and '0'.
You can try to do the following:
b = []
for i in a:
if i != None:
b.append(i)
if i == None:
b.append(None)
print (type(b))
The issue is not with the comma.
this works fine:
a = (1,13616,"My string, that can have comma",170.90)
and this also works:
a = (1,13616,None,"My string, that can have comma",170.90)
but when you leave two commas ",," it doesn't work.
Unicode strings are (basically) just strings in Python2 (in Python3, remove the word "basically" in that last sentence). They're written as literals by prefixing a u before the string (compare raw-strings r"something", or Py3.4+ formatter strings f"{some_var}thing")
Just strip off your parens and split by comma. You'll have to do some post-parsing if you want 170.90 instead of u'170.90' or None instead of u'', but I'll leave that for you to decide.
>>> var.strip(u'()').split(u',')
[u'1', u'13616', u'', u'"My string', u' that can have comma"', u'170.90']

Python remove everything after a space with hex \x00

I have a variable string with unknown length that has the important string at the left side and the unimportant things on the right side separated by a single space. How do I remove the unimportant information to the right?
I have tried rstrip, and split with no success.
Edit: I'll place the actual value that needs to be fixed.
"NPC_tester_contact() ) ntact() "
The very first space (the one left to the closed parenthesis) should have everything after including itself be marked as unimportant.
Edit: The output should be "NPC_tester_contact()"!
Look carefully at my string that I placed above. There is alot of whitespace after it as well. I assume that is what is causing the hiccup.
I have tried most of the solutions here and they either don't do anything or just produce whitespace.
repr(s) gives me.
'NPC_me_lvup_event_contact()\x00t()\x00act()\x00act()\x00ntact()\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
It should be "NPC_me_lvup_event_contact()".
Thanks!
Perhaps this is a better question. Is there a way to remove all characters after the first \x00 hex that shows up in the string?
For some reason, it works sometimes and doesn't always work. The above example was done with the method that Levon posted.
Solution: Problem solved. This is more of a null byte rather than a space byte. The solution would of been any of the below using "\x00" as the identifier instead of " ".
Thank you everyone!
UPDATE based on new string data:
Assuming s contains your string:
s.split('\x00')[0]
yields
'NPC_me_lvup_event_contact()'
split() will give you a list of strings separated by the character you specify with split. If none is specified space is used, in this case we use the hex character you are interested in.
USE split(' ')[0]
>>> a = 'aaa bbb'
>>> a.split(' ')[0]
'aaa'
>>> >
>>> mystring = 'important useless'
>>> mystring[:mystring.find(' ')]
'important'
split() w/o delimiter splits by any whitespace:
>>> "asdasd xyz".split()[0]
'asdasd'
str = "important unimportant"
important = str.split(' ')[0]
try this:
lhs,rhs=s.split() #lhs is what you want.
This only works if there is really only one space.
Otherwise, you can get lhs by (but you lose rhs):
lhs=s.split()[0]
Use the split() function, and get the first item that it returns:
raw_string = 'NPC_tester_contact() ) ntact() '
important = raw_string.split(' ')[0]
Will return:
NPC_tester_contact()
try this,
will assume that your string is stored in str
print str[0:str.index(" ")]
comment if it dont work, will solve it..
here is
My code
str = "NPC_tester_contact() ) ntact() "
print str[0:str.index(" ")]
output
NPC_tester_contact()
link
http://ideone.com/i9haI
and if you want output to be have surrounded with double-quotes then
`print '"',str[0:str.index(" ")],'"'
you could use a regex type solution also. Something like:
import re
input_string = 'NPC_me_lvup_event_contact()\x00t()\x00act()\x00act()\x00ntact()\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
string_pat = re.compile(r'[a-zA-Z0-9\(\)_]+')
try:
first_part = string_pat.findall(input_string)[0]
except IndexError:
# There is nothing of interest for you in this string
first_part = ''

Using strip() to clean up a string

I am new to python and I have a string that looks like this
Temp = "', '/1412311.2121\n"
my desired output is just getting the numbers and decimal itself.. so im looking for
1412311.2121
as the output.. trying to get rid of the ', '/\n in the string.. I have tried Temp.strip("\n") and Temp.rstrip("\n") for trying to remove \n but i still seems to remain in my string. :/... Does anyone have any ideas? Thanks for your help.
Strings are immutable. string.strip() doesn't change string, it's a function that returns a value. You need to do:
Temp = Temp.strip()
Note also that calling strip() without any parameters causes it to remove all whitespace characters, including \n
As stalk said, you can achieve your desired result by calling strip("',/\n") on Temp.
If the data are like you show, numbers that are wrapped from right and left with non-number data, you can use a very simple regular expression:
g = re.search('[0-9.]+', s) # capture the inner number only
print g.group(0)
I would use a regular expression to do this:
In [8]: s = "', '/1412311.2121\n"
In [9]: re.findall(r'([+-]?\d+(?:\.\d+)?(?:[eE][+-]\d+)?)', s)
Out[9]: ['1412311.2121']
This returns a list of all floating-point numbers found in the string.

Categories