Python remove everything after a space with hex \x00

Python remove everything after a space with hex \x00 - python

I have a variable string with unknown length that has the important string at the left side and the unimportant things on the right side separated by a single space. How do I remove the unimportant information to the right?
I have tried rstrip, and split with no success.
Edit: I'll place the actual value that needs to be fixed.
"NPC_tester_contact() ) ntact() "
The very first space (the one left to the closed parenthesis) should have everything after including itself be marked as unimportant.
Edit: The output should be "NPC_tester_contact()"!
Look carefully at my string that I placed above. There is alot of whitespace after it as well. I assume that is what is causing the hiccup.
I have tried most of the solutions here and they either don't do anything or just produce whitespace.
repr(s) gives me.
'NPC_me_lvup_event_contact()\x00t()\x00act()\x00act()\x00ntact()\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
It should be "NPC_me_lvup_event_contact()".
Thanks!
Perhaps this is a better question. Is there a way to remove all characters after the first \x00 hex that shows up in the string?
For some reason, it works sometimes and doesn't always work. The above example was done with the method that Levon posted.
Solution: Problem solved. This is more of a null byte rather than a space byte. The solution would of been any of the below using "\x00" as the identifier instead of " ".
Thank you everyone!

UPDATE based on new string data:
Assuming s contains your string:
s.split('\x00')[0]
yields
'NPC_me_lvup_event_contact()'
split() will give you a list of strings separated by the character you specify with split. If none is specified space is used, in this case we use the hex character you are interested in.

USE split(' ')[0]
>>> a = 'aaa bbb'
>>> a.split(' ')[0]
'aaa'
>>> >

>>> mystring = 'important useless'
>>> mystring[:mystring.find(' ')]
'important'

split() w/o delimiter splits by any whitespace:
>>> "asdasd xyz".split()[0]
'asdasd'

str = "important unimportant"
important = str.split(' ')[0]

try this:
lhs,rhs=s.split() #lhs is what you want.
This only works if there is really only one space.
Otherwise, you can get lhs by (but you lose rhs):
lhs=s.split()[0]

Use the split() function, and get the first item that it returns:
raw_string = 'NPC_tester_contact() ) ntact() '
important = raw_string.split(' ')[0]
Will return:
NPC_tester_contact()

try this,
will assume that your string is stored in str
print str[0:str.index(" ")]
comment if it dont work, will solve it..
here is
My code
str = "NPC_tester_contact() ) ntact() "
print str[0:str.index(" ")]
output
NPC_tester_contact()
link
http://ideone.com/i9haI
and if you want output to be have surrounded with double-quotes then
`print '"',str[0:str.index(" ")],'"'

you could use a regex type solution also. Something like:
import re
input_string = 'NPC_me_lvup_event_contact()\x00t()\x00act()\x00act()\x00ntact()\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
string_pat = re.compile(r'[a-zA-Z0-9\(\)_]+')
try:
first_part = string_pat.findall(input_string)[0]
except IndexError:
# There is nothing of interest for you in this string
first_part = ''

Related

How to remove text before a particular character or string in multi-line text?

I want to remove all the text before and including */ in a string.
For example, consider:
string = ''' something
other things
etc. */ extra text.
'''
Here I want extra text. as the output.
I tried:
string = re.sub("^(.*)(?=*/)", "", string)
I also tried:
string = re.sub(re.compile(r"^.\*/", re.DOTALL), "", string)
But when I print string, it did not perform the operation I wanted and the whole string is printing.

I suppose you're fine without regular expressions:
string[string.index("*/ ")+3:]
And if you want to strip that newline:
string[string.index("*/ ")+3:].rstrip()

The problem with your first regex is that . does not match newlines as you noticed. With your second one, you were closer but forgot the * that time. This would work:
string = re.sub(re.compile(r"^.*\*/", re.DOTALL), "", string)
You can also just get the part of the string that comes after your "*/":
string = re.search(r"(\*/)(.*)", string, re.DOTALL).group(2)

Update: After doing some research, I found that the pattern (\n|.) to match everything including newlines is inefficient. I've updated the answer to use [\s\S] instead as shown on the answer I linked.
The problem is that . in python regex matches everything except newlines. For a regex solution, you can do the following:
import re
strng = ''' something
other things
etc. */ extra text.
'''
print(re.sub("[\s\S]+\*/", "", strng))
# extra text.
Add in a .strip() if you want to remove that remaining leading whitespace.

to keep text until that symbol you can do:
split_str = string.split(' ')
boundary = split_str.index('*/')
new = ' '.join(split_str[0:boundary])
print(new)
which gives you:
something
other things
etc.

string_list = string.split('*/')[1:]
string = '*/'.join(string_list)
print(string)
gives output as
' extra text. \n'

Convert string with escapes to one without

I'm working on a exercism.io exercise in Python where one of the tests requires that I convert an SGF value with escape characters into one without. I don't know why they leave newline characters intact, however.
input_val = "\\]b\nc\nd\t\te \n\\]"
output_val = "]b\nc\nd e \n]"
I tried some codecs and ats functions to no avail. Any suggestions? Thanks in advance.

The purpose of your exercise is unclear, but the solution is trivial:
input_val.replace("\\", "").replace("\t", " ")

You can use this code:
def no_escapes(text): # get text argument
# get a list of strings split with \ and join them together without it
text = text.split('\\')
text = [t.split('\t') for t in text]
text = [i for t in text for i in t]
return ''.join(text)
It will first turn "\\]b\nc\nd\t\te \n\\]" into ["]b\nc\nd\te \n"]. It'll then turn it into [["]b\nc\nd", "e \n"]]. Next, it'll flatten it out into ["]b\nc\nd", "e \n"] and it'll join them together without anything between the strings, so you'll end up with "]b\nc\nd e \n]"
Example:
>>> print(no_escapes('\\yeet\nlol\\'))
yeet
lol
And if you want it raw:
>>> string = no_escapes('\\yeet\nlol\\')
>>> print(f'{string!r}')
yeet\nlol

After looking at SGF text value rules here which says, 'all whitespaces except line breaks become spaces,' I came up with this solution. It oddly doesn't say '\\' characters should be erased, though. Not sure if there's a cleaner way to do this?
s = '\\]b\nc\nd\t\te \n\\]'
r = re.sub(r'[^\S\n]', ' ', s).replace(r'\\', '')
print(r)
# ']b\nc\nd e \n]'

Remove Characters from string with replace not working

I have a number of strings from which I am aiming to remove charactars using replace. However, this dosent seem to wake. To give a simplified example, this code:
row = "b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'"
row = row.replace("b'", "").replace("'", "").replace('b"', '').replace('"', '')
print(row.encode('ascii', errors='ignore'))
still ouputs this b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38' wheras I would like it to output James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38. How can I do this?
Edit: Updataed the code with a better example.

You seem to be mistaking single quotes for double quotes. Simple replace 'b:
>>> row = "xyz'b"
>>> row.replace("'b", "")
'xyz'
As an alternative to str.replace, you can simple slice the string to remove the unwanted leading and trailing characters:
>>> row[2:-1]
'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'

In your first .replace, change b' to 'b. Hence your code should be:
>>> row = "xyz'b"
>>> row = row.replace("'b", "").replace("'", "").replace('b"', '').replace('"', '')
# ^ changed here
>>> print(row.encode('ascii', errors='ignore'))
xyz
I am assuming rest of the conditions you have are the part of other task/matches that you didn't mentioned here.
If all you want is to take the string before first ', then you may just do:
row.split("'")[0]

You haven't listed this to remove 'b:
.replace("'b", '')

import ast
row = "b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'"
b_string = ast.literal_eval(row)
print(b_string)
u_string = b_string.decode('utf-8')
print(u_string)
out:
b_string:b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'
u_string: James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38
The real question is how to convert a string to python object.
You get a string which contains an a binary string, to convert it to python's binary string object, you should use eval(). ast.literal_eval() is more safe way to do it.
Now you get a binary string, you can convert it to unicode string which do not start with "b" by using decode()

Delete all characters after a backslash in python?

I have a for loop that changes the string current_part.
Current_part should have a format of 1234 but sometimes it has the format of 1234/gg
Other formats exist but in all of them, anything after the backlash need to be deleted.
I found a similar example below so I tried it but it didn't work. How can I fix this? Thanks
current_part = re.sub(r"\B\\\w+", "", str(current_part))

No need for regexes here, why don't you simply go for current_part = current_part.split('/')[0] ?

Find the position of '/' and replace your string with all characters preceding '/'
st = "12345/gg"
n = st.find('/');
st = st[:n]
print(st)

You can split your string using string.split()
for example:
new_string = current_part.split("/")[0]

Using strip() to clean up a string

I am new to python and I have a string that looks like this
Temp = "', '/1412311.2121\n"
my desired output is just getting the numbers and decimal itself.. so im looking for
1412311.2121
as the output.. trying to get rid of the ', '/\n in the string.. I have tried Temp.strip("\n") and Temp.rstrip("\n") for trying to remove \n but i still seems to remain in my string. :/... Does anyone have any ideas? Thanks for your help.

Strings are immutable. string.strip() doesn't change string, it's a function that returns a value. You need to do:
Temp = Temp.strip()
Note also that calling strip() without any parameters causes it to remove all whitespace characters, including \n
As stalk said, you can achieve your desired result by calling strip("',/\n") on Temp.

If the data are like you show, numbers that are wrapped from right and left with non-number data, you can use a very simple regular expression:
g = re.search('[0-9.]+', s) # capture the inner number only
print g.group(0)

I would use a regular expression to do this:
In [8]: s = "', '/1412311.2121\n"
In [9]: re.findall(r'([+-]?\d+(?:\.\d+)?(?:[eE][+-]\d+)?)', s)
Out[9]: ['1412311.2121']
This returns a list of all floating-point numbers found in the string.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.