python convert "unicode" as list - python

I have a doubt about treat a return type in python.
I have a database function that returns this as value:
(1,13616,,"My string, that can have comma",170.90)
I put this into a variable and did test the type:
print(type(var))
I got the result:
<type 'unicode'>
I want to convert this to a list and get the values separeteds by comma.
Ex.:
var[0] = 1
var[1] = 13616
var[2] = None
var[3] = "My string, that can have comma"
var[4] = 170.90
Is it possible?

Using standard library csv readers:
>>> import csv
>>> s = u'(1,13616,,"My string, that can have comma",170.90)'
>>> [var] = csv.reader([s[1:-1]])
>>> var[3]
'My string, that can have comma'
Some caveats:
var[2] will be an empty string, not None, but you can post-process that.
numbers will be strings and also need post-processing, since csv does not tell the difference between 0 and '0'.

You can try to do the following:
b = []
for i in a:
if i != None:
b.append(i)
if i == None:
b.append(None)
print (type(b))

The issue is not with the comma.
this works fine:
a = (1,13616,"My string, that can have comma",170.90)
and this also works:
a = (1,13616,None,"My string, that can have comma",170.90)
but when you leave two commas ",," it doesn't work.

Unicode strings are (basically) just strings in Python2 (in Python3, remove the word "basically" in that last sentence). They're written as literals by prefixing a u before the string (compare raw-strings r"something", or Py3.4+ formatter strings f"{some_var}thing")
Just strip off your parens and split by comma. You'll have to do some post-parsing if you want 170.90 instead of u'170.90' or None instead of u'', but I'll leave that for you to decide.
>>> var.strip(u'()').split(u',')
[u'1', u'13616', u'', u'"My string', u' that can have comma"', u'170.90']

Related

Python string.rstrip() doesn't strip specified characters

string = "hi())("
string = string.rstrip("abcdefghijklmnoprstuwxyz")
print(string)
I want to remove every letter from given string using rstrip method, however it does not change the string in the slightest.
Output:
'hi())('
What i Want:
'())('
I know that I can use regex, but I really don't understand why it doesn't work.
Note : It is a part of the Valid Parentheses challenge on code-wars
You have to use lstrip instead of rstrip:
>>> string = "hi())("
>>> string = string.lstrip("abcdefghijklmnoprstuwxyz")
>>> string
'())('

python dict append to list error(value with \)

I got a problem when appending a dict to list
data = []
path = "abc\cde"
data.append({"image": path})
print(data)
When I append the path to the image, the output of data is [{'image':'abc\def'}].
It contains two \ instead of one.
When typing text that contains slashes, use raw strings to avoid having some sequences be interpreted as special characters, e.g. "\n" in a python string is a single character that represents a new line.
>>> data = []
>>> data.append({"image": r'abc\cde'})
>>> data
[{'image': 'abc\\cde'}]
>>>
>>> data.append({"image": r'abc\nasdf'})
>>> data
[{'image': 'abc\\cde'}, {'image': 'abc\\nasdf'}]
When you see two slashes is because that's how python repr-esents a string with slashes safely, it's not the actual content.
>>> r'abc\cde'
'abc\\cde'
>>> r'abc\nasdf'
'abc\\nasdf'
In this way a text with special chars can be visualized in a compact way. If you want to see what the actual content of those strings looks like, print them:
>>> print(r'abc\cde')
abc\cde
>>> print(r'abc\nasdf')
abc\nasdf
>>> print('abc\cde')
abc\cde
>>> print('abc\nasdf')
abc
asdf
Using raw strings only applies to strings you type manually, it's a method to explain python how to interpret certain characters. If the string comes from e.g. a file or a stream, the "meaning" of its char is already defined.
Regarding your question on how to concatenate a raw string (again, a raw string is a normal string) with a variable, there's no difference.
>>> with_slash = r'abc\cde'
>>> wout_slash = 'asdf'
>>> with_slash + wout_slash
'abc\\cdeasdf'
>>> print(with_slash + wout_slash)
abc\cdeasdf
\ is an escape character. It allows you to use special symbols, for example a new line \n or tab \t. If you want a string to contain a literal \, make sure that you put another \ before it.
In your case, Python understands that you meant "abc\\cde" even though you did not escape \. If you had abc\nde, the result would be abc<line_break>de.
>>> a = "abc\\cde"
>>> a
'abc\\cde'
>>> list(a)
['a', 'b', 'c', '\\', 'c', 'd', 'e']
As you see, even though it looks like a double backslash, it is just one \ character.
More info: https://www.w3schools.com/python/gloss_python_escape_characters.asp
The additional backslash is Python escaping the single backslash. The actual value of your path string is unchanged, as you can see when the value of data[0]['image'] is printed.
data = []
path = 'abc\cde'
data.append({"image": path})
# output: abc\cde
print(data[0]['image'])

Python prevent decoding HEX to ASCII while removing backslashes from my Var

I want to strip some unwanted symbols from my variable. In this case the symbols are backslashes. I am using a HEX number, and as an example I will show some short simple code down bellow. But I don't want python to convert my HEX to ASCII, how would I prevent this from happening.? I have some long shell codes for asm to work with later which are really long and removing \ by hand is a long process. I know there are different ways like using echo -e "x\x\x\x" > output etc, but my whole script will be written in python.
Thanks
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> b = a.strip("\\")
>>> print b
1�Phtv
>>> a = "\x31\x32\x33\x34\x35\x36"
>>> b = a.strip("\\")
>>> print b
123456
At the end I would like it to print my var:
>>> print b
x31x32x33x34x35x36
There are no backslashes in your variable:
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> print(a)
1ÀPhtv
Take newline for example: writing "\n" in Python will give you string with one character -- newline -- and no backslashes. See string literals docs for full syntax of these.
Now, if you really want to write string with such backslashes, you can do it with r modifier:
>>> a = r"\x31\xC0\x50\x68\x74\x76"
>>> print(a)
\x31\xC0\x50\x68\x74\x76
>>> print(a.replace('\\', ''))
x31xC0x50x68x74x76
But if you want to convert a regular string to hex-coded symbols, you can do it character by character, converting it to number ("\x31" == "1" --> 49), then to hex ("0x31"), and finally stripping the first character:
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> print(''.join([hex(ord(x))[1:] for x in a]))
'x31xc0x50x68x74x76'
There are two problems in your Code.
First the simple one:
strip() just removes one occurrence. So you should use replace("\\", ""). This will replace every backslash with "", which is the same as removing it.
The second problem is pythons behavior with backslashes:
To get your example working you need to append an 'r' in front of your string to indicate, that it is a raw string. a = r"\x31\xC0\x50\x68\x74\x76". In raw strings, a backlash doesn't escape a character but just stay a backslash.
>>> r"\x31\xC0\x50\x68\x74\x76"
'\\x31\\xC0\\x50\\x68\\x74\\x76'

Remove Characters from string with replace not working

I have a number of strings from which I am aiming to remove charactars using replace. However, this dosent seem to wake. To give a simplified example, this code:
row = "b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'"
row = row.replace("b'", "").replace("'", "").replace('b"', '').replace('"', '')
print(row.encode('ascii', errors='ignore'))
still ouputs this b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38' wheras I would like it to output James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38. How can I do this?
Edit: Updataed the code with a better example.
You seem to be mistaking single quotes for double quotes. Simple replace 'b:
>>> row = "xyz'b"
>>> row.replace("'b", "")
'xyz'
As an alternative to str.replace, you can simple slice the string to remove the unwanted leading and trailing characters:
>>> row[2:-1]
'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'
In your first .replace, change b' to 'b. Hence your code should be:
>>> row = "xyz'b"
>>> row = row.replace("'b", "").replace("'", "").replace('b"', '').replace('"', '')
# ^ changed here
>>> print(row.encode('ascii', errors='ignore'))
xyz
I am assuming rest of the conditions you have are the part of other task/matches that you didn't mentioned here.
If all you want is to take the string before first ', then you may just do:
row.split("'")[0]
You haven't listed this to remove 'b:
.replace("'b", '')
import ast
row = "b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'"
b_string = ast.literal_eval(row)
print(b_string)
u_string = b_string.decode('utf-8')
print(u_string)
out:
b_string:b'James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38'
u_string: James Bray,/citations?user=8IqSrdIAAAAJ&hl=en&oe=ASCII,1985,6020,188.12,42,1.31,76,2.38
The real question is how to convert a string to python object.
You get a string which contains an a binary string, to convert it to python's binary string object, you should use eval(). ast.literal_eval() is more safe way to do it.
Now you get a binary string, you can convert it to unicode string which do not start with "b" by using decode()

Python: How to remove [' and ']?

I want to remove [' from start and '] characters from the end of a string.
This is my text:
"['45453656565']"
I need to have this text:
"45453656565"
I've tried to use str.replace
text = text.replace("['","");
but it does not work.
You need to strip your text by passing the unwanted characters to str.strip() method:
>>> s = "['45453656565']"
>>>
>>> s.strip("[']")
'45453656565'
Or if you want to convert it to integer you can simply pass the striped result to int function:
>>> try:
... val = int(s.strip("[']"))
... except ValueError:
... print("Invalid string")
...
>>> val
45453656565
Using re.sub:
>>> my_str = "['45453656565']"
>>> import re
>>> re.sub("['\]\[]","",my_str)
'45453656565'
You could loop over the character filtering if the element is a digit:
>>> number_array = "['34325235235']"
>>> int(''.join(c for c in number_array if c.isdigit()))
34325235235
This solution works even for both "['34325235235']" and '["34325235235"]' and whatever other combination of number and characters.
You also can import a package and use a regular expresion to get it:
>>> import re
>>> theString = "['34325235235']"
>>> int(re.sub(r'\D', '', theString)) # Optionally parse to int
Instead of hacking your data by stripping brackets, you should edit the script that created it to print out just the numbers. E.g., instead of lazily doing
output.write(str(mylist))
you can write
for elt in mylist:
output.write(elt + "\n")
Then when you read your data back in, it'll contain the numbers (as strings) without any quotes, commas or brackets.

Categories