I am observing a different output in the unpack function of python when I accept the string input from the console and when I read the string input from a variable.
I read the string input from the variable, input:
>>> import struct
>>> input="\x0d\x00\x00\x00"
>>> print struct.unpack("I",input)[0]
13
I read the string input from the console:
>>> import sys
>>> import struct
>>> print struct.unpack("I",sys.stdin.read(4))[0]
\x0d\x00\x00\x00
1680898140
The input string is the same but the output is different. Does it interpret the input read from the console in a different way? How can I get the same input by reading the data from console?
"\x0d\x00\x00\x00" (from the first code) is different from r"\x0d\x00\x00\x00" (== "\\x0x\\x00\x00\x00") from the second code.
>>> struct.unpack("I", '\x0d\x00\x00\x00')[0]
13
>>> struct.unpack("I", r'\x0d\x00\x00\x00'[:4])[0]
1680898140
Try following:
>>> struct.unpack("I", sys.stdin.readline().decode('string-escape')[:4])[0]
\x0d\x00\x00\x00
13
seems like you are unpacking the wrong data...
>>> struct.unpack('I','\\x0d')[0]
1680898140
your call to sys.stdin.read(4) reads only 4 characters: \, x, 0 and d.
>>> import sys
>>> import struct
>>> value = raw_input().decode('string-escape')
\x0d\x00\x00\x00
>>> print struct.unpack("I", value)[0]
13
Related
I have a string which includes encoded bytes inside it:
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
I want to decode it, but I can't since it has become a string. Therefore I want to ask whether there is any way I can convert it into
str2 = b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
Here str2 is a bytes object which I can decode easily using
str2.decode('utf-8')
to get the final result:
'Output file 문항분석.xlsx Created'
You could use ast.literal_eval:
>>> print(str1)
b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
>>> type(str1)
<class 'str'>
>>> from ast import literal_eval
>>> literal_eval(str1).decode('utf-8')
'Output file 문항분석.xlsx Created'
Based on the SyntaxError mentioned in your comments, you may be having a testing issue when attempting to print due to the fact that stdout is set to ascii in your console (and you may also find that your console does not support some of the characters you may be trying to print). You can try something like the following to set sys.stdout to utf-8 and see what your console will print (just using string slice and encode below to get bytes rather than the ast.literal_eval approach that has already been suggested):
import codecs
import sys
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)
s = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
b = s[2:-1].encode().decode('utf-8')
A simple way is to assume that all the characters of the initial strings are in the [0,256) range and map to the same Unicode value, which means that it is a Latin1 encoded string.
The conversion is then trivial:
str1[2:-1].encode('Latin1').decode('utf8')
Finally I have found an answer where i use a function to cast a string to bytes without encoding.Given string
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
now i take only actual encoded text inside of it
str1[2:-1]
and pass this to the function which convert the string to bytes without encoding its values
import struct
def rawbytes(s):
"""Convert a string to raw bytes without encoding"""
outlist = []
for cp in s:
num = ord(cp)
if num < 255:
outlist.append(struct.pack('B', num))
elif num < 65535:
outlist.append(struct.pack('>H', num))
else:
b = (num & 0xFF0000) >> 16
H = num & 0xFFFF
outlist.append(struct.pack('>bH', b, H))
return b''.join(outlist)
So, calling the function would convert it to bytes which then is decoded
rawbytes(str1[2:-1]).decode('utf-8')
will give the correct output
'Output file 문항분석.xlsx Created'
I'm using selenium to insert text input with german umlauts in a web formular. The declared coding for the python script is utf-8. The page uses utf-8 encoding. When i definine a string like that everything works fine:
q = u"Hällö" #type(q) returns unicode
...
textbox.send_keys(q)
But when i try to read from a config file using ConfigParser (or another kind of file) i get malformed output in the webformular (Hällö). This is the code i use for that:
the_encoding = chardet.detect(q)['encoding'] #prints utf-8
q = parser.get('info', 'query') # type(q) returns str
q = q.decode('unicode-escape') # type(q) returns unicode
textbox.send_keys(q)
Whats the difference between the both q's given to the send_keys function?
This is probably bad encoding. Try printing q before the last statement, and see if it's equal. This line q = parser.get('info', 'query') # type(q) returns str should return the string 'H\xc3\xa4ll\xc3\xb6'. If it's different, then you are using the wrong coding.
>>> q = u"Hällö" # unicode obj
>>> q
u'H\xe4ll\xf6'
>>> print q
Hällö
>>> q.encode('utf-8')
'H\xc3\xa4ll\xc3\xb6'
>>> a = q.encode('utf-8') # str obj
>>> a
'H\xc3\xa4ll\xc3\xb6' # <-- this should be the value of the str
>>> a.decode('utf-8') # <-- unicode obj
u'H\xe4ll\xf6'
>>> print a.decode('utf-8')
Hällö
>>>
from ConfigParser import SafeConfigParser
import codecs
parser = SafeConfigParser()
with codecs.open('cfg.ini', 'r', encoding='utf-8-sig') as f:
parser.readfp(f)
greet = parser.get('main', 'greet')
print 'greet:', greet.encode('utf-8-sig')
greet: Hällö
cfg.ini file
[main]
greet=Hällö
Is there a build-in function in python 3 to let me get b from a?
a = '\\xe9\\x82\\xa3'
b = b'\xe9\x82\xa3'
You can use unicode-escape encoding:
>>> a = '\\xe9\\x82\\xa3'
>>> a.encode().decode('unicode-escape').encode('latin1')
b'\xe9\x82\xa3'
>>> import codecs
>>> codecs.decode(a, 'unicode-escape').encode('latin1')
b'\xe9\x82\xa3'
how can i print the output of os.urandom(n) in terminal?
I try to generate a SECRET_KEY with fabfile and will output the 24 bytes.
Example how i implement both variants in the python shell:
>>> import os
>>> out = os.urandom(24)
>>> out
'oS\xf8\xf4\xe2\xc8\xda\xe3\x7f\xc75*\x83\xb1\x06\x8c\x85\xa4\xa7piE\xd6I'
>>> print out
oS�������5*������piE�I
If what you want is hex-encoded string, use binascii.a2b_hex (or hexlify):
>>> out = 'oS\xf8\xf4\xe2\xc8\xda\xe3\x7f\xc75*\x83\xb1\x06\x8c\x85\xa4\xa7piE\xd6I'
>>> import binascii
>>> print binascii.hexlify(out)
6f53f8f4e2c8dae37fc7352a83b1068c85a4a7706945d649
To use just built-ins, you can get the integer value with ord and then convert that back to a hex number:
list_of_hex = [str(hex(ord(z)))[2:] for z in out]
print " ".join(list_of_hex)
If you just want the hex list, then the str() and [2:] are unnecessary
The output of this and the hexify() version are both type str and should work fine for the web app.
I have done the following.
from struct import pack, unpack
t = 1234
tt = str(pack("<I", t))
printing tt gives \xf3\xe0\x01\x00. How do I get original value of t back from tt?
I tried using unpacking the repr(tt) but that does not work out. How do I go about doing this?
>>> t=1234
>>> tt=pack('<I', t)
>>> tt
'\xd2\x04\x00\x00'
>>> unpack('<I', tt)
(1234,)
>>> ttt, = unpack('<I', tt)
>>> ttt
1234
you are using the wrong package for serialization. the struct package is only useful for python code which interacts with C code.
for serialization into a string, you should use the pickle module.
import pickle
t = 1234
tt = pickle.dumps(t)
t = pickle.loads(tt)
unpack('<I', tt) will give you (1234,).
repr doesn't work since it adds quotes to the string:
>>> repr('foo')
'"foo"'