I thought this would be simple, but spent quite some time trying to figure it out.
I want to convert an integer into a byte string, and display in hex format. But I seem to get the ascii representation? Specifically, for int value of 122.
from struct import *
pack("B",122) #this returns b'z', what i need is 'b\x7A'
pack("B",255) #this returns b'\xff', which is fine.
I know in python 2.x you can use something like chr() but not in python 3, which is what I have. Ideally the solution would work in both.
You can use codecs or string encoding
codecs.encode(pack("B",122),"hex")
or
a = pack("B",122)
a.encode("hex")
I think you are getting the results you desire, and that whatever you are using to look at your results is causing the confusion. Try running this code:
from struct import *
x = pack("B",122)
assert 123 == x[0] + 1
You will discover that it works as expected and does not assert.
Related
I know this should be easy, but I just can't get the syntax right for python.
My int is not converted correctly. This is the output of my 2 print statements. My output should be 9718 instead of 959918392.
bytearray(b'9718')
959918392
This is my conversion. I don't understand what am I doing wrong.
print(size)
print(int.from_bytes(size, byteorder='big'))
What you tried assumes the number is directly encoded as bytes. You actually want to parse it from ascii, which you can do like this:
int(b'9718'.decode('ascii'))
I need to unpack information in python from a C Structure,
doing it by the following code:
struct.unpack_from('>I', file.read(4))[0]
and afterwards, writing changed values back:
new_value = struct.pack('>I', 008200)
file.write(new_value)
a few examples:
008200 returns an syntaxerror: invalid token.
000010 is written into: 8
000017 is written into: 15
000017 returns a syntaxerror.
I have no idea what kind of conversion that is.
Any kind of help would be great.
This is invalid python code and is not related to the struct module. In python, numbers starting with a zero are octal (base 8). So, python tries to decode 008200 in octal but '8' isn't valid. Assuming you wanted decimal, use 8200. If you wanted hex, use 0x8200.
I have hex code point values for a long string. For a short one, following is fine.
msg = unichr(0x062A) + unichr(0x0627) + unichr(0x0628)
print msg
However, since unichr's alternate api unicode() does exist, i thought there must be a way to pass an entire code point string to it. So far i wasn't able to do it.
Now i have to type in a string of 150 hex values (code points) like the 3 above to generate a complete string. I was hoping to get something like
msg = unicode('0x062A, 0x0627....')
I have to use 'msg' latter. Printing it was a mere example. Any ideas?
Perhaps something like this:
", ".join(unichr(u) for u in (0x062A, 0x0627, 0x0628))
Result:
u'\u062a, \u0627, \u0628'
Edit: This uses str.join.
Hard to tell what you're asking for exactly. Are you looking for u'\u062A\u0627\u0628'? The \u escape lets you enter individual characters by code point in a unicode string.
Following your example:
>>> c = u'\u062A\u0627\u0628'
>>> print c
تاب
This is the bit of Ruby I want to implement in Python:
Base64.urlsafe_encode64([Digest::MD5.hexdigest(url).to_i(16)].pack("N")).sub(/==\n?$/, '')
You see, this helps turn a URL like this:
http://stackoverflow.com/questions/ask
Into a small code like thise:
sUEBtw
The big integer that gets generated in the process is this:
307275247029202263937236026733300351415
I've been able to pack this into binary form using this Python code:
url = 'http://stackoverflow.com/questions/ask'
n = int(hashlib.md5(url).hexdigest(), 16)
s = struct.Struct('d')
values = [n]
packed_data = s.pack(*values)
short_code = base64.urlsafe_b64encode(packed_data)[:-1]
print short_code
The short code I get is this:
zgMM62Hl7Ec
As you can see it's larger than the one I get with Ruby this the packing is using a different format.
You're help will be appreciated.
This does the trick:
import hashlib
import base64
url = 'http://stackoverflow.com/questions/ask'
print base64.urlsafe_b64encode(hashlib.md5(url).digest()[-4:])[:-2]
Output
sUEBtw
.digest() gives the packed bytes of the full 16-byte digest so no need for struct.pack, but it seems Ruby's .pack('N') only converts the last four bytes of the digest.
Ruby pack('N') converts to a network-order (big-endian) 32bit unsigned. python struct('d') converts to an IEEE double precision float. I think you want struct('>I') for the equivalent big endian 32 bit unsigned in python.
So it is clear now that Ruby's pack('N') takes only the lower 4 bytes so following DSM's suggestion I got this code to work:
import hashlib
import base64
url = 'https://stackoverflow.com/questions/ask'
n = int(hashlib.md5(url).hexdigest(), 16)
s = struct.Struct('>I')
values = [n % (2**32)]
packed_data = s.pack(*values)
print base64.urlsafe_b64encode(packed_data)[:-2]
Nonetheless, as explained in Mark Tolonen's answer, hashlib's HASH object's digest() method you get the hash already packed so taking the last four bytes with [-4:] for encoding using Base64's urlsafe_b64encode is good enough.
I have a hex-string made from a unicode string with that function:
def toHex(s):
res = ""
for c in s:
res += "%02X" % ord(c) #at least 2 hex digits, can be more
return res
hex_str = toHex(u"...")
This returns a string like this one:
"80547CFB4EBA5DF15B585728"
That's a sequence of 6 chinese symbols.
But
u"Knödel"
converts to
"4B6EF664656C"
What I need now is a function to convert this back to the original unicode. The chinese symbols seem to have a 2-byte representation while the second example has 1-byte representations for all characters. So I can't just use unichr() for each 1- or 2-byte block.
I've already tried
binascii.unhexlify(hex_str)
but this seems to convert byte-by-byte and returns a string, not unicode. I've also tried
binascii.unhexlify(hex_str).decode(...)
with different formats. Never got the original unicode string.
Thank you a lot in advance!
This seems to work just fine:
binascii.unhexlify(binascii.hexlify(u"Knödel".encode('utf-8'))).decode('utf-8')
Comes back to the original object. You can do the same for the chinese text if it's encoded properly, however ord(x) already destroys the text you started from. You'll need to encode it first and only then treat like a string of bytes.
Can't be done. Using %02X loses too much information. You should be using something like UTF-8 first and converting that, instead of inventing a broken encoding.
>>> u"Knödel".encode('utf-8').encode('hex')
'4b6ec3b664656c'
When I was working with Unicode in a VB app a while ago the first 1 or 2 digits would be removed if they were a "0". Meaning "&H00A2" would automatically be converted to "&HA2", I just created a small function to check the length of the string and if it was less than 4 chars add the missing 0's. I'm not sure if this is what's happening to you, but I thought I would give bit of information as something to be aware of.