Unable to understand the string representation

Unable to understand the string representation - python

Trying to understand existing code, i saw these 2 lines
key = ']\n\x80\xd8\x13J\xe1\x96w\x82Kg\x1e\x83\x8a\xf4'
(the above value in hexadecimal = 5d0a80d8 134ae196 77824b67 1e838af4
data = "p\xde\xdf-\xc4,\\\xbd:\x96\xf8\xa0\xb1\x14\x18\xb3`\x8dW3`J,\xd3j\xab\xc7\x0c\xe3\x19;\xb5\x15;\xe2\xd3\xc0m\xfd\xb2\xd1n\x9c5qX\xbejA\xd6\xb8a\xe4\x91\xdb?\xbf\xebQ\x8e\xfc\xf0H\xd7\xd5\x89Ss\x0f\xf3\x0c\x9e\xc4p\xff\xcdf=\xc3B\x01\xc3j\xdd\xc0\x11\x1c5\xb3\x8a\xfe\xe7\xcf\xdbX.71\xf8\xb4\xba\xa8\xd1\xa8\x9c\x06\xe8\x11\x99\xa9qb'\xbe4N\xfc\xb46\xdd\xd0\xf0\x96\xc0d\xc3\xb5\xe2\xc3\x99\x99?\xc7s\x94\xf9\xe0\x97 \xa8\x11\x85\x0e\xf2;.\xe0]\x9eas`\x9d\x86\xe1\xc0\xc1\x8e\xa5\x1a\x01*\x00\xbbA;\x9c\xb8\x18\x8ap<\xd6\xba\xe3\x1c\xc6{4\xb1\xb0\x00\x19\xe6\xa2\xb2\xa6\x90\xf0&q\xfe|\x9e\xf8\xde\xc0\tNS7cG\x8dX\xd2\xc5\xf5\xb8'\xa0\x14\x8cYH\xa9i1\xac\xf8OFZd\xe6,\xe7#\x07\xe9\x91\xe3~\xa8#\xfa\x0f\xb2\x19#\xb7\x99\x05\xb73\xb61\xe6\xc7\xd6\x86\n81\xac5\x1a\x9cs\x0cR\xffr\xd9\xd3\x08\xee\xdb\xab!\xfd\xe1C\xa0\xea\x17\xe2>\xdc\x1ft\xcb\xb3c\x8a 3\xaa\xa1Td\xea\xa738]\xbb\xebo\xd75\t\xb8W\xe6\xa4\x19\xdc\xa1\xd8\x90z\xf9w\xfb\xacM\xfa5\xec"
similarly the start of data is -
7827fab2 2c000000 70dedf2d c42c5cbd 3a96f8a0 b11418b3 608d5733 604a2cd3
what is it representing , from the logic both of these should be sort byte string,
what I want to do is give value of key
key = 0a8b6bd8 d9b08b08 d64e32d1 817777fb
The value are hexa decimal (128 bit )

You can use the binascii module to move back and forth between the formats.
>>> from binascii import hexlify, unhexlify
>>>
>>> key = ']\n\x80\xd8\x13J\xe1\x96w\x82Kg\x1e\x83\x8a\xf4'
>>> hexlify(key)
'5d0a80d8134ae19677824b671e838af4'
>>> unhexlify(hexlify(key))
']\n\x80\xd8\x13J\xe1\x96w\x82Kg\x1e\x83\x8a\xf4'

Assuming you are asking "how do I convert this string of bytes '0a8b6bd8 d9b08b08 d64e32d1 817777fb' into a python string", here's the answer:
'0a8b6bd8 d9b08b08 d64e32d1 817777fb'.replace(" ", "").decode("hex")
Which will give you:
'\n\x8bk\xd8\xd9\xb0\x8b\x08\xd6N2\xd1\x81ww\xfb'

Related

Decode emoji into two (or more) code points, using standard libraries

I'd like to be able to decode an emoji into its corresponding code points as seen here. I'm limited to using standard libraries in 2.7.
For example:
🇲🇩 -> U+1F1F2 U+1F1E9
I've managed to get the first code point using this code, but I can't figure out how to pull the second. Some emoji have even more code points.
to_decode = u'🇲🇩'
code = ord(to_decode[0])
if 0xd800 <= code <= 0xdbff:
code = (code - 0xd800) * 1024 + (ord(to_decode[1]) - 0xdc00) + + 0x010000
print(hex(code))

A combination of encode and struct.unpack can give you what you need.
>>> import struct
>>> b = to_decode.encode('utf_32_le')
>>> count = len(b) // 4
>>> count
2
>>> cp = struct.unpack('<%dI' % count, b)
>>> [hex(x) for x in cp]
['0x1f1f2', '0x1f1e9']

This is sort of an hack, but you can use the repr of the unicode string:
>>> repr(to_decode)
"u'\\U0001f1f2\\U0001f1e9'"
so:
>>> hex(int(repr(to_decode)[4:12], 16))
'0x1f1f2'
and
>>> hex(int(repr(to_decode)[14:22], 16))
'0x1f1e9'
You must extend this method to support emojis with more than two code points. You may consider using a combination of the above with .split("\\U").

For this problem, you actually need list() which will break a Unicode character into its constituent code points
to_decode = u'🇲🇩'
list(to_decode)
['🇲', '🇩']
As an example of what you can do with this, I created a unicode visualization of the Bengali Alphabet
https://www.kaggle.com/jamesmcguigan/unicode-visualization-of-the-bengali-alphabet

Converting OID of public key,etc in HEX data to the dot format

Hello guys,
I have CV1 RSA certificates that are slightly modified.So, I dont want to use asn1wrap to parse a 'der' file as it makes it too complex sometimes,instead as tags are already fixed for a CV1 certificate i can parse the HEX data of this 'der' file by converting the binary data to hex and extracting the required range of data.
However for representation i want the OID to be in the dot format
eg : ABSOLUTE OID 1.3.6.33.4.11.5318.2888.18.10377.5
i can extract the hex string for this from the whole hex data as :
'060D2B0621040BA946964812D10905'
any python3 function that can directly do this conversion. Or can anyone help out with the logic to convert the same.

Found an answer for anyone who's interested. Without using the pyasn1 or asn1crypto i did not find any package to convert the hexadecimal value to OID notation.
So i browsed around and made a mix of code from other languages and created one in python.
def notation_OID(oidhex_string):
''' Input is a hex string and as one byte is 2 charecters i take an
empty list and insert 2 characters per element of the list.
So for a string 'DEADBEEF' it would be ['DE','AD','BE,'EF']. '''
hex_list = []
for char in range(0,len(oidhex_string),2):
hex_list.append(oidhex_string[char]+oidhex_string[char+1])
''' I have deleted the first two element of the list as my hex string
includes the standard OID tag '06' and the OID length '0D'.
These values are not required for the calculation as i've used
absolute OID and not using any ASN.1 modules. Can be removed if you
have only the data part of the OID in hex string. '''
del hex_list[0]
del hex_list[0]
# An empty string to append the value of the OID in standard notation after
# processing each element of the list.
OID_str = ''
# Convert the list with hex data in str format to int format for
# calculations.
for element in range(len(hex_list)):
hex_list[element] = int(hex_list[element],16)
# Convert the OID to its standard notation. Sourced from code in other
# languages and adapted for python.
# The first two digits of the OID are calculated differently from the rest.
x = int(hex_list[0] / 40)
y = int(hex_list[0] % 40)
if x > 2:
y += (x-2)*40
x = 2;
OID_str += str(x)+'.'+str(y)
val = 0
for byte in range(1,len(hex_list)):
val = ((val<<7) | ((hex_list[byte] & 0x7F)))
if (hex_list[byte] & 0x80) != 0x80:
OID_str += "."+str(val)
val = 0
# print the OID in dot notation.
print (OID_str)
notation_OID('060D2B0621040BA946964812D10905')
Hope this helps... cHEErs !

How to byteswap 32bit integers inside a string in python?

I have a large string more than 256 bits and and I need to byte swap it by 32 bits. But the string is in a hexadecimal base. When I looked at numpy and array modules I couldnt find the right syntax as to how to do the coversion. Could someone please help me?
An example:(thought the data is much longer.I can use pack but then I would have to convert the little endian to decimal and then to big endian first which seems like a waste):
Input:12345678abcdeafa
Output:78563412faeacdab

Convert the string to bytes, unpack big-endian 32-bit and pack little-endian 32-bit (or vice versa) and convert back to a string:
#!python3
import binascii
import struct
Input = b'12345678abcdeafa'
Output = b'78563412faeacdab'
def convert(s):
s = binascii.unhexlify(s)
a,b = struct.unpack('>LL',s)
s = struct.pack('<LL',a,b)
return binascii.hexlify(s)
print(convert(Input),Output)
Output:
b'78563412faeacdab' b'78563412faeacdab'
Generalized for any string with length multiple of 4:
import binascii
import struct
Input = b'12345678abcdeafa'
Output = b'78563412faeacdab'
def convert(s):
if len(s) % 4 != 0:
raise ValueError('string length not multiple of 4')
s = binascii.unhexlify(s)
f = '{}L'.format(len(s)//4)
dw = struct.unpack('>'+f,s)
s = struct.pack('<'+f,*dw)
return binascii.hexlify(s)
print(convert(Input),Output)

If they really are strings, just do string operations on them?
>>> input = "12345678abcdeafa"
>>> input[7::-1]+input[:7:-1]
'87654321afaedcba'

My take:
slice the string in N digit chunks
reverse each chunk
concatenate everything
Example:
>>> source = '12345678abcdeafa87654321afaedcba'
>>> # small helper to slice the input in 8 digit chunks
>>> chunks = lambda iterable, sz: [iterable[i:i+sz]
for i in range(0, len(iterable), sz)]
>>> swap = lambda source, sz: ''.join([chunk[::-1]
for chunk in chunks(source, sz)])
Output asked in the original question:
>>> swap(source, 8)
'87654321afaedcba12345678abcdeafa'
It is easy to adapt in order to match the required output after icktoofay edit:
>>> swap(swap(source, 8), 2)
'78563412faeacdab21436587badcaeaf'
A proper implementation probably should check if len(source) % 8 == 0.

How to encode a long in Base64 in Python?

In Java, I can encode a BigInteger as:
java.math.BigInteger bi = new java.math.BigInteger("65537L");
String encoded = Base64.encodeBytes(bi.toByteArray(), Base64.ENCODE|Base64.DONT_GUNZIP);
// result: 65537L encodes as "AQAB" in Base64
byte[] decoded = Base64.decode(encoded, Base64.DECODE|Base64.DONT_GUNZIP);
java.math.BigInteger back = new java.math.BigInteger(decoded);
In C#:
System.Numerics.BigInteger bi = new System.Numerics.BigInteger("65537L");
string encoded = Convert.ToBase64(bi);
byte[] decoded = Convert.FromBase64String(encoded);
System.Numerics.BigInteger back = new System.Numerics.BigInteger(decoded);
How can I encode long integers in Python as Base64-encoded strings? What I've tried so far produces results different from implementations in other languages (so far I've tried in Java and C#), particularly it produces longer-length Base64-encoded strings.
import struct
encoded = struct.pack('I', (1<<16)+1).encode('base64')[:-1]
# produces a longer string, 'AQABAA==' instead of the expected 'AQAB'
When using this Python code to produce a Base64-encoded string, the resulting decoded integer in Java (for example) produces instead 16777472 in place of the expected 65537. Firstly, what am I missing?
Secondly, I have to figure out by hand what is the length format to use in struct.pack; and if I'm trying to encode a long number (greater than (1<<64)-1) the 'Q' format specification is too short to hold the representation. Does that mean that I have to do the representation by hand, or is there an undocumented format specifier for the struct.pack function? (I'm not compelled to use struct, but at first glance it seemed to do what I needed.)

Check out this page on converting integer to base64.
import base64
import struct
def encode(n):
data = struct.pack('<Q', n).rstrip('\x00')
if len(data)==0:
data = '\x00'
s = base64.urlsafe_b64encode(data).rstrip('=')
return s
def decode(s):
data = base64.urlsafe_b64decode(s + '==')
n = struct.unpack('<Q', data + '\x00'* (8-len(data)) )
return n[0]

The struct module:
… performs conversions between Python values and C structs represented as Python strings.
Because C doesn't have infinite-length integers, there's no functionality for packing them.
But it's very easy to write yourself. For example:
def pack_bigint(i):
b = bytearray()
while i:
b.append(i & 0xFF)
i >>= 8
return b
Or:
def pack_bigint(i):
bl = (i.bit_length() + 7) // 8
fmt = '<{}B'.format(bl)
# ...
And so on.
And of course you'll want an unpack function, like jbatista's from the comments:
def unpack_bigint(b):
b = bytearray(b) # in case you're passing in a bytes/str
return sum((1 << (bi*8)) * bb for (bi, bb) in enumerate(b))

This is a bit late, but I figured I'd throw my hat in the ring:
def inttob64(n):
"""
Given an integer returns the base64 encoded version of it (no trailing ==)
"""
parts = []
while n:
parts.insert(0,n & limit)
n >>= 32
data = struct.pack('>' + 'L'*len(parts),*parts)
s = base64.urlsafe_b64encode(data).rstrip('=')
return s
def b64toint(s):
"""
Given a string with a base64 encoded value, return the integer representation
of it
"""
data = base64.urlsafe_b64decode(s + '==')
n = 0
while data:
n <<= 32
(toor,) = struct.unpack('>L',data[:4])
n |= toor & 0xffffffff
data = data[4:]
return n
These functions turn an arbitrary-sized long number to/from a big-endian base64 representation.

Here is something that may help. Instead of using struct.pack() I am building a string of bytes to encode and then calling the BASE64 encode on that. I didn't write the decode, but clearly the decode can recover an identical string of bytes and a loop could recover the original value. I don't know if you need fixed-size integers (like always 128-bit) and I don't know if you need Big Endian so I left the decoder for you.
Also, encode64() and decode64() are from #msc's answer, but modified to work.
import base64
import struct
def encode64(n):
data = struct.pack('<Q', n).rstrip('\x00')
if len(data)==0:
data = '\x00'
s = base64.urlsafe_b64encode(data).rstrip('=')
return s
def decode64(s):
data = base64.urlsafe_b64decode(s + '==')
n = struct.unpack('<Q', data + '\x00'* (8-len(data)) )
return n[0]
def encode(n, big_endian=False):
lst = []
while True:
n, lsb = divmod(n, 0x100)
lst.append(chr(lsb))
if not n:
break
if big_endian:
# I have not tested Big Endian mode, and it may need to have
# some initial zero bytes prepended; like, if the integer is
# supposed to be a 128-bit integer, and you encode a 1, you
# would need this to have 15 leading zero bytes.
initial_zero_bytes = '\x00' * 2
data = initial_zero_bytes + ''.join(reversed(lst))
else:
data = ''.join(lst)
s = base64.urlsafe_b64encode(data).rstrip('=')
return s
print encode(1234567890098765432112345678900987654321)

reorder byte order in hex string (python)

I want to build a small formatter in python giving me back the numeric
values embedded in lines of hex strings.
It is a central part of my formatter and should be reasonable fast to
format more than 100 lines/sec (each line about ~100 chars).
The code below should give an example where I'm currently blocked.
'data_string_in_orig' shows the given input format. It has to be
byte swapped for each word. The swap from 'data_string_in_orig' to
'data_string_in_swapped' is needed. In the end I need the structure
access as shown. The expected result is within the comment.
Thanks in advance
Wolfgang R
#!/usr/bin/python
import binascii
import struct
## 'uint32 double'
data_string_in_orig = 'b62e000052e366667a66408d'
data_string_in_swapped = '2eb60000e3526666667a8d40'
print data_string_in_orig
packed_data = binascii.unhexlify(data_string_in_swapped)
s = struct.Struct('<Id')
unpacked_data = s.unpack_from(packed_data, 0)
print 'Unpacked Values:', unpacked_data
## Unpacked Values: (46638, 943.29999999943209)
exit(0)

array.arrays have a byteswap method:
import binascii
import struct
import array
x = binascii.unhexlify('b62e000052e366667a66408d')
y = array.array('h', x)
y.byteswap()
s = struct.Struct('<Id')
print(s.unpack_from(y))
# (46638, 943.2999999994321)
The h in array.array('h', x) was chosen because it tells array.array to regard the data in x as an array of 2-byte shorts. The important thing is that each item be regarded as being 2-bytes long. H, which signifies 2-byte unsigned short, works just as well.

This should do exactly what unutbu's version does, but might be slightly easier to follow for some...
from binascii import unhexlify
from struct import pack, unpack
orig = unhexlify('b62e000052e366667a66408d')
swapped = pack('<6h', *unpack('>6h', orig))
print unpack('<Id', swapped)
# (46638, 943.2999999994321)
Basically, unpack 6 shorts big-endian, repack as 6 shorts little-endian.
Again, same thing that unutbu's code does, and you should use his.
edit Just realized I get to use my favorite Python idiom for this... Don't do this either:
orig = 'b62e000052e366667a66408d'
swap =''.join(sum([(c,d,a,b) for a,b,c,d in zip(*[iter(orig)]*4)], ()))
# '2eb60000e3526666667a8d40'

The swap from 'data_string_in_orig' to 'data_string_in_swapped' may also be done with comprehensions without using any imports:
>>> d = 'b62e000052e366667a66408d'
>>> "".join([m[2:4]+m[0:2] for m in [d[i:i+4] for i in range(0,len(d),4)]])
'2eb60000e3526666667a8d40'
The comprehension works for swapping byte order in hex strings representing 16-bit words. Modifying it for a different word-length is trivial. We can make a general hex digit order swap function also:
def swap_order(d, wsz=4, gsz=2 ):
return "".join(["".join([m[i:i+gsz] for i in range(wsz-gsz,-gsz,-gsz)]) for m in [d[i:i+wsz] for i in range(0,len(d),wsz)]])
The input params are:
d : the input hex string
wsz: the word-size in nibbles (e.g for 16-bit words wsz=4, for 32-bit words wsz=8)
gsz: the number of nibbles which stay together (e.g for reordering bytes gsz=2, for reordering 16-bit words gsz = 4)

import binascii, tkinter, array
from tkinter import *
infile_read = filedialog.askopenfilename()
with open(infile, 'rb') as infile_:
infile_read = infile_.read()
x = (infile_read)
y = array.array('l', x)
y.byteswap()
swapped = (binascii.hexlify(y))
This is a 32 bit unsigned short swap i achieved with code very much the same as "unutbu's" answer just a little bit easier to understand. And technically binascii is not needed for the swap. Only array.byteswap is needed.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to understand the string representation - python

Assuming you are asking "how do I convert this string of bytes '0a8b6bd8 d9b08b08 d64e32d1 817777fb' into a python string", here's the answer: '0a8b6bd8 d9b08b08 d64e32d1 817777fb'.replace(" ", "").decode("hex") Which will give you: '\n\x8bk\xd8\xd9\xb0\x8b\x08\xd6N2\xd1\x81ww\xfb'

Related

Decode emoji into two (or more) code points, using standard libraries

Converting OID of public key,etc in HEX data to the dot format

How to byteswap 32bit integers inside a string in python?

How to encode a long in Base64 in Python?

reorder byte order in hex string (python)

Categories

Resources