Define a BitStruct inside a BitStruct in python using construct package

Define a BitStruct inside a BitStruct in python using construct package - python

I am trying to read the following format (Intel -> Little-Endian):
X: 0 -> 31, size 32 bits
Offset: 32 -> 43, size 12 bits
Index: 44 -> 47, size 4 bits
Time: 48 -> 55, size 8 bits
Radius: 56 -> 63, size 8 bits
For this parser I defined:
from construct import Bitwise, BitStruct, BitsInteger
from construct import Int32sl, Int8ul
BitStruct( "X" / Bytewise(Int32sl),
"Offset" / BitsInteger(12),
"Index" / BitsInteger(4),
"Time" / Bytewise(Int8ul),
"Radius" / Bytewise(Int8ul),
)
from the folloing bytes:
bytearray(b'\xca\x11\x01\x00\x00\x07\xffu')
What I get is:
Container:
X = 70090
Offset= 0
Index = 7
Time = 255
Radius = 117
What I should have gotten is:
Container:
X = 70090
Offset = 1792
Index = 0
Time = 255
Radius= 117
As you can see, the values of Offset and Index that I get do not match with the expected values, the rest is correct.
From what I saw, i need to swap the two byes, which contains the Offset and Index values.
How could I define a struct inside a struct and swap the two bytes as well?

BitsInteger treats as Big Endian by default.
From the documentation on BitsInteger
Note that little-endianness is only defined for multiples of 8 bits.
You must set default parameter swapped to True.
swapped – bool, whether to swap byte order (little endian), default is False (big endian)
As such :
BitStruct( "X" / Bytewise(Int32sl),
"Offset" / BitsInteger(12, swapped=True),
"Index" / BitsInteger(4, swapped=True),
"Time" / Bytewise(Int8ul),
"Radius" / Bytewise(Int8ul),
)
BUT you are not using multiples of 8 so you should just swap around the initial byte array and be done with it.

By swapping the bytes in bytearray and order of the variables in the BitStruct I am able to get the correct values.
from construct import Bitwise, BitStruct, BitsInteger, Bytewise
from construct import Int32sb, Int8ul
data = bytearray(b'\xca\x11\x01\x00\x00\x07\xffu')
data_reverse = data[::-1]
format = BitStruct( "Radius" / Bytewise(Int8ul),
"Time" / Bytewise(Int8ul),
"Index" / BitsInteger(4),
"Offset" / BitsInteger(12),
"X" / Bytewise(Int32sb),
)
print(format.parse(data_reverse))
return:
Container:
Radius = 117
Time = 255
Index = 0
Offset = 1792
X = 70090
If someone have a better solution I would be more then happy to hear.

Related

Read binary file: Matlab differs from R and python

I am trying to recreate a Matlab script that reads a binary file in either R or python. Here is a link to a the test.bin data: https://github.com/AndrewHFarkas/BinaryReadTest
Matlab
FilePath = 'test.bin'
fid=fopen(FilePath,'r','b');
VersionStr=fread(fid,7,'char')
Version=fread(fid,1,'int16')
SizeFormat='float32'
DataFormat='float32'
EegMegStatus=fread(fid,1,SizeFormat)
NChanExtra=fread(fid,1,SizeFormat)
TrigPoint=fread(fid,1,'float32')
DataTypeVal=fread(fid,1,'float32')
TmpSize=fread(fid,2,SizeFormat)
AvgMat=fread(fid,1,DataFormat)
Matlab output:
VersionStr =
86
101
114
115
105
111
110
Version =
8
SizeFormat =
'float32'
DataFormat =
'float32'
EegMegStatus =
1
NChanExtra =
0
TrigPoint =
1
DataTypeVal =
0
TmpSize =
65
1076
AvgMat =
-12.9650
This is my closest attempt with python (I found some of this code from a different stackoverflow answer:
import numpy as np
import array
def fread(fid, nelements, dtype):
if dtype is np.str:
dt = np.uint8 # WARNING: assuming 8-bit ASCII for np.str!
else:
dt = dtype
data_array = np.fromfile(fid, dt, nelements)
data_array.shape = (nelements, 1)
return data_array
fid = open('test.bin', 'rb');
print(fread(fid, 7, np.str)) # so far so good!
[[ 86]
[101]
[114]
[115]
[105]
[111]
[110]]
#both of these options return 2048
print(fread(fid, 1, np.int16))
np.fromfile(fid, np.int16, 1)
And no matter what else I've tried I can't get any of the same numbers past that point. I have tried using little and big endian settings, but maybe not correctly.
If it helps, here is my closest attempt in R:
newdata = file("test.bin", "rb")
version_char = readBin(newdata, "character", n=1)
version_char
[1] "Version" # this makes sense because the first 7 bytes to spell Version
version_num = readBin(newdata, "int", size = 1 , n = 1, endian = "little")
version_num
[1] 8 #correct number
And nothing after that matches. This is were I get really confused because I was only able to get 8 with a (byte) size = 1 for the version_num, but an int16 should be two bytes as far as I understand. I have tried this code below to read in a float as suggested in another post:
readBin(newdata, "double", size = 4 , n = 1, endian = "little")
Thank you all for your time

struct is in general what you would use to unpack binary data
>>> keys = "VStr vNum EegMeg nChan trigPoint dType tmpSize[0] tmpSize[1] avgMat".split()
>>> dict(zip(keys,struct.unpack_from(">7sh7f",open("test.bin","rb").read())))
{'nChan': 0.0, 'EegMeg': 1.0, 'dType': 0.0, 'trigPoint': 1.0, 'VStr': 'Version', 'avgMat': -12.964995384216309, 'vNum': 8, 'tmpSize[0]': 65.0, 'tmpSize[1]': 1076.0}
the unpack string says unpack a string of 7 characters, followed by a short int followed by 7 float32

How to generate a time-ordered uid in Python?

Is this possible? I've heard Cassandra has something similar : https://datastax.github.io/python-driver/api/cassandra/util.html
I have been using a ISO timestamp concatenated with a uuid4, but that ended up way too large (58 characters) and probably overkill.
Keeping a sequential number doesn't work in my context (DynamoDB NoSQL)
Worth noticing that for my application it doesn't matter if items created in batch/same second are in a random order, as long as the uid don't collapse.
I have no specific restriction on maximum length, ideally I would like to see the different collision chance for different lengths, but it needs to be smaller than 58 (my original attempt)
This is to use with DynamoDB(NoSQL Database) as Sort-key

Why uuid.uuid1 is not sequential
uuid.uuid1(node=None, clock_seq=None) is effectively:
60 bits of timestamp (representing number of 100-ns intervals after 1582-10-15 00:00:00)
14 bits of "clock sequence"
48 bits of "Node info" (generated from network card's mac-address or from hostname or from RNG).
If you don't provide any arguments, then System function is called to generate uuid. In that case:
It's unclear if "clock sequence" is sequential or random.
It's unclear if it's safe to be used in multiple processes (can clock_seq be repeated in different processes or not?). In Python 3.7 this info is now available.
If you provide clock_seq or node, then "pure python implementation is used". IN this case even with "fixed value" for clock_seq:
timestamp part is guaranteed to be sequential for all the calls in current process even in threaded execution.
clock_seq part is randomly generated. But that is not critical annymore because timestamp is sequential and unique.
It's NOT safe for multiple processes (processes that call uuid1 with the same clock_seq, node might return conflicting values if called during the "same 100-ns time interval")
Solution that reuses uuid.uuid1
It's easy to see, that you can make uuid1 sequential by providing clock_seq or node arguments (to use python implementation).
import time
from uuid import uuid1, getnode
_my_clock_seq = getrandbits(14)
_my_node = getnode()
def sequential_uuid(node=None):
return uuid1(node=node, clock_seq=_my_clock_seq)
# .hex attribute of this value is 32-characters long string
def alt_sequential_uuid(clock_seq=None):
return uuid1(node=_my_node, clock_seq=clock_seq)
if __name__ == '__main__':
from itertools import count
old_n = uuid1() # "Native"
old_s = sequential_uuid() # Sequential
native_conflict_index = None
t_0 = time.time()
for x in count():
new_n = uuid1()
new_s = sequential_uuid()
if old_n > new_n and not native_conflict_index:
native_conflict_index = x
if old_s >= new_s:
print("OOops: non-sequential results for `sequential_uuid()`")
break
if (x >= 10*0x3fff and time.time() - t_0 > 30) or (native_conflict_index and x > 2*native_conflict_index):
print('No issues for `sequential_uuid()`')
break
old_n = new_n
old_s = new_s
print(f'Conflicts for `uuid.uuid1()`: {bool(native_conflict_index)}')
Multiple processes issues
BUT if you are running some parallel processes on the same machine, then:
node which defaults to uuid.get_node() will be the same for all the processes;
clock_seq has small chance to be the same for some processes (chance of 1/16384)
That might lead to conflicts! That is general concern for using
uuid.uuid1 in parallel processes on the same machine unless you have access to SafeUUID from Python3.7.
If you make sure to also set node to unique value for each parallel process that runs this code, then conflicts should not happen.
Even if you are using SafeUUID, and set unique node, it's still possible to have non-sequential (but unique) ids if they are generated in different processes.
If some lock-related overhead is acceptable, then you can store clock_seq in some external atomic storage (for example in "locked" file) and increment it with each call: this allows to have same value for node on all parallel processes and also will make id-s sequential. For cases when all parallel processes are subprocesses created using multiprocessing: clock_seq can be "shared" using multiprocessing.Value
As a result you always have to remember:
If you are running multiple processes on the same machine, then you must:
Ensure uniqueness of node. The problem for this solution: you can't be sure to have sequential ids from different processes generated during the same 100-ns interval. But this is very "light" operation executed once on process startup and achieved by: "adding" something to default node, e.g. int(time.time()*1e9) - 0x118494406d1cc000, or by adding some counter from machine-level atomic db.
Ensure "machine-level atomic clock_seq" and the same node for all processes on one machine. That way you'll have some overhead for "locking" clock_seq, but id-s are guaranteed to be sequential even if generated in different processes during the same 100-ns interval (unless you are calling uuid from several threads in the same process).
For processes on different machines:
either you have to use some "global counter service";
or it's not possible to have sequential ids generated on different machines during the same 100-ns interval.
Reducing size of the id
General approach to generate UUIDs is quite simple, so it's easy to implement something similar from scratch, and for example use less bits for node_info part:
import time
from random import getrandbits
_my_clock_seq = getrandbits(14)
_last_timestamp_part = 0
_used_clock_seq = 0
timestamp_multiplier = 1e7 # I'd recommend to use this value
# Next values are enough up to year 2116:
if timestamp_multiplier == 1e9:
time_bits = 62 # Up to year 2116, also reduces chances for non-sequential id-s generated in different processes
elif timestamp_multiplier == 1e8:
time_bits = 60 # up to year 2335
elif timestamp_multiplier == 1e7:
time_bits = 56 # Up to year 2198.
else:
raise ValueError('Please calculate and set time_bits')
time_mask = 2**time_bits - 1
seq_bits = 16
seq_mask = 2**seq_bits - 1
node_bits = 12
node_mask = 2**node_bits - 1
max_hex_len = len(hex(2**(node_bits+seq_bits+time_bits) - 1)) - 2 # 21
_default_node_number = getrandbits(node_bits) # or `uuid.getnode() & node_mask`
def sequential_uuid(node_number=None):
"""Return 21-characters long hex string that is sequential and unique for each call in current process.
Results from different processes may "overlap" but are guaranteed to
be unique if `node_number` is different in each process.
"""
global _my_clock_seq
global _last_timestamp_part
global _used_clock_seq
if node_number is None:
node_number = _default_node_number
if not 0 <= node_number <= node_mask:
raise ValueError("Node number out of range")
timestamp_part = int(time.time() * timestamp_multiplier) & time_mask
_my_clock_seq = (_my_clock_seq + 1) & seq_mask
if _last_timestamp_part >= timestamp_part:
timestamp_part = _last_timestamp_part
if _used_clock_seq == _my_clock_seq:
timestamp_part = (timestamp_part + 1) & time_mask
else:
_used_clock_seq = _my_clock_seq
_last_timestamp_part = timestamp_part
return hex(
(timestamp_part << (node_bits+seq_bits))
|
(_my_clock_seq << (node_bits))
|
node_number
)[2:]
Notes:
Maybe it's better to simply store integer value (not hex-string) in the database
If you are storing it as text/char, then its better to convert integer to base64-string instead of converting it to hex-string. That way it will be shorter (21 chars hex-string → 16 chars b64-encoded string):
from base64 import b64encode
total_bits = time_bits+seq_bits+node_bits
total_bytes = total_bits // 8 + 1 * bool(total_bits % 8)
def int_to_b64(int_value):
return b64encode(int_value.to_bytes(total_bytes, 'big'))
Collision chances
Single process: collisions not possible
Multiple processes with manually set unique clock_seq or unique node in each process: collisions not possible
Multiple processes with randomly set node (48-bits, "fixed" in time):
Chance to have the node collision in several processes:
in 2 processes out of 10000: ~0.000018%
in 2 processes out of 100000: 0.0018%
Chance to have single collision of the id per second in 2 processes with the "colliding" node:
for "timestamp" interval of 100-ns (default for uuid.uuid1 , and in my code when timestamp_multiplier == 1e7): proportional to 3.72e-19 * avg_call_frequency²
for "timestamp" interval of 10-ns (timestamp_multiplier == 1e8): proportional to 3.72e-21 * avg_call_frequency²

In the article you've linked too, the cassandra.util.uuid_from_time(time_arg, node=None, clock_seq=None)[source] seems to be exactly what you're looking for.
def uuid_from_time(time_arg, node=None, clock_seq=None):
"""
Converts a datetime or timestamp to a type 1 :class:`uuid.UUID`.
:param time_arg:
The time to use for the timestamp portion of the UUID.
This can either be a :class:`datetime` object or a timestamp
in seconds (as returned from :meth:`time.time()`).
:type datetime: :class:`datetime` or timestamp
:param node:
None integer for the UUID (up to 48 bits). If not specified, this
field is randomized.
:type node: long
:param clock_seq:
Clock sequence field for the UUID (up to 14 bits). If not specified,
a random sequence is generated.
:type clock_seq: int
:rtype: :class:`uuid.UUID`
"""
if hasattr(time_arg, 'utctimetuple'):
seconds = int(calendar.timegm(time_arg.utctimetuple()))
microseconds = (seconds * 1e6) + time_arg.time().microsecond
else:
microseconds = int(time_arg * 1e6)
# 0x01b21dd213814000 is the number of 100-ns intervals between the
# UUID epoch 1582-10-15 00:00:00 and the Unix epoch 1970-01-01 00:00:00.
intervals = int(microseconds * 10) + 0x01b21dd213814000
time_low = intervals & 0xffffffff
time_mid = (intervals >> 32) & 0xffff
time_hi_version = (intervals >> 48) & 0x0fff
if clock_seq is None:
clock_seq = random.getrandbits(14)
else:
if clock_seq > 0x3fff:
raise ValueError('clock_seq is out of range (need a 14-bit value)')
clock_seq_low = clock_seq & 0xff
clock_seq_hi_variant = 0x80 | ((clock_seq >> 8) & 0x3f)
if node is None:
node = random.getrandbits(48)
return uuid.UUID(fields=(time_low, time_mid, time_hi_version,
clock_seq_hi_variant, clock_seq_low, node), version=1)
There's nothing Cassandra specific to a Type 1 UUID...

You should be able to encode a timestamp precise to the second for a time range of 135 years in 32 bits. That will only take 8 characters to represent in hex. Added to the hex representation of the uuid (32 hex characters) that will amount to only 40 hex characters.
Encoding the time stamp that way requires that you pick a base year (e.g. 2000) and compute the number of days up to the current date (time stamp). Multiply this number of days by 86400, then add the seconds since midnight. This will give you values that are less than 2^32 until you reach year 2135.
Note that you have to keep leading zeroes in the hex encoded form of the timestamp prefix in order for alphanumeric sorting to preserve the chronology.
With a few bits more in the time stamp, you could increase the time range and/or the precision. With 8 more bits (two hex characters), you could go up to 270 years with a precision to the hundredth of a second.
Note that you don't have to model the fraction of seconds in a base 10 range. You will get optimal bit usage by breaking it down in 128ths instead of 100ths for the same number of characters. With the doubling of the year range, this still fits within 8 bits (2 hex characters)
The collision probability, within the time precision (i.e. per second or per 100th or 128th of a second) is driven by the range of the uuid so it will be 1 in 2^128 for the chosen precision. Increasing the precision of the time stamp has the most impact on reducing the collision chances. It is also the factor that has the lowest impact on total size of the key.
More efficient character encoding: 27 to 29 character keys
You could significantly reduce the size of the key by encoding it in base 64 instead of 16 which would give you 27 to 29 characters (depending on you choice of precision)
Note that, for the timestamp part, you need to use an encoding function that takes an integer as input and that preserves the collating sequence of digit characters.
For example:
def encode64(number, size):
chars = "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
result = list()
for _ in range(size):
result.append(chars[number%64])
number //= 64
return "".join(reversed(result))
a = encode64(1234567890,6) # '-7ZU9G'
b = encode64(9876543210,6) # '7Ag-Pe'
print(a < b) # True
u = encode64(int(uuid.uuid4()),22) # '1QA2LtMg30ztnugxaokVMk'
key = a+u # '-7ZU9G1QA2LtMg30ztnugxaokVMk' (28 characters)
You can save some more characters by combining the time stamp and uuid into a single number before encoding instead of concatenating the two encoded values.
The encode64() function needs one character every 6 bits.
So, for 135 years with precision to the second: (32+128)/6 = 26.7 --> 27 characters
instead of (32/6 = 5.3 --> 6) + (128/6 = 21.3 --> 22) ==> 28 characters
uid = uuid.uuid4()
timeStamp = daysSince2000 * 86400 + int(secondsSinceMidnight)
key = encode64( timeStamp<<128 | int(uid) ,27)
with a 270 year span and 128th of a second precision: (40+128)/6 = 28 characters
uid = uuid.uuid4()
timeStamp = daysSince2000 * 86400 + int(secondsSinceMidnight)
precision = 128
timeStamp = timeStamp * precision + int(factionOfSecond * precision)
key = encode64( timeStamp<<128 | int(uid) ,28)
With 29 characters you can raise precision to 1024th of a second and year range to 2160 years.
UUID masking: 17 to 19 characters keys
To be even more efficient, you could strip out the first 64 bits of the uuid (which is already a time stamp) and combine it with your own time stamp. This would give you keys with a length of 17 to 19 characters with practically no loss of collision avoidance (depending on your choice of precision).
mask = (1<<64)-1
key = encode64( timeStamp<<64 | (int(uid) & mask) ,19)
Integer/Numeric keys ?
As a final note, if your database supports very large integers or numeric fields (140 bits or more) as keys, you don't have to convert the combined number to a string. Just use it directly as the key. The numerical sequence of timeStamp<<128 | int(uid) will respect the chronology.

The uuid6 module (pip install uuid6) solves the problem. It aims at implementing the corresponding draft for a new uuid variant standard, see here.
Example code:
import uuid6
for i in range(0, 30):
u = uuid6.uuid7()
print(u)
time.sleep(0.1)
The package suggests to use uuid6.uuid7():
Implementations SHOULD utilize UUID version 7 over UUID version 1 and
6 if possible.
UUID version 7 features a time-ordered value field derived from the
widely implemented and well known Unix Epoch timestamp source, the
number of milliseconds seconds since midnight 1 Jan 1970 UTC, leap
seconds excluded. As well as improved entropy characteristics over
versions 1 or 6.

Using strings and byte-like objects compatibly in code to run in both Python 2 & 3

I'm trying to modify the code shown far below, which works in Python 2.7.x, so it will also work unchanged in Python 3.x. However I'm encountering the following problem I can't solve in the first function, bin_to_float() as shown by the output below:
float_to_bin(0.000000): '0'
Traceback (most recent call last):
File "binary-to-a-float-number.py", line 36, in <module>
float = bin_to_float(binary)
File "binary-to-a-float-number.py", line 9, in bin_to_float
return struct.unpack('>d', bf)[0]
TypeError: a bytes-like object is required, not 'str'
I tried to fix that by adding a bf = bytes(bf) right before the call to struct.unpack(), but doing so produced its own TypeError:
TypeError: string argument without an encoding
So my questions are is it possible to fix this issue and achieve my goal? And if so, how? Preferably in a way that would work in both versions of Python.
Here's the code that works in Python 2:
import struct
def bin_to_float(b):
""" Convert binary string to a float. """
bf = int_to_bytes(int(b, 2), 8) # 8 bytes needed for IEEE 754 binary64
return struct.unpack('>d', bf)[0]
def int_to_bytes(n, minlen=0): # helper function
""" Int/long to byte string. """
nbits = n.bit_length() + (1 if n < 0 else 0) # plus one for any sign bit
nbytes = (nbits+7) // 8 # number of whole bytes
bytes = []
for _ in range(nbytes):
bytes.append(chr(n & 0xff))
n >>= 8
if minlen > 0 and len(bytes) < minlen: # zero pad?
bytes.extend((minlen-len(bytes)) * '0')
return ''.join(reversed(bytes)) # high bytes at beginning
# tests
def float_to_bin(f):
""" Convert a float into a binary string. """
ba = struct.pack('>d', f)
ba = bytearray(ba)
s = ''.join('{:08b}'.format(b) for b in ba)
s = s.lstrip('0') # strip leading zeros
return s if s else '0' # but leave at least one
for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
binary = float_to_bin(f)
print('float_to_bin(%f): %r' % (f, binary))
float = bin_to_float(binary)
print('bin_to_float(%r): %f' % (binary, float))
print('')

To make portable code that works with bytes in both Python 2 and 3 using libraries that literally use the different data types between the two, you need to explicitly declare them using the appropriate literal mark for every string (or add from __future__ import unicode_literals to top of every module doing this). This step is to ensure your data types are correct internally in your code.
Secondly, make the decision to support Python 3 going forward, with fallbacks specific for Python 2. This means overriding str with unicode, and figure out methods/functions that do not return the same types in both Python versions should be modified and replaced to return the correct type (being the Python 3 version). Do note that bytes is a reserved word, too, so don't use that.
Putting this together, your code will look something like this:
import struct
import sys
if sys.version_info < (3, 0):
str = unicode
chr = unichr
def bin_to_float(b):
""" Convert binary string to a float. """
bf = int_to_bytes(int(b, 2), 8) # 8 bytes needed for IEEE 754 binary64
return struct.unpack(b'>d', bf)[0]
def int_to_bytes(n, minlen=0): # helper function
""" Int/long to byte string. """
nbits = n.bit_length() + (1 if n < 0 else 0) # plus one for any sign bit
nbytes = (nbits+7) // 8 # number of whole bytes
ba = bytearray(b'')
for _ in range(nbytes):
ba.append(n & 0xff)
n >>= 8
if minlen > 0 and len(ba) < minlen: # zero pad?
ba.extend((minlen-len(ba)) * b'0')
return u''.join(str(chr(b)) for b in reversed(ba)).encode('latin1') # high bytes at beginning
# tests
def float_to_bin(f):
""" Convert a float into a binary string. """
ba = struct.pack(b'>d', f)
ba = bytearray(ba)
s = u''.join(u'{:08b}'.format(b) for b in ba)
s = s.lstrip(u'0') # strip leading zeros
return (s if s else u'0').encode('latin1') # but leave at least one
for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
binary = float_to_bin(f)
print(u'float_to_bin(%f): %r' % (f, binary))
float = bin_to_float(binary)
print(u'bin_to_float(%r): %f' % (binary, float))
print(u'')
I used the latin1 codec simply because that's what the byte mappings are originally defined, and it seems to work
$ python2 foo.py
float_to_bin(0.000000): '0'
bin_to_float('0'): 0.000000
float_to_bin(1.000000): '11111111110000000000000000000000000000000000000000000000000000'
bin_to_float('11111111110000000000000000000000000000000000000000000000000000'): 1.000000
float_to_bin(-14.000000): '1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float('1100000000101100000000000000000000000000000000000000000000000000'): -14.000000
float_to_bin(12.546000): '100000000101001000101111000110101001111110111110011101101100100'
bin_to_float('100000000101001000101111000110101001111110111110011101101100100'): 12.546000
float_to_bin(3.141593): '100000000001001001000011111101110000010110000101011110101111111'
bin_to_float('100000000001001001000011111101110000010110000101011110101111111'): 3.141593
Again, but this time under Python 3.5)
$ python3 foo.py
float_to_bin(0.000000): b'0'
bin_to_float(b'0'): 0.000000
float_to_bin(1.000000): b'11111111110000000000000000000000000000000000000000000000000000'
bin_to_float(b'11111111110000000000000000000000000000000000000000000000000000'): 1.000000
float_to_bin(-14.000000): b'1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float(b'1100000000101100000000000000000000000000000000000000000000000000'): -14.000000
float_to_bin(12.546000): b'100000000101001000101111000110101001111110111110011101101100100'
bin_to_float(b'100000000101001000101111000110101001111110111110011101101100100'): 12.546000
float_to_bin(3.141593): b'100000000001001001000011111101110000010110000101011110101111111'
bin_to_float(b'100000000001001001000011111101110000010110000101011110101111111'): 3.141593
It's a lot more work, but in Python3 you can more clearly see that the types are done as proper bytes. I also changed your bytes = [] to a bytearray to more clearly express what you were trying to do.

I had a different approach from #metatoaster's answer. I just modified int_to_bytes to use and return a bytearray:
def int_to_bytes(n, minlen=0): # helper function
""" Int/long to byte string. """
nbits = n.bit_length() + (1 if n < 0 else 0) # plus one for any sign bit
nbytes = (nbits+7) // 8 # number of whole bytes
b = bytearray()
for _ in range(nbytes):
b.append(n & 0xff)
n >>= 8
if minlen > 0 and len(b) < minlen: # zero pad?
b.extend([0] * (minlen-len(b)))
return bytearray(reversed(b)) # high bytes at beginning
This seems to work without any other modifications under both Python 2.7.11 and Python 3.5.1.
Note that I zero padded with 0 instead of '0'. I didn't do much testing, but surely that's what you meant?

In Python 3, integers have a to_bytes() method that can perform the conversion in a single call. However, since you asked for a solution that works on Python 2 and 3 unmodified, here's an alternative approach.
If you take a detour via hexadecimal representation, the function int_to_bytes() becomes very simple:
import codecs
def int_to_bytes(n, minlen=0):
hex_str = format(n, "0{}x".format(2 * minlen))
return codecs.decode(hex_str, "hex")
You might need some special case handling to deal with the case when the hex string gets an odd number of characters.
Note that I'm not sure this works with all versions of Python 3. I remember that pseudo-encodings weren't supported in some 3.x version, but I don't remember the details. I tested the code with Python 3.5.

python: struct pack size longer than expected -- why does this happen?

So I want to pack a list of tuples and then unpack it later.
from struct import *
from itertools import chain
a = [(1, 67), (213, 455), (9009, 8887)]
# converts 3x2 list to 6x1 list
b = list(chain(*a))
size=6
qq = pack('h'+'L'*size,size,*b)
# peak to get the list length
mysize = unpack('h',qq[:2])
mysize = mysize[0]
unpack('L',qq[2:6])
unpack('h'+'L'*mysize,qq)
unpack('L'*mysize, qq[2:]) # does not work
unpack('L'*mysize, qq[2:2+mysize*4]) # works
Using Python 2.7, the second to last line does not work. I tested len(qq), which is 28, when I was expecting 26.

According to the docs:
C types are represented in the machine’s native format and byte order, and properly aligned by skipping pad bytes if necessary (according to the rules used by the C compiler).
Since you are running on a 64 bit machine, smaller types such as h and L are padded to 8 bytes. You can use the formatting flags =, <, >, ! to remove the padding. For instance, adding "=" works
from struct import *
from itertools import chain
a = [(1, 67), (213, 455), (9009, 8887)]
# converts 3x2 list to 6x1 list
b = list(chain(*a))
size=6
qq = pack('=h'+'L'*size,size,*b)
# peak to get the list length
mysize = unpack('=h',qq[:2])
mysize = mysize[0]
unpack('=L',qq[2:6])
unpack('=h'+'L'*mysize,qq)
unpack('=' + 'L'*mysize, qq[2:]) # does not work
unpack('=' + 'L'*mysize, qq[2:2+mysize*4]) # works

Converting an RGB color tuple to a hexidecimal string

I need to convert (0, 128, 64) to something like this "#008040". I'm not sure what to call the latter, making searching difficult.

Use the format operator %:
>>> '#%02x%02x%02x' % (0, 128, 64)
'#008040'
Note that it won't check bounds...
>>> '#%02x%02x%02x' % (0, -1, 9999)
'#00-1270f'

def clamp(x):
return max(0, min(x, 255))
"#{0:02x}{1:02x}{2:02x}".format(clamp(r), clamp(g), clamp(b))
This uses the preferred method of string formatting, as described in PEP 3101. It also uses min() and max to ensure that 0 <= {r,g,b} <= 255.
Update added the clamp function as suggested below.
Update From the title of the question and the context given, it should be obvious that this expects 3 ints in [0,255] and will always return a color when passed 3 such ints. However, from the comments, this may not be obvious to everyone, so let it be explicitly stated:
Provided three int values, this will return a valid hex triplet representing a color. If those values are between [0,255], then it will treat those as RGB values and return the color corresponding to those values.

I have created a full python program for it the following functions can convert rgb to hex and vice versa.
def rgb2hex(r,g,b):
return "#{:02x}{:02x}{:02x}".format(r,g,b)
def hex2rgb(hexcode):
return tuple(map(ord,hexcode[1:].decode('hex')))
You can see the full code and tutorial at the following link : RGB to Hex and Hex to RGB conversion using Python

This is an old question but for information, I developed a package with some utilities related to colors and colormaps and contains the rgb2hex function you were looking to convert triplet into hexa value (which can be found in many other packages, e.g. matplotlib). It's on pypi
pip install colormap
and then
>>> from colormap import rgb2hex
>>> rgb2hex(0, 128, 64)
'##008040'
Validity of the inputs is checked (values must be between 0 and 255).

I'm truly surprised no one suggested this approach:
For Python 2 and 3:
'#' + ''.join('{:02X}'.format(i) for i in colortuple)
Python 3.6+:
'#' + ''.join(f'{i:02X}' for i in colortuple)
As a function:
def hextriplet(colortuple):
return '#' + ''.join(f'{i:02X}' for i in colortuple)
color = (0, 128, 64)
print(hextriplet(color))
#008040

triplet = (0, 128, 64)
print '#'+''.join(map(chr, triplet)).encode('hex')
or
from struct import pack
print '#'+pack("BBB",*triplet).encode('hex')
python3 is slightly different
from base64 import b16encode
print(b'#'+b16encode(bytes(triplet)))

you can use lambda and f-strings(available in python 3.6+)
rgb2hex = lambda r,g,b: f"#{r:02x}{g:02x}{b:02x}"
hex2rgb = lambda hx: (int(hx[0:2],16),int(hx[2:4],16),int(hx[4:6],16))
usage
rgb2hex(r,g,b) #output = #hexcolor
hex2rgb("#hex") #output = (r,g,b) hexcolor must be in #hex format

In Python 3.6, you can use f-strings to make this cleaner:
rgb = (0,128, 64)
f'#{rgb[0]:02x}{rgb[1]:02x}{rgb[2]:02x}'
Of course you can put that into a function, and as a bonus, values get rounded and converted to int:
def rgb2hex(r,g,b):
return f'#{int(round(r)):02x}{int(round(g)):02x}{int(round(b)):02x}'
rgb2hex(*rgb)

Here is a more complete function for handling situations in which you may have RGB values in the range [0,1] or the range [0,255].
def RGBtoHex(vals, rgbtype=1):
"""Converts RGB values in a variety of formats to Hex values.
#param vals An RGB/RGBA tuple
#param rgbtype Valid valus are:
1 - Inputs are in the range 0 to 1
256 - Inputs are in the range 0 to 255
#return A hex string in the form '#RRGGBB' or '#RRGGBBAA'
"""
if len(vals)!=3 and len(vals)!=4:
raise Exception("RGB or RGBA inputs to RGBtoHex must have three or four elements!")
if rgbtype!=1 and rgbtype!=256:
raise Exception("rgbtype must be 1 or 256!")
#Convert from 0-1 RGB/RGBA to 0-255 RGB/RGBA
if rgbtype==1:
vals = [255*x for x in vals]
#Ensure values are rounded integers, convert to hex, and concatenate
return '#' + ''.join(['{:02X}'.format(int(round(x))) for x in vals])
print(RGBtoHex((0.1,0.3, 1)))
print(RGBtoHex((0.8,0.5, 0)))
print(RGBtoHex(( 3, 20,147), rgbtype=256))
print(RGBtoHex(( 3, 20,147,43), rgbtype=256))

Note that this only works with python3.6 and above.
def rgb2hex(color):
"""Converts a list or tuple of color to an RGB string
Args:
color (list|tuple): the list or tuple of integers (e.g. (127, 127, 127))
Returns:
str: the rgb string
"""
return f"#{''.join(f'{hex(c)[2:].upper():0>2}' for c in color)}"
The above is the equivalent of:
def rgb2hex(color):
string = '#'
for value in color:
hex_string = hex(value) # e.g. 0x7f
reduced_hex_string = hex_string[2:] # e.g. 7f
capitalized_hex_string = reduced_hex_string.upper() # e.g. 7F
string += capitalized_hex_string # e.g. #7F7F7F
return string

You can also use bit wise operators which is pretty efficient, even though I doubt you'd be worried about efficiency with something like this. It's also relatively clean. Note it doesn't clamp or check bounds. This has been supported since at least Python 2.7.17.
hex(r << 16 | g << 8 | b)
And to change it so it starts with a # you can do:
"#" + hex(243 << 16 | 103 << 8 | 67)[2:]

def RGB(red,green,blue): return '#%02x%02x%02x' % (red,green,blue)
background = RGB(0, 128, 64)
I know one-liners in Python aren't necessarily looked upon kindly. But there are times where I can't resist taking advantage of what the Python parser does allow. It's the same answer as Dietrich Epp's solution (the best), but wrapped up in a single line function. So, thank you Dietrich!
I'm using it now with tkinter :-)

There is a package called webcolors. https://github.com/ubernostrum/webcolors
It has a method webcolors.rgb_to_hex
>>> import webcolors
>>> webcolors.rgb_to_hex((12,232,23))
'#0ce817'

''.join('%02x'%i for i in input)
can be used for hex conversion from int number

If typing the formatting string three times seems a bit verbose...
The combination of bit shifts and an f-string will do the job nicely:
# Example setup.
>>> r, g, b = 0, 0, 195
# Create the hex string.
>>> f'#{r << 16 | g << 8 | b:06x}'
'#0000c3'
This also illustrates a method by which 'leading' zero bits are not dropped, if either the red or green channels are zero.

My course task required doing this without using for loops and other stuff, here is my bizarre solution lol.
color1 = int(input())
color2 = int(input())
color3 = int(input())
color1 = hex(color1).upper()
color2 = hex(color2).upper()
color3 = hex(color3).upper()
print('#'+ color1[2:].zfill(2)+color2[2:].zfill(2)+color3[2:].zfill(2))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Define a BitStruct inside a BitStruct in python using construct package - python

Related

Read binary file: Matlab differs from R and python

How to generate a time-ordered uid in Python?

Using strings and byte-like objects compatibly in code to run in both Python 2 & 3

python: struct pack size longer than expected -- why does this happen?

Converting an RGB color tuple to a hexidecimal string

Categories

Resources