Python bit list to byte list - python

I have a long 1-dimensional list of integer 1's and 0's, representing 8-bit binary bytes. What is a neat way to create a new list from that, containing the integer bytes.
Being familiar with C, but new to Python, I've coded it in the way I'd do it with C: an elaborate structure that loops though each bit. However, I'm aware that the whole point of Python over C is that such things can usually be done compactly and elegantly, and that I should learn how to do that. Maybe using list comprehension?
This works, but suggestions for a more "Pythonic" way would be appreciated:
#!/usr/bin/env python2
bits = [1,0,0,1,0,1,0,1,0,1,1,0,1,0,1,1,1,1,1,0,0,1,1,1]
bytes = []
byt = ""
for bit in bits:
byt += str(bit)
if len(byt) == 8:
bytes += [int(byt, 2)]
byt = ""
print bytes
$ bits-to-bytes.py
[149, 107, 231]

You can slice the list into chunks of 8 elements and map the subelements to str:
[int("".join(map(str, bits[i:i+8])), 2) for i in range(0, len(bits), 8)]
You could split it up into two parts mapping and joining once:
mapped = "".join(map(str, bits))
[int(mapped[i:i+8], 2) for i in range(0, len(mapped), 8)]
Or using iter and borrowing from the grouper recipe in itertools:
it = iter(map(str, bits))
[int("".join(sli), 2) for sli in zip(*iter([it] * 8))]
iter(map(str, bits)) maps the content of bits to str and creates an iterator, zip(*iter([it] * 8)) groups the elements into groups of 8 subelements.
Each zip(*iter.. consumes eight subelements from our iterator so we always get sequential groups, it is the same logic as the slicing in the first code we just avoid the need to slice.
As Sven commented, for lists not divisible by n you will lose data using zip similarly to your original code, you can adapt the grouper recipe I linked to handle those cases:
from itertools import zip_longest # izip_longest python2
bits = [1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,1,0]
it = iter(map(str, bits))
print( [int("".join(sli), 2) for sli in izip_longest(*iter([it] * 8),fillvalue="")])
[149, 107, 231, 2] # using just zip would be [149, 107, 231]
The fillvalue="" means we pad the odd length group with empty string so we can still call int("".join(sli), 2) and get correct output as above where we are left with 1,0 after taking 3 * 8 chunks.
In your own code bytes += [int(byt, 2)] could simply become bytes.append(int(byt, 2))

Padraic's solution is good; here's another way to do it:
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# Taken from itertools recipes
# https://docs.python.org/2/library/itertools.html#recipes
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
bits = [1, 0, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 0, 1, 1,
1, 1, 1, 0, 0, 1, 1, 1]
byte_strings = (''.join(bit_group) for bit_group in grouper(map(str, bits), 8))
bytes = [int(byte_string, 2) for byte_string in byte_strings]
print bytes # [149, 107, 231]

Since you start from a numeric list you might want to avoid string manipulation. Here there are a couple of methods:
dividing the original list in 8 bits chunks and computing the decimal value of each byte (assuming the number of bits is a multiple of 8); thanks to Padraic Cunningham for the nice way of dividing a sequence by groups of 8 subelements;
bits = [1,0,0,1,0,1,0,1,0,1,1,0,1,0,1,1,1,1,1,0,0,1,1,1]
[sum(b*2**x for b,x in zip(byte[::-1],range(8))) for byte in zip(*([iter(bits)]*8))]
using bitwise operators (probably more efficient); if the number of bits is not a multiple of 8 the code works as if the bit sequence was padded with 0s on the left (padding on the left often makes more sense than padding on the right, because it preserves the numerical value of the original binary digits sequence)
bits = [1,0,0,1,0,1,0,1,0,1,1,0,1,0,1,1,1,1,1,0,0,1,1,1]
n = sum(b*2**x for b,x in zip(bits[::-1],range(len(bits)))) # value of the binary number represented by 'bits'
# n = int(''.join(map(str,bits)),2) # another way of finding n by means of string manipulation
[(n>>(8*p))&255 for p in range(len(bits)//8-(len(bits)%8==0),-1,-1)]

Related

multiple dimensional permutations [duplicate]

This question already has answers here:
How do I generate all permutations of a list?
(40 answers)
Closed 3 years ago.
given a list of non-zero integers like, [2, 3, 4, 2]
generate a list of all the permutations possible where each element above reflects its maximum variance (I am sure there is a better way to express this, but I don't have the math background); each element in the above array can be considered a dimension; the above 2 would allow for values 0 and 1; the 3 would allow for values 0, 1 and 2, etc
the result would be a list of zero-based tuples:
[(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 0), (0, 0, 1, 1), (0, 0, 2, 0)...
and so on till (1, 2, 3, 1)]
the length of the array could vary, from 1 element to x
you can use itertools.product:
try this:
from itertools import product
limits = [2, 3, 4, 2]
result = list(product(*[range(x) for x in limits]))
print(result)
What you're basically doing is trying to represent integers in a changing base. In your example, some of the digits are base 2, some base 3, and some base 4. So you can use an algorithm that chance base 10 to any base, and have the base you convert to depend on the current digit. Here's what I threw together, not sure if it's completely clear how it works.
n = [2, 3, 4, 2]
max_val = 1
for i in n:
max_val *= i
ans = [] # will hold the generated lists
for i in range(max_val):
current_value = i
current_perm = []
for j in n[::-1]: # For you, the 'least significant bit' is on the right
current_perm.append(current_value % j)
current_value //= j # integer division in python 3
ans.append(current_perm[::-1]) # flip it back around!
print(ans)
So you basically just want to count, but you have a different limit for each position?
limits = [2,3,4,2]
counter = [0] * len(limits)
def check_limits():
for i in range(len(limits)-1, 0, -1):
if counter[i] >= limits[i]:
counter[i] = 0
counter[i-1] += 1
return not counter[0] >= limits[0]
while True:
counter[len(counter)-1] += 1
check = check_limits()
if check:
print(counter)
else:
break
Not a list of tuples, but you get the idea...

How to decode any string fields in a list that are a byte b'string'

Migrating code to Python3.6, unpacking and assigning to a list worked in Python2.6 as the whole list was a string, in 3.6 string values are represented as bytecode. Any value that was an integer is being represented correctly in the list, but any string fields are being represented as bytes still eg: b'B'
Source data is a binary file containing various messages, with various lengths, these messages are successfully being unpacked and stored in a list
Raw byte values data of a sample message
b'\x07\x88g\xe0b\xe5]\xc5\x00\x01j\xdd\x00\x01\xff\xdcB\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x03\xe8\x00\x00\x02'
Unpacked data - using '>I Q I c I Q i H B' on the raw byte values above
[126380000, 7126205086073711325, 131036, b'B', 1, 10, 1000, 0, 2]
End state: to implement a generic solution that will detect any b' in a list (can be any index in a list depending on message) convert to a normal string value.
or do not store string values as bytecode during the unpack
Current : [126380000, 7126205086073711325, 131036, b'B', 1, 10, 1000, 0, 2]
End state: [126380000, 7126205086073711325, 131036, B, 1, 10, 1000, 0, 2]
Noting b'B' is to be simply represented as B
I have searched google and stackoverflow for a answer, but only find generic decode examples.
Thanks in advance
AFAIK, there is no format character for struct.unpack that outputs a string, always in bytes.
You can use map to decode each bytes-type list item to a string.
org = [126380000, 7126205086073711325, 131036, b'B', 1, 10, 1000, 0, 2]
res = list(map(lambda i: i.decode("utf-8") if isinstance(i, bytes) else i, org))
EDIT
As suggested, it can be simpler to use a list comprehension instead of map.
res = [i.decode("utf-8") if isinstance(i, bytes) else i for i in org]
I recommend going through the discussions in List comprehension vs map to see when to use one over the other (ex. performance with long/large lists, readability, with/without lambdas, etc.).
map
mysetup = "fields = [126380000, 7126205086073711325, 131036, b'B', 1, 10, 1000, 0, 2]"
mycode = 'fields = list(map(lambda i: i.decode("utf-8") if isinstance(i, bytes) else i, fields))'
print(timeit.timeit(setup=mysetup, stmt=mycode, number=100000))
Time: 0.24705234917252444
List comprehension
mysetup = "fields = [126380000, 7126205086073711325, 131036, b'B', 1, 10, 1000, 0, 2]"
mycode = 'fields = [i.decode("utf-8") if isinstance(i, bytes) else i for i in fields]'
print(timeit.timeit(setup=mysetup, stmt=mycode, number=100000))
Time: 0.1520654000212543
List comprehension is faster.

Comparison between numpy.all and bitwise '&' for binary sequence

I have a problem to perform a bitwise '&' between two large binary sequences of the same length and I need to find the indexes where the 1's appear.
I used numpy to do it and here is my code:
>>> c = numpy.array([[0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1],[0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1]]) #initialize 2d array
>>> c = c.all(axis=0)
>>> d = numpy.where(c)[False] #returns indices
I checked the timings for it.
>>> print("Time taken to perform 'numpy.all' : ",timeit.timeit(lambda :c.all(axis=0),number=10000))
>>> Time taken to perform 'numpy.all' : 0.01454929300234653
This operation was slower than what I expected.
Then, to compare, I performed a basic bitwise '&' operation:
>>> print("Time taken to perform bitwise & :",timeit.timeit('a = 0b0000000001111111111100000001111111111; b = 0b0000000001111111111100000001111111111; c = a&b',number=10000))
>>> Time taken to perform bitwise & : 0.0004252859980624635
This is much quicker than numpy
I'm using numpy because it allows to find the indexes where it has 1's, but the numpy.all operator is much slower.
My original data will be array list just like in first case. Will there be any repurcusion if I convert this list into a binary number and then perform the computation like in the second case?
I don't think you can beat the speed of a&b (the actual computation is just a bunch of elementary cpu ops, I'm pretty sure the result of your timeit is >99% overhead). For example:
>>> from timeit import timeit
>>> import numpy as np
>>> import random
>>>
>>> k = 2**17-2
>>> a = random.randint(0, 2**k-1) + 2**k
>>> b = random.randint(0, 2**k-1) + 2**k
>>> timeit('a & b', globals=globals())
2.520026927930303
That's >100k bits and takes just ~2.5 us.
In any case the cost of & will be dwarfed by the cost of generating the list or array of indices.
numpy comes with significant overhead itself, so for a simple operation like yours one needs to check whether it is worth it.
So let's try a pure python solution first:
>>> c = a & b
>>> timeit("[x for x, y in enumerate(bin(c), -2) if y=='1']", globals=globals(), number=1000)
7.905808186973445
That's ~8 ms and as anticipated several orders of magnitude more than the & operation.
How about numpy?
Let's move the list comprehension first:
>>> timeit("np.where(np.fromstring(bin(c), np.uint8)[2:] - ord('0'))[0]", globals=globals(), number=1000)
1.0363857130287215
So in this case we get a ~8-fold speedup. This shrinks to ~4-fold if we require the result to be a list:
>>> timeit("np.where(np.fromstring(bin(c), np.uint8)[2:] - ord('0'))[0].tolist()", globals=globals(), number=1000)
1.9008758360287175
We can also let numpy do the binary conversion, which gives another small speedup:
>>> timeit("np.where(np.unpackbits(np.frombuffer(c.to_bytes(k//8+1, 'big'), np.uint8))[1:])[0]", globals=globals(), number=1000)
0.869781385990791
In summary:
numpy is not always faster, better leave the & to pure Python
locating nonzero bits seems fast enough in numpy to offset the cost of conversion between list and array
Please note that all this comes with the caveat that my pure Python code is not necessarily optimal. For example using a lookup table we can get a bit faster:
>>> lookup = [(np.where(i)[0]-1).tolist() for i in np.ndindex(*8*[2])]
>>> timeit("[(x<<3) + z for x, y in enumerate(c.to_bytes(k//8+1, 'big')) for z in lookup[y]]", globals=globals(), number=1000)
4.687953414046206
>>> c = numpy.random.randint(2, size=(2, 40)) #initialize
>>> c
array([[1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0],
[1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,
0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1]])
Accessing this gives you two slow-downs:
You have to access the two rows, whereas your bit-wise test has the constants readily available in registers.
You are performing a series of 40 and operations, which may include casting from a full integer to a Boolean.
You severely handicapped the all test; the result is not a surprise (any more).
The factor you observe is a direct consequence of the fact that c=numpy.array([[0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1],[0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1]]) is an array on int and an int is coded in 32bits
therefore when you go c.all() you are doing an operation on 37*32 = 1184 bits
However a = 0b0000000001111111111100000001111111111 is composed of 37 bits so when you do a&b the operation is on 37 bits.
therefore you are doing something 32 times more costly with the numpy array.
Let's test that
import timeit
import numpy as np
print("Time taken to perform bitwise & :",timeit.timeit('a=0b0000000001111111111100000001111111111; b = 0b0000000001111111111100000001111111111; c = a&b',number=320000))
a = 0b0000000001111111111100000001111111111
b = 0b0000000001111111111100000001111111111
c=np.array([a,b])
print("Time taken to perform 'numpy.all' : ",timeit.timeit(lambda :c.all(axis=0),number=10000))
the & operation I do 320000 times and the all() operation I do 10000 times.
Time taken to perform bitwise & : 0.01527938833025152
Time taken to perform 'numpy.all' : 0.01583387375572265
It's the same thing !
Now back to your initial problem you want to know the indices where bits are 1 in a large binary number.
Maybe you could try things provided by the bitarray module
a = bitarray.bitarray('0000000001111111111100000001111111111')
b = bitarray.bitarray('0000000001111111111100000001111111111')
i=0
data = list()
for c in a&b:
if(c):
data.append(i)
i=i+1
print (data)
outputs
[9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]

Number of permutations of a set with an element limited to some positions

Say I have a combination of digits [1, 3, 5, 0, 9]. How can I calculate the number of permutations of the combination where 0 is not at the first position? Also, there may be more than one 0 in the combination.
The literal translation of your problem into python code would be:
>>> from itertools import permutations
>>> len([x for x in permutations((1, 3, 5, 0, 9)) if x[0]!=0])
96
But note that this actually calculates all the permutations, which would take a long time when the sequence gets long enough.
If all you are interested is the number of possible permutations fitting your restrictions, you'd be better off calculating that number via combinatorial considerations as fredtantini mentioned.
Let's say that you are talking about list (sets are not ordered and cannot have an item more than once).
Calculate the number of permutation is a mathematical problem that can be dealed without python: the number of permutation of a set of length 5 is 5!. As you don't want all the permutations that start with 0, the total number is 5!-4!=96.
Python has the module itertools with the permutation function. You can use list comprehension to filter the results and calculate the length:
>>>[l for l in permutations(list({1, 3, 5, 0, 9})) if l[0]!=0]
[(9, 0, 3, 5, 1), (9, 0, 3, 1, 5), ..., (1, 5, 3, 9, 0)]
>>>len([l for l in permutations(list({1, 3, 5, 0, 9})) if l[0]!=0])
96
In case I am able to understand your problem, then following logic should work:
a = [1, 3, 5, 0, 9]
import itertools
perm = list(itertools.permutations(a))
perm_new = []
for i in range(len(perm)):
if perm[i][0] != 0:
perm_new.append(perm[i])

Assigning multiple array indices at once in Python/Numpy

I'm looking to quickly (hopefully without a for loop) generate a Numpy array of the form:
array([a,a,a,a,0,0,0,0,0,b,b,b,0,0,0, c,c,0,0....])
Where a, b, c and other values are repeated at different points for different ranges. I'm really thinking of something like this:
import numpy as np
a = np.zeros(100)
a[0:3,9:11,15:16] = np.array([a,b,c])
Which obviously doesn't work. Any suggestions?
Edit (jterrace answered the original question):
The data is coming in the form of an N*M Numpy array. Each row is mostly zeros, occasionally interspersed by sequences of non-zero numbers. I want to replace all elements of each such sequence with the last value of the sequence. I'll take any fast method to do this! Using where and diff a few times, we can get the start and stop indices of each run.
raw_data = array([.....][....])
starts = array([0,0,0,1,1,1,1...][3, 9, 32, 7, 22, 45, 57,....])
stops = array([0,0,0,1,1,1,1...][5, 12, 50, 10, 30, 51, 65,....])
last_values = raw_data[stops]
length_to_repeat = stops[1]-starts[1]
Note that starts[0] and stops[0] are the same information (which row the run is occurring on). At this point, since the only route I know of is what jterrace suggest, we'll need to go through some contortions to get similar start/stop positions for the zeros, then interleave the zero start/stop with the values start/stops, and interleave the number 0 with the last_values array. Then we loop over each row, doing something like:
for i in range(N)
values_in_this_row = where(starts[0]==i)[0]
output[i] = numpy.repeat(last_values[values_in_this_row], length_to_repeat[values_in_this_row])
Does that make sense, or should I explain some more?
If you have the values and repeat counts fully specified, you can do it this way:
>>> import numpy
>>> values = numpy.array([1,0,2,0,3,0])
>>> counts = numpy.array([4,5,3,3,2,2])
>>> numpy.repeat(values, counts)
array([1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 3, 3, 0, 0])
you can use numpy.r_:
>>> np.r_[[a]*4,[b]*3,[c]*2]
array([1, 1, 1, 1, 2, 2, 2, 3, 3])

Categories