Related
I'm learning unit testing and TDD in python and have got the basics mastered with pytest.
I've started a learning project of writing a sudoku solver and one thing I'm realising is that at the start I'm not currently sure how I want to store the data for the sudoku board. Many talks warn about testing behaviour over implementation. Would the board data structure count as implementation?
It's very hard to write a test that is independent of the board data structure.
For example you could store the values of each cell as an integer (eg 1) or a string (eg '1') and the board as a whole can be a dict (using cell label like 'A1' or 'C5' etc as keys), list (flat or 2d) or a 81 character string.
So I might start with
hardboard = [
[0, 0, 0, 0, 0, 0, 4, 0, 0],
[0, 0, 1, 0, 7, 0, 0, 9, 0],
[5, 0, 0, 0, 3, 0, 0, 0, 6],
[0, 8, 0, 2, 0, 0, 0, 0, 0],
[7, 0, 0, 0, 0, 0, 9, 2, 0],
[1, 2, 0, 6, 0, 5, 0, 0, 0],
[0, 5, 0, 0, 0, 0, 0, 4, 0],
[0, 7, 3, 9, 0, 8, 0, 0, 0],
[6, 0, 0, 4, 5, 0, 2, 0, 0]
]
but then decide to use a dict
hardboard = {'A1': 0, 'A2': 0, 'A3': 0, ... 'I9': 0}
So how should I start? The only solution I can think of is to commit to one format, at least in terms how I represent the board in tests, even if the actual code uses a different data structure later and has to translate. But in my TDD noobiness I'm unsure if this is a good idea?
I'm not a python developer, but still:
Don't think about the board in terms of implementation details (like, this is a dictionary, array, slice, whatever).
Instead imagine that the board is an abstraction and it has some behaviors - which are in plain words - things you can do with a board:
place a digit in an index(i,j) that should be in 1 to 9 inclusive boundaries
remove a digit by index (i,j)
So its like you have a "contract" - you create a board and can execute some operations on it. Of course it has some internal implementation, but its hidden from everyone who uses it. Its your responsibility to provide a working implementation but this is what you want to test: The implementation is considered working if it "reacts" on all the operations correctly (behavior = operation = function at the level of actual programming implementation)
In this case your test (schematically might look like this):
Board Tests:
Test 1 : Check that its impossible to put a negative i index:
// setup
board = ...
// when:
board.put(-1, 3, 8) // i-index is wrong
// then:
expect exception to be thrown
Test 2 : Check that its impossible to put a negative j index:
// setup
board = ...
// when:
board.put(0, -1, 5) // j-index is wrong
// then:
expect exception to be thrown
Test 3 : Check that its impossible to put a negative number into cell:
// setup
board = ...
// when:
board.put(0, 0, -7)
// then:
expect exception to be thrown
Test 4 : Check that its impossible to put number that alreay contains another number:
// setup
board = ...
// when:
board.put(0, 0, 6)
board.put(0, 0, 5)
// then:
expect exception to be thrown
... and so on and so forth (also cover with tests for getting a cell, and any other additional behavior you create for the board).
Note, that all these tests indeed check the behavior of your abstraction rather than internal implementation.
At the level of implementation you can create a class for the board, internal data fields will contain the state, whereas the behavior will be represented by the methods exposed to the users of the class.
I'm experimenting with a genetic search algorithm and after building the initial population at random, and then selecting the top two fittest entries, I need to 'mate' them (with some random mutation) to create 64
'children'. The crossover part, explained here:
https://towardsdatascience.com/introduction-to-genetic-algorithms-including-example-code-e396e98d8bf3
seems easy to follow, but I can't seem to figure out how to implement it in Python. How can I implement this crossover of two integers?
def crossover(a, b, index):
return b[:index] + a[index:], a[:index] + b[index:]
Should be quite a bit faster than James' solution, since this one lets Python do all the work!
Here is a function called crossover that takes two parents and a crossover point. The parents should be lists of integers of the same length. The crossover point is the point before which genes get exchanged, as defined in the article that you linked to.
It returns the two offspring of the parents.
def crossover(a, b, crossover_point):
a1 = a[:]
b1 = b[:]
for i in range(crossover_point):
a1[i], b1[i] = b1[i], a1[i]
return [a1, b1]
And here is some code that demonstrates its usage. It creates a population consisting of two lists of length 10, one with only zeros, and the other with only ones. It crosses them over at point 4, and adds the children to the population.
def test_crossover():
a = [0]*10
b = [1]*10
population = [a,b]
population += crossover(a,b,4)
return population
print (test_crossover())
The output of the above is:
[
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
]
I have a problem to perform a bitwise '&' between two large binary sequences of the same length and I need to find the indexes where the 1's appear.
I used numpy to do it and here is my code:
>>> c = numpy.array([[0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1],[0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1]]) #initialize 2d array
>>> c = c.all(axis=0)
>>> d = numpy.where(c)[False] #returns indices
I checked the timings for it.
>>> print("Time taken to perform 'numpy.all' : ",timeit.timeit(lambda :c.all(axis=0),number=10000))
>>> Time taken to perform 'numpy.all' : 0.01454929300234653
This operation was slower than what I expected.
Then, to compare, I performed a basic bitwise '&' operation:
>>> print("Time taken to perform bitwise & :",timeit.timeit('a = 0b0000000001111111111100000001111111111; b = 0b0000000001111111111100000001111111111; c = a&b',number=10000))
>>> Time taken to perform bitwise & : 0.0004252859980624635
This is much quicker than numpy
I'm using numpy because it allows to find the indexes where it has 1's, but the numpy.all operator is much slower.
My original data will be array list just like in first case. Will there be any repurcusion if I convert this list into a binary number and then perform the computation like in the second case?
I don't think you can beat the speed of a&b (the actual computation is just a bunch of elementary cpu ops, I'm pretty sure the result of your timeit is >99% overhead). For example:
>>> from timeit import timeit
>>> import numpy as np
>>> import random
>>>
>>> k = 2**17-2
>>> a = random.randint(0, 2**k-1) + 2**k
>>> b = random.randint(0, 2**k-1) + 2**k
>>> timeit('a & b', globals=globals())
2.520026927930303
That's >100k bits and takes just ~2.5 us.
In any case the cost of & will be dwarfed by the cost of generating the list or array of indices.
numpy comes with significant overhead itself, so for a simple operation like yours one needs to check whether it is worth it.
So let's try a pure python solution first:
>>> c = a & b
>>> timeit("[x for x, y in enumerate(bin(c), -2) if y=='1']", globals=globals(), number=1000)
7.905808186973445
That's ~8 ms and as anticipated several orders of magnitude more than the & operation.
How about numpy?
Let's move the list comprehension first:
>>> timeit("np.where(np.fromstring(bin(c), np.uint8)[2:] - ord('0'))[0]", globals=globals(), number=1000)
1.0363857130287215
So in this case we get a ~8-fold speedup. This shrinks to ~4-fold if we require the result to be a list:
>>> timeit("np.where(np.fromstring(bin(c), np.uint8)[2:] - ord('0'))[0].tolist()", globals=globals(), number=1000)
1.9008758360287175
We can also let numpy do the binary conversion, which gives another small speedup:
>>> timeit("np.where(np.unpackbits(np.frombuffer(c.to_bytes(k//8+1, 'big'), np.uint8))[1:])[0]", globals=globals(), number=1000)
0.869781385990791
In summary:
numpy is not always faster, better leave the & to pure Python
locating nonzero bits seems fast enough in numpy to offset the cost of conversion between list and array
Please note that all this comes with the caveat that my pure Python code is not necessarily optimal. For example using a lookup table we can get a bit faster:
>>> lookup = [(np.where(i)[0]-1).tolist() for i in np.ndindex(*8*[2])]
>>> timeit("[(x<<3) + z for x, y in enumerate(c.to_bytes(k//8+1, 'big')) for z in lookup[y]]", globals=globals(), number=1000)
4.687953414046206
>>> c = numpy.random.randint(2, size=(2, 40)) #initialize
>>> c
array([[1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0],
[1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,
0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1]])
Accessing this gives you two slow-downs:
You have to access the two rows, whereas your bit-wise test has the constants readily available in registers.
You are performing a series of 40 and operations, which may include casting from a full integer to a Boolean.
You severely handicapped the all test; the result is not a surprise (any more).
The factor you observe is a direct consequence of the fact that c=numpy.array([[0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1],[0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1]]) is an array on int and an int is coded in 32bits
therefore when you go c.all() you are doing an operation on 37*32 = 1184 bits
However a = 0b0000000001111111111100000001111111111 is composed of 37 bits so when you do a&b the operation is on 37 bits.
therefore you are doing something 32 times more costly with the numpy array.
Let's test that
import timeit
import numpy as np
print("Time taken to perform bitwise & :",timeit.timeit('a=0b0000000001111111111100000001111111111; b = 0b0000000001111111111100000001111111111; c = a&b',number=320000))
a = 0b0000000001111111111100000001111111111
b = 0b0000000001111111111100000001111111111
c=np.array([a,b])
print("Time taken to perform 'numpy.all' : ",timeit.timeit(lambda :c.all(axis=0),number=10000))
the & operation I do 320000 times and the all() operation I do 10000 times.
Time taken to perform bitwise & : 0.01527938833025152
Time taken to perform 'numpy.all' : 0.01583387375572265
It's the same thing !
Now back to your initial problem you want to know the indices where bits are 1 in a large binary number.
Maybe you could try things provided by the bitarray module
a = bitarray.bitarray('0000000001111111111100000001111111111')
b = bitarray.bitarray('0000000001111111111100000001111111111')
i=0
data = list()
for c in a&b:
if(c):
data.append(i)
i=i+1
print (data)
outputs
[9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]
I'm writing a program in Linux which reads and distinguish inputs from two USB devices(two barcode readers) which simulates a keyboard.
I've already can read inputs from USB, but it happens before OS translate keycode in a charactere.
For example, when I read 'a' i got 24, 'b' 25, etc....
For example, when I read 'a' i got 4, 'b' 5, etc....
Is there any way to convert that code in a char without manual mapping?
Some output exemples:
KEYPRESS = a output = array('B', [0, 0, 4, 0, 0, 0, 0, 0])
KEYPRESS = SHIFT + a output = array('B', [2, 0, 4, 0, 0, 0, 0, 0])
KEYPRESS = 1 output = array('B', [0, 0, 30, 0, 0, 0, 0, 0])
KEYPRESS = ENTER output = array('B', [0, 0, 81, 0, 0, 0, 0, 0])
thx!
Use the chr function. Python uses a different character mapping (ASCII) from whatever you're receiving though, so you will have to add 73 to your key values to fix the offset.
>>> chr(24 + 73)
'a'
>>> chr(25 + 73)
'b'
I've already can read inputs from USB, but it happens before OS
translate keycode in a charactere.
The problem seems to me in your interface or the driver program.
In ASCII 'a' is supposed to have ordinal value 97 whose binary representation is 0b1100001, where as what you are receiving is 27 whose binary representation is 0b11000, similarly for 'b' you were supposed to received '0b1100010' instead you received 25 which is 0b11001. Check your hardware to determine if the 1st and the 3rd bit is dropped from the input.
What you are receiving is USB scan code. I do not think there is a third party python library to do the conversion for you. I would suggest you to refer any of the USB Scan Code Table and from it, create a dictionary of USB Scan Code vs the corresponding ASCII.
I'm looking to quickly (hopefully without a for loop) generate a Numpy array of the form:
array([a,a,a,a,0,0,0,0,0,b,b,b,0,0,0, c,c,0,0....])
Where a, b, c and other values are repeated at different points for different ranges. I'm really thinking of something like this:
import numpy as np
a = np.zeros(100)
a[0:3,9:11,15:16] = np.array([a,b,c])
Which obviously doesn't work. Any suggestions?
Edit (jterrace answered the original question):
The data is coming in the form of an N*M Numpy array. Each row is mostly zeros, occasionally interspersed by sequences of non-zero numbers. I want to replace all elements of each such sequence with the last value of the sequence. I'll take any fast method to do this! Using where and diff a few times, we can get the start and stop indices of each run.
raw_data = array([.....][....])
starts = array([0,0,0,1,1,1,1...][3, 9, 32, 7, 22, 45, 57,....])
stops = array([0,0,0,1,1,1,1...][5, 12, 50, 10, 30, 51, 65,....])
last_values = raw_data[stops]
length_to_repeat = stops[1]-starts[1]
Note that starts[0] and stops[0] are the same information (which row the run is occurring on). At this point, since the only route I know of is what jterrace suggest, we'll need to go through some contortions to get similar start/stop positions for the zeros, then interleave the zero start/stop with the values start/stops, and interleave the number 0 with the last_values array. Then we loop over each row, doing something like:
for i in range(N)
values_in_this_row = where(starts[0]==i)[0]
output[i] = numpy.repeat(last_values[values_in_this_row], length_to_repeat[values_in_this_row])
Does that make sense, or should I explain some more?
If you have the values and repeat counts fully specified, you can do it this way:
>>> import numpy
>>> values = numpy.array([1,0,2,0,3,0])
>>> counts = numpy.array([4,5,3,3,2,2])
>>> numpy.repeat(values, counts)
array([1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 3, 3, 0, 0])
you can use numpy.r_:
>>> np.r_[[a]*4,[b]*3,[c]*2]
array([1, 1, 1, 1, 2, 2, 2, 3, 3])