Generating list of lists with custom value limitations with Hypothesis - python

The Story:
Currently, I have a function-under-test that expects a list of lists of integers with the following rules:
number of sublists (let's call it N) can be from 1 to 50
number of values inside sublists is the same for all sublists (rectangular form) and should be >= 0 and <= 5
values inside sublists cannot be more than or equal to the total number of sublists. In other words, each value inside a sublist is an integer >= 0 and < N
Sample valid inputs:
[[0]]
[[2, 1], [2, 0], [3, 1], [1, 0]]
[[1], [0]]
Sample invalid inputs:
[[2]] # 2 is more than N=1 (total number of sublists)
[[0, 1], [2, 0]] # 2 is equal to N=2 (total number of sublists)
I'm trying to approach it with property-based-testing and generate different valid inputs with hypothesis library and trying to wrap my head around lists() and integers(), but cannot make it work:
the condition #1 is easy to approach with lists() and min_size and max_size arguments
the condition #2 is covered under Chaining strategies together
the condition #3 is what I'm struggling with - cause, if we use the rectangle_lists from the above example, we don't have a reference to the length of the "parent" list inside integers()
The Question:
How can I limit the integer values inside sublists to be less than the total number of sublists?
Some of my attempts:
from hypothesis import given
from hypothesis.strategies import lists, integers
#given(lists(lists(integers(min_value=0, max_value=5), min_size=1, max_size=5), min_size=1, max_size=50))
def test(l):
# ...
This one was very far from meeting the requirements - list is not strictly of a rectangular form and generated integer values can go over the generated size of the list.
from hypothesis import given
from hypothesis.strategies import lists, integers
#given(integers(min_value=0, max_value=5).flatmap(lambda n: lists(lists(integers(min_value=1, max_value=5), min_size=n, max_size=n), min_size=1, max_size=50)))
def test(l):
# ...
Here, the #1 and #2 are requirements were being met, but the integer values can go larger than the size of the list - requirement #3 is not met.

There's a good general technique that is often useful when trying to solve tricky constraints like this: try to build something that looks a bit like what you want but doesn't satisfy all the constraints and then compose it with a function that modifies it (e.g. by throwing away the bad bits or patching up bits that don't quite work) to make it satisfy the constraints.
For your case, you could do something like the following:
from hypothesis.strategies import builds, lists, integers
def prune_list(ls):
n = len(ls)
return [
[i for i in sublist if i < n][:5]
for sublist in ls
]
limited_list_strategy = builds(
prune_list,
lists(lists(integers(0, 49), average_size=5), max_size=50, min_size=1)
)
In this we:
Generate a list that looks roughly right (it's a list of list of integers and the integers are in the same range as all possible indices that could be valid).
Prune out any invalid indices from the sublists
Truncate any sublists that still have more than 5 elements in them
The result should satisfy all three conditions you needed.
The average_size parameter isn't strictly necessary but in experimenting with this I found it was a bit too prone to producing empty sublists otherwise.
ETA: Apologies. I've just realised that I misread one of your conditions - this doesn't actually do quite what you want because it doesn't ensure each list is the same length. Here's a way to modify this to fix that (it gets a bit more complicated, so I've switched to using composite instead of builds):
from hypothesis.strategies import composite, lists, integers, permutations
#composite
def limisted_lists(draw):
ls = draw(
lists(lists(integers(0, 49), average_size=5), max_size=50, min_size=1)
)
filler = draw(permutations(range(50)))
sublist_length = draw(integers(0, 5))
n = len(ls)
pruned = [
[i for i in sublist if i < n][:sublist_length]
for sublist in ls
]
for sublist in pruned:
for i in filler:
if len(sublist) == sublist_length:
break
elif i < n:
sublist.append(i)
return pruned
The idea is that we generate a "filler" list that provides the defaults for what a sublist looks like (so they will tend to shrink in the direction of being more similar to eachother) and then draw the length of the sublists to prune to to get that consistency.
This has got pretty complicated I admit. You might want to use RecursivelyIronic's flatmap based version. The main reason I prefer this over that is that it will tend to shrink better, so you'll get nicer examples out of it.

You can also do this with flatmap, though it's a bit of a contortion.
from hypothesis import strategies as st
from hypothesis import given, settings
number_of_lists = st.integers(min_value=1, max_value=50)
list_lengths = st.integers(min_value=0, max_value=5)
def build_strategy(number_and_length):
number, length = number_and_length
list_elements = st.integers(min_value=0, max_value=number - 1)
return st.lists(
st.lists(list_elements, min_size=length, max_size=length),
min_size=number, max_size=number)
mystrategy = st.tuples(number_of_lists, list_lengths).flatmap(build_strategy)
#settings(max_examples=5000)
#given(mystrategy)
def test_constraints(list_of_lists):
N = len(list_of_lists)
# condition 1
assert 1 <= N <= 50
# Condition 2
[length] = set(map(len, list_of_lists))
assert 0 <= length <= 5
# Condition 3
assert all((0 <= element < N) for lst in list_of_lists for element in lst)
As David mentioned, this does tend to produce a lot of empty lists, so some average size tuning would be required.
>>> mystrategy.example()
[[24, 6, 4, 19], [26, 9, 15, 15], [1, 2, 25, 4], [12, 8, 18, 19], [12, 15, 2, 31], [3, 8, 17, 2], [5, 1, 1, 5], [7, 1, 16, 8], [9, 9, 6, 4], [22, 24, 28, 16], [18, 11, 20, 21], [16, 23, 30, 5], [13, 1, 16, 16], [24, 23, 16, 32], [13, 30, 10, 1], [7, 5, 14, 31], [31, 15, 23, 18], [3, 0, 13, 9], [32, 26, 22, 23], [4, 11, 20, 10], [6, 15, 32, 22], [32, 19, 1, 31], [20, 28, 4, 21], [18, 29, 0, 8], [6, 9, 24, 3], [20, 17, 31, 8], [6, 12, 8, 22], [32, 22, 9, 4], [16, 27, 29, 9], [21, 15, 30, 5], [19, 10, 20, 21], [31, 13, 0, 21], [16, 9, 8, 29]]
>>> mystrategy.example()
[[28, 18], [17, 25], [26, 27], [20, 6], [15, 10], [1, 21], [23, 15], [7, 5], [9, 3], [8, 3], [3, 4], [19, 29], [18, 11], [6, 6], [8, 19], [14, 7], [25, 3], [26, 11], [24, 20], [22, 2], [19, 12], [19, 27], [13, 20], [16, 5], [6, 2], [4, 18], [10, 2], [26, 16], [24, 24], [11, 26]]
>>> mystrategy.example()
[[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
>>> mystrategy.example()
[[], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
>>> mystrategy.example()
[[6, 8, 22, 21, 22], [3, 0, 24, 5, 18], [16, 17, 25, 16, 11], [2, 12, 0, 3, 15], [0, 12, 12, 12, 14], [11, 20, 6, 6, 23], [5, 19, 2, 0, 12], [16, 0, 1, 24, 10], [2, 13, 21, 19, 15], [2, 14, 27, 6, 7], [22, 25, 18, 24, 9], [26, 21, 15, 18, 17], [7, 11, 22, 17, 21], [3, 11, 3, 20, 16], [22, 13, 18, 21, 11], [4, 27, 21, 20, 25], [4, 1, 13, 5, 13], [16, 19, 6, 6, 25], [19, 10, 14, 12, 14], [18, 13, 13, 16, 3], [12, 7, 26, 26, 12], [25, 21, 12, 23, 22], [11, 4, 24, 5, 27], [25, 10, 10, 26, 27], [8, 25, 20, 6, 23], [8, 0, 12, 26, 14], [7, 11, 6, 27, 26], [6, 24, 22, 23, 19]]

Pretty late, but for posterity: the easiest solution is to pick dimensions, then build up from the element strategy.
from hypothesis.strategies import composite, integers, lists
#composite
def complicated_rectangles(draw, max_N):
list_len = draw(integers(1, max_N))
sublist_len = draw(integers(0, 5))
element_strat = integers(0, min(list_len, 5))
sublist_strat = lists(
element_strat, min_size=sublist_len, max_size=sublist_len)
return draw(lists(
sublist_strat, min_size=list_len, max_size=list_len))

Related

How to dot product 1D and 2D lists in python without using NumPy or .dot?

With given 2D and 1D lists, I have to dot product them. But I have to calculate them without using .dot.
For example, I want to make these lists
matrix_A = [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]]
vector_x = [0, 1, 2, 3]
to this output
result_list = [ 14 38 62 86 110 134 158 182]
How can I do it by only using lists(not using NumPy array and .dot) in python?
You could use a list comprehension with nested for loops.
matrix_A = [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]]
vector_x = [0, 1, 2, 3]
result_list = [sum(a*b for a,b in zip(row, vector_x)) for row in matrix_A]
print(result_list)
Output:
[14, 38, 62, 86, 110, 134, 158, 182]
Edit: Removed the square brackets in the list comprehension following #fshabashev's comment.
If you do not mind using numpy, this is a solution
import numpy as np
matrix_A = [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]]
vector_x = [0, 1, 2, 3]
res = np.sum(np.array(matrix_A) * np.array(vector_x), axis=1)
print(res)

NumPy Array Fill Rows Downward By Indexed Sections

Let's say I have the following (fictitious) NumPy array:
arr = np.array(
[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24],
[25, 26, 27, 28],
[29, 30, 31, 32],
[33, 34, 35, 36],
[37, 38, 39, 40]
]
)
And for row indices idx = [0, 2, 3, 5, 8, 9] I'd like to repeat the values in each row downward until it reaches the next row index:
np.array(
[[1, 2, 3, 4],
[1, 2, 3, 4],
[9, 10, 11, 12],
[13, 14, 15, 16],
[13, 14, 15, 16],
[21, 22, 23, 24],
[21, 22, 23, 24],
[21, 22, 23, 24],
[33, 34, 35, 36],
[37, 38, 39, 40]
]
)
Note that idx will always be sorted and have no repeat values. While I can accomplish this by doing something like:
for start, stop in zip(idx[:-1], idx[1:]):
for i in range(start, stop):
arr[i] = arr[start]
# Handle last index in `idx`
start, stop = idx[-1], arr.shape[0]
for i in range(start, stop):
arr[i] = arr[start]
Unfortunately, I have many, many arrays like this and this can become slow as the size of the array gets larger (in both the number of rows as well as the number of columns) and the length of idx also increases. The final goal is to plot these as a heatmaps in matplotlib, which I already know how to do. Another approach that I tried was using np.tile:
for start, stop in zip(idx[:-1], idx[1:]):
reps = max(0, stop - start)
arr[start:stop] = np.tile(arr[start], (reps, 1))
# Handle last index in `idx`
start, stop = idx[-1], arr.shape[0]
arr[start:stop] = np.tile(arr[start], (reps, 1))
But I am hoping that there's a way to get rid of the slow for-loop.
Try np.diff to find the repetition for each row, then np.repeat:
# this assumes `idx` is a standard list as in the question
np.repeat(arr[idx], np.diff(idx+[len(arr)]), axis=0)
Output:
array([[ 1, 2, 3, 4],
[ 1, 2, 3, 4],
[ 9, 10, 11, 12],
[13, 14, 15, 16],
[13, 14, 15, 16],
[21, 22, 23, 24],
[21, 22, 23, 24],
[21, 22, 23, 24],
[33, 34, 35, 36],
[37, 38, 39, 40]])

How does np.argmax(axis=0) work on 3D arrays?

I am stuck at, as to how does np.argmax(arr, axis=0) work? I know how np.argmax(axis=0) works on 2D arrays. But this 3D one has really confused me.
My Code:
arr = np.array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]],
[[13, 14, 15],
[16, 17, 18],
[19, 20, 21],
[22, 23, 24]],
[[25, 26, 27],
[28, 29, 30],
[31, 32, 33],
[34, 35, 36]]])
Operation:
np.argmax(arr, axis = 0)
Output:
array([[2, 2, 2],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2]], dtype=int64)
FYI - I do know how np.argmax(axis=0) works on 2D arrays. But this 3D one has really confused me.
You need to understand better what is axis=0 here. It can be interpreted as height level of rectangle. So your output shows different levels of that rectangle:
level 0 level 1 level 2
[ 1, 2, 3] [13, 14, 15] [16, 17, 18]
[ 4, 5, 6] [16, 17, 18] [19, 20, 21]
[ 7, 8, 9] [19, 20, 21] [22, 23, 24]
[10, 11, 12] [22, 23, 24] [25, 16, 27]
Then argmax describes indices of levels at which max values are attained. They are:
[16, 17, 18]
[19, 20, 21]
[22, 23, 24]
[25, 16, 27]
It's definitely the upmost level (number 2) for any of these cells
so argmax of every cell is assigned to 2.

Use array to define indices for multidimensional numpy array

I have a multidimensional Numpy array; let's say it's
myArray = array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I know that running myArray[1,1,1], for instance, will return 13. However, I want to define indx = [1,1,1] then call something to the effect ofmyArray[indx].
However, this does some other multidimensional indexing stuff.
I have also tried myArray[*indx] but that understandably throws a syntax error.
Currently my very ugly workaround is to define
def array_as_indices(array, matrix):
st = ''
for i in array:
st += '%s,' % i
st = st[:-1]
return matrix[eval(st)]
which works but is quite inelegant and presumably slow.
Is there a more pythonic way to do what I'm looking for?
This is a duplicate of Unpacking tuples/arrays/lists as indices for Numpy Arrays, but you can just create a tuple
import numpy as np
def main():
my_array = np.array(
[
[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
[[9, 10, 11], [12, 13, 14], [15, 16, 17]],
[[18, 19, 20], [21, 22, 23], [24, 25, 26]],
]
)
print(f"my_array[1,1,1]: {my_array[1,1,1]}")
indx = (1, 1, 1)
print(f"my_array[indx]: {my_array[indx]}")
if __name__ == "__main__":
main()
will give
my_array[1,1,1]: 13
my_array[indx]: 13
The indices of a numpy array are addressed by tuples, not lists. Use indx = (1, 1, 1).
As an extension, if you want to call the indices (1, 1, 1) and (2, 2, 2), you can use
>>> indx = ([1, 2], [1, 2], [1, 2])
>>> x[indx]
array([13, 26])
The rationale behind the behavior with lists is that numpy treats lists sequentially, so
>>> indx = [1, 1, 1]
>>> x[indx]
array([[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]])
It returns a list of three elements, each equal to x[1].

Split list at a specific value

I am trying to write a code that splits lists in a class of lists in two when a certain value is a middle element of the list and then produce two lists where the middle element becomes the end element in the first list and the first element in the second one.
There can be more than n middle elements in the list so the result must be n+1 lists.
Example:
A = [[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],[16,17,18,19,20,21,22,23,24,25],[26,27,28,29]]
P = [4,7,13,20]
n = len(Points) # in this case n = 4
I am looking for a result that looks like this:
A = [[0,1,2,3,4],[4,5,6,7],[7,8,9,10,11,12,13],[13,14,15],[16,17,18,19,20],[20,21,22,23,24,25],[26,27,28,29]]
Since n = 4 and it will produce 5 lists, note that the answer has 6 lists because the last list doesn't have any value of P in and therefore stays intact.
I haven't been able to produce anything as I am new to python and it is hard to formulate this problem.
Any help is appreciated!
You can first recover all indices of the provided values and then slice accordingly.
Code
def split_at_values(lst, values):
indices = [i for i, x in enumerate(lst) if x in values]
for start, end in zip([0, *indices], [*indices, len(lst)]):
yield lst[start:end+1]
# Note: remove +1 for separator to only appear in right side slice
Example
values = {4, 7, 13, 20}
lst = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
print(*split_at_values(lst, values))
Output
[0, 1, 2, 3, 4] [4, 5, 6, 7] [7, 8, 9, 10, 11, 12, 13] [13, 14, 15]
You can then apply this iteratively to you input list A to get the desired result. Alternatively you can use itertools.chain.from_iterable.
from itertools import chain
values = {4, 7, 13, 20}
lst_A = [[0, 1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23, 24, 25],
[26, 27, 28, 29]]
output = list(chain.from_iterable(split_at_values(sublst, values) for sublst in lst_A))
print(output)
Output
[[0, 1, 2, 3, 4],
[4, 5, 6, 7],
[7, 8, 9, 10, 11, 12, 13],
[13, 14, 15],
[16, 17, 18, 19, 20],
[20, 21, 22, 23, 24, 25],
[26, 27, 28, 29]]
You can keep appending the sub-list items to the last sub-list of the output list, and if the current item is equal to the next item in Points, append a new sub-list to the output with the same item and pop the item from Points:
output = []
for l in List:
output.append([])
for i in l:
output[-1].append(i)
if Points and i == Points[0]:
output.append([i])
Points.pop(0)
With your sample input, output would become:
[[0, 1, 2, 3, 4], [4, 5, 6, 7], [7, 8, 9, 10, 11, 12, 13], [13, 14, 15], [16, 17, 18, 19, 20], [20, 21, 22, 23, 24, 25], [26, 27, 28, 29]]

Categories