generating conditional data with Hypothesis Python

generating conditional data with Hypothesis Python - python

I want to generate a list of lists of integers of size 2 with the following conditions.
the first element should be smaller than the second and
all the data should be unique.
I could generate each tuple with a custom function but don't know how to use that to satisfy the second condition.
from hypothesis import strategies as st
#st.composite
def generate_data(draw):
min_val, max_val = draw(st.lists(st.integers(1, 1e2), min_size=2, max_size=2))
st.assume(min_val < max_val)
return [min_val, max_val]
I could generate the data by iterating over generate_date a few times in this (inefficient ?) way:
>>> [generate_data().example() for _ in range(3)]
[[5, 31], [1, 12], [33, 87]]
But how can I check that the data is unique?
E.g, the following values are invalid:
[[1, 2], [1, 5], ...] # (1 is repeated)
[[1, 2], [1, 2], ...] # (repeated data)
but the following is valid:
[[1, 2], [3, 4], ...]

I think the following strategy satisfies your requirements:
import hypothesis.strategies as st
#st.composite
def unique_pair_lists(draw):
data = draw(st.lists(st.integers(), unique=True)
if len(data) % 2 != 0:
data.pop()
result = [data[i:i+2] for i in range(0, len(data), 2)]
for pair in result:
pair.sort()
return result
The idea here is that we generate something that gives the right elements, and then we transform it into something of the right shape. Rather than trying to generate pairs of lists of integers, we just generate a list of unique integers and then group them into pairs (we drop the last element if there's an odd number of integers). We then sort each pair to ensure it's in the right order.

David's solution permits an integer to appear in two sub-lists - for totally unique integers I'd use the following:
#st.composite
def list_of_pairs_of_unique_elements(draw):
seen = set()
new_int = st.integers(1, 1e2)\
.filter(lambda n: n not in seen)\ # Check that it's unique
.map(lambda n: seen.add(n) or n) # Add to filter before next draw
return draw(st.lists(st.tuples(new_int, new_int).map(sorted))
The .filter(...) method is probably what you're looking for.
.example() is only for interactive use - you'll get a warning (or error) if you use it in #given().
If you might end up filtering out most elements in the range (eg outer list of length > 30, meaning 60/100 possible unique elements), you might get better performance by creating a list of possible elements and popping out of it rather than rejecting seen elements.

Related

I like to eliminate duplicate component from double list and combine it

I like to combine double lists with duplicate components. It is python code.
import Tb= [[1],[1,3],[1,2],[5,7]]
expected output Tc=[[1,2,3],[5,7]]
for i in range(len(Tb)):
print(i)
for a in Tb[i]:
for j in range(len(Tb)):
if a in Tb[j]:
print('yes')
Tc.append(list(set(Tb[i]).union(set(Tb[j]))))
if len(Tb[i]) >= len(Tb[j]):
Tb.pop(j)
elif len(Tb[i])<=len(Tb[j]):
Tb.pop(i)
print(Tb)
print('#'*20)
print(Tc)
I got list index out of range error.

If you delete items from a list while you are iterating through it, the for-loop will eventually try to access items beyond the list size.
It is preferable to use a while loop for in-place deletion so that you can control the progression of the index in accordance with your item manipulations.
Tb= [[1],[1,3],[1,2],[5,7]]
i = 1
while i<len(Tb):
if set(Tb[i]).isdisjoint(Tb[i-1]):
i += 1
else:
Tb[i-1] = sorted({*Tb[i],*Tb.pop(i-1)})
print(Tb)
[[1, 2, 3], [5, 7]]
Unless your list is huge, you should consider creating a second list with the merged sublists, and not have worry about index progression at all:
Tb = [[1],[1,3],[1,2],[5,7]]
Tb2 = Tb[:1]
for t in Tb[1:]:
if set(t).isdisjoint(Tb2[-1]):
Tb2.append(t)
else:
Tb2[-1] = sorted({*t,*Tb2[-1]})
print(Tb2)
[[1, 2, 3], [5, 7]]

Create random sequence of comparison pairs (x, y) so that subsequent x and y values are not repeated

I have the following list:
item_list = [1, 2, 3, 4, 5]
I want to compare each item in the list to the other items to generate comparison pairs, such that the same comparisons (x, y) and (y, x) are not repeated (i.e. I don't want both [1, 5] and [5, 1]). For the 5 items in the list, this would generate a total of 10 comparison pairs (n*(n-1)/2). I also want to randomize the order of the pairs such that both x- and y-values aren't the same as the adjacent x- and y-values.
For example, this is fine:
[1, 5]
[3, 2]
[5, 4]
[4, 2]
...
But not this:
[1, 5]
[1, 4] <-- here the x-value is the same as the previous x-value
[2, 4] <-- here the y-value is the same as the previous y-value
[5, 3]
...
I have only been able to come up with a method in which I manually create the pairs by zipping two lists together (example below), but this is obviously very time-consuming (and would be even more so if I wanted to increase the list of items to 10, which would generate 45 pairs). I also can't randomize the order each time, otherwise I could get repetitions of the same x- or y-values.
x_list = [1, 4, 1, 3, 1, 4, 1, 2, 5, 3]
y_list = [2, 5, 3, 5, 4, 2, 5, 3, 2, 4]
zip_list = zip(x_list, y_list)
paired_list = list(zip_list)
print(paired_list)
I hope that makes sense. I am very new to coding, so any help would be much appreciated!
Edit: For context, my experiment involves displaying two images next to each other on the screen. I have a total of 5 images (labeled 1-5), hence the 5-item list. For each image pair, the participant must select one of the two images, which is why I don't want the same image displayed at the same time (e.g. [1, 1]), and I don't need the same pair repeated (e.g. [1, 5] and [5, 1]). I also want to make sure that each time the screen displays a new pair of images, both images, in their respective positions on the screen, change. So it doesn't matter if an image repeats in the sequence, so as long as it changes position (e.g. [4, 3] followed by [5, 4] is ok).

carrvo's answer is good, but doesn't guarantee the requirement that each iteration-step causes the x-value to change and the y-value to change.
(I'm also not a fan of mutability, shuffling in place, but in some contexts it's more performant)
I haven't thought of an elegant, concise implementation, but I do see a slightly clever trick: Because each pair appears only once, we're already guaranteed to have either x or y change, so if we see a pair for which they don't both change, we can just swap them.
I haven't tested this.
from itertools import combinations
from random import sample # not cryptographic secure.
def human_random_pairs(items):
n = len(items)
random_pairs = sample(combinations(items, 2),
n * (n - 1) / 2)
def generator():
old = random_pairs[0]
yield old
for new in random_pairs[1:]:
collision = old[0] == new[0] or old[1] == new[1] # or you can use any, a comprehension, and zip; your choice.
old = tuple(reversed(new)) if collision else new
yield old
return tuple(generator())
This wraps the output in a tuple; you can use a list if you like, or depending on your usage you can probably unwrap the inner function and just yield directly from human_random_pairs, in which case it will "return" the iterable/generator.
Oh, actually we can use itertools.accumulate:
from itertools import accumulate, combinations, starmap
from operator import eq
from random import sample # not cryptographic secure.
def human_random_pairs(items):
n = len(items)
def maybe_flip_second(fst, snd):
return tuple(reversed(snd)) if any(starmap(eq, zip(fst, snd))) else snd
return tuple( # this outer wrapper is optional
accumulate(sample(combinations(items, 2), n * (n - 1) / 2), # len(combinations) = n! / r! / (n-r)!
maybe_flip_second)
)

I had to look up how to generate combinations and random because I have not used them so often, but you should be looking for something like the following:
from itertools import combinations
from random import shuffle
item_list = range(1, 6) # [1, 2, 3, 4, 5]
paired_list = list(combinations(item_list, 2))
shuffle(paired_list)
print(paired_list)

Thank you for the contributions! I'm posting the solution I ended up using below for anyone who might be interested, which uses carrvo's code for generating random comparisons and the pair reversal idea from ShapeOfMatter. Overall does not look very elegant and can likely be simplified, but at least generates the desired output.
from itertools import combinations
import random
# Create image pair comparisons and randomize order
no_of_images = 5
image_list = range(1, no_of_images+1)
pairs_list = list(combinations(image_list, 2))
random.shuffle(pairs_list)
print(pairs_list)
# Create new comparisons sequence with no x- or y-value repeats, by reversing pairs that clash
trial_list = []
trial_list.append(pairs_list[0]) # append first image pair
binary_list = [0] # check if preceding pairs have been reversed or not (0 = not reversed, 1 = reversed)
# For subsequent pairs, if x- or y-values are repeated, reverse the pair
for i in range(len(pairs_list)-1):
# if previous pair was reversed, check against new pair
if binary_list[i] == 1:
if trial_list[i][0] == pairs_list[i+1][0] or trial_list[i][1] == pairs_list[i+1][1]:
trial_list.append(tuple(list(reversed(pairs_list[i+1])))) # if x- or y-value repeats, reverse pair
binary_list.append(1) # flag reversal
else:
trial_list.append(pairs_list[i+1])
binary_list.append(0)
# if previous pair was not reversed, check against old pair
elif binary_list[i] == 0:
if pairs_list[i][0] == pairs_list[i+1][0] or pairs_list[i][1] == pairs_list[i+1][1]:
trial_list.append(tuple(list(reversed(pairs_list[i+1])))) # if x- or y-value repeats, reverse pair
binary_list.append(1) # flag reversal
else:
trial_list.append(pairs_list[i+1])
binary_list.append(0)
print(trial_list)

Splitting list into 2 parts, as equal to sum as possible

I'm trying to wrap my head around this whole thing and I can't seem to figure it out. Basically, I have a list of ints. Adding up those int values equals 15. I want to split up a list into 2 parts, but at the same time, making each list as close as possible to each other in total sum. Sorry if I'm not explaining this good.
Example:
list = [4,1,8,6]
I want to achieve something like this:
list = [[8, 1][6,4]]
adding the first list up equals 9, and the other equals 10. That's perfect for what I want as they are as close as possible.
What I have now:
my_list = [4,1,8,6]
total_list_sum = 15
def divide_chunks(l, n):
# looping till length l
for i in range(0, len(l), n):
yield l[i:i + n]
n = 2
x = list(divide_chunks(my_list, n))
print (x)
But, that just splits it up into 2 parts.
Any help would be appreciated!

You could use a recursive algorithm and "brute force" partitioning of the list. Starting with a target difference of zero and progressively increasing your tolerance to the difference between the two lists:
def sumSplit(left,right=[],difference=0):
sumLeft,sumRight = sum(left),sum(right)
# stop recursion if left is smaller than right
if sumLeft<sumRight or len(left)<len(right): return
# return a solution if sums match the tolerance target
if sumLeft-sumRight == difference:
return left, right, difference
# recurse, brutally attempting to move each item to the right
for i,value in enumerate(left):
solution = sumSplit(left[:i]+left[i+1:],right+[value], difference)
if solution: return solution
if right or difference > 0: return
# allow for imperfect split (i.e. larger difference) ...
for targetDiff in range(1, sumLeft-min(left)+1):
solution = sumSplit(left, right, targetDiff)
if solution: return solution
# sumSplit returns the two lists and the difference between their sums
print(sumSplit([4,1,8,6])) # ([1, 8], [4, 6], 1)
print(sumSplit([5,3,2,2,2,1])) # ([2, 2, 2, 1], [5, 3], 1)
print(sumSplit([1,2,3,4,6])) # ([1, 3, 4], [2, 6], 0)

Use itertools.combinations (details here). First let's define some functions:
def difference(sublist1, sublist2):
return abs(sum(sublist1) - sum(sublist2))
def complement(sublist, my_list):
complement = my_list[:]
for x in sublist:
complement.remove(x)
return complement
The function difference calculates the "distance" between lists, i.e, how similar the sums of the two lists are. complement returns the elements of my_list that are not in sublist.
Finally, what you are looking for:
def divide(my_list):
lower_difference = sum(my_list) + 1
for i in range(1, int(len(my_list)/2)+1):
for partition in combinations(my_list, i):
partition = list(partition)
remainder = complement(partition, my_list)
diff = difference(partition, remainder)
if diff < lower_difference:
lower_difference = diff
solution = [partition, remainder]
return solution
test1 = [4,1,8,6]
print(divide(test1)) #[[4, 6], [1, 8]]
test2 = [5,3,2,2,2,1]
print(divide(test2)) #[[5, 3], [2, 2, 2, 1]]
Basically, it tries with every possible division of sublists and returns the one with the minimum "distance".
If you want to make it a a little bit faster you could return the first combination whose difference is 0.

I think what you're looking for is a hill climbing algorithm. I'm not sure this will cover all cases but at least works for your example. I'll update this if I think of a counter example or something.
Let's call your list of numbers vals.
vals.sort(reverse=true)
a,b=[],[]
for v in vals:
if sum(a)<sum(b):
a.append(v)
else:
b.append(v)

How to use itertools.combinations_with_replacement by avoiding particular type of combination

How can I use
itertools.combinations_with_replacement
by leaving out some particular type of combinations. In the case
list(combinations_with_replacement([1, 2, 3, 4], 3))
I need to avoid (1,1,1), (2,2,2), (3,3,3), (4,4,4) by leaving all the rest.
Here is a good reference, but I could not find what I need.

You may combine itertools.combinations_with_replacement() with a simple filter() function:
combinations = combinations_with_replacement([1, 2, 3, 4], 3)
filtered = filter(lambda c: len(set(c)) > 1, combinations)
You are in charge of selecting which combinations should be filtered or not. Here using a lambda function: if all elements are the same, discard it.

If the input list is sorted or contains distinct elements, this would also yield the intended result:
combinations = combinations_with_replacement([1, 2, 3, 4], 3)
filtered = (c for c in combinations if c[0] != c[-1]) # Use square brackets if a list is needed
The solution works because in each generated combination tuple, elements are sorted according to their indices in the input list. Therefore if c[0] == c[-1], then for any element e such that index(c[0])<=index(e)<=index(c[-1]), c[0]==e==c[-1] holds according to the constraint of the input.

How to change the index in a list of lists

I would like to change the way a list of lists in indexed.
Suppose my initial list is two lists of one list and two lists of three elements. For example:
L = [[[1, 2, 3]], [[4, 5, 6], [7, 8, 9]]]
Then let say I want to take '4' in L, I must do
L[1][0][0].
Now I'd like to create a new list such that the last indexing become the first one
Lnew = [[[1], [4, 7]], [[2], [5, 8]], [[3], [6, 9]]]
And then for taking '4' I have to do:
Lnew[0][1][0]
In a more general case, I'd like to create the list Lnew defined by:
Lnew[i][k][l] = L[k][l][i]
Is there a way to do this kind of permutation of the index without doing the following loops:
Lnew = []
for i in range(len(Payment_dates)):
L1 = []
for k in range(N+1):
L2 = []
for l in range(k+1):
L2.append(L[k][l][i])
L1.append(L2)
Lnew.append(L1)
Which is not very optimal in term of complexity.
Thanks

What you'd like to achieve, presupposes that all sublists have the same length.
If they have differing lengths, you may wish to append zeros to all sublists until they have the length of the longest sublist or (which is easier) until infinity.
The same behaviour can be achieved by using a function to access the elements of the list. You can call this function during runtime every time you need an element of the list:
def getElement(myList, i, k, l):
if k < myList.length and l < myList[k].length and i < myList[k][l].length:
return myList[k][l][i]
else:
return None # or zero, or whatever you prefer
Depending on your code structure, you might not need this as a function and you can just put the conditions inside of your code.
You can also nest the if-conditions and throw different errors or return different values depending on what level the element does not exist.
If we neglect the time complexity of outputting a multidimensional list's element, this approach should decrease your time complexity from O(n^3) to O(1).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

generating conditional data with Hypothesis Python - python

Related

I like to eliminate duplicate component from double list and combine it

Create random sequence of comparison pairs (x, y) so that subsequent x and y values are not repeated

Splitting list into 2 parts, as equal to sum as possible

How to use itertools.combinations_with_replacement by avoiding particular type of combination

How to change the index in a list of lists

Categories

Resources