randomly partition list without duplicates - python

I've got an array that contains each of a set of numbers n times. Example with n=2:
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
What I would like is a partition of this array in which the members of the partition
contain elements that are drawn randomly from the array
contain no duplicates
contain the same number of elements (up to rounding) k
Example output for k=4:
[[3,0,2,1], [0,1,4,2], [3,4]]
Invalid output for k=4:
[[3,0,2,2], [3,1,4,0], [1,4]]
(this is a partition but the first element of the partition contains duplicates)
What's the most pythonic way of achieving this?

A combination of collections.Counter and random.sample can be used:
from collections import Counter
import random
def random_partition(seq, k):
cnts = Counter(seq)
# as long as there are enough items to "sample" take a random sample
while len(cnts) >= k:
sample = random.sample(list(cnts), k)
cnts -= Counter(sample)
yield sample
# Fewer different items than the sample size, just return the unique
# items until the Counter is empty
while cnts:
sample = list(cnts)
cnts -= Counter(sample)
yield sample
This is a generator that yields the samples, so you can simply cast it to a list:
>>> l = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
>>> list(random_partition(l, 4))
[[1, 0, 2, 4], [1, 0, 2, 3], [3, 4]]
>>> list(random_partition(l, 2))
[[1, 0], [3, 0], [1, 4], [2, 3], [4, 2]]
>>> list(random_partition(l, 6))
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
>>> list(random_partition(l, 4))
[[4, 1, 0, 3], [1, 3, 4, 0], [2], [2]]
The last case shows that this method can give weird results if the "random" part in the function returns the "wrong" samples. If that shouldn't happen or at least not often you need to figure out how the samples could be weighted (for example using random.choices) to minimize that possibility.

Related

python replace value in array based on previous and following value in column

given the following array, I want to replace the zero with their previous value columnwise as long as it is surrounded by two values greater than zero.
I am aware of np.where but it would consider the whole array instead of its columns.
I am not sure how to do it and help would be appreciated.
This is the array:
a=np.array([[4, 3, 3, 2],
[0, 0, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
and since the only zero that meets this condition is the second row/second column one,
the new array should be the following
new_a=np.array([[4, 3, 3, 2],
[0, 3, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
How do I accomplish this?
And what if I would like to extend the gap surrounded by nonzero ? For instance, the first column contains two 0 and the second column contains one 0, so the new array would be
new_a=np.array([[4, 3, 3, 2],
[4, 3, 1, 2],
[4, 4, 2, 4],
[2, 4, 3, 0]])
In short, how do I solve this if the columnwise condition would be the one of having N consecutive zeros or less?
As a generic method, I would approach this using a convolution:
from scipy.signal import convolve2d
# kernel for top/down neighbors
kernel = np.array([[1],
[0],
[1]])
# is the value a zero?
m1 = a==0
# count non-zeros neighbors
m2 = convolve2d(~m1, kernel, mode='same') > 1
mask = m1&m2
# replace matching values with previous row value
a[mask] = np.roll(a, 1, axis=0)[mask]
output:
array([[4, 3, 3, 2],
[0, 3, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
filling from surrounding values
Using pandas to benefit from ffill/bfill (you can forward-fill in pure numpy but its more complex):
import pandas as pd
df = pd.DataFrame(a)
# limit for neighbors
N = 2
# identify non-zeros
m = df.ne(0)
# mask zeros
m2 = m.where(m)
# mask for values with 2 neighbors within limits
mask = m2.ffill(limit=N) & m2.bfill(limit=N)
df.mask(mask&~m).ffill()
array([[4, 3, 3, 2],
[4, 3, 1, 2],
[4, 4, 2, 4],
[2, 4, 3, 0]])
That's one solution I found. I know it's basic but I think it works.
a=np.array([[4, 3, 3, 2],
[0, 0, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
a_t = a.T
for i in range(len(a_t)):
ar = a_t[i]
for j in range(len(ar)-1):
if (j>0) and (ar[j] == 0) and (ar[j+1] > 0):
a_t[i][j] = a_t[i][j-1]
a = a_t.T

Python: How to create a list of integers depending on a specific distribution

Is there a way in python/numpy/scipy to create dynamically a list of integers in a specific range, which can vary and in which the numbers are ordererd depending on a distribtuin, like nomral(gaussian), exponential, linear. I imagine something
like for range 3:
[1,2,3]
[2,1,2]
[1,2,1]
[3,2,1]
for range 4:
[1,2,3,4]
[2,1,1,2]
[1,2,2,1]
[4,3,2,1]
for range 5:
[1,2,3,4,5]
[2,1,0,1,2]
[1,2,3,2,1]
[5,4,3,2,1]
We could use a bit of trickery using np.minimum to generate the symmetrical version in third row. The second row is just a complement of the third row subtracted from 3. The first and last rows are just ranges starting from 1 till n and flipped version of it respectively.
Thus, we would have one approach after row-stacking those rows to have a 2D array, like so -
def ranged_arr(n):
r = np.arange(n)+1
row3 = np.minimum(r,r[::-1])
return np.c_[r, 3-row3, row3, r[::-1]].T
We could also use np.row_stack to do the stacking -
np.row_stack((r, 3-row3, row3, r[::-1]))
Sample runs -
In [106]: ranged_arr(n=3)
Out[106]:
array([[1, 2, 3],
[2, 1, 2],
[1, 2, 1],
[3, 2, 1]])
In [107]: ranged_arr(n=4)
Out[107]:
array([[1, 2, 3, 4],
[2, 1, 1, 2],
[1, 2, 2, 1],
[4, 3, 2, 1]])
In [108]: ranged_arr(n=5)
Out[108]:
array([[1, 2, 3, 4, 5],
[2, 1, 0, 1, 2],
[1, 2, 3, 2, 1],
[5, 4, 3, 2, 1]])

Split a list into increasing sequences using itertools

I have a list with mixed sequences like
[1,2,3,4,5,2,3,4,1,2]
I want to know how I can use itertools to split the list into increasing sequences cutting the list at decreasing points. For instance the above would output
[[1, 2, 3, 4, 5], [2, 3, 4], [1, 2]]
this has been obtained by noting that the sequence decreases at 2 so we cut the first bit there and another decrease is at one cutting again there.
Another example is with the sequence
[3,2,1]
the output should be
[[3], [2], [1]]
In the event that the given sequence is increasing we return the same sequence. For example
[1,2,3]
returns the same result. i.e
[[1, 2, 3]]
For a repeating list like
[ 1, 2,2,2, 1, 2, 3, 3, 1,1,1, 2, 3, 4, 1, 2, 3, 4, 5, 6]
the output should be
[[1, 2, 2, 2], [1, 2, 3, 3], [1, 1, 1, 2, 3, 4], [1, 2, 3, 4, 5, 6]]
What I did to achieve this is define the following function
def splitter (L):
result = []
tmp = 0
initialPoint=0
for i in range(len(L)):
if (L[i] < tmp):
tmpp = L[initialPoint:i]
result.append(tmpp)
initialPoint=i
tmp = L[i]
result.append(L[initialPoint:])
return result
The function is working 100% but what I need is to do the same with itertools so that I can improve efficiency of my code. Is there a way to do this with itertools package to avoid the explicit looping?
With numpy, you can use numpy.split, this requires the index as split positions; since you want to split where the value decreases, you can use numpy.diff to calculate the difference and check where the difference is smaller than zero and use numpy.where to retrieve corresponding indices, an example with the last case in the question:
import numpy as np
lst = [ 1, 2,2,2, 1, 2, 3, 3, 1,1,1, 2, 3, 4, 1, 2, 3, 4, 5, 6]
np.split(lst, np.where(np.diff(lst) < 0)[0] + 1)
# [array([1, 2, 2, 2]),
# array([1, 2, 3, 3]),
# array([1, 1, 1, 2, 3, 4]),
# array([1, 2, 3, 4, 5, 6])]
Psidom already has you covered with a good answer, but another NumPy solution would be to use scipy.signal.argrelmax to acquire the local maxima, then np.split.
from scipy.signal import argrelmax
arr = np.random.randint(1000, size=10**6)
splits = np.split(arr, argrelmax(arr)[0]+1)
Assume your original input array:
a = [1, 2, 3, 4, 5, 2, 3, 4, 1, 2]
First find the places where the splits shall occur:
p = [ i+1 for i, (x, y) in enumerate(zip(a, a[1:])) if x > y ]
Then create slices for each such split:
print [ a[m:n] for m, n in zip([ 0 ] + p, p + [ None ]) ]
This will print this:
[[1, 2, 3, 4, 5], [2, 3, 4], [1, 2]]
I propose to use more speaking names than p, n, m, etc. ;-)

Confounding recursive list append in Python

I'm trying to create a pair of functions that, given a list of "starting" numbers, will recursively add to each index position up to a defined maximum value (much in the same way that a odometer works in a car--each counter wheel increasing to 9 before resetting to 1 and carrying over onto the next wheel).
The code looks like this:
number_list = []
def counter(start, i, max_count):
if start[len(start)-1-i] < max_count:
start[len(start)-1-i] += 1
return(start, i, max_count)
else:
for j in range (len(start)):
if start[len(start)-1-i-j] == max_count:
start[len(start)-1-i-j] = 1
else:
start[len(start)-1-i-j] += 1
return(start, i, max_count)
def all_values(fresh_start, i, max_count):
number_list.append(fresh_start)
new_values = counter(fresh_start,i,max_count)
if new_values != None:
all_values(*new_values)
When I run all_values([1,1,1],0,3) and print number_list, though, I get:
[[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1],
[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1],
[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1],
[1, 1, 1], [1, 1, 1], [1, 1, 1]]
Which is unfortunate. Doubly so knowing that if I replace the first line of all_values with
print(fresh_start)
I get exactly what I'm after:
[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 1]
[1, 2, 2]
[1, 2, 3]
[1, 3, 1]
[1, 3, 2]
[1, 3, 3]
[2, 1, 1]
[2, 1, 2]
[2, 1, 3]
[2, 2, 1]
[2, 2, 2]
[2, 2, 3]
[2, 3, 1]
[2, 3, 2]
[2, 3, 3]
[3, 1, 1]
[3, 1, 2]
[3, 1, 3]
[3, 2, 1]
[3, 2, 2]
[3, 2, 3]
[3, 3, 1]
[3, 3, 2]
[3, 3, 3]
I have already tried making a copy of fresh_start (by way of temp = fresh_start) and appending that instead, but with no change in the output.
Can anyone offer any insight as to what I might do to fix my code? Feedback on how the problem could be simplified would be welcome as well.
Thanks a lot!
temp = fresh_start
does not make a copy. Appending doesn't make copies, assignment doesn't make copies, and pretty much anything that doesn't say it makes a copy doesn't make a copy. If you want a copy, slice it:
fresh_start[:]
is a copy.
Try the following in the Python interpreter:
>>> a = [1,1,1]
>>> b = []
>>> b.append(a)
>>> b.append(a)
>>> b.append(a)
>>> b
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
>>> b[2][2] = 2
>>> b
[[1, 1, 2], [1, 1, 2], [1, 1, 2]]
This is a simplified version of what's happening in your code. But why is it happening?
b.append(a) isn't actually making a copy of a and stuffing it into the array at b. It's making a reference to a. It's like a bookmark in a web browser: when you open a webpage using a bookmark, you expect to see the webpage as it is now, not as it was when you bookmarked it. But that also means that if you have multiple bookmarks to the same page, and that page changes, you'll see the changed version no matter which bookmark you follow.
It's the same story with temp = a, and for that matter, a = [1,1,1]. temp and a are "bookmarks" to a particular array which happens to contain three ones. And b in the example above, is a bookmark to an array... which contains three bookmarks to that same array that contains three ones.
So what you do is create a new array and copy in the elements of the old array. The quickest way to do that is to take an array slice containing the whole array, as user2357112 demonstrated:
>>> a = [1,1,1]
>>> b = []
>>> b.append(a[:])
>>> b.append(a[:])
>>> b.append(a[:])
>>> b
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
>>> b[2][2] = 2
>>> b
[[1, 1, 1], [1, 1, 1], [1, 1, 2]]
Much better.
When I look at the desired output I can't help but think about using one of the numpy grid data production functions.
import numpy
first_column, second_column, third_column = numpy.mgrid[1:4,1:4,1:4]
numpy.dstack((first_column.flatten(),second_column.flatten(),third_column.flatten()))
Out[23]:
array([[[1, 1, 1],
[1, 1, 2],
[1, 1, 3],
[1, 2, 1],
[1, 2, 2],
[1, 2, 3],
[1, 3, 1],
[1, 3, 2],
[1, 3, 3],
[2, 1, 1],
[2, 1, 2],
[2, 1, 3],
[2, 2, 1],
[2, 2, 2],
[2, 2, 3],
[2, 3, 1],
[2, 3, 2],
[2, 3, 3],
[3, 1, 1],
[3, 1, 2],
[3, 1, 3],
[3, 2, 1],
[3, 2, 2],
[3, 2, 3],
[3, 3, 1],
[3, 3, 2],
[3, 3, 3]]])
Of course, the utility of this particular approach might depend on the variety of input you need to deal with, but I suspect this could be an interesting way to build the data and numpy is pretty fast for this kind of thing. Presumably if your input list has more elements you could have more min:max arguments fed into mgrid[] and then unpack / stack in a similar fashion.
Here is a simplified version of your program, which works. Comments will follow.
number_list = []
def _adjust_counter_value(counter, n, max_count):
"""
We want the counter to go from 1 to max_count, then start over at 1.
This function adds n to the counter and then returns a tuple:
(new_counter_value, carry_to_next_counter)
"""
assert max_count >= 1
assert 1 <= counter <= max_count
# Counter is in closed range: [1, max_count]
# Subtract 1 so expected value is in closed range [0, max_count - 1]
x = counter - 1 + n
carry, x = divmod(x, max_count)
# Add 1 so expected value is in closed range [1, max_count]
counter = x + 1
return (counter, carry)
def increment_counter(start, i, max_count):
last = len(start) - 1 - i
copy = start[:] # make a copy of the start
add = 1 # start by adding 1 to index
for i_cur in range(last, -1, -1):
copy[i_cur], add = _adjust_counter_value(copy[i_cur], add, max_count)
if 0 == add:
return (copy, i, max_count)
else:
# if we have a carry out of the 0th position, we are done with the sequence
return None
def all_values(fresh_start, i, max_count):
number_list.append(fresh_start)
new_values = increment_counter(fresh_start,i,max_count)
if new_values != None:
all_values(*new_values)
all_values([1,1,1],0,3)
import itertools as it
correct = [list(tup) for tup in it.product(range(1,4), range(1,4), range(1,4))]
assert number_list == correct
Since you want the counters to go from 1 through max_count inclusive, it's a little bit tricky to update each counter. Your original solution was to use several if statements, but here I have made a helper function that uses divmod() to compute each new digit. This lets us add any increment to any digit and will find the correct carry out of the digit.
Your original program never changed the value of i so my revised one doesn't either. You could simplify the program further by getting rid of i and just having increment_counter() always go to the last position.
If you run a for loop to the end without calling break or return, the else: case will then run if there is one present. Here I added an else: case to handle a carry out of the 0th place in the list. If there is a carry out of the 0th place, that means we have reached the end of the counter sequence. In this case we return None.
Your original program is kind of tricky. It has two explicit return statements in counter() and an implicit return at the end of the sequence. It does return None to signal that the recursion can stop, but the way it does it is too tricky for my taste. I recommend using an explicit return None as I showed.
Note that Python has a module itertools that includes a way to generate a counter series like this. I used it to check that the result is correct.
I'm sure you are writing this to learn about recursion, but be advised that Python isn't the best language for recursive solutions like this one. Python has a relatively shallow recursion stack, and does not automatically turn tail recursion into an iterative loop, so this could cause a stack overflow inside Python if your recursive calls nest enough times. The best solution in Python would be to use itertools.product() as I did to just directly generate the desired counter sequence.
Since your generated sequence is a list of lists, and itertools.product() produces tuples, I used a list comprehension to convert each tuple into a list, so the end result is a list of lists, and we can simply use the Python == operator to compare them.

Combinations Including Select Elements (Python)

In order to make the set of all combinations of numbers 0 to x, with length y, we do:
list_of_combinations=list(combinations(range(0,x+1),y))
list_of_combinations=map(list,list_of_combinations)
print list_of_combinations
This will output the result as a list of lists.
For example, x=4, y=3:
[[0, 1, 2], [0, 1, 3], [0, 1, 4], [0, 2, 3], [0, 2, 4], [0, 3, 4], [1, 2, 3], [1, 2, 4],
[1, 3, 4], [2, 3, 4]]
I am trying to do the above, but only outputting lists that have 2 members chosen beforehand.
For instance, I would like to only output the set of the combos that has 1 and 4 inside it. The output would then be (for x=4, y=3):
[[0, 1, 4], [1, 2, 4], [1, 3, 4]]
The best approach I have now is to make a list that is y-2 length with all numbers of the set without the chosen numbers, and then append the chosen numbers, but this seems very inefficient. Any help appreciated.
*Edit: I am doing this for large x and y, so I can't just write out all the combos and then search for the selected elements, I need to find a better method.
combinations() returns an iterable, so loop over that while producing the list:
[list(combo) for combo in combinations(range(x + 1), y) if 1 in combo]
This produces one list, the list of all combinations that match the criteria.
Demo:
>>> from itertools import combinations
>>> x, y = 4, 3
>>> [list(combo) for combo in combinations(range(x + 1), y) if 1 in combo]
[[0, 1, 2], [0, 1, 3], [0, 1, 4], [1, 2, 3], [1, 2, 4], [1, 3, 4]]
The alternative would be to produce y - 1 combinations of range(x + 1) with 1 removed, then adding 1 back in (using bisect.insort() to avoid having to sort afterwards):
import bisect
def combinations_with_guaranteed(x, y, *guaranteed):
values = set(range(x + 1))
values.difference_update(guaranteed)
for combo in combinations(sorted(values), y - len(guaranteed)):
combo = list(combo)
for value in guaranteed:
bisect.insort(combo, value)
yield combo
then loop over that generator:
>>> list(combinations_with_guaranteed(4, 3, 1))
[[0, 1, 2], [0, 1, 3], [0, 1, 4], [1, 2, 3], [1, 2, 4], [1, 3, 4]]
>>> list(combinations_with_guaranteed(4, 3, 1, 2))
[[0, 1, 2], [1, 2, 3], [1, 2, 4]]
This won't produce as many combinations for filtering to discard again.
It may well be that for larger values of y and guaranteed numbers, just using yield sorted(combo + values) is going to beat repeated bisect.insort() calls.
This should do the trick:
filtered_list = filter(lambda x: 1 in x and 4 in x, list_of_combinations)
To make your code nicer (use more generators), I'd use this
combs = combinations(xrange(0, x+1), y)
filtered_list = map(list, filter(lambda x: 1 in x and 4 in x, combs))
If you don't need the filtered_list to be a list and it can be an iterable, you could even do
from itertools import ifilter, imap, combinations
combs = combinations(xrange(0, x+1), y)
filtered_list = imap(list, ifilter(lambda x: 1 in x and 4 in x, combs))
filtered_list.next()
> [0, 1, 4]
filtered_list.next()
> [1, 2, 4]
filtered_list.next()
> [1, 3, 4]
filtered_list.next()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> StopIteration

Categories