Sample k random permutations without replacement in O(N) - python

I need a number of unique random permutations of a list without replacement, efficiently. My current approach:
total_permutations = math.factorial(len(population))
permutation_indices = random.sample(xrange(total_permutations), k)
k_permutations = [get_nth_permutation(population, x) for x in permutation_indices]
where get_nth_permutation does exactly what it sounds like, efficiently (meaning O(N)). However, this only works for len(population) <= 20, simply because 21! is so mindblowingly long that xrange(math.factorial(21)) won't work:
OverflowError: Python int too large to convert to C long
Is there a better algorithm to sample k unique permutations without replacement in O(N)?

Up to a certain point, it's unnecessary to use get_nth_permutation to get permutations. Just shuffle the list!
>>> import random
>>> l = range(21)
>>> def random_permutations(l, n):
... while n:
... random.shuffle(l)
... yield list(l)
... n -= 1
...
>>> list(random_permutations(l, 5))
[[11, 19, 6, 10, 0, 3, 12, 7, 8, 16, 15, 5, 14, 9, 20, 2, 1, 13, 17, 18, 4],
[14, 8, 12, 3, 5, 20, 19, 13, 6, 18, 9, 16, 2, 10, 4, 1, 17, 15, 0, 7, 11],
[7, 20, 3, 8, 18, 17, 4, 11, 15, 6, 16, 1, 14, 0, 13, 5, 10, 9, 2, 19, 12],
[10, 14, 5, 17, 8, 15, 13, 0, 3, 16, 20, 18, 19, 11, 2, 9, 6, 12, 7, 4, 1],
[1, 13, 15, 18, 16, 6, 19, 8, 11, 12, 10, 20, 3, 4, 17, 0, 9, 5, 2, 7, 14]]
The odds are overwhelmingly against duplicates appearing in this list for len(l) > 15 and n < 100000, but if you need guarantees, or for lower values of len(l), just use a set to record and skip duplicates if that's a concern (though as you've observed in your comments, if n gets close to len(l)!, this will stall). Something like:
def random_permutations(l, n):
pset = set()
while len(pset) < n:
random.shuffle(l)
pset.add(tuple(l))
return pset
However, as len(l) gets longer and longer, random.shuffle becomes less reliable, because the number of possible permutations of the list increases beyond the period of the random number generator! So not all permutations of l can be generated that way. At that point, not only do you need to map get_nth_permutation over a sequence of random numbers, you also need a random number generator capable of producing every random number between 0 and len(l)! with relatively uniform distribution. That might require you to find a more robust source of randomness.
However, once you have that, the solution is as simple as Mark Ransom's answer.
To understand why random.shuffle becomes unreliable for large len(l), consider the following. random.shuffle only needs to pick random numbers between 0 and len(l) - 1. But it picks those numbers based on its internal state, and it can take only a finite (and fixed) number of states. Likewise, the number of possible seed values you can pass to it is finite. This means that the set of unique sequences of numbers it can generate is also finite; call that set s. For len(l)! > len(s), some permutations can never be generated, because the sequences that correspond to those permutations aren't in s.
What are the exact lengths at which this becomes a problem? I'm not sure. But for what it's worth, the period of the mersenne twister, as implemented by random, is 2**19937-1. The shuffle docs reiterate my point in a general way; see also what Wikipedia has to say on the matter here.

Instead of using xrange simply keep generating random numbers until you have as many as you need. Using a set makes sure they're all unique.
permutation_indices = set()
while len(permutation_indices) < k:
permutation_indices.add(random.randrange(total_permutations))

I had one implementation of nth_permutation (not sure from where I got it) which I modified for your purpose. I believe this would be fast enough to suit your need
>>> def get_nth_permutation(population):
total_permutations = math.factorial(len(population))
while True:
temp_population = population[:]
n = random.randint(1,total_permutations)
size = len(temp_population)
def generate(s,n,population):
for x in range(s-1,-1,-1):
fact = math.factorial(x)
d = n/fact
n -= d * fact
yield temp_population[d]
temp_population.pop(d)
next_perm = generate(size,n,population)
yield [e for e in next_perm]
>>> nth_perm = get_nth_permutation(range(21))
>>> [next(nth_perm) for k in range(1,10)]

You seem to be searching for the Knuth Shuffle! Good luck!

You could use itertools.islice instead of xrange():
CPython implementation detail: xrange() is intended to be simple and
fast Implementations may impose restrictions to achieve this. The C
implementation of Python restricts all arguments to native C longs
(“short” Python integers), and also requires that the number of
elements fit in a native C long. If a larger range is needed, an
alternate version can be crafted using the itertools module:
islice(count(start, step), (stop-start+step-1+2*(step<0))//step).

Related

Finding where a given number falls in a partition

Suppose I have a sorted array of integers say
partition = [0, 3, 7, 12, 18, 23, 27]
and then given a value
value = 9
I would like to return the interval on which my value sits. For example
bounds = function(partition, value)
print(bounds)
>>>[7,12]
Is there a function out there that might be able to help me or do I have to build this from scratch?
Try numpy.searchsorted(). From the documentary:
Find indices where elements should be inserted to maintain order.
import numpy as np
partition = np.array( [0, 3, 7, 12, 18, 23, 27] )
value = 9
idx = np.searchsorted(partition,value)
bound = (partition[idx-1],partition[idx])
print(bound)
>>>>(7,12)
The advantage of searchsorted is that it can give you the index for multiple values at once.
The bisect module is nice for doing this efficiently. It will return the index of the higher bound.
You'll need to do some error checking if the value can fall outside the bounds:
from bisect import bisect
partition = [0, 3, 7, 12, 18, 23, 27]
value = 9
top = bisect(partition, value)
print(partition[top-1], partition[top])
# 7 12
def function(partition,value):
for i in range(len(partition)):
if partition[i]<value and partition[i+1]>value:
print [partition[i],partition[i+1]]
partition = [0, 3, 7, 12, 18, 23, 27,5,10]
value=9
function(partition,value)

Working around evaluation time discrepancy in generators

I found myself running into the gotcha under 'evaluation time discrepancy' from this list today, and am having a hard time working around it.
As a short demonstration of my problem, I make infinite generators that skip every nth number, with n going from [2..5]:
from itertools import count
skip_lists = []
for idx in range(2, 5):
# skip every 2nd, 3rd, 4th.. number
skip_lists.append(x for x in count() if (x % idx) != 0)
# print first 10 numbers of every skip_list
for skip_list in skip_lists:
for _, num in zip(range(10), skip_list):
print("{}, ".format(num), end="")
print()
Expected output:
1, 3, 5, 7, 9, 11, 13, 15, 17, 19,
1, 2, 4, 5, 7, 8, 10, 11, 13, 14,
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
Actual output:
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
Once I remembered that great feature, I tried to "solve" it by binding the if clause variable to a constant that would be part of the skip_list:
from itertools import count
skip_lists = []
for idx in range(2, 5):
# bind the skip distance
skip_lists.append([idx])
# same as in the first try, but use bound value instead of 'idx'
skip_lists[-1].append(x for x in count() if (x % skip_lists[-1][0]) != 0)
# print first 10 numbers of every skip_list
for skip_list in (entry[1] for entry in skip_lists):
for _, num in zip(range(10), skip_list):
print("{}, ".format(num), end="")
print()
But again:
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
Apart from an actual solution, I would also love to learn why my hack didn't work.
The value of idx is never looked up until you start iterating on the generators (generators are evaluated lazily), at which point idx = 4 the latest iteratee value, is what is present in the module scope.
You can make each appended generator stateful in idx by passing idx to a function and reading the value from the function scope at each generator's evaluation time. This exploits the fact that the iterable source of a generator expression is evaluated at the gen. exp's creation time, so the function is called at each iteration of the loop, and idx is safely stored away in the function scope:
from itertools import count
skip_lists = []
def skip_count(skip):
return (x for x in count() if (x % skip) != 0)
for idx in range(2, 5):
# skip every 2nd, 3rd, 4th.. number
skip_lists.append(skip_count(idx))
Illustration of generator expression's iterable source evaluation at gen. exp's creation:
>>> (i for i in 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
Your case is a bit trickier since the exclusions are actually done in a filter which is not evaluated at the gen exp's creation time:
>>> (i for i in range(2) if i in 5)
<generator object <genexpr> at 0x109a0da50>
The more reason why the for loop and filter all need to be moved into a scope that stores idx; not just the filter.
On a different note, you can use itertools.islice instead of the inefficient logic you're using to print a slice of the generator expressions:
from itertools import islice
for skip_list in skip_lists:
for num in islice(skip_list, 10):
print("{}, ".format(num), end="")
print()

Compare nums need optimisation (codingame.com)

www.codingame.com
Task
Write a program which, using a given number of strengths,
identifies the two closest strengths and shows their difference with an integer
Info
n = Number of horses
pi = strength of each horse
d = difference
1 < n < 100000
0 < pi ≤ 10000000
My code currently
def get_dif(a, b):
return abs(a - b)
horse_str = [10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7]
n = len(horse_str)
d = 10000001
for x in range(len(horse_str)):
for y in range(x, len(horse_str) - 1):
d = min([get_dif(horse_str[x], horse_str[y + 1]), d])
print(d)
Test cases
[3,5,8, 9] outputs: 1
[10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7] outputs: 1
Problem
They both work but then the next test gives me a very long list of horse strengths and i get **Process has timed out. This may mean that your solution is not optimized enough to handle some cases.
How can i optimise it? Thank you!
EDIT ONE
Default code given
import sys
import math
# Auto-generated code below aims at helping you parse
# the standard input according to the problem statement.
n = int(input())
for i in range(n):
pi = int(input())
# Write an action using print
# To debug: print("Debug messages...", file=sys.stderr)
print("answer")
Since you can use sort method (which is optimized to avoid performing a costly bubble sort or double loop by hand which has O(n**2) complexity, and times out with a very big list), let me propose something:
sort the list
compute the minimum of absolute value of difference of the adjacent values, passing a generator comprehension to the min function
The minimum has to be the abs difference of adjacent values. Since the list is sorted using a fast algorithm, the heavy lifting is done for you.
like this:
horse_str = [10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7]
sh = sorted(horse_str)
print(min(abs(sh[i]-sh[i+1]) for i in range(len(sh)-1)))
I also get 1 as a result (I hope I didn't miss anything)

Find lists which together contain all values from 0-23 in list of lists python

I have a list of lists. The lists within these list look like the following:
[0,2,5,8,7,12,16,18], [0,9,18,23,5,8,15,16], [1,3,4,17,19,6,13,23],
[9,22,21,10,11,20,14,15], [2,8,23,0,7,16,9,15], [0,5,8,7,9,11,20,16]
Every small list has 8 values from 0-23 and there are no value repeats within a small list.
What I need now are the three lists which have the values 0-23 stored. It is possible that there are a couple of combinations to accomplish it but I do only need one.
In this particular case the output would be:
[0,2,5,8,7,12,16,18], [1,3,4,17,19,6,13,23], [9,22,21,10,11,20,14,15]
I thought to do something with the order but I'm not a python pro so it is hard for me to handle all the lists within the list (to compare all).
Thanks for your help.
The following appears to work:
from itertools import combinations, chain
lol = [[0,2,5,8,7,12,16,18], [0,9,18,23,5,8,15,16], [1,3,4,17,19,6,13,23], [9,22,21,10,11,20,14,15], [2,8,23,0,7,16,9,15], [0,5,8,7,9,11,20,16]]
for p in combinations(lol, 3):
if len(set((list(chain.from_iterable(p))))) == 24:
print(p)
break # if only one is required
This displays the following:
([0, 2, 5, 8, 7, 12, 16, 18], [1, 3, 4, 17, 19, 6, 13, 23], [9, 22, 21, 10, 11, 20, 14, 15])
If it will always happen that 3 list will form numbers from 0-23, and you only want first list, then this can be done by creating combinations of length 3, and then set intersection:
>>> li = [[0,2,5,8,7,12,16,18], [0,9,18,23,5,8,15,16], [1,3,4,17,19,6,13,23], [9,22,21,10,11,20,14,15], [2,8,23,0,7,16,9,15], [0,5,8,7,9,11,20,16]]
>>> import itertools
>>> for t in itertools.combinations(li, 3):
... if not set(t[0]) & set(t[1]) and not set(t[0]) & set(t[2]) and not set(t[1]) & set(t[2]):
... print t
... break
([0, 2, 5, 8, 7, 12, 16, 18], [1, 3, 4, 17, 19, 6, 13, 23], [9, 22, 21, 10, 11, 20, 14, 15])
Let's do a recursive solution.
We need a list of lists that contain these values:
target_set = set(range(24))
This is a function that recursively tries to find a list of lists that match exactly that set:
def find_covering_lists(target_set, list_of_lists):
if not target_set:
# Done
return []
if not list_of_lists:
# Failed
raise ValueError()
# Two cases -- either the first element works, or it doesn't
try:
first_as_set = set(list_of_lists[0])
if first_as_set <= target_set:
# If it's a subset, call this recursively for the rest
return [list_of_lists[0]] + find_covering_lists(
target_set - first_as_set, list_of_lists[1:])
except ValueError:
pass # The recursive call failed to find a solution
# If we get here, the first element failed.
return find_covering_lists(target_set, list_of_lists[1:])

Python: Create a range of ordered numbers skipping the inverse of every Nth through Nth+D number

Greetings stackoverflow friends. I've decided to get a little wild this evening and party with for loops to iterate through a list I have created.
It appears the party has been pooped on, though, as the manner through which I would like to create a range is not readily apparent, neither through research nor playing around, and proving bothersome
The Desire: I would like to create a range of numbers much in a similar way that a range is usually created... by specifying range(start, stop, step) but with the minor alteration that I may additionally specify a step 'sweep' value such that range performed more like range(start, stop, step:sweep)
That is to say, if the glorious function above existed it could be used as following;
range(0,16,3:5)
# [0,3,4,5,8,9,10,13,14,15]
Another example!
range(0,24,2:9)
# [0,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18,20,21,22,23]
Yet another!
range(0,24,3:9)
# [0,3,4,5,6,7,8,9,12,13,14,15,16,17,18,21,22,23]
Last one.
swept_range(5,20,3,4)
# [7, 8, 11, 12, 15, 16, 19]
In English, I desire a simple way to create a range of ordered numbers holding on to every Nth through Nth + D number group where D is some positive number.
I've looked at slices to no avail.
I know MATLAB can succinctly do this but wasn't sure this exists in Python - does anyone?
How about this generator, using modular arithmetic:
def swept_range(start, stop, step=1, sweep=1):
for i in range(start, stop):
if not 0 < i % sweep < step:
yield i
You could also use a list comprehension, if you need a sequence, rather than an iterator:
def swept_range(start, stop, step=1, sweep=1):
return [i for i in range(start, stop) if not 0 < i % sweep < step]
def yrange(st, sp, N, D):
return [st] + [j for i in range(st,sp,D) for j in range(i+N,i+D+1) if j < sp]
print yrange(0, 16, 3, 5)
# [0, 3, 4, 5, 8, 9, 10, 13, 14, 15]
print yrange(0, 24, 2, 9)
# [0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23]
print yrange(0, 24, 3, 9)
# [0, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17, 18, 21, 22, 23]
def srange(start, stop, step=1, sweep=0):
if sweep < 0 :
raise Exception("sweep needs to be positive.")
STEPPING = 0
SWEEPING = 1
state = STEPPING
next = start
res = []
while next < stop:
res.append(next)
#ignores state if sweep is 0
if state == STEPPING or sweep == 0 :
state = SWEEPING
next = next + step
elif state == SWEEPING :
next = next + 1
if next % sweep == 0:
state = STEPPING
return res

Categories