Related
I would like to know how to use the python random.sample() function within a for-loop to generate multiple sample lists that are not identical.
For example, right now I have:
for i in range(3):
sample = random.sample(range(10), k=2)
This will generate 3 sample lists containing two numbers each, but I would like to make sure none of those sample lists are identical. (It is okay if there are repeating values, i.e., (2,1), (3,2), (3,7) would be okay, but (2,1), (1,2), (5,4) would not.)
If you specifically need to "use random.sample() within a for-loop", then you could keep track of samples that you've seen, and check that new ones haven't been seen yet.
import random
seen = set()
for i in range(3):
while True:
sample = random.sample(range(10), k=2)
print(f'TESTING: {sample = }') # For demo
fr = frozenset(sample)
if fr not in seen:
seen.add(fr)
break
print(sample)
Example output:
TESTING: sample = [0, 7]
[0, 7]
TESTING: sample = [0, 7]
TESTING: sample = [1, 5]
[1, 5]
TESTING: sample = [7, 0]
TESTING: sample = [3, 5]
[3, 5]
Here I made seen a set to allow fast lookups, and I converted sample to a frozenset so that order doesn't matter in comparisons. It has to be frozen because a set can't contain another set.
However, this could be very slow with different inputs, especially a larger range of i or smaller range to draw samples from. In theory, its runtime is infinite, but in practice, random's number generator is finite.
Alternatives
There are other ways to do the same thing that could be much more performant. For example, you could take a big random sample, then chunk it into the desired size:
n = 3
k = 2
upper = 10
sample = random.sample(range(upper), k=k*n)
for chunk in chunks(sample, k):
print(chunk)
Example output:
[6, 5]
[3, 0]
[1, 8]
With this approach, you'll never get any duplicate numbers like [[2,1], [3,2], [3,7]] because the sample contains all unique numbers.
This approach was inspired by Sven Marnach's answer on "Non-repetitive random number in numpy", which I coincidentally just read today.
it looks like you are trying to make a nested list of certain list items without repetition from original list, you can try below code.
import random
mylist = list(range(50))
def randomlist(mylist,k):
length = lambda : len(mylist)
newlist = []
while length() >= k:
newlist.append([mylist.pop(random.randint(0, length() - 1)) for I in range(k)])
newlist.append(mylist)
return newlist
randomlist(mylist,6)
[[2, 20, 36, 46, 14, 30],
[4, 12, 13, 3, 28, 5],
[45, 37, 18, 9, 34, 24],
[31, 48, 11, 6, 19, 17],
[40, 38, 0, 7, 22, 42],
[23, 25, 47, 41, 16, 39],
[8, 33, 10, 43, 15, 26],
[1, 49, 35, 44, 27, 21],
[29, 32]]
This should do the trick.
import random
import math
# create set to store samples
a = set()
# number of distinct elements in the population
m = 10
# sample size
k = 2
# number of samples
n = 3
# this protects against an infinite loop (see Safety Note)
if n > math.comb(m, k):
print(
f"Error: {math.comb(m, k)} is the number of {k}-combinations "
f"from a set of {m} distinct elements."
)
exit()
# the meat
while len(a) < n:
a.add(tuple(sorted(random.sample(range(m), k = k))))
print(a)
With a set you are guaranteed to get a collection with no duplicate elements. In a set, you would be allowed to have (1, 2) and (2, 1) inside, which is why sorted is applied. So if [1, 2] is drawn, sorted([1, 2]) returns [1, 2]. And if [2, 1] is subsequently drawn, sorted([2, 1]) returns [1, 2], which won't be added to the set because (1, 2) is already in the set. We use tuple because objects in a set have to be hashable and list objects are not.
I hope this helps. Any questions, please let me know.
Safety Note
To avoid an infinite loop when you change 3 to some large number, you need to know the maximum number of possible samples of the type that you desire.
The relevant mathematical concept for this is a combination.
Suppose your first argument of random.sample() is range(m) where
m is some arbitrary positive integer. Note that this means that the
sample will be drawn from a population of m distinct members
without replacement.
Suppose that you wish to have n samples of length k in total.
The number of possible k-combinations from the set of m distinct elements is
m! / (k! * (m - k)!)
You can get this value via
from math import comb
num_comb = comb(m, k)
comb(m, k) gives the number of different ways to choose k elements from m elements without repetition and without order, which is exactly what we want.
So in the example above, m = 10, k = 2, n = 3.
With these m and k, the number of possible k-combinations from the set of m distinct elements is 45.
You need to ensure that n is less than 45 if you want to use those specific m and k and avoid an infinite loop.
So, I've been trying to make a random series generator with the given numbers using an array:
so the possibilities are: [0-9, 0-9, 0-9, 0 - 59, 0-9, 0-9, 0-9].
The only problem is that I want that all the series aren't even 75% the same (no more than 2 numbers the same).
So here are some examples:
Good:
[1, 1, 1, 1, 1, 1, 1]
[2, 2, 1, 2, 1, 2, 2]
Not good:
[1, 1, 1, 1, 1, 1, 1]
[2, 2, 1, 2, 1, 2, 1]
So, if there are fewer than 2 numbers the same it deletes the second one.
And the second problem is that I want 10,000 of these series.
Sorry if I didn't explain it well, the code would probably explain what I tried to explain.
TRIGGER WARNING!! CODE ISN'T EFFICIENT AT ALL!!
TOTAL_SERIES = 10000
placement_amount = [9, 9, 9, 59, 9, 9, 9]
all_series = []
def create_series():
global fail, success
series = []
for i in range(len(placement_amount)):
series.append(random.randint(0, placement_amount[i]))
for i in all_series:
count = 0
for j in range(len(i)):
if series[j] == i[j]:
count += 1
if count > 2:
return;
all_series.append(series)
while len(all_series) < TOTAL_SERIES:
create_series()
The code technically works but it takes around 1 hour to generate 400 of these since the longer it runs the harder it takes to find a series that follows the rules.
So, my question is how do I make it more efficient and so it will make 10,000 series the fastest a code can.
What I've tried so far:
Tried adding cuda so I'll be able to run the code on a gpu making it faster (have python 32-bit so can't)
Tried creating a few threads where each generates 10,000/threads amount and then run a code that deletes all the ones who don't follow the rules (the code just got stuck).
I'm open to hear how I can try these again but with a correct code or anything that will make it efficient.
The answer for me isn't code efficiency but just that it's impossible make 10,000 series since the first 3 numbers can't be identical, so I changed the lines:
if counter > 2:
to
if counter > 3
Thanks everyone for the help, but if you got a way to make it more efficient it would be nice :D
Your original solution is in O( P(N)*N), you can reduce it to O(P(N)) with dicts and computing the differrent index combinations:
-P(N) is the expected number of iterations to get N such series
- the constants are larger!
import itertools
import random
indexes=list(itertools.combinations(range(7),3))
big_dict={ k : {} for k in indexes }
TOTAL_SERIES = 1000
placement_amount = [9, 9, 9, 59, 9, 9, 9]
all_series = []
loops=0
while len(all_series) < TOTAL_SERIES:
loops+=1
candidate = tuple(random.randint(0, amount) for amount in placement_amount)
if any( (candidate[index[0]],candidate[index[1]],candidate[index[2]]) in \
big_dict[index] for index in indexes ):
continue
else:
for index in indexes:
big_dict[index[(candidate[index[0]],candidate[index[1]],candidate[index[2]])]=True
all_series.append(candidate)
This must be the solution:
import random
def gen_series(pattern):
return [random.randint(0, max_val) for max_val in pattern]
pattern = [9, 9, 9, 59, 9, 9, 9]
for i in range(100):
print(gen_series(pattern))
I am trying to find elements from array(integer array) or list which are unique and those elements must not divisible by any other element from same array or list.
You can answer in any language like python, java, c, c++ etc.
I have tried this code in Python3 and it works perfectly but I am looking for better and optimum solution in terms of time complexity.
assuming array or list A is already sorted and having unique elements
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
while i<len(A)-1:
while j<len(A):
if A[j]%A[i]==0:
A.pop(j)
else:
j+=1
i+=1
j=i+1
For the given array A=[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] answer would be like ans=[2,3,5,7,11,13]
another example,A=[4,5,15,16,17,23,39] then ans would be like, ans=[4,5,17,23,39]
ans is having unique numbers
any element i from array only exists if (i%j)!=0, where i!=j
I think it's more natural to do it in reverse, by building a new list containing the answer instead of removing elements from the original list. If I'm thinking correctly, both approaches do the same number of mod operations, but you avoid the issue of removing an element from a list.
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
ans = []
for x in A:
for y in ans:
if x % y == 0:
break
else: ans.append(x)
Edit: Promoting the completion else.
This algorithm will perform much faster:
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
if (A[-1]-A[0])/A[0] > len(A)*2:
result = list()
for v in A:
for f in result:
d,m = divmod(v,f)
if m == 0: v=0;break
if d<f: break
if v: result.append(v)
else:
retain = set(A)
minMult = 1
maxVal = A[-1]
for v in A:
if v not in retain : continue
minMult = v*2
if minMult > maxVal: break
if v*len(A)<maxVal:
retain.difference_update([m for m in retain if m >= minMult and m%v==0])
else:
retain.difference_update(range(minMult,maxVal,v))
if maxVal%v == 0:
maxVal = max(retain)
result = list(retain)
print(result) # [2, 3, 5, 7, 11, 13]
In the spirit of the sieve of Eratostenes, each number that is retained, removes its multiples from the remaining eligible numbers. Depending on the magnitude of the highest value, it is sometimes more efficient to exclude multiples than check for divisibility. The divisibility check takes several times longer for an equivalent number of factors to check.
At some point, when the data is widely spread out, assembling the result instead of removing multiples becomes faster (this last addition was inspired by Imperishable Night's post).
TEST RESULTS
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] (100000 repetitions)
Original: 0.55 sec
New: 0.29 sec
A = list(range(2,5000))+[9697] (100 repetitions)
Original: 3.77 sec
New: 0.12 sec
A = list(range(1001,2000))+list(range(4000,6000))+[9697**2] (10 repetitions)
Original: 3.54 sec
New: 0.02 sec
I know that this is totally insane but i want to know what you think about this:
A = [4,5,15,16,17,23,39]
prova=[[x for x in A if x!=y and y%x==0] for y in A]
print([A[idx] for idx,x in enumerate(prova) if len(prova[idx])==0])
And i think it's still O(n^2)
If you care about speed more than algorithmic efficiency, numpy would be the package to use here in python:
import numpy as np
# Note: doesn't have to be sorted
a = [2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16, 29, 29]
a = np.unique(a)
result = a[np.all((a % a[:, None] + np.diag(a)), axis=0)]
# array([2, 3, 5, 7, 11, 13, 29])
This divides all elements by all other elements and stores the remainder in a matrix, checks which columns contain only non-0 values (other than the diagonal), and selects all elements corresponding to those columns.
This is O(n*M) where M is the max size of an integer in your list. The integers are all assumed to be none negative. This also assumes your input list is sorted (came to that assumption since all lists you provided are sorted).
a = [4, 7, 7, 8]
# a = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
# a = [4, 5, 15, 16, 17, 23, 39]
M = max(a)
used = set()
final_list = []
for e in a:
if e in used:
continue
else:
used.add(e)
for i in range(e, M + 1):
if not (i % e):
used.add(i)
final_list.append(e)
print(final_list)
Maybe this can be optimized even further...
If the list is not sorted then for the above method to work, one must sort it. The time complexity will then be O(nlogn + Mn) which equals to O(nlogn) when n >> M.
I need a number of unique random permutations of a list without replacement, efficiently. My current approach:
total_permutations = math.factorial(len(population))
permutation_indices = random.sample(xrange(total_permutations), k)
k_permutations = [get_nth_permutation(population, x) for x in permutation_indices]
where get_nth_permutation does exactly what it sounds like, efficiently (meaning O(N)). However, this only works for len(population) <= 20, simply because 21! is so mindblowingly long that xrange(math.factorial(21)) won't work:
OverflowError: Python int too large to convert to C long
Is there a better algorithm to sample k unique permutations without replacement in O(N)?
Up to a certain point, it's unnecessary to use get_nth_permutation to get permutations. Just shuffle the list!
>>> import random
>>> l = range(21)
>>> def random_permutations(l, n):
... while n:
... random.shuffle(l)
... yield list(l)
... n -= 1
...
>>> list(random_permutations(l, 5))
[[11, 19, 6, 10, 0, 3, 12, 7, 8, 16, 15, 5, 14, 9, 20, 2, 1, 13, 17, 18, 4],
[14, 8, 12, 3, 5, 20, 19, 13, 6, 18, 9, 16, 2, 10, 4, 1, 17, 15, 0, 7, 11],
[7, 20, 3, 8, 18, 17, 4, 11, 15, 6, 16, 1, 14, 0, 13, 5, 10, 9, 2, 19, 12],
[10, 14, 5, 17, 8, 15, 13, 0, 3, 16, 20, 18, 19, 11, 2, 9, 6, 12, 7, 4, 1],
[1, 13, 15, 18, 16, 6, 19, 8, 11, 12, 10, 20, 3, 4, 17, 0, 9, 5, 2, 7, 14]]
The odds are overwhelmingly against duplicates appearing in this list for len(l) > 15 and n < 100000, but if you need guarantees, or for lower values of len(l), just use a set to record and skip duplicates if that's a concern (though as you've observed in your comments, if n gets close to len(l)!, this will stall). Something like:
def random_permutations(l, n):
pset = set()
while len(pset) < n:
random.shuffle(l)
pset.add(tuple(l))
return pset
However, as len(l) gets longer and longer, random.shuffle becomes less reliable, because the number of possible permutations of the list increases beyond the period of the random number generator! So not all permutations of l can be generated that way. At that point, not only do you need to map get_nth_permutation over a sequence of random numbers, you also need a random number generator capable of producing every random number between 0 and len(l)! with relatively uniform distribution. That might require you to find a more robust source of randomness.
However, once you have that, the solution is as simple as Mark Ransom's answer.
To understand why random.shuffle becomes unreliable for large len(l), consider the following. random.shuffle only needs to pick random numbers between 0 and len(l) - 1. But it picks those numbers based on its internal state, and it can take only a finite (and fixed) number of states. Likewise, the number of possible seed values you can pass to it is finite. This means that the set of unique sequences of numbers it can generate is also finite; call that set s. For len(l)! > len(s), some permutations can never be generated, because the sequences that correspond to those permutations aren't in s.
What are the exact lengths at which this becomes a problem? I'm not sure. But for what it's worth, the period of the mersenne twister, as implemented by random, is 2**19937-1. The shuffle docs reiterate my point in a general way; see also what Wikipedia has to say on the matter here.
Instead of using xrange simply keep generating random numbers until you have as many as you need. Using a set makes sure they're all unique.
permutation_indices = set()
while len(permutation_indices) < k:
permutation_indices.add(random.randrange(total_permutations))
I had one implementation of nth_permutation (not sure from where I got it) which I modified for your purpose. I believe this would be fast enough to suit your need
>>> def get_nth_permutation(population):
total_permutations = math.factorial(len(population))
while True:
temp_population = population[:]
n = random.randint(1,total_permutations)
size = len(temp_population)
def generate(s,n,population):
for x in range(s-1,-1,-1):
fact = math.factorial(x)
d = n/fact
n -= d * fact
yield temp_population[d]
temp_population.pop(d)
next_perm = generate(size,n,population)
yield [e for e in next_perm]
>>> nth_perm = get_nth_permutation(range(21))
>>> [next(nth_perm) for k in range(1,10)]
You seem to be searching for the Knuth Shuffle! Good luck!
You could use itertools.islice instead of xrange():
CPython implementation detail: xrange() is intended to be simple and
fast Implementations may impose restrictions to achieve this. The C
implementation of Python restricts all arguments to native C longs
(“short” Python integers), and also requires that the number of
elements fit in a native C long. If a larger range is needed, an
alternate version can be crafted using the itertools module:
islice(count(start, step), (stop-start+step-1+2*(step<0))//step).
I am after a string format to efficiently represent a set of indices.
For example "1-3,6,8-10,16" would produce [1,2,3,6,8,9,10,16]
Ideally I would also be able to represent infinite sequences.
Is there an existing standard way of doing this? Or a good library? Or can you propose your own format?
thanks!
Edit: Wow! - thanks for all the well considered responses. I agree I should use ':' instead. Any ideas about infinite lists? I was thinking of using "1.." to represent all positive numbers.
The use case is for a shopping cart. For some products I need to restrict product sales to multiples of X, for others any positive number. So I am after a string format to represent this in the database.
You don't need a string for that, This is as simple as it can get:
from types import SliceType
class sequence(object):
def __getitem__(self, item):
for a in item:
if isinstance(a, SliceType):
i = a.start
step = a.step if a.step else 1
while True:
if a.stop and i > a.stop:
break
yield i
i += step
else:
yield a
print list(sequence()[1:3,6,8:10,16])
Output:
[1, 2, 3, 6, 8, 9, 10, 16]
I'm using Python slice type power to express the sequence ranges. I'm also using generators to be memory efficient.
Please note that I'm adding 1 to the slice stop, otherwise the ranges will be different because the stop in slices is not included.
It supports steps:
>>> list(sequence()[1:3,6,8:20:2])
[1, 2, 3, 6, 8, 10, 12, 14, 16, 18, 20]
And infinite sequences:
sequence()[1:3,6,8:]
1, 2, 3, 6, 8, 9, 10, ...
If you have to give it a string then you can combine #ilya n. parser with this solution. I'll extend #ilya n. parser to support indexes as well as ranges:
def parser(input):
ranges = [a.split('-') for a in input.split(',')]
return [slice(*map(int, a)) if len(a) > 1 else int(a[0]) for a in ranges]
Now you can use it like this:
>>> print list(sequence()[parser('1-3,6,8-10,16')])
[1, 2, 3, 6, 8, 9, 10, 16]
If you're into something Pythonic, I think 1:3,6,8:10,16 would be a better choice, as x:y is a standard notation for index range and the syntax allows you to use this notation on objects. Note that the call
z[1:3,6,8:10,16]
gets translated into
z.__getitem__((slice(1, 3, None), 6, slice(8, 10, None), 16))
Even though this is a TypeError if z is a built-in container, you're free to create the class that will return something reasonable, e.g. as NumPy's arrays.
You might also say that by convention 5: and :5 represent infinite index ranges (this is a bit stretched as Python has no built-in types with negative or infinitely large positive indexes).
And here's the parser (a beautiful one-liner that suffers from slice(16, None, None) glitch described below):
def parse(s):
return [slice(*map(int, x.split(':'))) for x in s.split(',')]
There's one pitfall, however: 8:10 by definition includes only indices 8 and 9 -- without upper bound. If that's unacceptable for your purposes, you certainly need a different format and 1-3,6,8-10,16 looks good to me. The parser then would be
def myslice(start, stop=None, step=None):
return slice(start, (stop if stop is not None else start) + 1, step)
def parse(s):
return [myslice(*map(int, x.split('-'))) for x in s.split(',')]
Update: here's the full parser for a combined format:
from sys import maxsize as INF
def indices(s: 'string with indices list') -> 'indices generator':
for x in s.split(','):
splitter = ':' if (':' in x) or (x[0] == '-') else '-'
ix = x.split(splitter)
start = int(ix[0]) if ix[0] is not '' else -INF
if len(ix) == 1:
stop = start + 1
else:
stop = int(ix[1]) if ix[1] is not '' else INF
step = int(ix[2]) if len(ix) > 2 else 1
for y in range(start, stop + (splitter == '-'), step):
yield y
This handles negative numbers as well, so
print(list(indices('-5, 1:3, 6, 8:15:2, 20-25, 18')))
prints
[-5, 1, 2, 6, 7, 8, 10, 12, 14, 20, 21, 22, 23, 24, 25, 18, 19]
Yet another alternative is to use ... (which Python recognizes as the built-in constant Ellipsis so you can call z[...] if you want) but I think 1,...,3,6, 8,...,10,16 is less readable.
This is probably about as lazily as it can be done, meaning it will be okay for even very large lists:
def makerange(s):
for nums in s.split(","): # whole list comma-delimited
range_ = nums.split("-") # number might have a dash - if not, no big deal
start = int(range_[0])
for i in xrange(start, start + 1 if len(range_) == 1 else int(range_[1]) + 1):
yield i
s = "1-3,6,8-10,16"
print list(makerange(s))
output:
[1, 2, 3, 6, 8, 9, 10, 16]
import sys
class Sequencer(object):
def __getitem__(self, items):
if not isinstance(items, (tuple, list)):
items = [items]
for item in items:
if isinstance(item, slice):
for i in xrange(*item.indices(sys.maxint)):
yield i
else:
yield item
>>> s = Sequencer()
>>> print list(s[1:3,6,8:10,16])
[1, 2, 6, 8, 9, 16]
Note that I am using the xrange builtin to generate the sequence. That seems awkward at first because it doesn't include the upper number of sequences by default, however it proves to be very convenient. You can do things like:
>>> print list(s[1:10:3,5,5,16,13:5:-1])
[1, 4, 7, 5, 5, 16, 13, 12, 11, 10, 9, 8, 7, 6]
Which means you can use the step part of xrange.
This looked like a fun puzzle to go with my coffee this morning. If you settle on your given syntax (which looks okay to me, with some notes at the end), here is a pyparsing converter that will take your input string and return a list of integers:
from pyparsing import *
integer = Word(nums).setParseAction(lambda t : int(t[0]))
intrange = integer("start") + '-' + integer("end")
def validateRange(tokens):
if tokens.from_ > tokens.to:
raise Exception("invalid range, start must be <= end")
intrange.setParseAction(validateRange)
intrange.addParseAction(lambda t: list(range(t.start, t.end+1)))
indices = delimitedList(intrange | integer)
def mergeRanges(tokens):
ret = set()
for item in tokens:
if isinstance(item,int):
ret.add(item)
else:
ret += set(item)
return sorted(ret)
indices.setParseAction(mergeRanges)
test = "1-3,6,8-10,16"
print indices.parseString(test)
This also takes care of any overlapping or duplicate entries, such "3-8,4,6,3,4", and returns a list of just the unique integers.
The parser takes care of validating that ranges like "10-3" are not allowed. If you really wanted to allow this, and have something like "1,5-3,7" return 1,5,4,3,7, then you could tweak the intrange and mergeRanges parse actions to get this simpler result (and discard the validateRange parse action altogether).
You are very likely to get whitespace in your expressions, I assume that this is not significant. "1, 2, 3-6" would be handled the same as "1,2,3-6". Pyparsing does this by default, so you don't see any special whitespace handling in the code above (but it's there...)
This parser does not handle negative indices, but if that were needed too, just change the definition of integer to:
integer = Combine(Optional('-') + Word(nums)).setParseAction(lambda t : int(t[0]))
Your example didn't list any negatives, so I left it out for now.
Python uses ':' for a ranging delimiter, so your original string could have looked like "1:3,6,8:10,16", and Pascal used '..' for array ranges, giving "1..3,6,8..10,16" - meh, dashes are just as good as far as I'm concerned.