How could I reduce the time complexity of this nested loop? - python

I'm in the process of learning to optimize code and implement more data structures and algorithms in my programs, however I'm experiencing difficulties with this code block.
My primary goal is to reduce this from a O(n**2) time complexity mainly not using a nested loop.
numArray = np.array([[20, 12, 10, 8, 6, 4], [10, 10, 10, 10, 10, 10]], dtype=int)
rolls = [[] for i in range(len(numArray[0]))]
for i in range(len(numArray[0])):
for j in range(numArray[1, i]):
rolls[i].append(random.randint(1, numArray[0, i]))
The code is supposed to generate x amount of random integers (where x is index i of the second numArray subarray, e.g. 10) between 1 and index i of the first numArray subarray (e.g. 20).
Then repeat this for each index in the first numArray subarray.
(In the whole dice program numArray subarrays are user generated integers, but I assigned fixed numbers to it for simplicities sake while optimizing.)

You could use np.random.randint since you're already importing numpy. It accepts a size argument to produce multiple random values in one go.
rolls = [list(np.random.randint(1, numArray[0][idx], val)) for idx, val in enumerate(numArray[1])]
This of course assumes that both lists in numArray are the same length, but should get you somewhere at least.

Related

Find subarray with fixed end and largest average of any length

So here I am given an array - array1 with N positive integers in [0, 10000]. N might be up to 10^8. Given that we need to fix the end point, what is the continuous subarray with the largest average possible?
For example, let array1 be [3, 1, 9, 2, 7]. Since we fix the end point, the subarray with the largest average possible is [9, 2, 7], with average 6.
I have tried a purely linear search, though Python is slow at looping 10^8 loops so it is not a good algorithm.
Time Restriction - The time limit is 4 seconds.
EDIT: I really don't have idea to start improving, so any hints would be appreciated. Is it possible to reduce it to O(log n)?
Clarification: The Last element is needed to be in the subarray and the subarray's length need to be >1
You can use itertools.combinations to do this task:
def sublists(l, minsize, endpoint):
# create empty list of all candidate sublists
# O(1)
candidates = []
# obtain the possible sublists
# O(n)
for start, end in combinations(range(len(l)), minsize):
# check if last element is the endpoint
# O(1)
if l[end] == endpoint:
# add sublist to the candidates
# O(1)
candidates.append(l[start:end+1])
# create tuple pairs (average, sublist)
# O(n)
pairs = [(sum(x) / len(x), x) for x in candidates]
# get max sublist
# O(n)
return max(pairs)[1]
Which works as follows:
>>> print(sublists([3, 1, 9, 2, 7], 2, 7))
[9, 2, 7]
Note: The algorithm above is O(n). This is expected since you need to create all possible contiguous combinations of the list to begin with. I don't think you can do this algorithm in O(logn) time, since generating the combinations itself is O(n). If you use a balanced BST you could perhaps find the maximum average in O(logn) time, but this would just be for the search part, not the algorithm as a whole.

Python: subset of list as equally distributed as possible?

I have a range of possible values, for example:
possible_values = range(100)
I have a list with unsystematic (but unique) numbers within that range, for example:
somelist = [0, 5, 10, 15, 20, 33, 77, 99]
I want to create a new list of length < len(somelist) including a subset of these values but as equally distributed as possible over the range of possible values. For example:
length_newlist = 2
newlist = some_function(somelist, length_newlist, possible_values)
print(newlist)
Which would then ideally output something like
[33, 77]
So I neither want a random sample nor a sample that chosen from equally spaced integers. I'd like to have a sample based on a distribution (here an uniform distribution) in regard to an interval of possible values.
Is there a function or an easy way to achieve this?
What about the closest values of your subset to certain list's pivots? ie:
def some_function(somelist, length_list, possible_values):
a = min(possible_values)
b = max(possible_values)
chunk_size = (b-a)/(length_list+1)
new_list = []
for i in range(1,length_list+1):
index = a+i*chunk_size
new_list.append(min(somelist, key=lambda x:abs(x-index)))
return new_list
possible_values = range(100)
somelist = [0, 5, 10, 15, 20, 33, 77, 99]
length_newlist = 2
newlist = some_function(somelist, length_newlist, possible_values)
print(newlist)
In any case, I'd also recommend to take a look to numpy's random sampling functions, that could help you as well.
Suppose your range is 0..N-1, and you want a list of K<=N-1 values. Then define an "ideal" list of K values, which would be your desired distribution over this full list (which I am frankly not sure I understand what that would be, but hopefully you do). Finally, take the closest matches to those values from your randomly chosen greater-than-K-length sublist to get your properly distributed K-length random sublist.
I think you should check random.sample(population, k) function. It samples the population in k-length list.

Prune a list of combinations based on sets

While this question is formulated using the Python programming language, I believe it is more of a programming logic problem.
I have a list of all possible combinations, i.e.: n choose k
I can prepare such a list using
import itertools
bits_list = list(itertools.combinations(range(n), k))
If 'n' is 100, and `k' is 5, then the length of 'bits_list' will be 75287520.
Now, I want to prune this list, such that numbers appear in groups, or they don't. Let's use the following sets as an example:
Set 1: [0, 1, 2]
Set 2: [57, 58]
Set 3: [10, 15, 20, 25]
Set 4: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Here each set needs to appear in any member of the bits_list together, or not at all.
So far, I only have been able to think of a brute-force if-else method of solving this problem, but the number of if-else conditions will be very large this way.
Here's what I have:
bits_list = [x for x in list(itertools.combinations(range(n), k))
if all(y in x for y in [0, 1, 2]) or
all(y not in x for y in [0, 1, 2])]
Now, this only covered Set 1. I would like to do this for many sets. If the length of the set is longer than the value of k, we can ignore the set (for example, k = 5 and Set 4).
Note, that the ultimate aim is to have 'k' iterate over a range, say [5:25] and work on the appended list. The size of the list grows exponentially here and computationally speaking, very expensive!
With 'k' as 10, the python interpreter interrupts the process before completion on any average laptop with 16 GB RAM. I need to find a solution that fits in the memory of a relatively modern server (not a cluster or a server farm).
Any help is greatly appreciated!
P.S.: Intuitively, think of this problem as generating all the possible cases for people boarding a public bus or train system. Usually, you board an entire group or you don't board anyone.
UPDATE:
For the given sets above, if k = 5, then a valid member of bits_list would be [0, 1, 2, 57, 58], i.e.: a combination of Set1 and Set2. If k = 10, then we could have built Set1 + Set2 + Set3 + NoSetElement as a possible member. #DonkeyKong's solution made me realize I haven't mentioned this explicitly in my question.
I have a lot of sets; I intend to use enough sets to prune the full list of combinations such that the bits_list eventually fits into memory.
#9000's suggestion is perfectly valid here, that during each iteration, I can save the combinations as actual bits.
This still gets crushed by a memory error (which I don't see how you're getting away from if you insist on a list) at a certain point (around n=90, k=5), but it is much faster than your current implementation. For n=80 and k=5, my rudimentary benchmarking had my solution at 2.6 seconds and yours around 52 seconds.
The idea is to construct the disjoint and subset parts of your filter separately. The disjoint part is trivial, and the subset part is calculated by taking the itertools.product of all disjoint combinations of length k - set_len and the individual elements of your set.
from itertools import combinations, product, chain
n = 80
k = 5
set1 = {0,1,2}
nots = set(range(n)) - set1
disj_part = list(combinations(nots, k))
subs_part = [tuple(chain(x, els)) for x, *els in
product(combinations(nots, k - len(set1)), *([e] for e in set1))]
full_l = disj_part + subs_part
If you actually represented your bits as bits, that is, 0/1 values in a binary representation of an integer n bits long with exactly k bits set, the amount of RAM you'd need to store the data would be drastically smaller.
Also, you'd be able to use bit operations to look check if all bits in a mask are actually set (value & mask == mask), or all unset (value | ~mask == value).
The brute-force will probably take shorter that the time you'd spend thinking about a more clever algorithm, so it's totally OK for a one-off filtering.
If you must execute this often and quickly, and your n is in small hundreds or less, I'd rather use cython to describe the brute-force algorithm efficiently than look at algorithmic improvements. Modern CPUs can efficiently operate on 64-bit numbers; you won't benefit much from not comparing a part of the number.
OTOH if your n is really large, and the number of sets to compare to is also large, you could partition your bits for efficient comparison.
Let's suppose you can efficiently compare a chunk of 64 bits, and your bit lists contain e.g. 100 chunks each. Then you can do the same thing you'd do with strings: compare chunk by chunk, and if one of the chunks fails to match, do not compare the rest.
A faster implementation would be to replace the if and all() statements in:
bits_list = [x for x in list(itertools.combinations(range(n), k))
if all(y in x for y in [0, 1, 2]) or
all(y not in x for y in [0, 1, 2])]
with python's set operations isdisjoint() and issubset() operations.
bits_generator = (set(x) for x in itertools.combinations(range(n), k))
first_set = set([0,1,2])
filter_bits = (x for x in bits_generator
if x.issubset(first_set) or
x.isdisjoint(first_set))
answer_for_first_set = list(filter_bits)
I can keep going using generators and with generators you won't run out of memory but you will be waiting and hastening the heat death of the universe. Not because of python's runtime or other implementation details but because there are some problems that are just not feasible even in computer time if you pick a large N and K values.
Based on the ideas from #Mitch's answer, I created a solution with a slightly different thinking than originally presented in the question. Instead of creating the list (bits_list) of all combinations and then pruning those combinations that do not match the sets listed, I built bits_list from the sets.
import itertools
all_sets = [[0, 1, 2], [3, 4, 5], [6, 7], [8], [9, 19, 29], [10, 20, 30],
[11, 21, 31], [12, 22, 32], ...[57, 58], ... [95], [96], [97]]
bits_list = [list(itertools.chain.from_iterable(x)) for y in [1, 2, 3, 4, 5]
for x in itertools.combinations(all_sets, y)]
Here, instead of finding n choose k, and then looping for all k, and finding combinations which match the sets, I started from the sets, and even included the individual members as sets by themselves and therefore removing the need for the 2 components - the disjoint and the subset parts - discussed in #Mitch's answer.

how to get the index of numpy.random.choice? - python

Is it possible to modify the numpy.random.choice function in order to make it return the index of the chosen element?
Basically, I want to create a list and select elements randomly without replacement
import numpy as np
>>> a = [1,4,1,3,3,2,1,4]
>>> np.random.choice(a)
>>> 4
>>> a
>>> [1,4,1,3,3,2,1,4]
a.remove(np.random.choice(a)) will remove the first element of the list with that value it encounters (a[1] in the example above), which may not be the chosen element (eg, a[7]).
Regarding your first question, you can work the other way around, randomly choose from the index of the array a and then fetch the value.
>>> a = [1,4,1,3,3,2,1,4]
>>> a = np.array(a)
>>> random.choice(arange(a.size))
6
>>> a[6]
But if you just need random sample without replacement, replace=False will do. Can't remember when it was firstly added to random.choice, might be 1.7.0. So if you are running very old numpy it may not work. Keep in mind the default is replace=True
Here's one way to find out the index of a randomly selected element:
import random # plain random module, not numpy's
random.choice(list(enumerate(a)))[0]
=> 4 # just an example, index is 4
Or you could retrieve the element and the index in a single step:
random.choice(list(enumerate(a)))
=> (1, 4) # just an example, index is 1 and element is 4
numpy.random.choice(a, size=however_many, replace=False)
If you want a sample without replacement, just ask numpy to make you one. Don't loop and draw items repeatedly. That'll produce bloated code and horrible performance.
Example:
>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.random.choice(a, size=5, replace=False)
array([7, 5, 8, 6, 2])
On a sufficiently recent NumPy (at least 1.17), you should use the new randomness API, which fixes a longstanding performance issue where the old API's replace=False code path unnecessarily generated a complete permutation of the input under the hood:
rng = numpy.random.default_rng()
result = rng.choice(a, size=however_many, replace=False)
This is a bit in left field compared with the other answers, but I thought it might help what it sounds like you're trying to do in a slightly larger sense. You can generate a random sample without replacement by shuffling the indices of the elements in the source array :
source = np.random.randint(0, 100, size=100) # generate a set to sample from
idx = np.arange(len(source))
np.random.shuffle(idx)
subsample = source[idx[:10]]
This will create a sample (here, of size 10) by drawing elements from the source set (here, of size 100) without replacement.
You can interact with the non-selected elements by using the remaining index values, i.e.:
notsampled = source[idx[10:]]
Maybe late but it worth to mention this solution because I think the simplest way to do so is:
a = [1, 4, 1, 3, 3, 2, 1, 4]
n = len(a)
idx = np.random.choice(list(range(n)), p=np.ones(n)/n)
It means you are choosing from the indices uniformly. In a more general case, you can do a weighted sampling (and return the index) in this way:
probs = [.3, .4, .2, 0, .1]
n = len(a)
idx = np.random.choice(list(range(n)), p=probs)
If you try to do so for so many times (e.g. 1e5), the histogram of the chosen indices would be like [0.30126 0.39817 0.19986 0. 0.10071] in this case which is correct.
Anyway, you should choose from the indices and use the values (if you need) as their probabilities.
Instead of using choice, you can also simply random.shuffle your array, i.e.
random.shuffle(a) # will shuffle a in-place
Based on your comment:
The sample is already a. I want to work directly with a so that I can control how many elements are still left and perform other operations with a. – HappyPy
it sounds to me like you're interested in working with a after n randomly selected elements are removed. Instead, why not work with N = len(a) - n randomly selected elements from a? Since you want them to still be in the original order, you can select from indices like in #CTZhu's answer, but then sort them and grab from the original list:
import numpy as np
n = 3 #number to 'remove'
a = np.array([1,4,1,3,3,2,1,4])
i = np.random.choice(np.arange(a.size), a.size-n, replace=False)
i.sort()
a[i]
#array([1, 4, 1, 3, 1])
So now you can save that as a again:
a = a[i]
and work with a with n elements removed.
Here is a simple solution, just choose from the range function.
import numpy as np
a = [100,400,100,300,300,200,100,400]
I=np.random.choice(np.arange(len(a)))
print('index is '+str(I)+' number is '+str(a[I]))
The question title versus its description are a bit different. I just wanted the answer to the title question which was getting only an (integer) index from numpy.random.choice(). Rather than any of the above, I settled on index = numpy.random.choice(len(array_or_whatever)) (tested in numpy 1.21.6).
Ex:
import numpy
a = [1, 2, 3, 4]
i = numpy.random.choice(len(a))
The problem I had in the other solutions were the unnecessary conversions to list which would recreate the entire collection in a new object (slow!).
Reference: https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html?highlight=choice#numpy.random.choice
Key point from the docs about the first parameter a:
a: 1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were np.arange(a)
Since the question is very old then it's possible I'm coming at this from the convenience of newer versions supporting exactly what myself and the OP wanted.

Inserting and removing into/from sorted list in Python

I have a sorted list of integers, L, and I have a value X that I wish to insert into the list such that L's order is maintained. Similarly, I wish to quickly find and remove the first instance of X.
Questions:
How do I use the bisect module to do the first part, if possible?
Is L.remove(X) going to be the most efficient way to do the second part? Does Python detect that the list has been sorted and automatically use a logarithmic removal process?
Example code attempts:
i = bisect_left(L, y)
L.pop(i) #works
del L[bisect_left(L, i)] #doesn't work if I use this instead of pop
You use the bisect.insort() function:
bisect.insort(L, X)
L.remove(X) will scan the whole list until it finds X. Use del L[bisect.bisect_left(L, X)] instead (provided that X is indeed in L).
Note that removing from the middle of a list is still going to incur a cost as the elements from that position onwards all have to be shifted left one step. A binary tree might be a better solution if that is going to be a performance bottleneck.
You could use Raymond Hettinger's IndexableSkiplist. It performs 3 operations in O(ln n) time:
insert value
remove value
lookup value by rank
import skiplist
import random
random.seed(2013)
N = 10
skip = skiplist.IndexableSkiplist(N)
data = range(N)
random.shuffle(data)
for num in data:
skip.insert(num)
print(list(skip))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for num in data[:N//2]:
skip.remove(num)
print(list(skip))
# [0, 3, 4, 6, 9]

Categories