Random generation in heavily nested loops

Random generation in heavily nested loops - python

I have a little game I've been making for a school project, and it has worked up until now.
I used a very messy nested list system for multiple screens, each with a 2D array for objects on screen. These 2D "level" arrays are also arranged in their own 2D array, which makes up the "world". The strings correspond to an object tile, which is drawn using pygame.
My problem is that every level array is the same in the world array, and I can't understand why that is.
def generate_world(load):
# This bit not important
if load is True:
in_array()
# This is
else:
for world_y in Game_world.game_array:
for world_x in world_y:
generate_clutter(world_x)
print Game_world.game_array
out_array()
# Current_level.array = Level.new_level_array
def generate_clutter(world_x):
for level_y in world_x:
for level_x, _ in enumerate(level_y):
### GENERATE CLUTTER ###
i = randrange(1, 24)
if i == 19 or i == 20:
level_y[level_x] = "g1"
elif i == 21 or i == 22:
level_y[level_x] = "g2"
elif i == 23:
level_y[level_x] = "c1"
else:
level_y[level_x] = "-"
I'm sure it's something simple I'm overlooking, but to me it seems the random generation should be carried out for every single list item individually, so I can't understand the duplication.
I know quadruple nested lists aren't pretty, but I think I'm in too deep to make any serious changes now.
EDIT:
This is the gist of how the lists/arrays are initially created. Their size doesn't ever change, existing strings are just replaced.
class World:
def __init__(self, name, load):
if load is False:
n = [["-" for x in range(20)]for x in range(15)]
self.game_array = [[n, n, n, n, n, n, n],
[n, n, n, n, n, n, n],
[n, n, n, n, n, n, n]]

In Python, everything is an object - even integer values. How you initialize an 'empty' array can have some surprising results.
Consider this initialization:
>>> l=[[1]*2]*2
>>> l
[[1, 1], [1, 1]]
You appear to have created a 2x2 matrix with each cell containing the value 1. In fact, you have created a list of two lists (each containing [1,1]). Deeper still, you have created a list of two references to a single list [1,1].
The results of this can be seen if you now modify one of the cells
>>> l[0][0]=2
>>> l
[[2, 1], [2, 1]]
>>>
Notice that both l[0][0] and l[1][0] were modified.
To avoid this effect, you need to jump through some hoops
>>> l2 = [[1 for _ in range(2)] for _ in range(2)]
>>> l2
[[1, 1], [1, 1]]
>>> l2[0][0]=2
>>> l2
[[2, 1], [1, 1]]
>>>
If you used the former approach to initialize Game_world.game_array every assignment to level_y[level_x] will be modifying multiple cells in your array.
Just as an additional comment, your generate_clutter function can be simplified slightly using a dict
def generate_clutter(world_x):
clutter_map = {19:"g1", 20:"g1", 21:"g2", 22:"g2", 23:"c1"}
for level_y in world_x:
for level_x, _ in enumerate(level_y):
level_y[level_x] = clutter_map.get(randrange(1,24),'-')
This separates the logic of selecting the clutter representation from the actual mapping of values and will be much easier to expand and maintain.
Looking at your edit, the initialization needs to be something like:
self.game_array = [
[
[
["-" for x in range(20)]
for x in range(15)
]
for x in range(7)
]
for x in range(3)
]

Related

how to avoid List assignment index out of range [duplicate]

How do I create an empty list that can hold 10 elements?
After that, I want to assign values in that list. For example:
xs = list()
for i in range(0, 9):
xs[i] = i
However, that gives IndexError: list assignment index out of range. Why?
Editor's note:
In Python, lists do not have a set capacity, but it is not possible to assign to elements that aren't already present. Answers here show code that creates a list with 10 "dummy" elements to replace later. However, most beginners encountering this problem really just want to build a list by adding elements to it. That should be done using the .append method, although there will often be problem-specific ways to create the list more directly. Please see Why does this iterative list-growing code give IndexError: list assignment index out of range? How can I repeatedly add elements to a list? for details.

You cannot assign to a list like xs[i] = value, unless the list already is initialized with at least i+1 elements. Instead, use xs.append(value) to add elements to the end of the list. (Though you could use the assignment notation if you were using a dictionary instead of a list.)
Creating an empty list:
>>> xs = [None] * 10
>>> xs
[None, None, None, None, None, None, None, None, None, None]
Assigning a value to an existing element of the above list:
>>> xs[1] = 5
>>> xs
[None, 5, None, None, None, None, None, None, None, None]
Keep in mind that something like xs[15] = 5 would still fail, as our list has only 10 elements.
range(x) creates a list from [0, 1, 2, ... x-1]
# 2.X only. Use list(range(10)) in 3.X.
>>> xs = range(10)
>>> xs
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Using a function to create a list:
>>> def display():
... xs = []
... for i in range(9): # This is just to tell you how to create a list.
... xs.append(i)
... return xs
...
>>> print display()
[0, 1, 2, 3, 4, 5, 6, 7, 8]
List comprehension (Using the squares because for range you don't need to do all this, you can just return range(0,9) ):
>>> def display():
... return [x**2 for x in range(9)]
...
>>> print display()
[0, 1, 4, 9, 16, 25, 36, 49, 64]

Try this instead:
lst = [None] * 10
The above will create a list of size 10, where each position is initialized to None. After that, you can add elements to it:
lst = [None] * 10
for i in range(10):
lst[i] = i
Admittedly, that's not the Pythonic way to do things. Better do this:
lst = []
for i in range(10):
lst.append(i)
Or even simpler, in Python 2.x you can do this to initialize a list with values from 0 to 9:
lst = range(10)
And in Python 3.x:
lst = list(range(10))

varunl's currently accepted answer
>>> l = [None] * 10
>>> l
[None, None, None, None, None, None, None, None, None, None]
Works well for non-reference types like numbers. Unfortunately if you want to create a list-of-lists you will run into referencing errors. Example in Python 2.7.6:
>>> a = [[]]*10
>>> a
[[], [], [], [], [], [], [], [], [], []]
>>> a[0].append(0)
>>> a
[[0], [0], [0], [0], [0], [0], [0], [0], [0], [0]]
>>>
As you can see, each element is pointing to the same list object. To get around this, you can create a method that will initialize each position to a different object reference.
def init_list_of_objects(size):
list_of_objects = list()
for i in range(0,size):
list_of_objects.append( list() ) #different object reference each time
return list_of_objects
>>> a = init_list_of_objects(10)
>>> a
[[], [], [], [], [], [], [], [], [], []]
>>> a[0].append(0)
>>> a
[[0], [], [], [], [], [], [], [], [], []]
>>>
There is likely a default, built-in python way of doing this (instead of writing a function), but I'm not sure what it is. Would be happy to be corrected!
Edit: It's [ [] for _ in range(10)]
Example :
>>> [ [random.random() for _ in range(2) ] for _ in range(5)]
>>> [[0.7528051908943816, 0.4325669600055032], [0.510983236521753, 0.7789949902294716], [0.09475179523690558, 0.30216475640534635], [0.3996890132468158, 0.6374322093017013], [0.3374204010027543, 0.4514925173253973]]

There are two "quick" methods:
x = length_of_your_list
a = [None]*x
# or
a = [None for _ in xrange(x)]
It appears that [None]*x is faster:
>>> from timeit import timeit
>>> timeit("[None]*100",number=10000)
0.023542165756225586
>>> timeit("[None for _ in xrange(100)]",number=10000)
0.07616496086120605
But if you are ok with a range (e.g. [0,1,2,3,...,x-1]), then range(x) might be fastest:
>>> timeit("range(100)",number=10000)
0.012513160705566406

You can .append(element) to the list, e.g.:
s1.append(i)
What you are currently trying to do is access an element (s1[i]) that does not exist.

I'm surprised nobody suggest this simple approach to creating a list of empty lists. This is an old thread, but just adding this for completeness. This will create a list of 10 empty lists
x = [[] for i in range(10)]

The accepted answer has some gotchas. For example:
>>> a = [{}] * 3
>>> a
[{}, {}, {}]
>>> a[0]['hello'] = 5
>>> a
[{'hello': 5}, {'hello': 5}, {'hello': 5}]
>>>
So each dictionary refers to the same object. Same holds true if you initialize with arrays or objects.
You could do this instead:
>>> b = [{} for i in range(0, 3)]
>>> b
[{}, {}, {}]
>>> b[0]['hello'] = 6
>>> b
[{'hello': 6}, {}, {}]
>>>

How do I create an empty list that can hold 10 elements?
All lists can hold as many elements as you like, subject only to the limit of available memory. The only "size" of a list that matters is the number of elements currently in it.
However, that gives IndexError: list assignment index out of range. Why?
The first time through the loop, i is equal to 0. Thus, we attempt xs[0] = 0. This does not work because there are currently 0 elements in the list, so 0 is not a valid index.
We cannot use indexing to write list elements that don't already exist - we can only overwrite existing ones. Instead, we should use the .append method:
xs = list();
for i in range(0, 9):
xs.append(i)
The next problem you will note is that your list will actually have only 9 elements, because the end point is skipped by the range function. (As side notes: [] works just as well as list(), the semicolon is unnecessary, and only one parameter is needed for range if you're starting from 0.) Addressing those issues gives:
xs = []
for i in range(10):
xs.append(i)
However, this is still missing the mark - range is not some magical keyword that's part of the language the way for (or, say, def) is.
In 2.x, range is a function, which directly returns the list that we already wanted:
xs = range(10) # 2.x specific!
# In 3.x, we don't get a list; we can do a lot of things with the
# result, but we can't e.g. append or replace elements.
In 3.x, range is a cleverly designed class, and range(10) creates an instance. To get the desired list, we can simply feed it to the list constructor:
xs = list(range(10)) # correct in 3.x, redundant in 2.x

One simple way to create a 2D matrix of size n using nested list comprehensions:
m = [[None for _ in range(n)] for _ in range(n)]

I'm a bit surprised that the easiest way to create an initialised list is not in any of these answers. Just use a generator in the list function:
list(range(9))

Another option is to use numpy for fixed size arrays (of pointers):
> pip install numpy
import numpy as np
a = np.empty(10, dtype=np.object)
a[1] = 2
a[5] = "john"
a[3] = []
If you just want numbers, you can do with numpy:
a = np.arange(10)

Here's my code for 2D list in python which would read no. of rows from the input :
empty = []
row = int(input())
for i in range(row):
temp = list(map(int, input().split()))
empty.append(temp)
for i in empty:
for j in i:
print(j, end=' ')
print('')

A list is always "iterable" and you can always add new elements to it:
insert: list.insert(indexPosition, value)
append: list.append(value)
extend: list.extend(value)
In your case, you had instantiated an empty list of length 0. Therefore, when you try to add any value to the list using the list index (i), it is referring to a location that does not exist. Therefore, you were getting the error "IndexError: list assignment index out of range".
You can try this instead:
s1 = list();
for i in range(0,9):
s1.append(i)
print (s1)
To create a list of size 10(let's say), you can first create an empty array, like np.empty(10) and then convert it to list using arrayName.tolist(). Alternately, you can chain them as well.
**`np.empty(10).tolist()`**

I came across this SO question while searching for a similar problem. I had to build a 2D array and then replace some elements of each list (in 2D array) with elements from a dict.
I then came across this SO question which helped me, maybe this will help other beginners to get around.
The key trick was to initialize the 2D array as an numpy array and then using array[i,j] instead of array[i][j].
For reference this is the piece of code where I had to use this :
nd_array = []
for i in range(30):
nd_array.append(np.zeros(shape = (32,1)))
new_array = []
for i in range(len(lines)):
new_array.append(nd_array)
new_array = np.asarray(new_array)
for i in range(len(lines)):
splits = lines[i].split(' ')
for j in range(len(splits)):
#print(new_array[i][j])
new_array[i,j] = final_embeddings[dictionary[str(splits[j])]-1].reshape(32,1)
Now I know we can use list comprehension but for simplicity sake I am using a nested for loop. Hope this helps others who come across this post.

Not technically a list but similar to a list in terms of functionality and it's a fixed length
from collections import deque
my_deque_size_10 = deque(maxlen=10)
If it's full, ie got 10 items then adding another item results in item #index 0 being discarded. FIFO..but you can also append in either direction.
Used in say
a rolling average of stats
piping a list through it aka sliding a window over a list until you get a match against another deque object.
If you need a list then when full just use list(deque object)

s1 = []
for i in range(11):
s1.append(i)
print s1
To create a list, just use these brackets: "[]"
To add something to a list, use list.append()

Make it more reusable as a function.
def createEmptyList(length,fill=None):
'''
return a (empty) list of a given length
Example:
print createEmptyList(3,-1)
>> [-1, -1, -1]
print createEmptyList(4)
>> [None, None, None, None]
'''
return [fill] * length

This code generates an array that contains 10 random numbers.
import random
numrand=[]
for i in range(0,10):
a = random.randint(1,50)
numrand.append(a)
print(a,i)
print(numrand)

How to modify this power set function to only include 'valid' subsets?

I am using a binomial tree as a list with 2^T - 1 'nodes' and want to create a set of subsets that work within some given criteria (outlined below) on the elements of the list. Right now, I use the following code to generate a tree
def gen_nodes(T):
nodes = []
for t in range(T):
for i in range(2**t):
nodes += [[t,1 + i]]
return nodes
For instance, for T = 1, we get the root
gen_nodes(1) = [[0,1]],
but for T = 2 and T = 3, we get
gen_nodes(2) = [[0,1], [1,1], [1,2]]
gen_nodes(3) = [[0,1], [1,1], [1,2], [2,1], [2,2], [2,3], [2,4]],
et cetera. Right now, I'm using a powerset function courtesy of this wonderful contributor,
def powerset(s):
x = len(s)
masks = [1 << i for i in range(x)]
for i in range(1 << x):
yield [ss for mask, ss in zip(masks, s) if i & mask]
This has worked great for me, but as I'm sure you've caught at this point, the length of the powerset gets entirely too large with the time complexity of something like O(2^(2^T)). Initially, I was going to just create the entire set by brute force and then apply constraints on valid subsets after creating the larger set of subsets, but it seem's like I'm going to run into some overflow problems if I don't modify the powerset function with those constraints.
Basically, I only want the lists e within the output of ls = list(powerset(gen_nodes(T))) such that for all i from 0:len(e), e[i] in e implies [e[i] - 1, e[i]] OR [e[i] - 1, e[i] - 1] are in e.
Returning to the binary tree analog, this is basically saying for all [t,i] in [0,t] x [1,t+1], [t,i] in e only if [t-1,i] OR [t-1,i-1] in e, basically if [t,i] is in e, there there must be at least one "path" from [0,1] to [t,i] where each node in the path is also in e. I suspect this will condense the size of subsets output immensely, but I'm unsure of how to implement it. I think I might have to forgo using the powerset function, but I'm not sure how to code it in that case, and would therefore appreciate any help I can get.
EDIT: I should include desired output as commented. Additionally, I've included the function that has been 'working' for me thus far, but it's horribly inefficient currently. First, let pset be the function that solves this problem. and pg(i) = pset(gen_nodes(i)) for brevity. Then
pg(1) = [[0,1]],
pg(2) = [[[0, 1]], [[0, 1], [1, 1]], [[0, 1],
[1, 2]], [[0, 1], [1, 1], [1, 2]]]
Unfortunately, this set still grows very fast (pg(3) is 17 lists of length up to 6 pairs, pg(4) is 97 lists of length up to 10 pairs, etc), so I can't share much more on this post. However, I did develop a function that works, but seems to be horribly inefficient (pg(6) takes half a second, and then pg(7) takes 4 minutes to complete). It is attached below:
import time
def pset(lst):
pw_set = [[]]
start_time = time.time()
for i in range(0,len(lst)):
for j in range(0,len(pw_set)):
ele = pw_set[j].copy()
if lst[i] == [0,1]:
ele = ele + [lst[i]]
pw_set = pw_set + [ele]
else:
if [lst[i][0] - 1,lst[i][1]] in ele or [lst[i][0] - 1,lst[i][1] - 1] in ele:
ele = ele + [lst[i]]
pw_set = pw_set + [ele]
print("--- %s seconds ---" % (time.time() - start_time))
return pw_set[1:]
Here, I just checked if the 'node' being added had at least one of the nodes preceding it in the set: if not, it was skipped. I checked up to pg(3) and the output is as desired, so I'm thinking it's working, just inefficient. Thus, I've (seemingly) solved the memory overflow problem, now I just need to make this efficient.

Optimize testing all combinations of rows from multiple NumPy arrays

I have three NumPy arrays of ints, same number of columns, arbitrary number of rows each. I am interested in all instances where a row of the first one plus a row of the second one gives a row of the third one ([3, 1, 4] + [1, 5, 9] = [4, 6, 13]).
Here is a pseudo-code:
for i, j in rows(array1), rows(array2):
if i + j is in rows(array3):
somehow store the rows this occured at (eg. (1,2,5) if 1st row of
array1 + 2nd row of array2 give 5th row of array3)
I will need to run this for very big matrices so I have two questions:
(1) I can write the above using nested loops but is there a quicker way, perhaps list comprehensions or itertools?
(2) What is the fastest/most memory-efficient way to store the triples? Later I will need to create a heatmap using two as coordinates and the first one as the corresponding value eg. point (2,5) has value 1 in the pseudo-code example.
Would be very grateful for any tips - I know this sounds quite simple but it needs to run fast and I have very little experience with optimization.
edit: My ugly code was requested in comments
import numpy as np
#random arrays
A = np.array([[-1,0],[0,-1],[4,1], [-1,2]])
B = np.array([[1,2],[0,3],[3,1]])
C = np.array([[0,2],[2,3]])
#triples stored as numbers with 2 coordinates in a otherwise-zero matrix
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
for i in range(A.shape[0]):
for j in range(B.shape[0]):
for k in range(C.shape[0]):
if np.array_equal((A[i,] + B[j,]), C[k,]):
output_matrix[j, k] = i+1
print(output_matrix)

We can leverage broadcasting to perform all those summations and comparison in a vectorized manner and then use np.where on it to get the indices corresponding to the matching ones and finally index and assign -
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
mask = ((A[:,None,None,:] + B[None,:,None,:]) == C).all(-1)
I,J,K = np.where(mask)
output_matrix[J,K] = I+1

(1) Improvements
You can use sets for the final result in the third matrix, as a + b = c must hold identically. This already replaces one nested loop with a constant-time lookup. I will show you an example of how to do this below, but we first ought to introduce some notation.
For a set-based approach to work, we need a hashable type. Lists will thus not work, but a tuple will: it is an ordered, immutable structure. There is, however, a problem: tuple addition is defined as appending, that is,
(0, 1) + (1, 0) = (0, 1, 1, 0).
This will not do for our use-case: we need element-wise addition. As such, we subclass the built-in tuple as follows,
class AdditionTuple(tuple):
def __add__(self, other):
"""
Element-wise addition.
"""
if len(self) != len(other):
raise ValueError("Undefined behaviour!")
return AdditionTuple(self[idx] + other[idx]
for idx in range(len(self)))
Where we override the default behaviour of __add__. Now that we have a data-type amenable to our problem, let's prepare the data.
You give us,
A = [[-1, 0], [0, -1], [4, 1], [-1, 2]]
B = [[1, 2], [0, 3], [3, 1]]
C = [[0, 2], [2, 3]]
To work with. I say,
from types import SimpleNamespace
A = [AdditionTuple(item) for item in A]
B = [AdditionTuple(item) for item in B]
C = {tuple(item): SimpleNamespace(idx=idx, values=[])
for idx, item in enumerate(C)}
That is, we modify A and B to use our new data-type, and turn C into a dictionary which supports (amortised) O(1) look-up times.
We can now do the following, eliminating one loop altogether,
from itertools import product
for a, b in product(enumerate(A), enumerate(B)):
idx_a, a_i = a
idx_b, b_j = b
if a_i + b_j in C: # a_i + b_j == c_k, identically
C[a_i + b_j].values.append((idx_a, idx_b))
Then,
>>>print(C)
{(2, 3): namespace(idx=1, values=[(3, 2)]), (0, 2): namespace(idx=0, values=[(0, 0), (1, 1)])}
Where for each value in C, you get the index of that value (as idx), and a list of tuples of (idx_a, idx_b) whose elements of A and B together sum to the value at idx in C.
Let us briefly analyse the complexity of this algorithm. Redefining the lists A, B, and C as above is linear in the length of the lists. Iterating over A and B is of course in O(|A| * |B|), and the nested condition computes the element-wise addition of the tuples: this is linear in the length of the tuples themselves, which we shall denote k. The whole algorithm then runs in O(k * |A| * |B|).
This is a substantial improvement over your current O(k * |A| * |B| * |C|) algorithm.
(2) Matrix plotting
Use a dok_matrix, a sparse SciPy matrix representation. Then you can use any heatmap-plotting library you like on the matrix, e.g. Seaborn's heatmap.

Permutations with repetition without two consecutive equal elements

I need a function that generates all the permutation with repetition of an iterable with the clause that two consecutive elements must be different; for example
f([0,1],3).sort()==[(0,1,0),(1,0,1)]
#or
f([0,1],3).sort()==[[0,1,0],[1,0,1]]
#I don't need the elements in the list to be sorted.
#the elements of the return can be tuples or lists, it doesn't change anything
Unfortunatly itertools.permutation doesn't work for what I need (each element in the iterable is present once or no times in the return)
I've tried a bunch of definitions; first, filterting elements from itertools.product(iterable,repeat=r) input, but is too slow for what I need.
from itertools import product
def crp0(iterable,r):
l=[]
for f in product(iterable,repeat=r):
#print(f)
b=True
last=None #supposing no element of the iterable is None, which is fine for me
for element in f:
if element==last:
b=False
break
last=element
if b: l.append(f)
return l
Second, I tried to build r for cycle, one inside the other (where r is the class of the permutation, represented as k in math).
def crp2(iterable,r):
a=list(range(0,r))
s="\n"
tab=" " #4 spaces
l=[]
for i in a:
s+=(2*i*tab+"for a["+str(i)+"] in iterable:\n"+
(2*i+1)*tab+"if "+str(i)+"==0 or a["+str(i)+"]!=a["+str(i-1)+"]:\n")
s+=(2*i+2)*tab+"l.append(a.copy())"
exec(s)
return l
I know, there's no need you remember me: exec is ugly, exec can be dangerous, exec isn't easy-readable... I know.
To understand better the function I suggest you to replace exec(s) with print(s).
I give you an example of what string is inside the exec for crp([0,1],2):
for a[0] in iterable:
if 0==0 or a[0]!=a[-1]:
for a[1] in iterable:
if 1==0 or a[1]!=a[0]:
l.append(a.copy())
But, apart from using exec, I need a better functions because crp2 is still too slow (even if faster than crp0); there's any way to recreate the code with r for without using exec? There's any other way to do what I need?

You could prepare the sequences in two halves, then preprocess the second halves to find the compatible choices.
def crp2(I,r):
r0=r//2
r1=r-r0
A=crp0(I,r0) # Prepare first half sequences
B=crp0(I,r1) # Prepare second half sequences
D = {} # Dictionary showing compatible second half sequences for each token
for i in I:
D[i] = [b for b in B if b[0]!=i]
return [a+b for a in A for b in D[a[-1]]]
In a test with iterable=[0,1,2] and r=15, I found this method to be over a hundred times faster than just using crp0.

You could try to return a generator instead of a list. With large values of r, your method will take a very long time to process product(iterable,repeat=r) and will return a huge list.
With this variant, you should get the first element very fast:
from itertools import product
def crp0(iterable, r):
for f in product(iterable, repeat=r):
last = f[0]
b = True
for element in f[1:]:
if element == last:
b = False
break
last = element
if b:
yield f
for no_repetition in crp0([0, 1, 2], 12):
print(no_repetition)
# (0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
# (1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0)

Instead of filtering the elements, you could generate a list directly with only the correct elements. This method uses recursion to create the cartesian product:
def product_no_repetition(iterable, r, last_element=None):
if r == 0:
return [[]]
else:
return [p + [x] for x in iterable
for p in product_no_repetition(iterable, r - 1, x)
if x != last_element]
for no_repetition in product_no_repetition([0, 1], 12):
print(no_repetition)

I agree with #EricDuminil's comment that you do not want "Permutations with repetition." You want a significant subset of the product of the iterable with itself multiple times. I don't know what name is best: I'll just call them products.
Here is an approach that builds each product line without building all the products then filtering out the ones you want. My approach is to work primarily with the indices of the iterable rather than the iterable itself--and not all the indices, but ignoring the last one. So instead of working directly with [2, 3, 5, 7] I work with [0, 1, 2]. Then I work with the products of those indices. I can transform a product such as [1, 2, 2] where r=3 by comparing each index with the previous one. If an index is greater than or equal to the previous one I increment the current index by one. This prevents two indices from being equal, and this also gets be back to using all the indices. So [1, 2, 2] is transformed to [1, 2, 3] where the final 2 was changed to a 3. I now use those indices to select the appropriate items from the iterable, so the iterable [2, 3, 5, 7] with r=3 gets the line [3, 5, 7]. The first index is treated differently, since it has no previous index. My code is:
from itertools import product
def crp3(iterable, r):
L = []
for k in range(len(iterable)):
for f in product(range(len(iterable)-1), repeat=r-1):
ndx = k
a = [iterable[ndx]]
for j in range(r-1):
ndx = f[j] if f[j] < ndx else f[j] + 1
a.append(iterable[ndx])
L.append(a)
return L
Using %timeit in my Spyder/IPython configuration on crp3([0,1], 3) shows 8.54 µs per loop while your crp2([0,1], 3) shows 133 µs per loop. That shows a sizeable speed improvement! My routine works best where iterable is short and r is large--your routine finds len ** r lines (where len is the length of the iterable) and filters them while mine finds len * (len-1) ** (r-1) lines without filtering.
By the way, your crp2() does do filtering, as shown by the if lines in your code that is execed. The sole if in my code does not filter a line, it modifies an item in the line. My code does return surprising results if the items in the iterable are not unique: if that is a problem, just change the iterable to a set to remove the duplicates. Note that I replaced your l name with L: I think l is too easy to confuse with 1 or I and should be avoided. My code could easily be changed to a generator: replace L.append(a) with yield a and remove the lines L = [] and return L.

How about:
from itertools import product
result = [ x for x in product(iterable,repeat=r) if all(x[i-1] != x[i] for i in range(1,len(x))) ]

Elaborating on #peter-de-rivaz's idea (divide and conquer). When you divide the sequence to create into two subsequences, those subsequences are the same or very close. If r = 2*k is even, store the result of crp(k) in a list and merge it with itself. If r=2*k+1, store the result of crp(k) in a list and merge it with itself and with L.
def large(L, r):
if r <= 4: # do not end the divide: too slow
return small(L, r)
n = r//2
M = large(L, r//2)
if r%2 == 0:
return [x + y for x in M for y in M if x[-1] != y[0]]
else:
return [x + y + (e,) for x in M for y in M for e in L if x[-1] != y[0] and y[-1] != e]
small is an adaptation from #eric-duminil's answer using the famous for...else loop of Python:
from itertools import product
def small(iterable, r):
for seq in product(iterable, repeat=r):
prev, *tail = seq
for e in tail:
if e == prev:
break
prev = e
else:
yield seq
A small benchmark:
print(timeit.timeit(lambda: crp2( [0, 1, 2], 10), number=1000))
#0.16290732200013736
print(timeit.timeit(lambda: crp2( [0, 1, 2, 3], 15), number=10))
#24.798989593000442
print(timeit.timeit(lambda: large( [0, 1, 2], 10), number=1000))
#0.0071403849997295765
print(timeit.timeit(lambda: large( [0, 1, 2, 3], 15), number=10))
#0.03471425700081454

Python: .append(0)

I would like to ask what the following does in Python.
It was taken from http://danieljlewis.org/files/2010/06/Jenks.pdf
I have entered comments telling what I think is happening there.
# Seems to be a function that returns a float vector
# dataList seems to be a vector of flat.
# numClass seems to an int
def getJenksBreaks( dataList, numClass ):
# dataList seems to be a vector of float. "Sort" seems to sort it ascendingly
dataList.sort()
# create a 1-dimensional vector
mat1 = []
# "in range" seems to be something like "for i = 0 to len(dataList)+1)
for i in range(0,len(dataList)+1):
# create a 1-dimensional-vector?
temp = []
for j in range(0,numClass+1):
# append a zero to the vector?
temp.append(0)
# append the vector to a vector??
mat1.append(temp)
(...)
I am a little confused because in the pdf there are no explicit variable declarations. However I think and hope I could guess the variables.

Yes, the method append() adds elements to the end of the list. I think your interpretation of the code is correct.
But note the following:
x =[1,2,3,4]
x.append(5)
print(x)
[1, 2, 3, 4, 5]
while
x.append([6,7])
print(x)
[1, 2, 3, 4, 5, [6, 7]]
If you want something like
[1, 2, 3, 4, 5, 6, 7]
you may use extend()
x.extend([6,7])
print(x)
[1, 2, 3, 4, 5, 6, 7]

Python doesn't have explicit variable declarations. It's dynamically typed, variables are whatever type they get assigned to.
Your assessment of the code is pretty much correct.
One detail: The range function goes up to, but does not include, the last element. So the +1 in the second argument to range causes the last iterated value to be len(dataList) and numClass, respectively. This looks suspicious, because the range is zero-indexed, which means it will perform a total of len(dataList) + 1 iterations (which seems suspicious).
Presumably dataList.sort() modifies the original value of dataList, which is the traditional behavior of the .sort() method.
It is indeed appending the new vector to the initial one, if you look at the full source code there are several blocks that continue to concatenate more vectors to mat1.

append is a list function used to append a value at the end of the list
mat1 and temp together are creating a 2D array (eg = [[], [], []]) or matrix of (m x n)
where m = len(dataList)+1 and n = numClass
the resultant matrix is a zero martix as all its value is 0.

In Python, variables are implicitely declared. When you type this:
i = 1
i is set to a value of 1, which happens to be an integer. So we will talk of i as being an integer, although i is only a reference to an integer value. The consequence of that is that you don't need type declarations as in C++ or Java.
Your understanding is mostly correct, as for the comments. [] refers to a list. You can think of it as a linked-list (although its actual implementation is closer to std::vectors for instance).
As Python variables are only references to objects in general, lists are effectively lists of references, and can potentially hold any kind of values. This is valid Python:
# A vector of numbers
vect = [1.0, 2.0, 3.0, 4.0]
But this is perfectly valid code as well:
# The list of my objects:
list = [1, [2,"a"], True, 'foo', object()]
This list contains an integer, another list, a boolean... In Python, you usually rely on duck typing for your variable types, so this is not a problem.
Finally, one of the methods of list is sort, which sorts it in-place, as you correctly guessed, and the range function generates a range of numbers.
The syntax for x in L: ... iterates over the content of L (assuming it is iterable) and sets the variable x to each of the successive values in that context. For example:
>>> for x in ['a', 'b', 'c']:
... print x
a
b
c
Since range generates a range of numbers, this is effectively the idiomatic way to generate a for i = 0; i < N; i += 1 type of loop:
>>> for i in range(4): # range(4) == [0,1,2,3]
... print i
0
1
2
3

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Random generation in heavily nested loops - python

Related

how to avoid List assignment index out of range [duplicate]

How to modify this power set function to only include 'valid' subsets?

Optimize testing all combinations of rows from multiple NumPy arrays

Permutations with repetition without two consecutive equal elements

Python: .append(0)

Categories

Resources