How to iterate over this n-dimensional dataset? - python

I have a dataset which has 4 dimensions (for now...) and I need to iterate over it.
To access a value in the dataset, I do this:
value = dataset[i,j,k,l]
Now, I can get the shape for the dataset:
shape = [4,5,2,6]
The values in shape represent the length of the dimension.
How, given the number of dimensions, can I iterate over all the elements in my dataset? Here is an example:
for i in range(shape[0]):
for j in range(shape[1]):
for k in range(shape[2]):
for l in range(shape[3]):
print('BOOM')
value = dataset[i,j,k,l]
In the future, the shape may change. So for example, shape may have 10 elements rather than the current 4.
Is there a nice and clean way to do this with Python 3?

You could use itertools.product to iterate over the cartesian product 1 of some values (in this case the indices):
import itertools
shape = [4,5,2,6]
for idx in itertools.product(*[range(s) for s in shape]):
value = dataset[idx]
print(idx, value)
# i would be "idx[0]", j "idx[1]" and so on...
However if it's a numpy array you want to iterate over, it could be easier to use np.ndenumerate:
import numpy as np
arr = np.random.random([4,5,2,6])
for idx, value in np.ndenumerate(arr):
print(idx, value)
# i would be "idx[0]", j "idx[1]" and so on...
1 You asked for clarification what itertools.product(*[range(s) for s in shape]) actually does. So I'll explain it in more details.
For example is you have this loop:
for i in range(10):
for j in range(8):
# do whatever
This can also be written using product as:
for i, j in itertools.product(range(10), range(8)):
# ^^^^^^^^---- the inner for loop
# ^^^^^^^^^-------------- the outer for loop
# do whatever
That means product is just a handy way of reducing the number of independant for-loops.
If you want to convert a variable number of for-loops to a product you essentially need two steps:
# Create the "values" each for-loop iterates over
loopover = [range(s) for s in shape]
# Unpack the list using "*" operator because "product" needs them as
# different positional arguments:
prod = itertools.product(*loopover)
for idx in prod:
i_0, i_1, ..., i_n = idx # index is a tuple that can be unpacked if you know the number of values.
# The "..." has to be replaced with the variables in real code!
# do whatever
That's equivalent to:
for i_1 in range(shape[0]):
for i_2 in range(shape[1]):
... # more loops
for i_n in range(shape[n]): # n is the length of the "shape" object
# do whatever

Related

Taking n elements at a time from 1d list and add them to 2d list

I have a list making up data, and I'd like to take 4 elements at a time from this list and put them in a 2d list where each 4-element increment is a new row of said list.
My first attempts involve input to 1d list:
list.append(input("Enter data type 1:")) list.append(input("Enter data type 2:")) etc.
and then I've tried to loop the list and to "switch" rows once the index reaches 4.
for x in range(n * 4):
for idx, y in enumerate(list):
if idx % 4 == 0:
x = x + 1
list[y] = result[x][y]
where I've initialised result according to the following:
and
ran = int(len(list)/4)
result=[[0 for x in range(ran)] for j in range(n)]
I've also attempted to ascribe a temporary empty list that will append to an initialised 2D list.
`
row.append(list)
result=[[x for x in row] for j in range(n + 1)]
#result[n]=row
print(result)
n = n + 1
row.clear()
list.clear()
so that each new loop starts with an empty row, takes input from user and copies it.
I'm at a loss for how to make result save the first entry and not be redefined at second,third,fourth entries.
I think this post is probably what you need. With np.reshape() you can just have your list filled with all the values you need and do the reshaping after in a single step.

i want to find out the index of the elements in an array of duplicate elements

a=[2, 1, 3, 5, 3, 2]
def firstDuplicate(a):
for i in range(0,len(a)):
for j in range(i+1,len(a)):
while a[i]==a[j]:
num=[j]
break
print(num)
print(firstDuplicate(a))
The output should be coming as 4 and 5 but it's coming as 4 only
You can find the indices of all duplicates in an array in O(n) time and O(1) extra space with something like the following:
def get_duplicate_indices(arr):
inds = []
for i, val in enumerate(arr):
val = abs(val)
if arr[val] >= 0:
arr[val] = -arr[val]
else:
inds.append(i)
return inds
get_duplicate_indices(a)
[4, 5]
Note that this will modify the array in place! If you want to keep your input array un-modified, replace the first few lines in the above with:
def get_duplicate_indices(a):
arr = a.copy() # so we don't modify in place. Drawback is it's not O(n) extra space
inds = []
for i, val in enumerate(a):
# ...
Essentially this uses the sign of each element in the array as an indicator of whether a number has been seen before. If we come across a negative value, it means the number we reached has been seen before, so we append the number's index to our list of already-seen indices.
Note that this can run into trouble if the values in the array are larger than the length of the array, but in this case we just extend the working array to be the same length as whatever the maximum value is in the input. Easy peasy.
There are some things wrong with your code. The following will collect the indexes of every first duplicate:
def firstDuplicate(a):
num = [] # list to collect indexes of first dupes
for i in range(len(a)-1): # second to last is enough here
for j in range(i+1, len(a)):
if a[i]==a[j]: # while-loop made little sense
num.append(j) # grow list and do not override it
break # stop inner loop after first duplicate
print(num)
There are of course more performant algorithms to achieve this that are not quadratic.

Iterating efficiently through indices of arbitrary order array

Say I have an arbitrary array of variable order N. For example:
A is a 2x3x3 array is an order 3 array with 2,3, and 3 dimiensions along it's three indices.
I would like to efficiently loop through each element. If I knew a priori the order then I could do something like (in python),
#for order 3
import numpy as np
shape = np.shape(A)
i = 0
while i < shape[0]:
j = 0
while j < shape[1]:
k = 0
while k < shape[2]:
#code using i,j,k
k += 1
j += 1
i += 1
Now suppose I don't know the order of A, i.e. I don't know a priori the length of shape. How can I permute the quickest through all elements of the array?
There are many ways to do this, e.g. iterating over a.ravel() or a.flat. However, looping over every single element of an array in a Python loop will never be particularly efficient.
I don't think it matters which index you choose to permute over first, which index you choose to permute over second, etc. because your inner-most while statement will always be executed once per combination of i, j, and k.
If you need to keep the results of your operation (and assuming its a function of A and i,j,k) You'd want to use something like this:
import itertools
import numpy as np
results = ( (position, code(A,position))
for indices in itertools.product(*(range(i) for i in np.shape(A))))
Then you can iterate the results getting out the position and return value of code for each position. Or convert the generator expression to a list if you need to access the results multiple times.
If the array of of the format array = [[[1,2,3,4],[1,2]],[[1],[1,2,3]]]
You could use the following structure:
array = [[[1,2,3,4],[1,2]],[[1],[1,2,3]]]
indices = []
def iter_array(array,indices):
indices.append(0)
for a in array:
if isinstance(a[0],list):
iter_array(a,indices)
else:
indices.append(0)
for nonlist in a:
#do something using each element in indices
#print(indices)
indices.append(indices.pop()+1)
indices.pop()
indices.append(indices.pop()+1)
indices.pop()
iter_array(array,indices)
This should work for the usual nested list "arrays" I don't know if it would be possible to mimic this using numpy's array structure.

How is this 2D array being sized by FOR loops?

Question background:
This is the first piece of Python code I've looked at and as such I'm assuming that my thread title is correct in explaining what this code is actually trying to achieve i.e setting a 2D array.
The code:
The code I'm looking at sets the size of a 2D array based on two for loops:
n = len(sentences)
values = [[0 for x in xrange(n)] for x in xrange(n)]
for i in range(0, n):
for j in range(0, n):
values[i][j] = self.sentences_intersection(sentences[i], sentences[j])
I could understand it if each side of the array was set with using the length property of the sentences variable, unless this is in effect what xrange is doing by using the loop size based on the length?
Any helping with explaing how the array is being set would be great.
This code is actually a bit redundant.
Firstly you need to realize that values is not an array, it is a list. A list is a dynamically sized one-dimensional structure.
The second line of the code uses a nested list comprehension to create one list of size n, each element of which is itself a list consisting of n zeros.
The second loop goes through this list of lists, and sets each element according to whatever sentences_intersection does.
The reason this is redundant is because lists don't need to be pre-allocated. Rather than doing two separate iterations, really the author should just be building up the lists with the correct values, then appending them.
This would be better:
n = len(sentences)
values = []
for i in range(0, n):
inner = []
for j in range(0, n):
inner.append(self.sentences_intersection(sentences[i], sentences[j]))
values.append(inner)
but you could actually do the whole thing in the list comprehension if you wanted:
values = [[self.sentences_intersection(sentences[i], sentences[j]) for i in xrange(n)] for j in xrange(n)]

How do I get an empty list of any size in Python?

I basically want a Python equivalent of this Array in C:
int a[x];
but in python I declare an array like:
a = []
and the problem is I want to assign random slots with values like:
a[4] = 1
but I can't do that with Python, since the Python list is empty (of length 0).
If by "array" you actually mean a Python list, you can use
a = [0] * 10
or
a = [None] * 10
You can't do exactly what you want in Python (if I read you correctly). You need to put values in for each element of the list (or as you called it, array).
But, try this:
a = [0 for x in range(N)] # N = size of list you want
a[i] = 5 # as long as i < N, you're okay
For lists of other types, use something besides 0. None is often a good choice as well.
You can use numpy:
import numpy as np
Example from Empty Array:
np.empty([2, 2])
array([[ -9.74499359e+001, 6.69583040e-309],
[ 2.13182611e-314, 3.06959433e-309]])
also you can extend that with extend method of list.
a= []
a.extend([None]*10)
a.extend([None]*20)
Just declare the list and append each element. For ex:
a = []
a.append('first item')
a.append('second item')
If you (or other searchers of this question) were actually interested in creating a contiguous array to fill with integers, consider bytearray and memoryivew:
# cast() is available starting Python 3.3
size = 10**6
ints = memoryview(bytearray(size)).cast('i')
ints.contiguous, ints.itemsize, ints.shape
# (True, 4, (250000,))
ints[0]
# 0
ints[0] = 16
ints[0]
# 16
It is also possible to create an empty array with a certain size:
array = [[] for _ in range(n)] # n equal to your desired size
array[0].append(5) # it appends 5 to an empty list, then array[0] is [5]
if you define it as array = [] * n then if you modify one item, all are changed the same way, because of its mutability.
x=[]
for i in range(0,5):
x.append(i)
print(x[i])
If you actually want a C-style array
import array
a = array.array('i', x * [0])
a[3] = 5
try:
[5] = 'a'
except TypeError:
print('integers only allowed')
Note that there's no concept of un-initialized variable in python. A variable is a name that is bound to a value, so that value must have something. In the example above the array is initialized with zeros.
However, this is uncommon in python, unless you actually need it for low-level stuff. In most cases, you are better-off using an empty list or empty numpy array, as other answers suggest.
The (I think only) way to assign "random slots" is to use a dictionary, e.g.:
a = {} # initialize empty dictionary
a[4] = 1 # define the entry for index 4 to be equal to 1
a['French','red'] = 'rouge' # the entry for index (French,red) is "rouge".
This can be handy for "quick hacks", and the lookup overhead is irrelevant if you don't have intensive access to the array's elements.
Otherwise, it will be more efficient to work with pre-allocated (e.g., numpy) arrays of fixed size, which you can create with a = np.empty(10) (for an non-initialized vector of length 10) or a = np.zeros([5,5]) for a 5x5 matrix initialized with zeros).
Remark: in your C example, you also have to allocate the array (your int a[x];) before assigning a (not so) "random slot" (namely, integer index between 0 and x-1).
References:
The dict datatype: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
Function np.empty(): https://numpy.org/doc/stable/reference/generated/numpy.empty.html

Categories