Access each element of a multidimensional array without knowing the individual dimensions - python

I have a multidimensional array but I won't know the number of dimensions or the size of each dimension. How can I generalize the code such that I can access each element of the array individually?
import numpy as np
import random
myRand = np.random.rand(5, 6, 7)
#print (myRand.shape[0])
# This works great if I already know that myRand has 3 dimensions. What if I don't know that?
mySum = 0.0
for i in range(myRand.shape[0]):
for j in range(myRand.shape[1]):
for k in range(myRand.shape[2]):
# Do something with myRand[i,j,k]

You could use itertools to do so.
the following piece of code will generate the indices which you will be able to access the array while obtaining them:
import numpy as np
import itertools
v1 = np.random.randint(5, size=2)
v2 = np.random.randint(5, size=(2, 4))
v3 = np.random.randint(5, size=(2, 3, 2))
# v1
args1 = [list(range(e)) for e in list(v1.shape)]
print(v1)
for combination in itertools.product(*args1):
print(v1[combination])
# v2
args2 = [list(range(e)) for e in list(v2.shape)]
print(v2)
for combination in itertools.product(*args2):
print(v2[combination])
# v3
args3 = [list(range(e)) for e in list(v3.shape)]
print(v3)
for combination in itertools.product(*args3):
print(v3[combination])
Tested it on a simple arrays with different sizes and it works fine.

If you do not need to retain the indices within each of the dimensions for calculation, you can just use numpy.nditer
>>> a = np.arange(8).reshape((2, 2, 2))
>>> a
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
>>> for i in np.nditer(a):
... print(i)
...
0
1
2
3
4
5
6
7
You shouldn't really need to iterate over the array using a for-loop like this. There is usually a better way to perform whatever computation you are doing using numpy methods

Related

selecting random elements from each column of numpy array

I have an n row, m column numpy array, and would like to create a new k x m array by selecting k random elements from each column of the array. I wrote the following python function to do this, but would like to implement something more efficient and faster:
def sample_array_cols(MyMatrix, nelements):
vmat = []
TempMat = MyMatrix.T
for v in TempMat:
v = np.ndarray.tolist(v)
subv = random.sample(v, nelements)
vmat = vmat + [subv]
return(np.array(vmat).T)
One question is whether there's a way to loop over each column without transposing the array (and then transposing back). More importantly, is there some way to map the random sample onto each column that would be faster than having a for loop over all columns? I don't have that much experience with numpy objects, but I would guess that there should be something analogous to apply/mapply in R that would work?
One alternative is to randomly generate the indices first, and then use take_along_axis to map them to the original array:
arr = np.random.randn(1000, 5000) # arbitrary
k = 10 # arbitrary
n, m = arr.shape
idx = np.random.randint(0, n, (k, m))
new = np.take_along_axis(arr, idx, axis=0)
Output (shape):
in [215]: new.shape
out[215]: (10, 500) # (k x m)
To sample each column without replacement just like your original solution
import numpy as np
matrix = np.arange(4*3).reshape(4,3)
matrix
Output
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
k = 2
np.take_along_axis(matrix, np.random.rand(*matrix.shape).argsort(axis=0)[:k], axis=0)
Output
array([[ 9, 1, 2],
[ 3, 4, 11]])
I would
Pre-allocate the result array, and fill in columns, and
Use numpy index based indexing
def sample_array_cols(matrix, n_result):
(n,m) = matrix.shape
vmat = numpy.array([n_result, m], dtype= matrix.dtype)
for c in range(m):
random_indices = numpy.random.randint(0, n, n_result)
vmat[:,c] = matrix[random_indices, c]
return vmat
Not quite fully vectorized, but better than building up a list, and the code scans just like your description.

parallelize zonal computation on numpy array

I try to compute mode on all cells of the same zone (same value) on a numpy array. I give you an example of code below. In this example sequential approach works fine but multiprocessed approach does nothing. I do not find my mistake.
Does someone see my error ?
I would like to parallelize the computation because my real array is a 10k * 10k array with 1M zones.
import numpy as np
import scipy.stats as ss
import multiprocessing as mp
def zone_mode(i, a, b, output):
to_extract = np.where(a == i)
val = b[to_extract]
output[to_extract] = ss.mode(val)[0][0]
return output
def zone_mode0(i, a, b):
to_extract = np.where(a == i)
val = b[to_extract]
output = ss.mode(val)[0][0]
return output
np.random.seed(1)
zone = np.array([[1, 1, 1, 2, 3],
[1, 1, 2, 2, 3],
[4, 2, 2, 3, 3],
[4, 4, 5, 5, 3],
[4, 6, 6, 5, 5],
[6, 6, 6, 5, 5]])
values = np.random.randint(8, size=zone.shape)
output = np.zeros_like(zone).astype(np.float)
for i in np.unique(zone):
output = zone_mode(i, zone, values, output)
# for multiprocessing
zone0 = zone - 1
pool = mp.Pool(mp.cpu_count() - 1)
results = [pool.apply(zone_mode0, args=(u, zone0, values)) for u in np.unique(zone0)]
pool.close()
output = results[zone0]
For positve integers in the arrays - zone and values, we can use np.bincount. The basic idea is that we will consider zone and values as row and cols on a 2D grid. So, can map those to their linear index equivalent numbers. Those would be used as bins for binned summation with np.bincount. Their argmax IDs would be the mode numbers. They are mapped back to zone-grid with indexing into zone.
Hence, the solution would be -
m = zone.max()+1
n = values.max()+1
ids = zone*n + values
c = np.bincount(ids.ravel(),minlength=m*n).reshape(-1,n).argmax(1)
out = c[zone]
For sparsey data (well spread integers in the input arrays), we can look into sparse-matrix to get the argmax IDs c. Hence, with SciPy's sparse-matrix -
from scipy.sparse import coo_matrix
data = np.ones(zone.size,dtype=int)
r,c = zone.ravel(),values.ravel()
c = coo_matrix((data,(r,c))).argmax(1).A1
For slight perf. boost, specify the shape -
c = coo_matrix((data,(r,c)),shape=(m,n)).argmax(1).A1
Solving for generic values
We will make use of pandas.factorize, like so -
import pandas as pd
ids,unq = pd.factorize(values.flat)
v = ids.reshape(values.shape)
# .. same steps as earlier with bincount, using v in place of values
out = unq[c[zone]]
Note that for tie-cases, it would pick random element off values. If you want to pick the first one, use pd.factorize(values.flat, sort=True).

generating matrix of pairs from two object vectors using numpy

I have two object arrays not necessarily of the same length:
import numpy as np
class Obj_A:
def __init__(self,n):
self.type = 'a'+str(n)
def __eq__(self,other):
return self.type==other.type
class Obj_B:
def __init__(self,n):
self.type = 'b'+str(n)
def __eq__(self,other):
return self.type==other.type
a = np.array([Obj_A(n) for n in range(2)])
b = np.array([Obj_B(n) for n in range(3)])
I would like to generate the matrix
mat = np.array([[[a[0],b[0]],[a[0],b[1]],[a[0],b[2]]],
[[a[1],b[0]],[a[1],b[1]],[a[1],b[2]]]])
this matrix has shape (len(a),len(b),2). Its elements are
mat[i,j] = [a[i],b[j]]
A solution is
mat = np.empty((len(a),len(b),2),dtype='object')
for i,aa in enumerate(a):
for j,bb in enumerate(b):
mat[i,j] = np.array([aa,bb],dtype='object')
but this is too expensive for my problem, which has O(len(a)) = O(len(b)) = 1e5.
I suspect there is a clean numpy solution involving np.repeat, np.tile and np.transpose, similar to the accepted answer here, but the output in this case does not simply reshape to the desired result.
I would suggest using np.meshgrid(), which takes two input arrays and repeats both along different axes so that looking at corresponding positions of the outputs gets you all possible combinations. For example:
>>> x, y = np.meshgrid([1, 2, 3], [4, 5])
>>> x
array([[1, 2, 3],
[1, 2, 3]])
>>> y
array([[4, 4, 4],
[5, 5, 5]])
In your case, you can put the two arrays together and transpose them into the proper configuration. Based on some experimentation I think this should work for you:
>>> np.transpose(np.meshgrid(a, b), (2, 1, 0))

Dot Product in Python without NumPy

Is there a way that you can preform a dot product of two lists that contain values without using NumPy or the Operation module in Python? So that the code is as simple as it could get?
For example:
V_1=[1,2,3]
V_2=[4,5,6]
Dot(V_1,V_2)
Answer: 32
Without numpy, you can write yourself a function for the dot product which uses zip and sum.
>>> def dot(v1, v2):
... return sum(x*y for x, y in zip(v1, v2))
...
>>> dot([1, 2, 3], [4, 5, 6])
32
As of Python 3.10, you can use zip(v1, v2, strict=True) to ensure that v1 and v2 have the same length.
def dot_product(x, y):
dp = 0
for i in range(len(x)):
dp += (x[i]*y[i])
return dp
sample1 = [1,2,3,4,5]
sample2 = [2,1,1,1,1]
dot_product(sample1, sample2) #16
We can simply use # operator from python.
For example:
import numpy as np
x = np.array([25, 2, 5])
y = np.array([0, 1, 2])
print(x#y)
12

Python NumPy: How to fill a matrix using an equation

I wish to initialise a matrix A, using the equation A_i,j = f(i,j) for some f (It's not important what this is).
How can I do so concisely avoiding a situation where I have two for loops?
numpy.fromfunction fits the bill here.
Example from doc:
>>> import numpy as np
>>> np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int)
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
One could also get the indexes of your array with numpy.indices and then apply the function f in a vectorized fashion,
import numpy as np
shape = 1000, 1000
Xi, Yj = np.indices(shape)
A = (2*Xi + 3*Yj).astype(np.int) # or any other function f(Xi, Yj)

Categories