Is there a reverse operation to scipy.linalg.block_diag? - python

I know scipy offers a handy function to convert from a list of arrays to a block diagonal matrix, e.g.
>>> from scipy.linalg import block_diag
>>> A = [[1, 0],
[0, 1]]
>>> B = [[3, 4, 5],
[6, 7, 8]]
>>> C = [[7]]
>>> D = block_diag(A, B, C)
>>> D
array([[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 3, 4, 5, 0],
[0, 0, 6, 7, 8, 0],
[0, 0, 0, 0, 0, 7]])
Is there a reverse operation to this? I.e. take a block diagonal matrix and a list of chunk sizes and decompose to a list of arrays.
a, b, c = foo(D, block_sizes=[(2,2), (2,3), (1,1)])
If there's no handy built-in way to accomplish this, is there a better (more performant) implementation than looping over the input array? The naive implementation probably looks something like this:
def foo(matrix, block_sizes):
result = []
curr_row, curr_col = 0, 0
for nrows, ncols in block_sizes:
result.append(matrix[curr_row:curr_row + nrows, curr_col:curr_col + ncols])
curr_row += nrows
curr_col += ncols
return result

Look at the code for block_diag:
The core part allocates a out array, and then iteratively assigns the blocks to it:
out = np.zeros(np.sum(shapes, axis=0), dtype=out_dtype)
r, c = 0, 0
for i, (rr, cc) in enumerate(shapes):
out[r:r + rr, c:c + cc] = arrs[i]
r += rr
c += cc
Were you imagining something more "performant"?
What you want is a list of arrays that (may) differ in shape. How are you going to get those without some sort of (python level) iteration?
Look at the code for np.split (and variants). It too iterates, taking multiplie slices to produce a list of arrays.

Related

Find first n non zero values in in numpy 2d array

I would like to know the fastest way to extract the indices of the first n non zero values per column in a 2D array.
For example, with the following array:
arr = [
[4, 0, 0, 0],
[0, 0, 0, 0],
[0, 4, 0, 0],
[2, 0, 9, 0],
[6, 0, 0, 0],
[0, 7, 0, 0],
[3, 0, 0, 0],
[1, 2, 0, 0],
With n=2 I would have [0, 0, 1, 1, 2] as xs and [0, 3, 2, 5, 3] as ys. 2 values in the first and second columns and 1 in the third.
Here is how it is currently done:
x = []
y = []
n = 3
for i, c in enumerate(arr.T):
a = c.nonzero()[0][:n]
if len(a):
x.extend([i]*len(a))
y.extend(a)
In practice I have arrays of size (405, 256).
Is there a way to make it faster?
Here is a method, although quite confusing as it uses a lot of functions, that does not require sorting the array (only a linear scan is necessary to get non null values):
n = 2
# Get indices with non null values, columns indices first
nnull = np.stack(np.where(arr.T != 0))
# split indices by unique value of column
cols_ids= np.array_split(range(len(nnull[0])), np.where(np.diff(nnull[0]) > 0)[0] +1 )
# Take n in each (max) and concatenate the whole
np.concatenate([nnull[:, u[:n]] for u in cols_ids], axis = 1)
outputs:
array([[0, 0, 1, 1, 2],
[0, 3, 2, 5, 3]], dtype=int64)
Here is one approach using argsort, it gives a different order though:
n = 2
m = arr!=0
# non-zero values first
idx = np.argsort(~m, axis=0)
# get first 2 and ensure non-zero
m2 = np.take_along_axis(m, idx, axis=0)[:n]
y,x = np.where(m2)
# slice
x, idx[y,x]
# (array([0, 1, 2, 0, 1]), array([0, 2, 3, 3, 5]))
Use dislocation comparison for the row results of the transposed nonzero:
>>> n = 2
>>> i, j = arr.T.nonzero()
>>> mask = np.concatenate([[True] * n, i[n:] != i[:-n]])
>>> i[mask], j[mask]
(array([0, 0, 1, 1, 2], dtype=int64), array([0, 3, 2, 5, 3], dtype=int64))

How to put elements in specific locations in a np array in one line

I'm writing in Python 3.6, with Numpy 1.20.1. The problem is I have an np.ndarray called A with size (10, 3), and I have another np.ndarray called B with size (4, 3). For the 4 arrays of size 3, I would like to put them into 4 specific positions in the first array.
For example:
A = np.zeros((10, 3))
B = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
idx = [7,3,1,4]
And I would like to put each row in B to Aby the order in idx. So after the conversion, A should look like:
[0, 0, 0],
[7, 8, 9],
[0, 0, 0],
[4, 5, 6],
[10, 11, 12],
[0, 0, 0],
[0, 0 ,0],
[1, 2, 3],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0].
I especailly wonder if it's possible to accomplish this in one line code.
I tried A[idx] = B, and it gives me error: IndexError: too many indices for array
Numpy version one liner.
A = np.zeros((10, 3))
B = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
idx = [7,3,1,4]
A[idx] = B[ np.arange(B.shape[0]) ] # Source from B.shape
OR
a[idx] = B[[0,1,2,3]] # Source as constants
Your code works perfectly fine for me.
As for the problem, you can try:
for i, row in zip(idx, B):
A[i] = row

How do I rewrite the following code in Python without for loops?

I have an array "b" of size L^2 x L*(L+1), and an array "a" of size L x L.
currently my code is
for i in range (L):
for j in range (L):
b[i+j*L,i+j*(L+1)] = a[j,i]
What this means is that, for example for L=2, if the 2x2 array "a" has the form
ab
cd
then I want the 4x6 array "b" to be
a00000
0b0000
000c00
0000d0
How do I rewrite the same thing without using for loops?
What you want is to fill the diagonal of matrix B with the values of (flattened) A. Numpy has functions for this:
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html
https://numpy.org/doc/stable/reference/generated/numpy.fill_diagonal.html
import numpy as np
# Set sample data
a = np.array([[1, 2], [3, 4]])
b = np.zeros([4,6])
# This is it:
np.fill_diagonal(b, a.flatten())
If you don't want to use a library, for example because this is a programming assignment, you can represent matrices as nested lists and use a list comprehension, as this:
# Prepare data
a = [[1, 2], [3, 4]]
L = len(a)
# "Build" the result
b = [[a[i//L][i%L] if i == j else 0 for j in range(L*(L+1))] for i in range(L*L)]
# Same, with better formatting:
b = [[a[i//L][i%L]
if i == j else 0
for j in range(L*(L+1))]
for i in range(L*L)]
# b will be = [[1, 0, 0, 0, 0, 0],
# [0, 2, 0, 0, 0, 0],
# [0, 0, 3, 0, 0, 0],
# [0, 0, 0, 4, 0, 0]]
Anyway you need to iterate through the items in 'a', so you are just replacing the 'for' constructions by a list comprehension. This might be more efficient for large matrices but arguably less clear.
Generalizing the answer from Milo:
L = (a.shape)[0]
b = np.zeros([L*L, L*(L+1)])
np.fill_diagonal(b, a.flatten())

vectorizing numpy bincount

I have a 2d numpy array., A I want to apply np.bincount() to each column of the matrix A to generate another 2d array B that is composed of the bincounts of each column of the original matrix A.
My problem is that np.bincount() is a function that takes a 1d array-like. It's not an array method like B = A.max(axis=1) for example.
Is there a more pythonic/numpythic way to generate this B array other than a nasty for-loop?
import numpy as np
states = 4
rows = 8
cols = 4
A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))
for x in range(A.shape[1]):
B[:,x] = np.bincount(A[:,x])
Using the same philosophy as in this post, here's a vectorized approach -
m = A.shape[1]
n = A.max()+1
A1 = A + (n*np.arange(m))
out = np.bincount(A1.ravel(),minlength=n*m).reshape(m,-1).T
I would suggest to use np.apply_along_axis, which will allow you to apply a 1D-method (in this case np.bincount) to 1D slices of a higher dimensional array:
import numpy as np
states = 4
rows = 8
cols = 4
A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))
B = np.apply_along_axis(np.bincount, axis=0, arr=A)
You'll have to be careful, though. This (as well as your suggested for-loop) only works if the output of np.bincount has the right shape. If the maximum state is not present in one or multiple columns of your array A, the output will not have a smaller dimensionality and thus, the code will file with a ValueError.
This solution using the numpy_indexed package (disclaimer: I am its author) is fully vectorized, thus does not include any python loops behind the scenes. Also, there are no restrictions on the input; not every column needs to contain the same set of unique values.
import numpy_indexed as npi
rowidx, colidx = np.indices(A.shape)
(bin, col), B = npi.count_table(A.flatten(), colidx.flatten())
This gives an alternative (sparse) representation of the same result, which may be much more appropriate if the B array does indeed contain many zeros:
(bin, col), count = npi.count((A.flatten(), colidx.flatten()))
Note that apply_along_axis is just syntactic sugar for a for-loop, and has the same performance characteristics.
Yet another possibility:
import numpy as np
def bincount_columns(x, minlength=None):
nbins = x.max() + 1
if minlength is not None:
nbins = max(nbins, minlength)
ncols = x.shape[1]
count = np.zeros((nbins, ncols), dtype=int)
colidx = np.arange(ncols)[None, :]
np.add.at(count, (x, colidx), 1)
return count
For example,
In [110]: x
Out[110]:
array([[4, 2, 2, 3],
[4, 3, 4, 4],
[4, 3, 4, 4],
[0, 2, 4, 0],
[4, 1, 2, 1],
[4, 2, 4, 3]])
In [111]: bincount_columns(x)
Out[111]:
array([[1, 0, 0, 1],
[0, 1, 0, 1],
[0, 3, 2, 0],
[0, 2, 0, 2],
[5, 0, 4, 2]])
In [112]: bincount_columns(x, minlength=7)
Out[112]:
array([[1, 0, 0, 1],
[0, 1, 0, 1],
[0, 3, 2, 0],
[0, 2, 0, 2],
[5, 0, 4, 2],
[0, 0, 0, 0],
[0, 0, 0, 0]])

List comprehensions work fine in Python shell but are ignored when running the script

I have the following d numpy array:
array([[0, 1, 4, 9, 4, 1, 0, 1, 4],
[1, 0, 1, 4, 1, 0, 1, 0, 1],
[1, 0, 1, 4, 1, 0, 1, 0, 1],
[4, 1, 0, 1, 0, 1, 4, 1, 0],
[4, 1, 0, 1, 0, 1, 4, 1, 0],
[9, 4, 1, 0, 1, 4, 9, 4, 1],
[1, 0, 1, 4, 1, 0, 1, 0, 1],
[0, 1, 4, 9, 4, 1, 0, 1, 4]])
I need to fill another 2d array (D) with a Dynamic Time Warping algorithm, which requires filling the first row and the first column first, then the rest of the array.
To do so, I have a DTWdistance(d) function, which receives the d array above as argument to compute and return the new D array.
I'm intending to use list comprehensions over for loops, but while the loops work as expected, the list comprehensions are totally ignored when I run the script. They work fine when run in a Python shell, though, so any syntax errors can be discarded.
Since the lists comprehensions are being ignored, the D array is never computed, and the function is returning the same d array without any changes.
For example:
D[0,1:] = [d[0,i] + D[0, i-1] for i in range(1, m)]
This should fill the first row of the D array (starting from i=1) with the following values: [1, 5, 14, 18, 19, 19, 20, 24].
However, having this list comprehension for the first row and for loops for the rest of the process, it yields the D array with every value correct, except for the first row, which isn't changing anything and instead of assigning the list above to the array slice, it's simply assingning those values from the corresponding slice of the d array: [1, 4, 9, 4, 1, 0, 1, 4].
In consequence, by using lists comprehensions to compute the whole D array I'm getting nothing but the same d array.
I'm well aware of the existence of several DTW-oriented tools out there, but this is my own implementation, which fits a particular set of personal needs.
I would appreciate explanations about why the list comprehensions are being ignored, and if I may be doing something wrong or if this could be a bug, and how I can overcome it.
I'm using Python 3.4 x64, and Spyder IDE 2.3.5.2, on Windows 8.
TL;DR:
Tried to use list comprehensions instead of loops to calculate several lists and assign them to specific slices of a 2d array. They work fine in the Python shell but are ignored if run in script. I have no idea why.
As requested, an MCVE:
import numpy as np
def ldistance(x, y):
m = len(x)
n = len(y)
# Euclidean distance
d = np.array([[(x[j]-y[i])**2 for j in range(m)] for i in range(n)],
dtype=float)
return d
def cdistance(d):
n, m = d.shape
D = np.zeros((n, m))
# First element is identical in both matrices.
D[0, 0] = d[0, 0]
# Elements in first row [0, 1:]
D[0, 1:] = [d[0, j] + D[0, j-1] for j in range(1, m)]
# Elements in first column [1:, 0]
D[1:, 0] = [d[i, 0] + D[i-1, 0] for i in range(1, n)]
# Rest of the elements in the matrix [1:, 1:]
D[1:, 1:] = [[d[i, j] + min(D[i-1, j-1], D[i-1, j], D[i, j-1])
for j in range(1, m)]
for i in range(1, n)]
return D
# --
x = [0, 1, 2, 3, 2, 1, 0, 1, 2]
y = [0, 1, 1, 2, 2, 3, 1, 0]
d = ldistance(x, y)
D = cdistance(d)
The wrong part here, is that you can't do this sort of thing:
D[0, 1:] = [d[0, j] + D[0, j-1] for j in range(1, m)]
As you can see you are trying to use an element on D, which would be calculated
inside this inner loop in itself (the D[0, j-1] part). D[0, 1:] is only calculated after the whole list expression is evaluated and the resulting list assigned to D[0: 1:] - so, but for the value of j == 1 where you have already assigned D[0, 0], all the remaining values D[0, j-1] are 0.
If you, by chance, ever had this working on he interactive mode, it could only be because you had previously filled D with meaningful values.
Performing external for loops will work in this case, because the previous, needed values of D items will already have been assigned when you are computing the values depending on that.

Categories