Optimizing assignment into an array from various arrays - NumPy - python

I have four square matrices with dimension 3Nx3N, called A, B, C and D.
I want to combine them in a single matrix.
The code with for loops is the following:
import numpy
N = 3
A = numpy.random.random((3*N, 3*N))
B = numpy.random.random((3*N, 3*N))
C = numpy.random.random((3*N, 3*N))
D = numpy.random.random((3*N, 3*N))
final = numpy.zeros((6*N, 6*N))
for i in range(N):
for j in range(N):
for k in range(3):
for l in range(3):
final[6*i + k][6*j + l] = A[3*i+k][3*j+l]
final[6*i + k + 3][6*j + l + 3] = B[3*i+k][3*j+l]
final[6*i + k + 3][6*j + l] = C[3*i+k][3*j+l]
final[6*i + k][6*j + l + 3] = D[3*i+k][3*j+l]
Is it possible to write the previous for loops in a numpythonic way?

Great problem for practicing array-slicing into multi-dimensional tensors/arrays!
We will initialize the output array as a multi-dimensional 6D array and simply slice it and assign the four arrays being reshaped as 4D arrays. The intention is avoid any stacking/concatenating as those would be expensive specially when working with large arrays by instead working with reshaping of input arrays, which would be merely views.
Here's the implementation -
out = np.zeros((N,2,3,N,2,3),dtype=A.dtype)
out[:,0,:,:,0,:] = A.reshape(N,3,N,3)
out[:,0,:,:,1,:] = D.reshape(N,3,N,3)
out[:,1,:,:,0,:] = C.reshape(N,3,N,3)
out[:,1,:,:,1,:] = B.reshape(N,3,N,3)
out.shape = (6*N,6*N)
Just to explain a bit more, we had :
|------------------------ Axes for selecting A, B, C, D
np.zeros((N,2,3,N,2,3),dtype=A.dtype)
|------------------------- Axes for selecting A, B, C, D
Thus, those two axes (second and fifth) of lengths (2x2) = 4 were used to select between the four inputs.
Runtime test
Approaches -
def original_app(A, B, C, D):
final = np.zeros((6*N,6*N),dtype=A.dtype)
for i in range(N):
for j in range(N):
for k in range(3):
for l in range(3):
final[6*i + k][6*j + l] = A[3*i+k][3*j+l]
final[6*i + k + 3][6*j + l + 3] = B[3*i+k][3*j+l]
final[6*i + k + 3][6*j + l] = C[3*i+k][3*j+l]
final[6*i + k][6*j + l + 3] = D[3*i+k][3*j+l]
return final
def slicing_app(A, B, C, D):
out = np.zeros((N,2,3,N,2,3),dtype=A.dtype)
out[:,0,:,:,0,:] = A.reshape(N,3,N,3)
out[:,0,:,:,1,:] = D.reshape(N,3,N,3)
out[:,1,:,:,0,:] = C.reshape(N,3,N,3)
out[:,1,:,:,1,:] = B.reshape(N,3,N,3)
return out.reshape(6*N,6*N)
Timings and verification -
In [147]: # Setup input arrays
...: N = 200
...: A = np.random.randint(11,99,(3*N,3*N))
...: B = np.random.randint(11,99,(3*N,3*N))
...: C = np.random.randint(11,99,(3*N,3*N))
...: D = np.random.randint(11,99,(3*N,3*N))
...:
In [148]: np.allclose(slicing_app(A, B, C, D), original_app(A, B, C, D))
Out[148]: True
In [149]: %timeit original_app(A, B, C, D)
1 loops, best of 3: 1.63 s per loop
In [150]: %timeit slicing_app(A, B, C, D)
100 loops, best of 3: 9.26 ms per loop

I'll start with a couple of generic observations
For numpy arrays we normally use the [ , ] syntax rather than [][]
final[6*i + k][6*j + l]
final[6*i + k, 6*j + l]
For new arrays built from others we often use things like reshape and slicing so that we can then add them together as blocks rather than with iterative loops
For an simple example, to take successive differences:
y = x[1:] - x[:-1]
Regarding the title, 'matrix creation' is clearer. 'load' has more of the sense of reading data from a file, as in np.loadtxt.
=================
So with N=1,
In [171]: A=np.arange(0,9).reshape(3,3)
In [172]: B=np.arange(10,19).reshape(3,3)
In [173]: C=np.arange(20,29).reshape(3,3)
In [174]: D=np.arange(30,39).reshape(3,3)
In [178]: final
Out[178]:
array([[ 0, 1, 2, 30, 31, 32],
[ 3, 4, 5, 33, 34, 35],
[ 6, 7, 8, 36, 37, 38],
[20, 21, 22, 10, 11, 12],
[23, 24, 25, 13, 14, 15],
[26, 27, 28, 16, 17, 18]])
Which can be created with one call to bmat:
In [183]: np.bmat([[A,D],[C,B]]).A
Out[183]:
array([[ 0, 1, 2, 30, 31, 32],
[ 3, 4, 5, 33, 34, 35],
[ 6, 7, 8, 36, 37, 38],
[20, 21, 22, 10, 11, 12],
[23, 24, 25, 13, 14, 15],
[26, 27, 28, 16, 17, 18]])
bmat uses a mix of hstack and vstack. It also produces a np.matrix, hence the need for .A. #Divakar's solution is bound to be faster.
This does not match with N=3. 3x3 blocks are out of order. But expanding the array to 6d (as Divakar does), and swapping some axes, puts the sub blocks into the the right order.
For N=3:
In [57]: block=np.bmat([[A,D],[C,B]])
In [58]: b1=block.A.reshape(2,3,3,2,3,3)
In [59]: b2=b1.transpose(1,0,2,4,3,5)
In [60]: b3=b2.reshape(18,18)
In [61]: np.allclose(b3,final)
Out[61]: True
In quick time tests (N=3), my approach is about half the speed of slicing_app.
As a matter of curiosity, bmat works with a string input: np.bmat('A,D;C,B'). That's because np.matrix was trying, years ago, to give a MATLAB feel.

you can just concat em
concat A and B horizontally
concat C and D horizontally
concat the conjunction of AB with the conjucntion of CD vertically
example:
AB = numpy.concatenate([A,B],1)
CD = numpy.concatenate([C,D],1)
ABCD = numpy.concatenate([AB,CD],0)
i hope that helps :)

Yet another way to do that, using view_as_blocks :
from skimage.util import view_as_blocks
def by_blocks():
final = numpy.empty((6*N,6*N))
a,b,c,d,f= [view_as_blocks(X,(3,3)) for X in [A,B,C,D,final]]
f[0::2,0::2]=a
f[1::2,1::2]=b
f[1::2,0::2]=c
f[0::2,1::2]=d
return final
You just have to think block by block, letting view_as_blocks manage strides and shapes for you. It is as fast as other numpy solutions.

Related

How can I construct an if statement for numpy arrays' elements comparison, to produce a new array with same dimension?

import numpy as np
a = np.array([[3.5,6,8,2]])
b = np.array([[6,2,8,2]])
c = np.array([[2,3,7,5]])
d = np.array([[3,2,5,1]])
if a > b:
e = 2*a+6*c
else:
e = 3*c + 4*d
print(e)
then I got
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
and if I type in print(e), I got
[2, 3, 7, 5, 2, 3, 7, 5, 2, 3, 7, 5, 3, 2, 5, 1, 3, 2, 5, 1, 3, 2, 5, 1, 3, 2, 5, 1]
The e I want to construct is an array that has the same dimension with a,b,c,d , and the if statement that decides what equation will be used to make each element.
In other words, for the elements in of the first place of a and b: 3.5<6, so e = 3c + 4d = 32 + 43 = 18
For the second elements: 6>2, e = 2a+6c = 26 + 63 = 30
Third: 8=8, e = 3c + 4d = 37 + 45 = 41
Fourth: 2 = 2, e = 3c + 4d = 35 + 41 = 19
e = [18,30,41,19]
I tried to find someone who asked about constructing a script doing such things, but I could find none, and all numpy questions about if statement(or equivalent) did not help. Thanks.(It seems that a.all or a.any from the python recommendation did not help as well.)
Use numpy.where:
Return elements chosen from x or y depending on condition.
e = np.where(a > b, 2*a+6*c, 3*c + 4*d)
In [370]: a = np.array([[3.5,6,8,2]])
...: b = np.array([[6,2,8,2]])
...: c = np.array([[2,3,7,5]])
...: d = np.array([[3,2,5,1]])
...:
In [371]: a.shape
Out[371]: (1, 4)
In [372]: a[0].shape
Out[372]: (4,)
The problem with the if is that a>b is an array. There's no one True/False value to "switch" on:
In [373]: a>b
Out[373]: array([[False, True, False, False]])
where does the array "switch"; an equivalent way is:
In [376]: mask = a>b
In [377]: e = 3*c + 4*d
In [378]: e
Out[378]: array([[18, 17, 41, 19]])
In [379]: e[mask] = 2*a[mask] + 6*c[mask]
In [380]: e
Out[380]: array([[18, 30, 41, 19]])
np.where itself does not iterate (many pandas users seem assume it does). It works with the whole arrays, the condition array (my mask) and the whole array values.
To use the if, we have to work with scalar values, not arrays. Wrap the if/else in a loop. For example:
In [381]: alist = []
In [382]: for i,j,k,l in zip(a[0],b[0],c[0],d[0]):
...: if i>j:
...: f = 2*i+6*k
...: else:
...: f = 3*k+4*l
...: alist.append(f)
...:
In [383]: alist
Out[383]: [18, 30.0, 41, 19]
This works because i and j are single numbers, not arrays.

Physically transposing a large non-square numpy matrix

Is there any quicker way to physically transpose a large 2D numpy matrix than array.transpose.copy()? And are there any routines for doing it with efficient memory use?
It may be worth looking at what transpose does, just so we are clear about what you mean by 'physically tranposing'.
Start with a small (4,3) array:
In [51]: arr = np.array([[1,2,3],[10,11,12],[22,23,24],[30,32,34]])
In [52]: arr
Out[52]:
array([[ 1, 2, 3],
[10, 11, 12],
[22, 23, 24],
[30, 32, 34]])
This is stored with a 1d data buffer, which we can display with ravel:
In [53]: arr.ravel()
Out[53]: array([ 1, 2, 3, 10, 11, 12, 22, 23, 24, 30, 32, 34])
and strides which tell it to step columns by 8 bytes, and rows by 24 (3*8):
In [54]: arr.strides
Out[54]: (24, 8)
We can ravel with the "F" order - that's going down the rows:
In [55]: arr.ravel(order='F')
Out[55]: array([ 1, 10, 22, 30, 2, 11, 23, 32, 3, 12, 24, 34])
While [53] is a view, [55] is a copy.
Now the transpose:
In [57]: arrt=arr.T
In [58]: arrt
Out[58]:
array([[ 1, 10, 22, 30],
[ 2, 11, 23, 32],
[ 3, 12, 24, 34]])
This a view; we can tranverse the [53] data buffer, going down rows with 8 byte steps. Doing calculations with the arrt is basically just as fast as with arr. With the strided iteration, order 'F' is just as fast as order 'C'.
In [59]: arrt.strides
Out[59]: (8, 24)
the original order:
In [60]: arrt.ravel(order='F')
Out[60]: array([ 1, 2, 3, 10, 11, 12, 22, 23, 24, 30, 32, 34])
but doing a 'C' ravel creates a copy, same as [55]
In [61]: arrt.ravel(order='C')
Out[61]: array([ 1, 10, 22, 30, 2, 11, 23, 32, 3, 12, 24, 34])
Doing a copy of the transpose makes an array that's transpose with 'C' order. This is your 'physical transpose':
In [62]: arrc = arrt.copy()
In [63]: arrc.strides
Out[63]: (32, 8)
Reshaping a transpose as done with [61] does make a copy, but usually we don't need to explicitly make the copy. I think the only reason to do so is to avoid several redundant copies in later calculations.
I assume that you need to do a row-wise operation that uses the CPU cache more efficiently if rows are contiguous in memory, and you don't have enough memory available to make a copy.
Wikipedia has an article on in-place matrix transposition. It turns out that such a transposition is nontrivial. Here is a follow-the-cycles algorithm as described there:
import numpy as np
from numba import njit
#njit # comment this line for debugging
def transpose_in_place(a):
"""In-place matrix transposition for a rectangular matrix.
https://stackoverflow.com/a/62507342/6228891
Parameter:
- a: 2D array. Unless it's a square matrix, it will be scrambled
in the process.
Return:
- transposed array, using the same in-memory data storage as the
input array.
This algorithm is typically 10x slower than a.T.copy().
Only use it if you are short on memory.
"""
if a.shape == (1, 1):
return a # special case
n, m = a.shape
# find max length L of permutation cycle by starting at a[0,1].
# k is the index in the flat buffer; i, j are the indices in
# a.
L = 0
k = 1
while True:
j = k % m
i = k // m
k = n*j + i
L += 1
if k == 1:
break
permut = np.zeros(L, dtype=np.int32)
# Now do the permutations, one cycle at a time
seen = np.full(n*m, False)
aflat = a.reshape(-1) # flat view
for k0 in range(1, n*m-1):
if seen[k0]:
continue
# construct cycle
k = k0
permut[0] = k0
q = 1 # size of permutation array
while True:
seen[k] = True
# note that this is slightly faster than the formula
# on Wikipedia, k = n*k % (n*m-1)
i = k // m
j = k - i*m
k = n*j + i
if k == k0:
break
permut[q] = k
q += 1
# apply cyclic permutation
tmp = aflat[permut[q-1]]
aflat[permut[1:q]] = aflat[permut[:q-1]]
aflat[permut[0]] = tmp
aT = aflat.reshape(m, n)
return aT
def test_transpose(n, m):
a = np.arange(n*m).reshape(n, m)
aT = a.T.copy()
assert np.all(transpose_in_place(a) == aT)
def roundtrip_inplace(a):
a = transpose_in_place(a)
a = transpose_in_place(a)
def roundtrip_copy(a):
a = a.T.copy()
a = a.T.copy()
if __name__ == '__main__':
test_transpose(1, 1)
test_transpose(3, 4)
test_transpose(5, 5)
test_transpose(1, 5)
test_transpose(5, 1)
test_transpose(19, 29)
Even though I'm using numba.njit here so that the loops in the transpose function are compiled, it's still quite a bit slower than a copy-transpose.
n, m = 1000, 10000
a_big = np.arange(n*m, dtype=np.float64).reshape(n, m)
%timeit -r2 -n10 roundtrip_copy(a_big)
54.5 ms ± 153 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)
%timeit -r2 -n1 roundtrip_inplace(a_big)
614 ms ± 141 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Whatever you do will require O(n^2) time and memory. I would assume that .transpose and .copy (written in C) will be the most efficient choice for your application.
Edit: this assumes you actually need to copy the matrix

Two dimensional function not returning array of values?

I'm trying to plot a 2-dimensional function (specifically, a 2-d Laplace solution). I defined my function and it returns the right value when I put in specific numbers, but when I try running through an array of values (x,y below), it still returns only one number. I tried with a random function of x and y (e.g., f(x,y) = x^2 + y^2) and it gives me an array of values.
def V_func(x,y):
a = 5
b = 4
Vo = 4
n = np.arange(1,100,2)
sum_list = []
for indx in range(len(n)):
sum_term = (1/n[indx])*(np.cosh(n[indx]*np.pi*x/a))/(np.cosh(n[indx]*np.pi*b/a))*np.sin(n[indx]*np.pi*y/a)
sum_list = np.append(sum_list,sum_term)
summation = np.sum(sum_list)
V = 4*Vo/np.pi * summation
return V
x = np.linspace(-4,4,50)
y = np.linspace(0,5,50)
V_func(x,y)
Out: 53.633709914177224
Try this:
def V_func(x,y):
a = 5
b = 4
Vo = 4
n = np.arange(1,100,2)
# sum_list = []
sum_list = np.zeros(50)
for indx in range(len(n)):
sum_term = (1/n[indx])*(np.cosh(n[indx]*np.pi*x/a))/(np.cosh(n[indx]*np.pi*b/a))*np.sin(n[indx]*np.pi*y/a)
# sum_list = np.append(sum_list,sum_term)
sum_list += sum_term
# summation = np.sum(sum_list)
# V = 4*Vo/np.pi * summation
V = 4*Vo/np.pi * sum_list
return V
Define a pair of arrays:
In [6]: x = np.arange(3); y = np.arange(10,13)
In [7]: x,y
Out[7]: (array([0, 1, 2]), array([10, 11, 12]))
Try a simple function of the 2
In [8]: x + y
Out[8]: array([10, 12, 14])
Since they have the same size, they can be summed (or otherwise combined) elementwise. The result has the same shape as the 2 inputs.
Now try 'broadcasting'. x[:,None] has shape (3,1)
In [9]: x[:,None] + y
Out[9]:
array([[10, 11, 12],
[11, 12, 13],
[12, 13, 14]])
The result is (3,3), the first 3 from the reshaped x, the second from y.
I can generate the pair of arrays with meshgrid:
In [10]: I,J = np.meshgrid(x,y,sparse=True, indexing='ij')
In [11]: I
Out[11]:
array([[0],
[1],
[2]])
In [12]: J
Out[12]: array([[10, 11, 12]])
In [13]: I + J
Out[13]:
array([[10, 11, 12],
[11, 12, 13],
[12, 13, 14]])
Note the added parameters in meshgrid. So that's how we go about generating 2d values from a pair of 1d arrays.
Now look at what sum does. As you use it in the function:
In [14]: np.sum(I + J)
Out[14]: 108
the result is a scalar. See the docs. If I specify an axis I get an array.
In [15]: np.sum(I + J, axis=0)
Out[15]: array([33, 36, 39])
If you gave V_func the right x and y, sum_list could be a 3d array. That axis-less sum reduces it to a scalar.
In code like this you need to keep track of array shapes. Include test prints if needed; don't just assume anything; test it. Pay attention to how dimensions grow and shrink as they pass through various operations.

Efficient numpy indexing: Take first N rows of every block of M rows

x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
I want to grab first 2 rows of array x from every block of 5, result should be:
x[fancy_indexing] = [1,2, 6,7, 11,12]
It's easy enough to build up an index like that using a for loop.
Is there a one-liner slicing trick that will pull it off? Points for simplicity here.
Approach #1 Here's a vectorized one-liner using boolean-indexing -
x[np.mod(np.arange(x.size),M)<N]
Approach #2 If you are going for performance, here's another vectorized approach using NumPy strides -
n = x.strides[0]
shp = (x.size//M,N)
out = np.lib.stride_tricks.as_strided(x, shape=shp, strides=(M*n,n)).ravel()
Sample run -
In [61]: # Inputs
...: x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
...: N = 2
...: M = 5
...:
In [62]: # Approach 1
...: x[np.mod(np.arange(x.size),M)<N]
Out[62]: array([ 1, 2, 6, 7, 11, 12])
In [63]: # Approach 2
...: n = x.strides[0]
...: shp = (x.size//M,N)
...: out=np.lib.stride_tricks.as_strided(x,shape=shp,strides=(M*n,n)).ravel()
...:
In [64]: out
Out[64]: array([ 1, 2, 6, 7, 11, 12])
I first thought you need this to work for 2d arrays due to your phrasing of "first N rows of every block of M rows", so I'll leave my solution as this.
You could work some magic by reshaping your array into 3d:
M = 5 # size of blocks
N = 2 # number of columns to cut
x = np.arange(3*4*M).reshape(4,-1) # (4,3*N)-shaped dummy input
x = x.reshape(x.shape[0],-1,M)[:,:,:N+1].reshape(x.shape[0],-1) # (4,3*N)-shaped output
This will extract every column according to your preference. In order to use it for your 1d case you'd need to make your 1d array into a 2d one using x = x[None,:].
Reshape the array to multiple rows of five columns then take (slice) the first two columns of each row.
>>> x
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
>>> x.reshape(x.shape[0] / 5, 5)[:,:2]
array([[ 1, 2],
[ 6, 7],
[11, 12]])
Or
>>> x.reshape(x.shape[0] / 5, 5)[:,:2].flatten()
array([ 1, 2, 6, 7, 11, 12])
>>>
It only works with 1-d arrays that have a length that is a multiple of five.
import numpy as np
x = np.array(range(1, 16))
y = np.vstack([x[0::5], x[1::5]]).T.ravel()
y
// => array([ 1, 2, 6, 7, 11, 12])
Taking the first N rows of every block of M rows in the array [1, 2, ..., K]:
import numpy as np
K = 30
M = 5
N = 2
x = np.array(range(1, K+1))
y = np.vstack([x[i::M] for i in range(N)]).T.ravel()
y
// => array([ 1, 2, 6, 7, 11, 12, 16, 17, 21, 22, 26, 27])
Notice that .T and .ravel() are fast operations: they don't copy any data, but just manipulate the dimensions and strides of the array.
If you insist on getting your slice using fancy indexing:
import numpy as np
K = 30
M = 5
N = 2
x = np.array(range(1, K+1))
fancy_indexing = [i*M+n for i in range(len(x)//M) for n in range(N)]
x[fancy_indexing]
// => array([ 1, 2, 6, 7, 11, 12, 16, 17, 21, 22, 26, 27])

(Python) How to get diagonal(A*B) without having to perform A*B?

Let's say we have two matrices A and B and let matrix C be A*B (matrix multiplication not element-wise). We wish to get only the diagonal entries of C, which can be done via np.diagonal(C). However, this causes unnecessary time overhead, because we are multiplying A with B even though we only need the the multiplications of each row in A with the column of B that has the same 'id', that is row 1 of A with column 1 of B, row 2 of A with column 2 of B and so on: the multiplications that form the diagonal of C. Is there a way to efficiently achieve that using Numpy? I want to avoid using loops to control which row is multiplied with which column, instead, I wish for a built-in numpy method that does this kind of operation to optimize performance.
Thanks in advance..
I might use einsum here:
>>> a = np.random.randint(0, 10, (3,3))
>>> b = np.random.randint(0, 10, (3,3))
>>> a
array([[9, 2, 8],
[5, 4, 0],
[8, 0, 6]])
>>> b
array([[5, 5, 0],
[3, 5, 5],
[9, 4, 3]])
>>> a.dot(b)
array([[123, 87, 34],
[ 37, 45, 20],
[ 94, 64, 18]])
>>> np.diagonal(a.dot(b))
array([123, 45, 18])
>>> np.einsum('ij,ji->i', a,b)
array([123, 45, 18])
For larger arrays, it'll be much faster than doing the multiplication directly:
>>> a = np.random.randint(0, 10, (1000,1000))
>>> b = np.random.randint(0, 10, (1000,1000))
>>> %timeit np.diagonal(a.dot(b))
1 loops, best of 3: 7.04 s per loop
>>> %timeit np.einsum('ij,ji->i', a, b)
100 loops, best of 3: 7.49 ms per loop
[Note: originally I'd done the elementwise version, ii,ii->i, instead of matrix multiplication. The same einsum tricks work.]
def diag(A,B):
diags = []
for x in range(len(A)):
diags.append(A[x][x] * B[x][x])
return diags
I believe the above code is that you're looking for.

Categories