Identity matrix stacking in NumPy

Identity matrix stacking in NumPy - python

I need a 2n x n matrix in NumPy consisting of the n x n identity matrix and the negative n x n identity matrix stacked on top of one another.
This was my original solution, which works fine.
def id_stack(n):
id_ = np.identity(n)
return np.vstack((id_, -id_))
id_stack(3)
# array([[ 1., 0., 0.],
# [ 0., 1., 0.],
# [ 0., 0., 1.],
# [-1., -0., -0.],
# [-0., -1., -0.],
# [-0., -0., -1.]])
Then I figured I could just set the diagonals instead and be faster like this, which also works.
def id_stack2(n):
full = np.zeros((2*n, n))
rng = np.arange(n)
full[rng, rng] = 1
full[rng + n, rng] = -1
return full
I was wondering if there is an even faster way of accomplishing this, maybe using some kind of stride tricks?

As you probably noticed from your own examples, allocating one big buffer and setting elements in it is generally faster than allocating two smaller buffers and a big buffer to copy them into.
The neat thing about numpy is that you can get views to the same buffer without allocating a new array. For example:
output = np.zeros((2 * n, n))
A useful view in this case is
flat = output.ravel()
You can set every n + 1st element to 1, starting from the first, for a total of n elements in the flattened view, and similar for -1. This requires only a simple indexing operation on the raveled view:
output[:n * n:n + 1] = 1
output[n * n::n + 1] = -1
This avoids creating the full range arrays, and triggering advanced indexing semantics, which are more memory intensive as well.

Related

Create identity matrices with arbitrary shape with numpy

Is there a faster / inbuilt way to generate identity matrices with arbitrary shape in the first dimensions and an identity in the last m dimensions?
import numpy as np
base_shape = (10, 11, 12)
n_dim = 4
# m = 2
frames2d = np.zeros(base_shape + (n_dim, n_dim))
for i in range(n_dim):
frames2d[..., i, i] = 1
# m = 3
frames3d = np.zeros(base_shape + (n_dim, n_dim, n_dim))
for i in range(n_dim):
frames3d[..., i, i, i] = 1

Approach #1
We can leverage np.einsum for a diagonal view inspired by this post and hence assign 1s there for our desired output. So, for say the m=3 case, after initializing with zeros, we can simply do -
diag_view = np.einsum('...iii->...i',frames3d)
diag_view[:] = 1
Generalizing to include those input params, it would be -
def ndeye_einsum(base_shape, n_dim, m):
out = np.zeros(list(base_shape) + [n_dim]*m)
diag_view = np.einsum('...'+'i'*m+'->...i',out)
diag_view[:] = 1
return out
So, to reproduce those same arrays, it would be -
frames2d = ndeye_einsum(base_shape, n_dim, m=2)
frames3d = ndeye_einsum(base_shape, n_dim, m=3)
Approach #2
Again, from the same linked post, we can also reshape to 2D and assign into step-sized sliced array along the cols, like so -
def ndeye_reshape(base_shape, n_dim, m):
N = (n_dim**np.arange(m)).sum()
out = np.zeros(list(base_shape) + [n_dim]*m)
out.reshape(-1,n_dim**m)[:,::N] = 1
return out
This again works on a view and hence should be equally efficient as approach #1.
Approach #3
Another way would be to use integer-based indexing. So, for example for assigning into frames3d in one-go, it would be -
I = np.arange(n_dim)
frames3d[..., I, I, I] = 1
Generalizing that becomes -
def ndeye_ellipsis_indexer(base_shape, n_dim, m):
I = np.arange(n_dim)
indexer = tuple([Ellipsis]+[I]*m)
out = np.zeros(list(base_shape) + [n_dim]*m)
out[indexer] = 1
return out
Extending to higher-dims with view
The dims along base_shape are basically replications of elements from the last m dims. As such, we can get those higher dims as a higher-dim array view with np.broadcast_to. We will create basically a m-dim identity array and then broadcast-view into higher dims. This would be applicable across all three approaches posted earlier. To demonstrate, how to use it on the einsum based solution, we would have -
# Create m-dim "trailing-base" array, basically a m-dim identity array
def ndeye_einsum_trailingbase(n_dim, m):
out = np.zeros([n_dim]*m)
diag_view = np.einsum('i'*m+'->...i',out)
diag_view[:] = 1
return out
def ndeye_einsum_view(base_shape, n_dim, m):
trail_base = ndeye_einsum_trailingbase(n_dim, m)
return np.broadcast_to(trail_base, list(base_shape) + [n_dim]*m)
Thus, again we would have, e.g. -
frames3d = ndeye_einsum_view(base_shape, n_dim, m=3)
This would be a view into a m-dim array and hence efficient both on memory and performance.

One approach to have an identity matrix along the last two dimensions of the array, is to use np.broadcast_to and specifying the resulting shape the ndarray should have (this does not generalize to higher dimensions):
base_shape = (10, 11, 12)
n_dim = 4
frame2d = np.broadcast_to(np.eye(n_dim), a.shape+(n_dim,)*2)
print(frame2d.shape)
# (10, 11, 12, 4, 4)
print(frame2d)
array([[[[[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]],
[[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]],
...

Expanding tensor using native tensorflow ops

I have a single dimensional data (floats) as shown below:
[-8., 18., 9., -3., 12., 11., -13., 38., ...]
I want to replace each negative element with an equivalent number of zeros.
My result would look something like this for the example above:
[0., 0., 0., 0., 0., 0., 0., 0., 18., 9., 0., 0., 0., 12., ...]
I am able to do this in Tensorflow by using tf.py_func().
But it turns out the graph is not serializable if I use that method.
Are there native tensorflow ops that can help me get the same result?

Not a straightforward task! Here is a pure TensorFlow implementation:
import tensorflow as tf
# Input vector
inp = tf.placeholder(tf.int32, [None])
# Find positive and negative indices
mask = inp < 0
num_inputs = tf.size(inp)
pos_idx, neg_idx = tf.dynamic_partition(tf.range(num_inputs), tf.cast(mask, tf.int32), 2)
# Negative values
negs = -tf.gather(inp, neg_idx)
total_neg = tf.reduce_sum(negs)
cum_neg = tf.cumsum(negs)
# Compute the final index of each positive element
pos_neg_idx = tf.cast(pos_idx[:, tf.newaxis] > neg_idx, inp.dtype)
neg_ref = tf.reduce_sum(pos_neg_idx, axis=1)
shifts = tf.gather(tf.concat([[0], cum_neg], axis=0), neg_ref) - neg_ref
final_pos_idx = pos_idx + shifts
# Compute the final size
final_size = num_inputs + total_neg - tf.size(negs)
# Make final vector by scattering positive values
result = tf.scatter_nd(final_pos_idx[:, tf.newaxis], tf.gather(inp, pos_idx), [final_size])
with tf.Session() as sess:
print(sess.run(result, feed_dict={inp: [-1, 1, -2, 2, 1, -3]}))
Output:
[0 1 0 0 2 1 0 0 0]
There is some "more than necessary" computational cost in this solution, namely the computation of final indices of positive elements through pos_neg_idx, which is O(n2), while it could be done iteratively in O(n). However, I cannot think of a way to replicate the loop iteratively, and a TensorFlow loop (using tf.while_loop) would be awkward and slow. In any case, unless you are using quite large vectors (with evenly distributed positive and negative values) it should not be a big issue.

cosine similarity on large sparse matrix with numpy

The code below causes my system to run out of memory before it completes.
Can you suggest a more efficient means of computing the cosine similarity on a large matrix, such as the one below?
I would like to have the cosine similarity computed for each of the 65000 rows in my original matrix (mat) relative to all of the others so that the result is a 65000 x 65000 matrix where each element is the cosine similarity between two rows in the original matrix.
import numpy as np
from scipy import sparse
from sklearn.metrics.pairwise import cosine_similarity
mat = np.random.rand(65000, 10)
sparse_mat = sparse.csr_matrix(mat)
similarities = cosine_similarity(sparse_mat)
After running that last line I always run out of memory and the program either freezes or crashes with a MemoryError. This occurs whether I run on my 8 gb local RAM or on a 64 gb EC2 instance.

Same problem here. I've got a big, non-sparse matrix. It fits in memory just fine, but cosine_similarity crashes for whatever unknown reason, probably because they copy the matrix one time too many somewhere. So I made it compare small batches of rows "on the left" instead of the entire matrix:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
def cosine_similarity_n_space(m1, m2, batch_size=100):
assert m1.shape[1] == m2.shape[1]
ret = np.ndarray((m1.shape[0], m2.shape[0]))
for row_i in range(0, int(m1.shape[0] / batch_size) + 1):
start = row_i * batch_size
end = min([(row_i + 1) * batch_size, m1.shape[0]])
if end <= start:
break # cause I'm too lazy to elegantly handle edge cases
rows = m1[start: end]
sim = cosine_similarity(rows, m2) # rows is O(1) size
ret[start: end] = sim
return ret
No crashes for me; YMMV. Try different batch sizes to make it faster. I used to only compare 1 row at a time, and it took about 30X as long on my machine.
Stupid yet effective sanity check:
import random
while True:
m = np.random.rand(random.randint(1, 100), random.randint(1, 100))
n = np.random.rand(random.randint(1, 100), m.shape[1])
assert np.allclose(cosine_similarity(m, n), cosine_similarity_n_space(m, n))

You're running out of memory because you're trying to store a 65000x65000 matrix. Note that the matrix you're constructing is not sparse at all. np.random.rand generates a random number between 0 and 1. So there aren't enough zeros for csr_matrix to actually compress your data. In fact, there are almost surely no zeros at all.
If you look closely at your MemoryError traceback, you can see that cosine_similarity tries to use the sparse dot product if possible:
MemoryError Traceback (most recent call last)
887 Y_normalized = normalize(Y, copy=True)
888
--> 889 K = safe_sparse_dot(X_normalized, Y_normalized.T, dense_output=dense_output)
890
891 return K
So the problem isn't with cosine_similarity, it's with your matrix. Try initializing an actual sparse matrix (with 1% sparsity, for example) like this:
>>> a = np.zeros((65000, 10))
>>> i = np.random.rand(a.size)
>>> a.flat[i < 0.01] = 1 # Select 1% of indices and set to 1
>>> a = sparse.csr_matrix(a)
Then, on a machine with 32GB RAM (8GB RAM was not enough for me), the following runs with no memory error:
>>> b = cosine_similarity(a)
>>> b
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 1., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])

I would run it in chunks like this
from sklearn.metrics.pairwise import cosine_similarity
# Change chunk_size to control resource consumption and speed
# Higher chunk_size means more memory/RAM needed but also faster
chunk_size = 500
matrix_len = your_matrix.shape[0] # Not sparse numpy.ndarray
def similarity_cosine_by_chunk(start, end):
if end > matrix_len:
end = matrix_len
return cosine_similarity(X=your_matrix[start:end], Y=your_matrix) # scikit-learn function
for chunk_start in xrange(0, matrix_len, chunk_size):
cosine_similarity_chunk = similarity_cosine_by_chunk(chunk_start, chunk_start+chunk_size)
# Handle cosine_similarity_chunk ( Write it to file_timestamp and close the file )
# Do not open the same file again or you may end up with out of memory after few chunks

Defining Error of An Array with Two Index

I get an error such as;
Traceback (most recent call last): File
"C:\Users\SONY\Desktop\deneme.py", line 42, in
G[alpha][n]=compute_G(x,n) NameError: name 'G' is not defined
Here is my code:
N = 20
N_cor = 25
N_cf = 25
a = 0.5
eps = 1.4
def update(x):
for j in range(0,N):
old_x = x[j]
old_Sj = S(j,x)
x[j] = x[j] + random.uniform(-eps,eps)
dS = S(j,x) - old_Sj
if dS>0 and exp(-dS)<random.uniform(0,1):
x[j] = old_x
def S(j,x):
jp = (j+1)%N
jm = (j-1)%N
return a*x[j]**2/2 + x[j]*(x[j]-x[jp]-x[jm])/a
def compute_G(x,n):
g = 0
for j in range(0,N):
g = g + x[j]*x[(j+n)%N]
return g/N
#def MCaverage(x,G):
import random
from math import exp
x=[]
for j in range(0,N):
x.append(0.0)
print"x(%d)=%f"%(j,x[j])
for j in range(0,5*N_cor):
update(x)
for alpha in range(0,N_cf):
for j in range(0,N_cor):
update(x)
for i in range(0,N):
print"x(%d)=%f"%(i,x[i])
for n in range(0,N):
G[alpha][n]=compute_G(x,n)
for n in range(0,N):
avg_G = 0
for alpha in range(0,N_cf):
avg_G = avg_G + G[alpha][n]
avg_G = avg_G / N_cf
print "G(%d) = %f"%(n,avg_G)
When i define G I get another error such as:
Traceback (most recent call last): File
"C:\Users\SONY\Desktop\deneme.py", line 43, in
G[alpha][n]=compute_G(x,n) IndexError: list index out of range
Here is how i define G:
...
for alpha in range(0,N_cf):
for j in range(0,N_cor):
update(x)
for n in range(0,N):
G=[][]
G[alpha][n]=compute_G(x,n)
...
What should i do to define an array with two index ie a two dimensional matrix?

In Python a=[] defines a list, not an array. It certainly can be used to store a lot of elements all of the same numeric type, and one can define a mapping from two integers indexing a rectangular array to one list index. It's rather going against the grain, though. Hard to program and inefficiently stored, because lists are intended as ordered collections of objects which may be of arbitrary type.
What you probably need most is a direction to where to start reading. Here it is. Learn about Numpy http://www.numpy.org/, which is a Python module for use in typical scienticic calculations with arrays of (mostly) numeric data in which all the elements are of the same type. Here is a brief taster, after you have installed numpy.
>>> import numpy as np # importing as np is conventional
>>> p = np.zeros( (6,4) ) # two dimensional, 24 elements in total
>>> for i in range(4): p[i,i]=1
>>> p
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
numpy arrays are efficient ways of manipulating as much data as you can fit into your computer's RAM.
Underlying numpy is Python's array.array datatype, but it is rarely used on its own. numpy is the support code that you'll usually not want to write for yourself. Not least, because when your arrays are millions or billions of elements, you can't afford the inefficiency of inner loops over their indices in an interpreted language like Python. Numpy offers you row-, column- and array-level operations whose underlying code is compiled and optimized, so it runs considerably faster.

Python time optimisation of for loop using newaxis

I need to calculate n number of points(3D) with equal spacing along a defined line(3D).
I know the starting and end point of the line. First, I used
for k in range(nbin):
step = k/float(nbin-1)
bin_point.append(beam_entry+(step*(beamlet_intersection-beam_entry)))
Then, I found that using append for large arrays takes more time, then I changed code like this:
bin_point = [start_point+((k/float(nbin-1))*(end_point-start_point)) for k in range(nbin)]
I got a suggestion that using newaxis will further improve the time.
The modified code looks like this.
step = arange(nbin) / float(nbin-1)
bin_point = start_point + ( step[:,newaxis,newaxis]*((end_pint - start_point))[newaxis,:,:] )
But, I could not understand the newaxis function, I also have a doubt that, whether the same code will work if the structure or the shape of the start_point and end_point are changed. Similarly how can I use the newaxis to mdoify the following code
for j in range(32): # for all los
line_dist[j] = sqrt([sum(l) for l in (end_point[j]-start_point[j])**2])
Sorry for being so clunky, to be more clear the structure of the start_point and end_point are
array([ [[1,1,1],[],[],[]....[]],
[[],[],[],[]....[]],
[[],[],[],[]....[]]......,
[[],[],[],[]....[]] ])

Explanation of the newaxis version in the question: these are not matrix multiplies, ndarray multiply is element-by-element multiply with broadcasting. step[:,newaxis,newaxis] is num_steps x 1 x 1 and point[newaxis,:,:] is 1 x num_points x num_dimensions. Broadcasting together ndarrays with shape (num_steps x 1 x 1) and (1 x num_points x num_dimensions) will work, because the broadcasting rules are that every dimension should be either 1 or the same; it just means "repeat the array with dimension 1 as many times as the corresponding dimension of the other array". This results in an ndarray with shape (num_steps x num_points x num_dimensions) in a very efficient way; the i, j, k subscript will be the k-th coordinate of the i-th step along the j-th line (given by the j-th pair of start and end points).
Walkthrough:
>>> start_points = numpy.array([[1, 0, 0], [0, 1, 0]])
>>> end_points = numpy.array([[10, 0, 0], [0, 10, 0]])
>>> steps = numpy.arange(10)/9.0
>>> start_points.shape
(2, 3)
>>> steps.shape
(10,)
>>> steps[:,numpy.newaxis,numpy.newaxis].shape
(10, 1, 1)
>>> (steps[:,numpy.newaxis,numpy.newaxis] * start_points).shape
(10, 2, 3)
>>> (steps[:,numpy.newaxis,numpy.newaxis] * (end_points - start_points)) + start_points
array([[[ 1., 0., 0.],
[ 0., 1., 0.]],
[[ 2., 0., 0.],
[ 0., 2., 0.]],
[[ 3., 0., 0.],
[ 0., 3., 0.]],
[[ 4., 0., 0.],
[ 0., 4., 0.]],
[[ 5., 0., 0.],
[ 0., 5., 0.]],
[[ 6., 0., 0.],
[ 0., 6., 0.]],
[[ 7., 0., 0.],
[ 0., 7., 0.]],
[[ 8., 0., 0.],
[ 0., 8., 0.]],
[[ 9., 0., 0.],
[ 0., 9., 0.]],
[[ 10., 0., 0.],
[ 0., 10., 0.]]])
As you can see, this produces the correct answer :) In this case broadcasting (10,1,1) and (2,3) results in (10,2,3). What you had is broadcasting (10,1,1) and (1,2,3) which is exactly the same and also produces (10,2,3).
The code for the distance part of the question does not need newaxis: the inputs are num_points x num_dimensions, the ouput is num_points, so one dimension has to be removed. That is actually the axis you sum along. This should work:
line_dist = numpy.sqrt( numpy.sum( (end_point - start_point) ** 2, axis=1 )
Here numpy.sum(..., axis=1) means sum along that axis only, rather than all elements: a ndarray with shape num_points x num_dimensions summed along axis=1 produces a result with num_points, which is correct.
EDIT: removed code example without broadcasting.
EDIT: fixed up order of indexes.
EDIT: added line_dist

I'm not through understanding all you wrote, but some things I already can tell you; maybe they help.
newaxis is rather a marker than a function (in fact, it is plain None). It is used to add an (unused) dimension to a multi-dimensional value. With it you can make a 3D value out of a 2D value (or even more). Each dimension already there in the input value must be represented by a colon : in the index (assuming you want to use all values, otherwise it gets complicated beyond our usecase), the dimensions to be added are denoted by newaxis.
Example:
input is a one-dimensional vector (1D): 1,2,3
output shall be a matrix (2D).
There are two ways to accomplish this; the vector could fill the lines with one value each, or the vector could fill just the first and only line of the matrix. The first is created by vector[:,newaxis], the second by vector[newaxis,:]. Results of this:
>>> array([ 7,8,9 ])[:,newaxis]
array([[7],
[8],
[9]])
>>> array([ 7,8,9 ])[newaxis,:]
array([[7, 8, 9]])
(Dimensions of multi-dimensional values are represented by nesting of arrays of course.)
If you have more dimensions in the input, use the colon more than once (otherwise the deeper nested dimensions are simply ignored, i.e. the arrays are treated as simple values). I won't paste a representation of this here as it won't clarify things due to the optical complexity when 3D and 4D values are written on a 2D display using nested brackets. I hope it gets clear anyway.

The newaxis reshapes the array in such a way so that when you multiply numpy uses broadcasting. Here is a good tutorial on broadcasting.
step[:, newaxis, newaxis] is the same as step.reshape((step.shape[0], 1, 1)) (if step is 1d). Either method for reshaping should be very fast because reshaping arrays in numpy is very cheep, it just makes a view of the array, especially because you should only be doing it once.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Identity matrix stacking in NumPy - python

Related

Create identity matrices with arbitrary shape with numpy

Expanding tensor using native tensorflow ops

cosine similarity on large sparse matrix with numpy

Defining Error of An Array with Two Index

Python time optimisation of for loop using newaxis

Categories

Resources