I want to copared each vector from one array with all vectors from another array, and count how many symbols matches per vector. Let me show an example.
I have two arrays, a and b.
For each vector in a, I want to compare it with each vector in b. I then want to return a new array which is with dimension np.array((len(a),14)) where each vector holds the number of times vectors in a had 0,1,2,3,4,..,12,13 matches with vectors from b. The wished results are shown in array c below.
I already have solved this problem using np.newaxis() but my issue is (see my function below), that this takes up so much memory so my computer can't handle it when a and b gets larger. Hence, I am looking for a more efficient way to do this calculation, as it hurts my memory big time to add on dimensions to the vectors. One solution is to go with a normal for loop, but this method is rather slow.
Is it possible to make these calculations more efficient?
a = array([[1., 1., 1., 2., 1., 1., 2., 1., 0., 2., 2., 2., 2.],
[0., 2., 2., 0., 1., 1., 0., 1., 1., 0., 2., 1., 2.],
[0., 0., 0., 1., 1., 0., 2., 1., 2., 0., 1., 2., 2.],
[1., 2., 2., 0., 1., 1., 0., 2., 0., 1., 1., 0., 2.],
[1., 2., 0., 2., 2., 0., 2., 0., 0., 1., 2., 0., 0.]])
b = array([[0., 2., 0., 0., 0., 0., 0., 1., 1., 1., 0., 2., 2.],
[1., 0., 1., 2., 2., 0., 1., 1., 1., 1., 2., 1., 2.],
[1., 2., 1., 2., 0., 0., 0., 1., 1., 2., 2., 0., 2.],
[0., 1., 2., 0., 2., 1., 0., 1., 2., 0., 0., 0., 2.],
[0., 2., 2., 1., 2., 1., 0., 1., 1., 1., 2., 2., 2.],
[0., 2., 2., 1., 0., 1., 1., 0., 1., 0., 2., 2., 1.],
[1., 0., 2., 2., 0., 1., 0., 1., 0., 1., 1., 2., 2.],
[1., 1., 0., 2., 1., 1., 1., 1., 0., 2., 0., 2., 2.],
[1., 2., 0., 0., 0., 1., 2., 1., 0., 1., 2., 0., 1.],
[1., 2., 1., 2., 2., 1., 2., 0., 2., 0., 0., 1., 1.]])
c = array([[0, 0, 0, 2, 1, 2, 2, 2, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 2, 3, 1, 2, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 3, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 3, 0, 3, 2, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 4, 0, 3, 0, 1, 0, 0, 0, 0, 0]])
My solution:
def new_method_test(a,b):
test = (a[:,np.newaxis] == b).sum(axis=2)
zero = (test == 0).sum(axis=1)
one = (test == 1).sum(axis=1)
two = (test == 2).sum(axis=1)
three = (test == 3).sum(axis=1)
four = (test == 4).sum(axis=1)
five = (test == 5).sum(axis=1)
six = (test == 6).sum(axis=1)
seven = (test == 7).sum(axis=1)
eight = (test == 8).sum(axis=1)
nine = (test == 9).sum(axis=1)
ten = (test == 10).sum(axis=1)
eleven = (test == 11).sum(axis=1)
twelve = (test == 12).sum(axis=1)
thirteen = (test == 13).sum(axis=1)
c = np.concatenate((zero,one,two,three,four,five,six,seven,eight,nine,ten,eleven,twelve,thirteen), axis = 0).reshape(14,len(a)).T
return c
Thank you for you help.
welcome to Stackoverflow! I think a for loop is the way to go if you want to save memory (and it's really not that slow). Additionally you can directly go from one test to your c output matrix with np.bincount. I think this method will be approximately equally fast as yours and it will use significantly less memory by comparison.
import numpy as np
c = np.empty(a.shape, dtype=int)
for i in range(a.shape[0]):
test_one_vector = (a[i,:]==b).sum(axis=1)
c[i,:] = np.bincount(test_one_vector, minlength=a.shape[1])
Small sidenote if you are really dealing with floating point numbers in a and b you should consider dropping the equality check (==) in favor of a proximity check like e.g. np.isclose
Related
I have 4 different numpy array's (2Dimension) and each array have the size (112,20).
How can I convert (concatenate) them, to one array with 3 Dimension and the size of (112, 20, 4).
Thanks for your support!
Use np.stack((arr1, arr2, arr3, arr4), axis=2):
arr1 = np.zeros((2,5))
arr2 = np.ones((2,5))
arr3 = np.ones((2,5))*2
arr4 = np.ones((2,5))*3
v = np.stack((arr1, arr2, arr3, arr4), axis=2)
v.shape returns (2, 5, 4)
Output:
array([[[0., 1., 2., 3.],
[0., 1., 2., 3.],
[0., 1., 2., 3.],
[0., 1., 2., 3.],
[0., 1., 2., 3.]],
[[0., 1., 2., 3.],
[0., 1., 2., 3.],
[0., 1., 2., 3.],
[0., 1., 2., 3.],
[0., 1., 2., 3.]]])
I want to fill Ndarray x with values from array b along dimension i without using a for loop. This snippet of code is what I'm currently using but it's not that fast. Is there a better way?
for i in range(len(b)):
x[...,i,:,:] = b[i]
Edit 1: It's almost what I'm looking for but for higher dimensions it doesn't seem to work. x has a dimension of 8 and it's important that the shape of the Ndarray remains the same. Any more ideas?
import numpy as np
x = np.ones((2,3,4))
b = np.arange(3)
for i in range(len(b)):
x[:,i,:] = b[i]
x
Out[5]:
array([[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]],
[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]]])
y = np.tile(b,(4,1,2)).T
y
Out[7]:
array([[[0, 0, 0, 0]],
[[1, 1, 1, 1]],
[[2, 2, 2, 2]],
[[0, 0, 0, 0]],
[[1, 1, 1, 1]],
[[2, 2, 2, 2]]])
Edit 2: This seems to do the job
z[...] = b.reshape(1,-1,1)
z
Out[20]:
array([[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]],
[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]]])
There is a faster way. You can reshape b to add new dimensions and get the advantages of numpy broadcasting rules:
x[...,:,:,:] = b.reshape(-1,1,1)
Here I am assuming that b is a vector.
Another equivalent way to create new dimensions is as the following code indicates:
x[...,:,:,:] = b[:, np.newaxis, np.newaxis]
Depending on the shape of your destination array you can do something like this
>>> import numpy as np
>>> x = np.ones((4,8))
>>> x
array([[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.]])
>>> b = np.arange(4)
>>> b
array([0, 1, 2, 3])
>>> x[:,1] = b
>>> x
array([[1., 0., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 2., 1., 1., 1., 1., 1., 1.],
[1., 3., 1., 1., 1., 1., 1., 1.]])
In this example we assigned b to column 1 of the 2D array x
If instead you are trying to repeat b a certain number of times you can use np.tile
>>> x = np.tile(b, (8,1)).T
>>> x
array([[0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3]])
In order to compute confusion matrix (not the accuracy) loop over the predicted and true labels may be needed. How to perform that in a numpy manner, if next code does not give needed result?
>> a = np.zeros((5, 5))
>> indices = np.array([
[0, 0],
[2, 2],
[4, 4],
[0, 0],
[2, 2],
[4, 4],
])
np.add.at(a, indices, 1)
>> a
>> array([
[4., 4., 4., 4., 4.],
[0., 0., 0., 0., 0.],
[4., 4., 4., 4., 4.],
[0., 0., 0., 0., 0.],
[4., 4., 4., 4., 4.]
])
# Wanted
>> array([
[2., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 2., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 2.]
])
Docs say If first operand has multiple dimensions, indices can be a tuple of array like index objects or slice objects.
Using next tupling wanted result is reached.
np.add.at(a, (indices[:, 0], indices[:, 1]), 1)
I'm still playing around with tensorflow and been trying to use the gather_nd op, but the return value is not in the shape/format I want...
Input Tensor: - shape: (2, 7, 4)
array([[[ 0., 0., 1., 2.],
[ 0., 0., 2., 2.],
[ 0., 0., 3., 3.],
[ 0., 0., 4., 3.],
[ 0., 0., 5., 4.],
[ 0., 0., 6., 4.],
[ 0., 0., 7., 5.]],
[[ 1., 1., 0., 2.],
[ 1., 2., 0., 2.],
[ 1., 3., 0., 3.],
[ 1., 4., 0., 3.],
[ 1., 5., 0., 4.],
[ 1., 6., 0., 5.],
[ 1., 7., 0., 5.]]], dtype=float32)
Indices returned by tf.where op: - shape: (3, 2)
array([[0, 0],
[0, 1],
[1, 0]])
tf.gather results: (shape = [3, 4])
array([[ 0., 0., 1., 2.],
[ 0., 0., 2., 2.],
[ 1., 1., 0., 2.]], dtype=float32)
desired results: = (2, sparse, 4)
array([[[ 0., 0., 1., 2.],
[ 0., 0., 2., 2.]],
[[ 1., 1., 0., 2.]]], dtype=float32)
What's the best way to achieve this, keeping in mind that tf.where = dynamic shapes and no guarantees of shape consistency across the 2nd dimension (axis=1)?
NB: Ignore this question - See my answer
I think its a Tensorflow version problem. In my version (1.2.1), I get the exact desired output from your inputs. However, I also tried the following code according to the older version.
import tensorflow as tf
indices = [[[0, 0, 0], [0, 0, 1], [0, 0, 2], [0, 0, 3]],
[[0, 1, 0], [0, 1, 1], [0, 1, 2], [0, 1, 3]],
[[1, 0, 0], [1, 0, 1], [1, 0, 2], [1, 0, 3]]]
params = [[[ 0., 0., 1., 2.],
[ 0., 0., 2., 2.],
[ 0., 0., 3., 3.],
[ 0., 0., 4., 3.],
[ 0., 0., 5., 4.],
[ 0., 0., 6., 4.],
[ 0., 0., 7., 5.]],
[[ 1., 1., 0., 2.],
[ 1., 2., 0., 2.],
[ 1., 3., 0., 3.],
[ 1., 4., 0., 3.],
[ 1., 5., 0., 4.],
[ 1., 6., 0., 5.],
[ 1., 7., 0., 5.]]]
output = tf.gather_nd(params, indices)
with tf.Session()as sess:
print (sess.run(output))
Hope this helps.
I realized the idiocy of my question.
# of tuples when 1st dim is 0 != # of tuples when 1st dim is 1
I'm not sure that what I'm asking is feasible ...
I have a set of data which is in columns, where the first column is the x values. How do i read this in?
If you want to store both, x and y values you can do
ydat = np.zeros((data.shape[1]-1,data.shape[0],2))
# write the x data
ydat[:,:,0] = data[:,0]
# write the y data
ydat[:,:,1] = data[:,1:].T
Edit:
If you want to store only the y-data in the sub arrays you can simply do
ydat = data[:,1:].T
Working example:
t = np.array([[ 0., 0., 1., 2.],
[ 1., 0., 1., 2.],
[ 2., 0., 1., 2.],
[ 3., 0., 1., 2.],
[ 4., 0., 1., 2.]])
a = t[:,1:].T
a
array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.]])