numpy fill ndarray with values of a 1D array - python

I want to fill Ndarray x with values from array b along dimension i without using a for loop. This snippet of code is what I'm currently using but it's not that fast. Is there a better way?
for i in range(len(b)):
x[...,i,:,:] = b[i]
Edit 1: It's almost what I'm looking for but for higher dimensions it doesn't seem to work. x has a dimension of 8 and it's important that the shape of the Ndarray remains the same. Any more ideas?
import numpy as np
x = np.ones((2,3,4))
b = np.arange(3)
for i in range(len(b)):
x[:,i,:] = b[i]
x
Out[5]:
array([[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]],
[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]]])
y = np.tile(b,(4,1,2)).T
y
Out[7]:
array([[[0, 0, 0, 0]],
[[1, 1, 1, 1]],
[[2, 2, 2, 2]],
[[0, 0, 0, 0]],
[[1, 1, 1, 1]],
[[2, 2, 2, 2]]])
Edit 2: This seems to do the job
z[...] = b.reshape(1,-1,1)
z
Out[20]:
array([[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]],
[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]]])

There is a faster way. You can reshape b to add new dimensions and get the advantages of numpy broadcasting rules:
x[...,:,:,:] = b.reshape(-1,1,1)
Here I am assuming that b is a vector.
Another equivalent way to create new dimensions is as the following code indicates:
x[...,:,:,:] = b[:, np.newaxis, np.newaxis]

Depending on the shape of your destination array you can do something like this
>>> import numpy as np
>>> x = np.ones((4,8))
>>> x
array([[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.]])
>>> b = np.arange(4)
>>> b
array([0, 1, 2, 3])
>>> x[:,1] = b
>>> x
array([[1., 0., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 2., 1., 1., 1., 1., 1., 1.],
[1., 3., 1., 1., 1., 1., 1., 1.]])
In this example we assigned b to column 1 of the 2D array x
If instead you are trying to repeat b a certain number of times you can use np.tile
>>> x = np.tile(b, (8,1)).T
>>> x
array([[0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3]])

Related

How to length the last dimension of a numpy array and fill it up with another array?

I have a numpy array of shape (5, 4, 3) and another numpy array of shape (4,) and what I want to do is expand the last dimension of the first array
(5, 4, 3) -> (5, 4, 4)
and then broadcast the other array with shape (4,) such that it fills up the new array cells respectively.
Example:
np.ones((5,4,3))
array([[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]]])
becomes
array([[[1., 1., 1., 0.],
[1., 1., 1., 0.],
[1., 1., 1., 0.],
[1., 1., 1., 0.]],
[[1., 1., 1., 0.],
[1., 1., 1., 0.],
[1., 1., 1., 0.],
[1., 1., 1., 0.]],
[[1., 1., 1., 0.],
[1., 1., 1., 0.],
[1., 1., 1., 0.],
[1., 1., 1., 0.]],
[[1., 1., 1., 0.],
[1., 1., 1., 0.],
[1., 1., 1., 0.],
[1., 1., 1., 0.]],
[[1., 1., 1., 0.],
[1., 1., 1., 0.],
[1., 1., 1., 0.],
[1., 1., 1., 0.]]])
And then I have another array
array([2., 3., 4., 5.])
which I somehow broadcast with the first one to fill the zeros:
array([[[1., 1., 1., 2.],
[1., 1., 1., 3.],
[1., 1., 1., 4.],
[1., 1., 1., 5.]],
[[1., 1., 1., 2.],
[1., 1., 1., 3.],
[1., 1., 1., 4.],
[1., 1., 1., 5.]],
[[1., 1., 1., 2.],
[1., 1., 1., 3.],
[1., 1., 1., 4.],
[1., 1., 1., 5.]],
[[1., 1., 1., 2.],
[1., 1., 1., 3.],
[1., 1., 1., 4.],
[1., 1., 1., 5.]],
[[1., 1., 1., 2.],
[1., 1., 1., 3.],
[1., 1., 1., 4.],
[1., 1., 1., 5.]]])
How can I accomplish this?
You can use numpy.c_ and numpy.tile:
A = np.ones((5,4,3), dtype='int')
B = np.array([2, 3, 4, 5])
np.c_[A, np.tile(B[:,None], (A.shape[0], 1, 1))]
output:
array([[[1, 1, 1, 2],
[1, 1, 1, 3],
[1, 1, 1, 4],
[1, 1, 1, 5]],
[[1, 1, 1, 2],
[1, 1, 1, 3],
[1, 1, 1, 4],
[1, 1, 1, 5]],
[[1, 1, 1, 2],
[1, 1, 1, 3],
[1, 1, 1, 4],
[1, 1, 1, 5]],
[[1, 1, 1, 2],
[1, 1, 1, 3],
[1, 1, 1, 4],
[1, 1, 1, 5]],
[[1, 1, 1, 2],
[1, 1, 1, 3],
[1, 1, 1, 4],
[1, 1, 1, 5]]])
How it works:
# reshape B to add one dimension
>>> B[:, None]
array([[2],
[3],
[4],
[5]])
# tile to match A's first dimension
>>> np.tile(B[:,None], (A.shape[0], 1, 1))
array([[[2],
[3],
[4],
[5]],
[[2],
[3],
[4],
[5]],
[[2],
[3],
[4],
[5]],
[[2],
[3],
[4],
[5]],
[[2],
[3],
[4],
[5]]])
There are various ways of doing this, but one of simplest is to make a array of the desired final size, and fill in the values.
I could start with a np.zeros((5,4,4)), and insert the np.ones((5,4,3)), but why not just start with all ones:
In [680]: res = np.ones((5,4,4))
The we can easily copy the 4 element list/array to the last column with:
In [681]: res[:,:,-1] = [2,3,4,5]
In [682]: res
Out[682]:
array([[[1., 1., 1., 2.],
[1., 1., 1., 3.],
[1., 1., 1., 4.],
[1., 1., 1., 5.]],
[[1., 1., 1., 2.],
[1., 1., 1., 3.],
[1., 1., 1., 4.],
[1., 1., 1., 5.]],
...
The (4,) shape array is broadcasted to (1,4), which easily fits in (5,4) slot defined by res[:,:,-1].
Expanding the (4,) to (5,4,1) (with tile), and then concatenating that with the (5,4,3) also works.
You can use numpy.append :
A=np.ones((5,4,3))
AA=np.zeros((5,4,1))
B=np.array([2., 3., 4., 5.])
C=np.append(A,AA, axis=2)
for i in range(np.shape(C)[0]):
for j in range(np.shape(C)[1]):
C[i,j,-1]=B[j]
print(C)
>
[[[1. 1. 1. 2.]
[1. 1. 1. 3.]
[1. 1. 1. 4.]
[1. 1. 1. 5.]]
[[1. 1. 1. 2.]
[1. 1. 1. 3.]
[1. 1. 1. 4.]
[1. 1. 1. 5.]]
[[1. 1. 1. 2.]
[1. 1. 1. 3.]
[1. 1. 1. 4.]
[1. 1. 1. 5.]]
[[1. 1. 1. 2.]
[1. 1. 1. 3.]
[1. 1. 1. 4.]
[1. 1. 1. 5.]]
[[1. 1. 1. 2.]
[1. 1. 1. 3.]
[1. 1. 1. 4.]
[1. 1. 1. 5.]]]

Torch sum subsets of tensor

if the tensor is of shape [20, 5] then I need to take 10 at a time and sum them, so result is [2,5].
eg:
shape[20,5] -> shape[2, 5] (sum 10 at a time)
shape[100, 20] -> shape[10,20] (sum 10 at a time)
Is there any faster/optimal way to do this?
eg:
[[1, 1], [1, 2], [3, 4], [1,2]] i want [[2, 3], [4, 6]] by taking sum of 2 rows.
It is not completely clear, but I cannot use a comment for this, so.
For the first case you have:
t1 = torch.tensor([[1., 1.], [1., 2.], [3., 4.], [1.,2.]])
t1.shape #=> torch.Size([4, 2])
t1
tensor([[1., 1.],
[1., 2.],
[3., 4.],
[1., 2.]])
To get the desired output you should reshape:
tr1 = t1.reshape([2, 2, 2])
res1 = torch.sum(tr1, axis = 1)
res1.shape #=> torch.Size([2, 2])
res1
tensor([[2., 3.],
[4., 6.]])
Let's take a tensor with all one elements (torch.ones) for the second case.
t2 = torch.ones((20, 5))
t2.shape #=> torch.Size([20, 5])
t2
tensor([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
So, reshaping to get the required (?) result:
tr2 = tensor.reshape((10, 2, 5))
res2 = torch.sum(tr2, axis = 0)
res2.shape #=> torch.Size([2, 5])
res2
tensor([[10., 10., 10., 10., 10.],
[10., 10., 10., 10., 10.]])
Is this what you are looking for?
I am not aware of any off the shelf solution for that.
If having the average is enough you can use nn.AvgPool1d https://pytorch.org/docs/stable/generated/torch.nn.AvgPool1d.html#avgpool1d:
import torch, torch.nn as nn
x = torch.rand(batch_size, channels, lenght)
pool = nn.AvgPool1D(kernel_size=10, stride=10)
avg = pool(x)
With this solution, just make sure you are averaging the correct dimension.
EDIT
I just realized you can get the sum by modifying the last line with avg = pool(x) * kernel_size!
You can also just write your own function that does the summing for you:
import torch
def SumWindow(x, window_size, dim):
input_split = torch.split(x, window_size, dim)
input_sum = [v.sum(dim=dim), for v in input_split] # may be expensive if there are too many tensors
out = torch.cat(inptu_sum, dim=dim)
return dim

Smoothing one-hot encoded matrix rows

Assuming that I have the following matrix consisting of one-hot encoded rows:
X = np.array([[0., 0., 0., 1., 0.], [1., 0., 0., 0., 0.], [0., 0., 1., 0., 0.]])
What I aim to do is smooth/expand the one-hot encoding in a way such that I will obtain the following output:
Y = np.array([[0., 0., 1., 1., 1.], [1., 1., 0., 0., 0.], [0., 1., 1., 1., 0.]])
assuming that I want to smooth/expand 1 element to the left or the right of the one-hot element. Thank you for the help!
We can use convolution -
In [22]: from scipy.signal import convolve2d
In [23]: convolve2d(X,np.ones((1,3)),'same')
Out[23]:
array([[0., 0., 1., 1., 1.],
[1., 1., 0., 0., 0.],
[0., 1., 1., 1., 0.]])
With binary-dilation to be more memory-efficient -
In [43]: from scipy.ndimage.morphology import binary_dilation
In [46]: binary_dilation(X,np.ones((1,3), dtype=bool)).view('i1')
Out[46]:
array([[0, 0, 1, 1, 1],
[1, 1, 0, 0, 0],
[0, 1, 1, 1, 0]], dtype=int8)
Or since we only 0s and 1s, uniform filter would also work and additionally we can use it along a generic axis (axis=1 in our case) and should be better on perf. -
In [47]: from scipy.ndimage import uniform_filter1d
In [50]: (uniform_filter1d(X,size=3,axis=1)>0).view('i1')
Out[50]:
array([[0, 0, 1, 1, 1],
[1, 1, 0, 0, 0],
[0, 1, 1, 1, 0]], dtype=int8)
You could convolve X with an array of ones:
from scipy.signal import convolve2d
convolve2d(X, np.ones((1,3)), mode='same')
array([[0., 0., 1., 1., 1.],
[1., 1., 0., 0., 0.],
[0., 1., 1., 1., 0.]])
Solution based on standard np.convolve:
import numpy as np
np.array([np.convolve(x, np.array([1,1,1]), mode='same') for x in X])
Iterate rows using list comprehension to convolve, then convert back to np.array

Alternatives to np.newaxis() for saving memory when comparing arrays

I want to copared each vector from one array with all vectors from another array, and count how many symbols matches per vector. Let me show an example.
I have two arrays, a and b.
For each vector in a, I want to compare it with each vector in b. I then want to return a new array which is with dimension np.array((len(a),14)) where each vector holds the number of times vectors in a had 0,1,2,3,4,..,12,13 matches with vectors from b. The wished results are shown in array c below.
I already have solved this problem using np.newaxis() but my issue is (see my function below), that this takes up so much memory so my computer can't handle it when a and b gets larger. Hence, I am looking for a more efficient way to do this calculation, as it hurts my memory big time to add on dimensions to the vectors. One solution is to go with a normal for loop, but this method is rather slow.
Is it possible to make these calculations more efficient?
a = array([[1., 1., 1., 2., 1., 1., 2., 1., 0., 2., 2., 2., 2.],
[0., 2., 2., 0., 1., 1., 0., 1., 1., 0., 2., 1., 2.],
[0., 0., 0., 1., 1., 0., 2., 1., 2., 0., 1., 2., 2.],
[1., 2., 2., 0., 1., 1., 0., 2., 0., 1., 1., 0., 2.],
[1., 2., 0., 2., 2., 0., 2., 0., 0., 1., 2., 0., 0.]])
b = array([[0., 2., 0., 0., 0., 0., 0., 1., 1., 1., 0., 2., 2.],
[1., 0., 1., 2., 2., 0., 1., 1., 1., 1., 2., 1., 2.],
[1., 2., 1., 2., 0., 0., 0., 1., 1., 2., 2., 0., 2.],
[0., 1., 2., 0., 2., 1., 0., 1., 2., 0., 0., 0., 2.],
[0., 2., 2., 1., 2., 1., 0., 1., 1., 1., 2., 2., 2.],
[0., 2., 2., 1., 0., 1., 1., 0., 1., 0., 2., 2., 1.],
[1., 0., 2., 2., 0., 1., 0., 1., 0., 1., 1., 2., 2.],
[1., 1., 0., 2., 1., 1., 1., 1., 0., 2., 0., 2., 2.],
[1., 2., 0., 0., 0., 1., 2., 1., 0., 1., 2., 0., 1.],
[1., 2., 1., 2., 2., 1., 2., 0., 2., 0., 0., 1., 1.]])
c = array([[0, 0, 0, 2, 1, 2, 2, 2, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 2, 3, 1, 2, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 3, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 3, 0, 3, 2, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 4, 0, 3, 0, 1, 0, 0, 0, 0, 0]])
My solution:
def new_method_test(a,b):
test = (a[:,np.newaxis] == b).sum(axis=2)
zero = (test == 0).sum(axis=1)
one = (test == 1).sum(axis=1)
two = (test == 2).sum(axis=1)
three = (test == 3).sum(axis=1)
four = (test == 4).sum(axis=1)
five = (test == 5).sum(axis=1)
six = (test == 6).sum(axis=1)
seven = (test == 7).sum(axis=1)
eight = (test == 8).sum(axis=1)
nine = (test == 9).sum(axis=1)
ten = (test == 10).sum(axis=1)
eleven = (test == 11).sum(axis=1)
twelve = (test == 12).sum(axis=1)
thirteen = (test == 13).sum(axis=1)
c = np.concatenate((zero,one,two,three,four,five,six,seven,eight,nine,ten,eleven,twelve,thirteen), axis = 0).reshape(14,len(a)).T
return c
Thank you for you help.
welcome to Stackoverflow! I think a for loop is the way to go if you want to save memory (and it's really not that slow). Additionally you can directly go from one test to your c output matrix with np.bincount. I think this method will be approximately equally fast as yours and it will use significantly less memory by comparison.
import numpy as np
c = np.empty(a.shape, dtype=int)
for i in range(a.shape[0]):
test_one_vector = (a[i,:]==b).sum(axis=1)
c[i,:] = np.bincount(test_one_vector, minlength=a.shape[1])
Small sidenote if you are really dealing with floating point numbers in a and b you should consider dropping the equality check (==) in favor of a proximity check like e.g. np.isclose

Remove all columns matching a value in Numpy

Let's suppose I have a matrix with a number of binary values:
matrix([[1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
Using np.sum(M, 0) produces:
matrix([[1., 1., 2., 2., 2.]])
How do I remove all of the columns from the matrix that have only the value of 1?
Easier to have an array here:
M = M.A
Now using simple slicing:
M[:, np.sum(M, 0)!=1]
array([[1., 0., 0.],
[1., 1., 1.],
[0., 1., 0.],
[0., 0., 1.]])
You can convert the matrix to array. Then find the index with values 1 and then use those indexes to delete the values. For example you can do the following.
import numpy as np
M = np.matrix([[1, 1, 1, 0, 0], [0, 0, 1, 1, 1], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1]])
M = np.sum(M, 0)
# conversion to array
array = np.squeeze(np.asarray(M))
index_of_elements_with_value_1 = [i for i, val in enumerate(array) if val == 1]
array = np.delete(array, index_of_elements_with_value_1)
print(array)

Categories