Related
I am wondering any good ways to calculate this type of multiplication.
It's simply multiplying x[i] by x element-wise, and resulting into [2, 2, 3] matrix.
>>> x
array([[0, 1, 2],
[3, 4, 5]])
>>> output
array([[[ 0, 1, 4],
[ 0, 4, 10]],
[[ 0, 4, 10],
[ 9, 16, 25]]])
I tried with code below and wondering for faster version using numpy.
np.array([
np.multiply(x[i], x)
for i in range(x.shape[0])
])
There are two straightforward ways to do so, the first is using broadcasting, and the second one using einsum. I'd recommed using timeit, to compare the various versions for their speed with the application you have in mind:
out_broadcast = x[:, None, :] * x
out_einsum = np.einsum('ij,kj->ikj',x,x)
I have and ndarray defined in the following way:
dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
dtype=np.float32)
This array represents a collection of images of size image_size * image_size.
So I can say, dataset[0] and get a 2D table corresponding to an image with index 0.
Now I would like to have one additional field for each image in this array. For instance, for image located at index 0, I would like to store number 123, for an image located at index 321 I would like to store number 50000.
What is the simplest way to add this additional data field to the existing ndarray?
What is the appropriate way to access data in the new array after adding this additional dimension?
If you shuffle an index array instead of the dataset itself, you can keep track of the original 'identifiers'
idx = np.arange(len(image_files))
np.random.shuffle(idx)
shuffle_set = dataset[idx]
illustration:
In [20]: x = np.arange(12).reshape(6,2)
...: idx = np.arange(6)
...: np.random.shuffle(idx)
In [21]: x
Out[21]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
In [22]: x[idx] # shuffled
Out[22]:
array([[ 4, 5],
[ 0, 1],
[ 2, 3],
[ 6, 7],
[10, 11],
[ 8, 9]])
In [23]: idx1=np.argsort(idx)
In [24]: idx
Out[24]: array([2, 0, 1, 3, 5, 4])
In [25]: idx1
Out[25]: array([1, 2, 0, 3, 5, 4])
In [26]: Out[22][idx1] # recover original order
Out[26]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
Numpy arrays are fundamentally tensors, i.e., they have a shape that is absolute across the axes. Meaning that the shape is fixed and not variable. Take for example,
import numpy as np
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]]
])
print(x.shape) #Here we have two, 2x2s. Shape = (2,2,2)
If I want to associate x[0] to the number 5 and x[1] to the number 7, then that would be something like (if it was possible):
x = np.array([[[1,2],[3,4]],5,
[[5,6],[7,8]],7
])
But such thing is impossible, since it would "in some sense" have a shape that corresponds to (2,((2,2),1)), or something else that is ambiguous. Such an object is not a numpy array or a tensor. It doesn't have fixed axis sizes. All numpy arrays must have fixed axis sizes. Hence, if you wish to store the new information, the only way to do it, is to create another array.
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]],
])
y = np.array([5,7])
Now x[0] corresponds to y[0] and x[1] corresponds to y[1]. x has shape (2,2,2) and y has shape (2,).
TL;DR:
I am looking for a way to get a non trivial, and in particular non contigous, view of a numpy ndarray.
E.g., given a 1D ndarray, x = np.array([1, 2, 3, 4]), is there a way to get a non trivial view of it, e.g. np.array([2, 4, 3, 1])?
Longer Version
The context of the question is the following: I have a 4D ndarray of shape (U, V, S, T) which I would like to reshape to a 2D ndarray of shape (U*S, V*T)in a non-trivial way, i.e. a simple np.reshape()does not do the trick as I have a more complex indexing scheme in mind, in which the reshaped array will not be contigous in memory. The arrays in my case are rather large and I would like to get a view and not a copy of the array.
Example
Given an array x(u, v, s, t)of shape (2, 2, 2, 2):
x = np.array([[[[1, 1], [1, 1]],[[2, 2], [2, 2]]],
[[[3, 3], [3, 3]], [[4, 4], [4, 4]]]])
I would like to get the view z(a, b) of the array:
np.array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
This corresponds to a indexing scheme of a = u * S + s and b = v * T + t, where in this case S = 2 = T.
What I have tried
Various approaches using np.reshape or even as_strided. Doing standard reshaping will not change the order of elements as they appear in the memory. I tried playing around with order='F' and transposing a bit but had no idea which gave me the correct result.
Since I know the indexing scheme, I tried to operate on the flattened view of the array using np.ravel(). My idea was to create an array of indices follwing the desired indexing scheme and apply it to the flattened array view, but unfortunately, fancy/advanced indexing gives a copy of the array, not a view.
Question
Is there any way to achieve the indexing view that I'm looking for?
In principle, I think this should be possible, as for example ndarray.sort() performs an in place non-trivial indexing of the array. On the other hand, this is probably implemented in C/C++, so it might even not be possible in pure Python?
Let's review the basics of an array - it has a flat data buffer, a shape, strides, and dtype. Those three attributes are used to view the elements of the data buffer in a particular way, whether it is a simple 1d sequence, 2d or higher dimensions.
A true view than use the same data buffer, but applies different shape, strides or dtype to it.
To get [2, 4, 3, 1] from [1,2,3,4] requires starting at 2, jumping forward 2, then skipping back to 1 and forward 2. That's not a regular pattern that can be represented by strides.
arr[1::2] gives the [2,4], and arr[0::2] gives the [1,3].
(U, V, S, T) to (U*S, V*T) requires a transpose to (U, S, V, T), followed by a reshape
arr.transpose(0,2,1,3).reshape(U*S, V*T)
That will require a copy, no way around that.
In [227]: arr = np.arange(2*3*4*5).reshape(2,3,4,5)
In [230]: arr1 = arr.transpose(0,2,1,3).reshape(2*4, 3*5)
In [231]: arr1.shape
Out[231]: (8, 15)
In [232]: arr1
Out[232]:
array([[ 0, 1, 2, 3, 4, 20, 21, 22, 23, 24, 40, 41, 42,
43, 44],
[ 5, 6, 7, 8, 9, 25, 26, 27, 28, 29, 45, 46, 47,
48, 49],
....)
Or with your x
In [234]: x1 = x.transpose(0,2,1,3).reshape(4,4)
In [235]: x1
Out[235]:
array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
Notice that the elements are in a different order:
In [254]: x.ravel()
Out[254]: array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
In [255]: x1.ravel()
Out[255]: array([1, 1, 2, 2, 1, 1, 2, 2, 3, 3, 4, 4, 3, 3, 4, 4])
ndarray.sort is in-place and changes the order of bytes in the data buffer. It is operating at a low level that we don't have access to. It isn't a view of the original array.
Is there any more concise syntax to stack array/matrix? In MatLab, you can simply do [x, y] to stack horizontally, and [x; y] to stack vertically, and it can be easily chained, such as [x, x; y, y]; while in python, it seems to be more tedious, see below:
import numpy as np
x = np.array([[1, 1, 1], [1, 2, 3]])
y = x*10
np.vstack((x, y))
array([[ 1, 1, 1],
[ 1, 2, 3],
[10, 10, 10],
[10, 20, 30]])
np.hstack((x, y))
array([[ 1, 1, 1, 10, 10, 10],
[ 1, 2, 3, 10, 20, 30]])
np.vstack((np.hstack((x, x)), np.hstack((y, y))))
array([[ 1, 1, 1, 1, 1, 1],
[ 1, 2, 3, 1, 2, 3],
[10, 10, 10, 10, 10, 10],
[10, 20, 30, 10, 20, 30]])
MATLAB has its own interpreter, so it can interpret the ; etc to suit its needs. numpyuses the Python interpreter, so can't use or reuse basic syntactic characters like [],;. So the basic array constructor wraps a nested list of lists (takes a list as argument):
np.array([[1,2,3], [4,5,6]])
But that nesting can be carried to any depth, np.array([]), np.array([[[[['foo']]]]]), because arrays can have 0,1, 2 etc dimensions.
MATLAB initially only had 2d matrices, and still can't have 1 or 0d.
In MATLAB that matrix is the basic object (cell and struct came later). In Python lists are the basic object (with tuples and dicts close behind).
np.matrix takes a string argument that imitates the MATLAB syntax. np.matrix('1 2; 3 4'). But np.matrix like the original MATLAB is fixed at 2d.
https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#matrix-objects
https://docs.scipy.org/doc/numpy/reference/generated/numpy.bmat.html#numpy.bmat
But seriously, who makes real, useful matrices with the 1, 2; 3, 4 syntax? Those are toys. I prefer to use np.arange(12).reshape(3,4) if I need a simple example.
numpy has added a np.stack which gives more ways of joining arrays into new constructs. And a np.block:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.block.html#numpy.block
I have a 1d np.array. Its length may vary according to user input, but it will always stay single-dimesional.
Please advise if there is an efficient way to create a symmetric 2d np.array from it? By 'symmetric' I mean that its elements will be according to the rule k[i, j] = k[j, i].
I realise it is possible to do with a python for loop and lists, but that is very inefficient.
Many thanks in advance!
EXAMPLE:
For example, we have x = np.array([1, 2, 3]). The desired result should be
M = np.array([[1, 2, 3],
[2, 1, 2],
[3, 2, 1])
Interpretation #1
Seems like you are reusing elements at each row. So, with that sort of idea, an implementation using broadcasting would be -
def symmetricize(arr1D):
ID = np.arange(arr1D.size)
return arr1D[np.abs(ID - ID[:,None])]
Sample run -
In [170]: arr1D
Out[170]: array([59, 21, 70, 10, 42])
In [171]: symmetricize(arr1D)
Out[171]:
array([[59, 21, 70, 10, 42],
[21, 59, 21, 70, 10],
[70, 21, 59, 21, 70],
[10, 70, 21, 59, 21],
[42, 10, 70, 21, 59]])
Interpretation #2
Another interpretation I had when you would like to assign the elements from the input 1D array into a symmetric 2D array without re-use, such that we would fill in the upper triangular part once and then replicate those on the lower triangular region by keeping symmetry between the row and column indices. As such, it would only work for a specific size of it. So, as a pre-processing step, we need to perform that error-checking. After we are through the error-checking, we will initialize an output array and use row and column indices of a triangular array to assign values once as they are and once with swapped indices to assign values in the other triangular part, thus giving it the symmetry effect.
It seemed like Scipy's squareform should do be able to do this task, but from the docs, it doesn't look like it supports filling up the diagonal elements with the input array elements. So, let's give our solution a closely-related name.
Thus, we would have an implementation like so -
def squareform_diagfill(arr1D):
n = int(np.sqrt(arr1D.size*2))
if (n*(n+1))//2!=arr1D.size:
print "Size of 1D array not suitable for creating a symmetric 2D array!"
return None
else:
R,C = np.triu_indices(n)
out = np.zeros((n,n),dtype=arr1D.dtype)
out[R,C] = arr1D
out[C,R] = arr1D
return out
Sample run -
In [179]: arr1D = np.random.randint(0,9,(12))
In [180]: squareform_diagfill(arr1D)
Size of 1D array not suitable for creating a symmetric 2D array!
In [181]: arr1D = np.random.randint(0,9,(10))
In [182]: arr1D
Out[182]: array([0, 4, 3, 6, 4, 1, 8, 6, 0, 5])
In [183]: squareform_diagfill(arr1D)
Out[183]:
array([[0, 4, 3, 6],
[4, 4, 1, 8],
[3, 1, 6, 0],
[6, 8, 0, 5]])
What you're looking for is a special Toeplitz matrix and easy to generate with scipy
from numpy import concatenate, zeros
from scipy.linalg import toeplitz
toeplitz([1,2,3])
array([[1, 2, 3],
[2, 1, 2],
[3, 2, 1]])
another special matrix interpretation can be using Hankel matrix, which will give you minimum dimension square matrix for a given array.
from scipy.linalg import hankel
a=[1,2,3]
t=int(len(a)/2)+1
s=t-2+len(a)%2
hankel(a[:t],a[s:])
array([[1, 2],
[2, 3]])