New at Python and Numpy, trying to create 3-dimensional arrays. My problem is that the order of the dimensions are off compared to Matlab. In fact the order doesn't make sense at all.
Creating a matrix:
x = np.zeros((2,3,4))
In my world this should result in 2 rows, 3 columns and 4 depth dimensions and it should be presented as:
[0 0 0 [0 0 0 [0 0 0 [0 0 0
0 0 0] 0 0 0] 0 0 0] 0 0 0]
Seperating on each depth dimensions.
Instead it is presented as
[0 0 0 0 [0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0] 0 0 0 0]
That is, 3 rows, 4 column and 2 depth dimensions. That is, the first dimension is the "depth". To further add to this problem, importing an image with OpenCV the color dimension is the last dimension, that is, I see the color information as the depth dimension. This complicates things greatly if all I want to do is try something on a known smaller 3-dimensional array.
Have I misunderstood something? If not, why the heck is numpy using such a unintuitive way of working with 3D-dimensional arrays?
You have a truncated array representation. Let's look at a full example:
>>> a = np.zeros((2, 3, 4))
>>> a
array([[[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]],
[[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]]])
Arrays in NumPy are printed as the word array followed by structure, similar to embedded Python lists. Let's create a similar list:
>>> l = [[[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]],
[[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]]]
>>> l
[[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0]]]
The first level of this compound list l has exactly 2 elements, just as the first dimension of the array a (# of rows). Each of these elements is itself a list with 3 elements, which is equal to the second dimension of a (# of columns). Finally, the most nested lists have 4 elements each, same as the third dimension of a (depth/# of colors).
So you've got exactly the same structure (in terms of dimensions) as in Matlab, just printed in another way.
Some caveats:
Matlab stores data column by column ("Fortran order"), while NumPy by default stores them row by row ("C order"). This doesn't affect indexing, but may affect performance. For example, in Matlab efficient loop will be over columns (e.g. for n = 1:10 a(:, n) end), while in NumPy it's preferable to iterate over rows (e.g. for n in range(10): a[n, :] -- note n in the first position, not the last).
If you work with colored images in OpenCV, remember that:
2.1. It stores images in BGR format and not RGB, like most Python libraries do.
2.2. Most functions work on image coordinates (x, y), which are opposite to matrix coordinates (i, j).
You are right, you are creating a matrix with 2 rows, 3 columns and 4 depth. Numpy prints matrixes different to Matlab:
Numpy:
>>> import numpy as np
>>> np.zeros((2,3,2))
array([[[ 0., 0.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.]]])
Matlab
>> zeros(2, 3, 2)
ans(:,:,1) =
0 0 0
0 0 0
ans(:,:,2) =
0 0 0
0 0 0
However you are calculating the same matrix. Take a look to Numpy for Matlab users, it will guide you converting Matlab code to Numpy.
For example if you are using OpenCV, you can build an image using numpy taking into account that OpenCV uses BGR representation:
import cv2
import numpy as np
a = np.zeros((100, 100,3))
a[:,:,0] = 255
b = np.zeros((100, 100,3))
b[:,:,1] = 255
c = np.zeros((100, 200,3))
c[:,:,2] = 255
img = np.vstack((c, np.hstack((a, b))))
cv2.imshow('image', img)
cv2.waitKey(0)
If you take a look to matrix c you will see it is a 100x200x3 matrix which is exactly what it is shown in the image (in red as we have set the R coordinate to 255 and the other two remain at 0).
No need to go in such deep technicalities, and get yourself blasted. Let me explain it in the most easiest way. We all have studied "Sets" during our school-age in Mathematics. Just consider 3D numpy array as the formation of "sets".
x = np.zeros((2,3,4))
Simply Means:
2 Sets, 3 Rows per Set, 4 Columns
Example:
Input
x = np.zeros((2,3,4))
Output
Set # 1 ---- [[[ 0., 0., 0., 0.], ---- Row 1
[ 0., 0., 0., 0.], ---- Row 2
[ 0., 0., 0., 0.]], ---- Row 3
Set # 2 ---- [[ 0., 0., 0., 0.], ---- Row 1
[ 0., 0., 0., 0.], ---- Row 2
[ 0., 0., 0., 0.]]] ---- Row 3
Explanation:
See? we have 2 Sets, 3 Rows per Set, and 4 Columns.
Note: Whenever you see a "Set of numbers" closed in double brackets from both ends. Consider it as a "set". And 3D and 3D+ arrays are always built on these "sets".
As much as people like to say "order doesn't matter its just convention" this breaks down when entering cross domain interfaces, IE transfer from C ordering to Fortran ordering or some other ordering scheme. There, precisely how your data is layed out and how shape is represented in numpy is very important.
By default, numpy uses C ordering, which means contiguous elements in memory are the elements stored in rows. You can also do FORTRAN ordering ("F"), this instead orders elements based on columns, indexing contiguous elements.
Numpy's shape further has its own order in which it displays the shape. In numpy, shape is largest stride first, ie, in a 3d vector, it would be the least contiguous dimension, Z, or pages, 3rd dim etc... So when executing:
np.zeros((2,3,4)).shape
you will get
(2,3,4)
which is actually (frames, rows, columns). doing np.zeros((2,2,3,4)).shape instead would mean (metaframs, frames, rows, columns). This makes more sense when you think of creating multidimensional arrays in C like langauges. For C++, creating a non contiguously defined 4D array results in an array [ of arrays [ of arrays [ of elements ]]]. This forces you to de reference the first array that holds all the other arrays (4th dimension) then the same all the way down (3rd, 2nd, 1st) resulting in syntax like:
double element = array4d[w][z][y][x];
In fortran, this indexed ordering is reversed (x is instead first array4d[x][y][z][w]), most contiguous to least contiguous and in matlab, it gets all weird.
Matlab tried to preserve both mathematical default ordering (row, column) but also use column major internally for libraries, and not follow C convention of dimensional ordering. In matlab, you order this way:
double element = array4d[y][x][z][w];
which deifies all convention and creates weird situations where you are sometimes indexing as if row ordered and sometimes column ordered (such as with matrix creation).
In reality, Matlab is the unintuitive one, not Numpy.
Read this article for better insight: numpy: Array shapes and reshaping arrays
Note: NumPy reports the shape of 3D arrays in the order layers, rows, columns.
I also got confused initially in NumPy. When you say :
x = np.zeros((2,3,4))
It interprets as:
Generate a 3d matrix with 2 matrices of 3 rows each. Each row must contain 4 elements each;
Numpy always starts assigning dimensions from the outermost then moves in
Thumb rule: A 2d array is a matrix
Related
Hi I want to join multiple arrays in python, using numpy to form multidimensional arrays, it's inside of a for loop, this is a pseudocode
import numpy as np
h = np.zeros(4)
for x in range(3):
x1 = some array of length of 4 returned from a previous function (3,5,6,7)
h = np.concatenate((h,x1), axis =0)
The first iteration goes fine, but during the second iteration on the for loop I get the following error,
ValueError: all the input arrays must have same number of dimensions
The output array should look something like this
[[0,0,0,0],[3,5,6,7],[6,3,6,7]]
etc
So how can I join the arrays?
Thanks
You need to use vstack. It allows you to stack arrays. You take a sequence of arrays and stack them vertically to make a single array
import numpy as np
h = np.zeros(4)
for x in range(3):
x1 = [3,5,6,7]
h = np.vstack((h,x1))
# not h = np.concatenate((h,x1), axis =0)
print h
Output:
[[ 0. 0. 0. 0.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]]
more edits later.
If you do want to use cocatenate only, you can do the following way as well:
import numpy as np
h1 = np.zeros(4)
for x in range(3):
x1 = np.array([3,5,6,7])
h1= np.concatenate([h1,x1.T], axis =0)
print h1.shape
print h1.reshape(4,4)
Output:
(16,)
[[ 0. 0. 0. 0.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]]
Both have different applications. You can choose according to your need.
There are multiple ways of doing this. I'll list a few examples:
First, we import numpy and define a function that generates those arrays of length 4.
import numpy as np
def previous_function_returning_array_of_length_4(x):
return np.array(range(4)) + x
The first way involves creating a list of arrays, then calling numpy.array() to convert the list to a 2D array.
h0 = np.zeros(4)
arrays = [h0]
for x in range(3):
x1 = previous_function_returning_array_of_length_4(x)
arrays.append(x1)
h = np.array(arrays)
You can do the same with np.vstack():
h0 = np.zeros(4)
arrays = [h0]
for x in range(3):
x1 = previous_function_returning_array_of_length_4(x)
arrays.append(x1)
h = np.vstack(arrays)
Alternatively, if you know how many arrays you are going to create, you can create the 2D array first and fill in the values:
h = np.zeros((4, 4))
for ii in range(3):
x1 = previous_function_returning_array_of_length_4(ii)
h[ii + 1, ...] = x1
There are more ways, but hopefully, this will give you an idea of what to do.
It is best to collect values in a list, and perform the concatenate or array creation once, at the end.
h = [np.zeros(4)]
for x in range(3):
x1 = some array of length of 4 returned from a previous function (3,5,6,7)
h = h.append(x1)
h = np.array(h)
# or h = np.vstack(h)
All the concatenate/stack/array functions takes a list of multiple items. It is faster to append to a list than to do a concatenate of 2 items.
======================
Let's try your approach step by step:
In [189]: h=np.zeros(4)
In [190]: h
Out[190]: array([ 0., 0., 0., 0.]) # 1d array (4,) shape
In [191]: x1=np.array([3,5,6,7]) # another 1d
In [192]: h1=np.concatenate((h,x1),axis=0)
In [193]: h1
Out[193]: array([ 0., 0., 0., 0., 3., 5., 6., 7.])
In [194]: h1.shape
Out[194]: (8,) # also a 1d array, but with 8 items
In [195]: x1=np.array([6,3,6,7])
In [196]: h1=np.concatenate((h1,x1),axis=0)
In [197]: h1
Out[197]: array([ 0., 0., 0., 0., 3., 5., 6., 7., 6., 3., 6., 7.])
In this case I'm adding (4,) arrays one after the other, still getting a 1d array.
If I go back an create x1 as 2d (1,4):
In [198]: h=np.zeros(4)
In [199]: x1=np.array([[6,3,6,7]])
In [200]: h1=np.concatenate((h,x1),axis=0)
...
ValueError: all the input arrays must have same number of dimensions
I get this dimension error right away.
The fact that you get the error on the 2nd iteration suggests that the 1st x1 is (4,), but the 2nd is 2d.
When you have dimensions errors like this, check the shapes.
vstack adds dimensions to the inputs, as needed, so you can build 2d arrays:
In [207]: h=np.zeros(4)
In [208]: x1=np.array([3,5,6,7])
In [209]: h=np.vstack((h,x1))
In [210]: h
Out[210]:
array([[ 0., 0., 0., 0.],
[ 3., 5., 6., 7.]])
In [211]: x1=np.array([6,3,6,7])
In [212]: h=np.vstack((h,x1))
In [213]: h
Out[213]:
array([[ 0., 0., 0., 0.],
[ 3., 5., 6., 7.],
[ 6., 3., 6., 7.]])
It appears that a trailing 1 in the shape parameter simple transposes the base ndarray, is that the correct way to think about it?
In [22]: np.ones((3, 2))
Out[22]:
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
In [23]: np.ones((3, 2, 1))
Out[23]:
array([[[ 1.],
[ 1.]],
[[ 1.],
[ 1.]],
[[ 1.],
[ 1.]]])
I'm wondering if there are performance reasons to specify the trailing 1 as well.
The 2 Numpy arrays are different in structure while the number of values stay the same. Example: If you were thinking about a spreadsheet np.ones((3, 2)) would have 2 columns of data and 3 rows while np.ones((3, 2, 1)) would have only 1 column and 2 rows over 3 sheets.
You can go from one to the other using reshape() so just think about the block of data in the most helpful configuration knowing you can change it later.
examples np.ones((3,2)).reshape(1,2,1,3) (this is now 4 dimensions with the same number of values)
As for performance there isn't any difference.
It's still a big block of the same types of data referenced efficiently.
I had a weird behaviour trying to change the value of an element of a numpy array today, and I would like to understand why it didn't work. I have two arrays (a and b), and I want to change the values of b where a > 0.
a = array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
b = array([[ 5., 0., 0.],
[ 0., 5., 0.],
[ 0., 0., 5.]])
mask = a > 0
print b[mask][0]
=> 5.0
b[mask][0] = 10
print b[mask][0]
=> 5.0
Could someone please explain why the assignment b[mask][0] didn't change my value 5.0?
b[mask] is a copy of b. b[mask][0] = 1 is effectively:
c = b[mask]
c[0] = 1
The data elements of c are not (in general) a contiguous block of the elements of b.
b[mask] = 10
b[mask] = [10, 11, 12]
You can assign values to b[mask] when it is the only thing on the left. You need to change all the masked elements.
If you need to change one or two, then first change the mask so it selects only those elements.
In general
b[...][...] = ...
is not good practice. Sometimes it works (if the first indexing is a slice that produces a view), but you shouldn't count on it. It takes a while to full grasp the difference between a view and copy.
The [] get translated by the Python interpreter into calls to __getitem__ or __setitem__. The following pairs are equivalent:
c = b[mask]
c = b.__getitem__(mask)
b[mask] = 10
b.__setitem__(mask, 10)
b[mask][0] = 10
b.__getitem__(mask).__setitem__(0, 10)
b[mask][10] is 2 operations, a get followed by a set. The set operates on the result of the get. It modifies b only if the result of the get is a view.
Input:
array length (Integer)
indexes (Set or List)
Output:
A boolean numpy array that has a value 1 for the indexes 0 for the others.
Example:
Input: array_length=10, indexes={2,5,6}
Output:
[0,0,1,0,0,1,1,0,0,0]
Here is a my simple implementation:
def indexes2booleanvec(size, indexes):
v = numpy.zeros(size)
for index in indexes:
v[index] = 1.0
return v
Is there more elegant way to implement this?
One way is to avoid the loop
In [7]: fill = np.zeros(array_length) # array_length = 10
In [8]: fill[indexes] = 1 # indexes = [2,5,6]
In [9]: fill
Out[9]: array([ 0., 0., 1., 0., 0., 1., 1., 0., 0., 0.])
Another way to do it (in one line):
np.isin(np.arange(array_length), indexes)
However this is slower than Zero's solution.
I have a numpy array with three columns of the form:
x1 y1 f1
x2 y2 f2
...
xn yn fn
The (x,y) pairs may repeat. I would need another array such that each (x,y) pair appears once and the corresponding third column is the sum of all the f values that appeared next to (x,y).
For example, the array
1 2 4.0
1 1 5.0
1 2 3.0
0 1 9.0
would give
0 1 9.0
1 1 5.0
1 2 7.0
The order of rows is not relevant. What is the fastest way to do this in Python?
Thank you!
This would be one approach to solve it -
import numpy as np
# Input array
A = np.array([[1,2,4.0],
[1,1,5.0],
[1,2,3.0],
[0,1,9.0]])
# Extract xy columns
xy = A[:,0:2]
# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(xy.T)
sorted_xy = xy[sorted_idx,:]
# Differentiation along rows for sorted array
df1 = np.diff(sorted_xy,axis=0)
df2 = np.append([True],np.any(df1!=0,1),0)
# OR df2 = np.append([True],np.logical_or(df1[:,0]!=0,df1[:,1]!=0),0)
# OR df2 = np.append([True],np.dot(df1!=0,[True,True]),0)
# Get unique sorted labels
sorted_labels = df2.cumsum(0)-1
# Get labels
labels = np.zeros_like(sorted_idx)
labels[sorted_idx] = sorted_labels
# Get unique indices
unq_idx = sorted_idx[df2]
# Get counts and unique rows and setup output array
counts = np.bincount(labels, weights=A[:,2])
unq_rows = xy[unq_idx,:]
out = np.append(unq_rows,counts.ravel()[:,None],1)
Input & Output -
In [169]: A
Out[169]:
array([[ 1., 2., 4.],
[ 1., 1., 5.],
[ 1., 2., 3.],
[ 0., 1., 9.]])
In [170]: out
Out[170]:
array([[ 0., 1., 9.],
[ 1., 1., 5.],
[ 1., 2., 7.]])
Thanks to #hpaulj, finally found the simplest solution. If d contains the 3-column data:
ind =d[0:2].astype(int)
x = zeros(shape=(N,N))
add.at(x,list(ind),d[2])
This solution assumes that the (x,y) indices in the first two columns are integer and smaller than N. This is what I need and should have mentioned in the post.
Edit: Note that the above solution produces a sparse matrix with the sum values at position (x,y) within the matrix.
Certainly easily done in Python:
arr = np.array([[1,2,4.0],
[1,1,5.0],
[1,2,3.0],
[0,1,9.0]])
d={}
for x, y, z in arr:
d.setdefault((x,y), 0)
d[x,y]+=z
>>> d
{(1.0, 2.0): 7.0, (0.0, 1.0): 9.0, (1.0, 1.0): 5.0}
Then translate back to numpy:
>>> np.array([[x,y,d[(x,y)]] for x,y in d.keys()])
array([[ 1., 2., 7.],
[ 0., 1., 9.],
[ 1., 1., 5.]])
If you have scipy, the sparse module does this kind of addition - again for an array where the 1st 2 columns are integers - ie. indexes.
from scipy import sparse
M = sparse.csr_matrix((d[:,0], (d[:,1],d[:,2])))
M = M.tocoo() # there may be a short cut to this csr coo round trip
x = np.column_stack([M.row, M.col, M.data]) # needs testing
For convenience in constructing certain kinds of linear algebra matrices, the csr sparse array format sums values with duplicate indices. It's implemented in compiled code so should be fairly fast. But putting the data into M and taking it back out might slow it down.
(ps. I haven't tested this script since I'm writing this on a machine without scipy).