This question already has answers here:
what does numpy ndarray shape do?
(4 answers)
Numpy array dimensions
(9 answers)
In Python NumPy what is a dimension and axis?
(7 answers)
Closed 3 years ago.
newbie here. Trying to learn conceptually what it means to have an n-dimensional array in python. For example, if I create an ndarrray using the following code:
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
how exactly are the 1,2,3 and 4,5,6 blocks related?
what if I added another block [6, 7, 8]? can I think of them as separate rows of the same grid? I get that i can create an array of N-dimensions by passing in N lists using the above, but can't just grasp conceptually what it means for an array to have more than one dimension.
thanks so much
Number of dimensions is the same as the depth to which lists are nested. Your example is 2d because it is a list of lists. Adding a third list doesn't change the dimension. What would make it 3d would be if it was a list of lists of lists.
If you print that array, you'll see it displayed as a 2d grid, where each of the lists is a row, and each list index is a column. It gets harder to imagine as you go to higher dimensions, but a 3d grid could be rendered as a cube, and then 4d and beyond is too difficult to imagine.
--Edit--
As pointed out, an array is not the same as nested lists. They are functionally very different. But the original question was about conceptualizing multidimensional arrays.
import numpy as np
array = np.arange(27)
print(array)
array = array.reshape(3, 9)
print(array)
array = array.reshape(3, 3, 3)
print(array)
If you run this script, you can see the same information displayed as a 1d, 2d, and 3d array.
# 1D
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26]
# 2D
[[ 0 1 2 3 4 5 6 7 8]
[ 9 10 11 12 13 14 15 16 17]
[18 19 20 21 22 23 24 25 26]]
# 3D
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
The 1d array is akin to a number line. The 2d array is like a grid. The 3d array could represent each of the cells in a 3x3x3 cube. For learning how to operate on an array, this isn't particularly useful. But for being able to conceptually grasp the structure of an N-dimensional array, it's helpful to me.
Related
For example I have a matrix array
a=np.arrange(25).shape(5,5)
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
How do I make an 1D array of elements that I would like to choose manually? For example [2,3], [4,1], [1,0] and [2,2], so I get the following:
b=[13, 21, 5, 12]
The array should a reference rather than a copy.
You can make a function for this.
# defining the function
def get_value(matrix, row_list, col_list):
for i, j in zip(row_list, col_list):
return matrix[row_list, col_list]
# initializing the array
a = np.arange(0, 25, 1).reshape(5, 5)
# getting the required values and printing
b = get_value(a, [2,4,1,0], [3,1,0,2])
# output
print(b)
Edit
I'll let the previous answer be as is, just in case if anyone else stumbles upon that and needs it.
What the question wants is to give a value from b (i.e. b[0] which is 13) and change the value from the original matrix a based on the index of that passed value from b in a.
def change_the_value(old_mat, val_to_change, new_val):
mat_coor = np.array(np.matrix(np.where(old_mat == val_to_change)).T)[0]
old_mat[mat_coor[0], mat_coor[1]] = new_val
a = np.arange(0, 25, 1).reshape(5,5)
b = [13, 16, 5, 12]
change_the_value(a, b[0], 0)
a=np.arange(25).reshape(5,5)
search=[[2,3], [4,1], [1,0], [2,2]]
for row,col in search:
print(row,col, a[row][col])
output:
r c result
2 3 13
4 1 21
1 0 5
2 2 12
First of all, I've found that constructing a non-contiguous view to a Numpy array is not natively possible, because Numpy efficiently utilises contiguous memory layout of an array, which enables dramatic speed increase.
Here's a solution I found that works the best so far:
Instead of having a view to an array, I construct a collection indices, that I would like to process, [2,3], [4,1], [1,0], [2,2].
The collection type I have chosen are Sets, due to exclusion of duplicates and set().add and set().discard methods that do not require search. Keeping order was not necessary.
To use them for indexing an array they have to be casted from a set of tuples set{(2,3),(4,1),(1,0),(2,2)} to a tuple of arrays (ndarray([2,4,1,2], ndarray[3,1,0,2]).
Which can be achieved by unzipping a set and constructing a tuple of arrays:
import numpy as np
a=np.arrange(25).shape(5,5)
>>>[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
my_set = {(2,3),(4,1),(1,0),(2,2)}
uzip_set = list(zip(*my_set))
seq_from_set = (np.asarray(uzip_set[0]),np.asarray(uzip_set[1]))
print(seq_from_set)
>>>(array[2,4,1,2], array[3,1,0,2])
And array a can be manipulated by providing such a sequence of indices:
b = a[seq_from_set]
print(b)
>>>array[13,21,5,12]
a[seq_from_set] = 0
print(a)
>>>[[ 0 1 2 3 4]
[ 0 6 7 8 9]
[10 11 0 0 14]
[15 16 17 18 19]
[20 0 22 23 24]]
The solution is a bit sophisticated compared to something native, but works surprisingly fast. This allows an easy management of the collection of indices and supports quick conversion to a stream of indices on demand.
This question already has answers here:
What is the difference between resize and reshape when using arrays in NumPy?
(5 answers)
What does it mean "The reshape function returns its argument with a modified shape, whereas the ndarray.resize method modifies the array itself"?
(3 answers)
Closed 2 years ago.
I was writing a line of code and I get some strange output of it.
a = np.arange(2,11).resize((3,3))
print(a)
a = np.arange(2,11).reshape((3,3))
print(a)
the first one gives me None but the second one gives me a 3X3 matrix.
but when I write the first code in separate lines it won't give me None
a = np.arange(2,11)
a.resize((3,3))
print(a)
what is the difference between resize and reshape in this case and in what are the differences in general?
That is because ndarray.resize modifies the array shape in-place, and since you're assigning back to a you get None, as an in-place operation does not return anything. reshape instead returns a view of the array:
a = np.arange(2,11)
a.shape
#(10,)
a.resize((3,3))
a.shape
# (3, 3)
np.arange(2,11).reshape((3,3)).shape
# (3, 3)
Both reshape and resize change the shape of the numpy array; the difference is that using resize will affect the original array while using reshape create a new reshaped instance of the array.
Reshape:
import numpy as np
r = np.arange(16)
print('original r: \n',r)
print('\nreshaped array: \n',r.reshape((4,4)))
print('\narray r after reshape was applied: \n',r)
output
original r:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
reshaped array:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
array r after reshape was applied:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
Resize:
import numpy as np
r = np.arange(16)
print('original r: \n',r)
print('\nresized array: \n',r.resize((4,4)))
print('\narray r after resize was applied: \n',r)
output:
original r:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
resized array:
None
array r after resize was applied:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
As you can see reshape created a reshaped new instance of the data while the original r stayed unchanged.And As you can see, resize didn’t create a new instance of r, the changes were applied to the original array directly.
I have an array
a=[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21...]
and I want to first choose the elements 7, 8, 9 and afterwards choose every 10 items of these 3 elements to form a new array
b=[7 8 9 17 18 19 27 28 29 ....]
How could I implement this?
you could use a list comprehension and boolean indexing:
import numpy as np
a = np.arange(123)
mask = [True if x%10 in [7,8,9] else False for x in a]
b = a[mask]
You can use reshaping to convert it to 2-D, then select columns and finally flatten it back to 1-D:
b = a.reshape(-1,10)[:,7:10].flatten()
And if your array’s shape is not a multiplier of 10, you can either crop it or pad it with zeros first and then remove extra zeros from selection.
How to crop first:
a = a[:a.size//10*10]
And padding with zero:
a = a.resize((a.size//10+1)*10)
Removing extra unwanted selected zeros is another cropping.
I have a rather big 3 dimensional numpy (2000,2500,32) array that I need to manipulate.Some rows are bad so I would need to delete several rows.
In order to detect which row is "bad" I using the following function
def badDetect(x):
for i in xrange(10,19):
ptp = np.ptp(x[i*100:(i+1)*100])
if ptp < 0.01:
return True
return False
which marks as bad any sequence of 2000 that has a range of 100 values with peak to peak value less than 0.01.
When this is the case I want to remove that sequence of 2000 values (which can be selected from numpy with a[:,x,y])
Numpy delete seems to be accepting indexes but only for 2 dimensional arrays.
You will definitely have to reshape your input array, because cutting out "rows" from a 3D cube leaves a structure that cannot be properly addressed.
As we don't have your data, I'll use a different example first to explain how this possible solution works:
>>> import numpy as np
>>> from numpy.lib.stride_tricks import as_strided
>>>
>>> threshold = 18
>>> a = np.arange(5*3*2).reshape(5,3,2) # your dataset of 2000x2500x32
>>> # Taint the data:
... a[0,0,0] = 5
>>> a[a==22]=20
>>> print(a)
[[[ 5 1]
[ 2 3]
[ 4 5]]
[[ 6 7]
[ 8 9]
[10 11]]
[[12 13]
[14 15]
[16 17]]
[[18 19]
[20 21]
[20 23]]
[[24 25]
[26 27]
[28 29]]]
>>> a2 = a.reshape(-1, np.prod(a.shape[1:]))
>>> print(a2) # Will prove to be much easier to work with!
[[ 5 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 20 23]
[24 25 26 27 28 29]]
As you can see, from the representation above, it already becomes much clearer now over which windows you want to compute the peak to peak value. And you'll need this form if you're going to remove "rows" (now they have been transformed to columns) from this datastructure, something you couldn't do in 3 dimensions!
>>> isize = a.itemsize # More generic, in case you have another dtype
>>> slice_size = 4 # How big each continuous slice is over which the Peak2Peak value is calculated
>>> slices = as_strided(a2,
... shape=(a2.shape[0] + 1 - slice_size, slice_size, a2.shape[1]),
... strides=(isize*a2.shape[1], isize*a2.shape[1], isize))
>>> print(slices)
[[[ 5 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 20 23]]
[[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 20 23]
[24 25 26 27 28 29]]]
So I took, as an example, a window size of 4 elements: If the peak to peak value within any of these 4 element slices (per dataset, so per column) is less than a certain threshold, I want to exclude it. That can be done like this:
>>> mask = np.all(slices.ptp(axis=1) >= threshold, axis=0) # These are the ones that are of interest
>>> print(a2[:,mask])
[[ 1 2 3 5]
[ 7 8 9 11]
[13 14 15 17]
[19 20 21 23]
[25 26 27 29]]
You can now clearly see that the tainted data has been removed. But remember, you could not have simply removed that data from a 3D array (but you could've masked it then).
Obviously, you'll have to set the threshold to .01 in your use-case, and the slice_size to 100.
Beware, while the as_strided form is extremely memory-efficient, computing the peak to peak values of this array and storing that result does require a good amount of memory in your case: 1901x(2500x32) in the full case scenario, so when you do not ignore the first 1000 slices. In your case, where you're only interested in the slices from 1000:1900, you would have to add that to the code like so:
mask = np.all(slices[1000:1900,:,:].ptp(axis=1) >= threshold, axis=0)
And that would reduce the memory required to store this mask to "only" 900x(2500x32) values (of whatever data type you were using).
I am looking for how to resample a numpy array representing image data at a new size, preferably having a choice of the interpolation method (nearest, bilinear, etc.). I know there is
scipy.misc.imresize
which does exactly this by wrapping PIL's resize function. The only problem is that since it uses PIL, the numpy array has to conform to image formats, giving me a maximum of 4 "color" channels.
I want to be able to resize arbitrary images, with any number of "color" channels. I was wondering if there is a simple way to do this in scipy/numpy, or if I need to roll my own.
I have two ideas for how to concoct one myself:
a function that runs scipy.misc.imresize on every channel separately
create my own using scipy.ndimage.interpolation.affine_transform
The first one would probably be slow for large data, and the second one does not seem to offer any other interpolation method except splines.
Based on your description, you want scipy.ndimage.zoom.
Bilinear interpolation would be order=1, nearest is order=0, and cubic is the default (order=3).
zoom is specifically for regularly-gridded data that you want to resample to a new resolution.
As a quick example:
import numpy as np
import scipy.ndimage
x = np.arange(9).reshape(3,3)
print 'Original array:'
print x
print 'Resampled by a factor of 2 with nearest interpolation:'
print scipy.ndimage.zoom(x, 2, order=0)
print 'Resampled by a factor of 2 with bilinear interpolation:'
print scipy.ndimage.zoom(x, 2, order=1)
print 'Resampled by a factor of 2 with cubic interpolation:'
print scipy.ndimage.zoom(x, 2, order=3)
And the result:
Original array:
[[0 1 2]
[3 4 5]
[6 7 8]]
Resampled by a factor of 2 with nearest interpolation:
[[0 0 1 1 2 2]
[0 0 1 1 2 2]
[3 3 4 4 5 5]
[3 3 4 4 5 5]
[6 6 7 7 8 8]
[6 6 7 7 8 8]]
Resampled by a factor of 2 with bilinear interpolation:
[[0 0 1 1 2 2]
[1 2 2 2 3 3]
[2 3 3 4 4 4]
[4 4 4 5 5 6]
[5 5 6 6 6 7]
[6 6 7 7 8 8]]
Resampled by a factor of 2 with cubic interpolation:
[[0 0 1 1 2 2]
[1 1 1 2 2 3]
[2 2 3 3 4 4]
[4 4 5 5 6 6]
[5 6 6 7 7 7]
[6 6 7 7 8 8]]
Edit: As Matt S. pointed out, there are a couple of caveats for zooming multi-band images. I'm copying the portion below almost verbatim from one of my earlier answers:
Zooming also works for 3D (and nD) arrays. However, be aware that if you zoom by 2x, for example, you'll zoom along all axes.
data = np.arange(27).reshape(3,3,3)
print 'Original:\n', data
print 'Zoomed by 2x gives an array of shape:', ndimage.zoom(data, 2).shape
This yields:
Original:
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
Zoomed by 2x gives an array of shape: (6, 6, 6)
In the case of multi-band images, you usually don't want to interpolate along the "z" axis, creating new bands.
If you have something like a 3-band, RGB image that you'd like to zoom, you can do this by specifying a sequence of tuples as the zoom factor:
print 'Zoomed by 2x along the last two axes:'
print ndimage.zoom(data, (1, 2, 2))
This yields:
Zoomed by 2x along the last two axes:
[[[ 0 0 1 1 2 2]
[ 1 1 1 2 2 3]
[ 2 2 3 3 4 4]
[ 4 4 5 5 6 6]
[ 5 6 6 7 7 7]
[ 6 6 7 7 8 8]]
[[ 9 9 10 10 11 11]
[10 10 10 11 11 12]
[11 11 12 12 13 13]
[13 13 14 14 15 15]
[14 15 15 16 16 16]
[15 15 16 16 17 17]]
[[18 18 19 19 20 20]
[19 19 19 20 20 21]
[20 20 21 21 22 22]
[22 22 23 23 24 24]
[23 24 24 25 25 25]
[24 24 25 25 26 26]]]
If you want to resample, then you should look at Scipy's cookbook for rebinning. In particular, the congrid function defined at the end will support rebinning or interpolation (equivalent to the function in IDL with the same name). This should be the fastest option if you don't want interpolation.
You can also use directly scipy.ndimage.map_coordinates, which will do a spline interpolation for any kind of resampling (including unstructured grids). I find map_coordinates to be slow for large arrays (nx, ny > 200).
For interpolation on structured grids, I tend to use scipy.interpolate.RectBivariateSpline. You can choose the order of the spline (linear, quadratic, cubic, etc) and even independently for each axis. An example:
import scipy.interpolate as interp
f = interp.RectBivariateSpline(x, y, im, kx=1, ky=1)
new_im = f(new_x, new_y)
In this case you're doing a bi-linear interpolation (kx = ky = 1). The 'nearest' kind of interpolation is not supported, as all this does is a spline interpolation over a rectangular mesh. It's also not the fastest method.
If you're after bi-linear or bi-cubic interpolation, it is generally much faster to do two 1D interpolations:
f = interp.interp1d(y, im, kind='linear')
temp = f(new_y)
f = interp.interp1d(x, temp.T, kind='linear')
new_im = f(new_x).T
You can also use kind='nearest', but in that case get rid of the transverse arrays.
Have you looked at Scikit-image? Its transform.pyramid_* functions might be useful for you.
I've recently just found an issue with scipy.ndimage.interpolation.zoom, which I've submitted as a bug report: https://github.com/scipy/scipy/issues/3203
As an alternative (or at least for me), I've found that scikit-image's skimage.transform.resize works correctly: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.resize
However it works differently to scipy's interpolation.zoom - rather than specifying a mutliplier, you specify the the output shape that you want. This works for 2D and 3D images.
For just 2D images, you can use transform.rescale and specify a multiplier or scale as you would with interpolation.zoom.
You can use interpolate.interp2d.
For example, considering an image represented by a numpy array arr, you can resize it to an arbitrary height and width as follows:
W, H = arr.shape[:2]
new_W, new_H = (600,300)
xrange = lambda x: np.linspace(0, 1, x)
f = interp2d(xrange(W), xrange(H), arr, kind="linear")
new_arr = f(xrange(new_W), xrange(new_H))
Of course, if your image has multiple channels, you have to perform the interpolation for each one.
This solution scales X and Y of the fed image without affecting RGB channels:
import numpy as np
import scipy.ndimage
matplotlib.pyplot.imshow(scipy.ndimage.zoom(image_np_array, zoom = (7,7,1), order = 1))
Hope this is useful.