Can not reshape after numpy.bincount() (ValueError) - python

If I generate the b array using np.random.uniform() I can reshape it with no issues (so I can multiply it by the larger array a). But if I try the same line generating b using np.bincount(), I get a
ValueError: cannot reshape array of size 7 into shape (20,)
even thought both the a and b arrays have the exact same shape in both blocks.
import numpy as np
a = np.random.uniform(0., 1., 20)
# Works
b = np.random.uniform(0., 1., 7)
b.resize(a.shape)
d = b * a
# Does not work
c = [0, 4, 5, 4, 1, 3, 4, 5, 6, 6, 5, 6, 4, 6, 3, 1, 5, 4, 6, 0]
b = np.bincount(c)
b.reshape(a.shape)
d = b * a

NumPys resize can change the total number of elements. It discards elements if the new shape is smaller and fills elements with zeros in case the new shape is bigger (or repeats the arrays values in case you use the resize function). So it's no problem if you "resize" an array from size 7 to size 20.
Return a new array with the specified shape.
If the new array is larger than the original array, then the new array is filled with repeated copies of a. Note that this behavior is different from a.resize(new_shape) which fills with zeros instead of repeated copies of a.
However reshape needs to keep the number of elements constant. That's why you can't reshape an array of length 7 to an array of size 20.
Gives a new shape to an array without changing its data.
Also the reshape method (and function) don't change the array in-place. Only the resize method does that (the resize function also doesn't!).
Thanks #user2357112 for pointing that out!

Related

Rearrange elements in numpy array to plot a 3d array in 2d, preserving grouping

Hello I have a 3d numpy array of shape (nc, nx, ny). I would like to plot it (with matpoltlib) using x and y as the axis, but in such a way that all c values are close to each other and arranged in a rectangle around a central value (you can assume nc is a power of 2).
Thanks to #mad-physicist's answer I think I managed to come up with a general solution, but it is not very efficient.
My code looks like this:
nc1 , nc2 = 4, 4
(nx, ny, nc) = (2,3,nc1*nc2)
data = np.reshape(np.arange(nx* ny* nc),(nc, nx, ny))
res = np.zeros((nx*nc1,ny*nc2))
for i in range(nx) :
for j in range(ny) :
tmp = data[:,i,j].reshape(nc1,nc2)
res[i*nc1:(i+1)*nc1,j*nc2:(j+1)*nc2] = tmp
I am trying to find a way to avoid the loops using numpy functions. Any suggestion on how to do this?
To clarify, here is a picture of what I have in mind.
Here is a tentative approach to rearrange the memory layout closer to what you want, so you can de-interlace in a single step:
y = np.moveaxis(data.reshape(nc // 2, 2, nx, ny), 1, -1).reshape(nc, -1)
result = np.concatenate((y[::2], y[1::2]))
You may have to play with the dimensions to make sure you are rearranging based on the ones you want.
Numpy assumes C order. So when you reshape, if your array is not C contiguous, it generally makes a copy. It also means that you can rearrange the memory layout of an array by say transposing and then reshaping it.
You want to take an array of shape (nc1 * nc2, nx, ny) into one of shape (nx * nc1, ny * nc2). One approach is to split the array into 4D first. Adding an extra dimension will not change the memory layout:
data = data.reshape(nc1, nc2, nx, ny)
Now you can move the dimensions around to get the memory layout you need. This will change your strides without copying memory, since the array is still C contiguous in the last step:
data = data.transpose(2, 0, 3, 1) # nx, nc1, ny, nc2
Notice that we moved nc1 after nx and nc2 after ny. In C order, we want the groups to be contiguous. Later dimensions are close together than earlier ones. So raveling [[1, 2], [3, 4]] produces [1, 2, 3, 4], while raveling [[1, 3], [2, 4]] produces [1, 3, 2, 4].
Now you can get the final array. Since the data is no longer C contiguous after rearranging the dimensions, the following reshape will make the copy that actually rearranges the data:
data = data.reshape(nx * nc1, ny * nc2)
You can write this in one line:
result = data.reshape(nc1, nc2, nx, ny).transpose(2, 0, 3, 1).reshape(nx * nc1, ny * nc2)
In principle, this is as efficient as you can get: the first reshape changes the shape and strides; the transpose swaps around the order of the shape and strides. Only a single copy operation happens during the last reshape.

numpy's transpose method can't convert 1D row ndarray to a column one [duplicate]

This question already has answers here:
Transposing a 1D NumPy array
(15 answers)
Closed 3 years ago.
Let's consider a as an 1D row/horizontal array:
import numpy as np
N = 10
a = np.arange(N) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a.shape # (10,)
now I want to have b a 1D column/vertical array transposed of a:
b = a.transpose() # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b.shape # (10,)
but the .transpose() method returns an identical ndarray whith the exact same shape!
What I expected to see was
np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
which can be achieved by
c = a.reshape(a.shape[0], 1) # or c = a; c.shape = (c.shape[0], 1)
c.shape # (10, 1)
and to my surprise, it has a shape of (10, 1) instead of (1, 10).
In Octave/Scilab I could do:
N = 10
b = 0:(N-1)
a = b'
size(b) % ans = 1 10
size(a) % ans = 10 1
I understand that numpy ndarrays are not matrices (as discussed here), but the behavior of the numpy's transpose function just doesn't make sense to me! I would appreciate it if you could help me understand how this behavior makes sense and what am I missing here.
P.S. So what I have understood so far is that b = a.transpose() is the equivalent of b = a; b.shape = b.shape[::-1] which if you had a "2D array" of (N, 1) would return a (1, N) shaped array, as you would expect from a transpose operator. However, numpy seems to treat the "1D array" of (N,) as a 0D scalar. I think they should have named this method something else, as this is very misleading/confusing IMHO.
To understand the numpy array better, you should take a look at this review paper: The NumPy array: a structure for efficient numerical computation
In short, numpy ndarrays have this attribute called the stride, which is
the number of bytes to skip in memory to proceed to the next element.
For a (10, 10) array of bytes, for example, the strides may be (10,
1), in other words: proceed one byte to get to the next column and ten
bytes to locate the next row.
For your ndarray a, a.stride = (8,), which shows that it is only 1 dimensional, and that to get to the next element on this single dimension, you need to advance 8 bytes in memory (each int is 64-bit).
Strides are useful for representing transposes:
By modifying strides, for example, an array can be transposed or
reshaped at zero cost (no memory needs to be copied).
So if there was a 2-dimensional ndarray, say b = np.ones((3,5)) for example, then b.strides = (40, 8), while b.transpose().strides = (8, 40). So as you see a transposed 2D-ndarray is simply the exact same array, whose strides have been reordered. And since your 1D ndarray has only 1 dimension, swapping the the values of its strides (i.e. taking its transpose), doesn't do anything.
As you already mentioned that numpy array are not matrix. The defination of transpose function is like below
Permute the dimensions of an array.
Which means that numpy's transpose method will move data from one dimension to another. As 1D array has only one dimension there is no other dimension to move the data t0. So you need add a dimension before transpose has any effect. This behavior make sense also to be consistent with higher dimensional array (3D, 4D ...) array.
There is a clean way to achive what you want
N = 10
a = np.arange(N)
a[ :, np.newaxis]

Append value to each array in a numpy array

I have a numpy array of arrays, for example:
x = np.array([[1,2,3],[10,20,30]])
Now lets say I want to extend each array with [4,40], to generate the following resulting array:
[[1,2,3,4],[10,20,30,40]]
How can I do this without making a copy of the whole array? I tried to change the shape of the array in place but it throws a ValueError:
x[0] = np.append(x[0],4)
x[1] = np.append(x[1],40)
ValueError : could not broadcast input array from shape (4) into shape (3)
You can't do this. Numpy arrays allocate contiguous blocks of memory, if at all possible. Any change to the array size will force an inefficient copy of the whole array. You should use Python lists to grow your structure if possible, then convert the end result back to an array.
However, if you know the final size of the resulting array, you could instantiate it with something like np.empty() and then assign values by index, rather than appending. This does not change the size of the array itself, only reassigns values, so should not require copying.
While #roganjosh is right that you cannot modify the numpy arrays without making a copy (in the underlying process), there is a simpler way of appending each value of an ndarray to the end of each numpy array in a 2d ndarray, by using numpy.column_stack
x = np.array([[1,2,3],[10,20,30]])
array([[ 1, 2, 3],
[10, 20, 30]])
stack_y = np.array([4,40])
array([ 4, 40])
numpy.column_stack((x, stack_y))
array([[ 1, 2, 3, 4],
[10, 20, 30, 40]])
Create a new matrix
Insert the values of your old matrix
Then, insert your new values in the last positions
x = np.array([[1,2,3],[10,20,30]])
new_X = np.zeros((2, 4))
new_X[:2,:3] = x
new_X[0][-1] = 4
new_X[1][-1] = 40
x=new_X
Or Use np.reshape() or np.resize() instead

What is the difference between resize and reshape when using arrays in NumPy?

I have just started using NumPy. What is the difference between resize and reshape for arrays?
Reshape doesn't change the data as mentioned here.
Resize changes the data as can be seen here.
Here are some examples:
>>> numpy.random.rand(2,3)
array([[ 0.6832785 , 0.23452056, 0.25131171],
[ 0.81549186, 0.64789272, 0.48778127]])
>>> ar = numpy.random.rand(2,3)
>>> ar.reshape(1,6)
array([[ 0.43968751, 0.95057451, 0.54744355, 0.33887095, 0.95809916,
0.88722904]])
>>> ar
array([[ 0.43968751, 0.95057451, 0.54744355],
[ 0.33887095, 0.95809916, 0.88722904]])
After reshape the array didn't change, but only output a temporary array reshape.
>>> ar.resize(1,6)
>>> ar
array([[ 0.43968751, 0.95057451, 0.54744355, 0.33887095, 0.95809916,
0.88722904]])
After resize the array changed its shape.
One major difference is reshape() does not change your data, but resize() does change it. resize() first accommodates all the values in the original array. After that, if extra space is there (or size of new array is greater than original array), it adds its own values. As #David mentioned in comments, what values resize() adds depends on how that is called.
You can call reshape() and resize() function in the following two ways.
numpy.resize()
ndarray.resize() - where ndarray is an n dimensional array you are resizing.
You can similarly call reshape also as numpy.reshape() and ndarray.reshape(). But here they are almost the same except the syntax.
One point to notice is that, reshape() will always try to return a view wherever possible, otherwise it would return a copy. Also, it can't tell what will be returned when, but you can make your code to raise error whenever the data is copied.
For resize() function, numpy.resize() returns a new copy of the array whereas ndarray.resize() does it in-place. But they don't go to the view thing.
Now coming to the point that what the values of extra elements should be. From the documentation, it says
If the new array is larger than the original array, then the new array is filled with repeated copies of a. Note that this behavior is different from a.resize(new_shape) which fills with zeros instead of repeated copies of a.
So for ndarray.resize() it is the value 0, but for numpy.resize() it is the values of the array itself (of course, whatever can fit in the new size). The below code snippet will make it clear.
In [40]: arr = np.array([1, 2, 3, 4])
In [41]: np.resize(arr, (2,5))
Out[41]:
array([[1, 2, 3, 4, 1],
[2, 3, 4, 1, 2]])
In [42]: arr.resize((2,5))
In [43]: arr
Out[43]:
array([[1, 2, 3, 4, 0],
[0, 0, 0, 0, 0]])
You can also see that ndarray.resize() returns None and does the resizing in-place.
reshape() is able to change the shape only (i.e. the meta info), not the number of elements.
If the array has five elements, we may use e.g. reshape(5, ), reshape(1, 5),
reshape(1, 5, 1), but not reshape(2, 3).
reshape() in general don't modify data themselves, only meta info about them,
the .reshape() method (of ndarray) returns the reshaped array, keeping the original array untouched.
resize() is able to change both the shape and the number of elements, too.
So for an array with five elements we may use resize(5, 1), but also resize(2, 2) or resize(7, 9).
The .resize() method (of ndarray) returns None, changing only the original array (so it seems as an in-place change).
Suppose you have the following np.ndarray:
a = np.array([1, 2, 3, 4]) # Shape of this is (4,)
Now we try 'a.reshape'
a.reshape(1, 4)
array([[1, 2, 3, 4]])
a.shape # This will again return (4,)
We see that the shape of a hasn't changed.
Let's try 'a.resize' now:
a.resize(1,4)
a.shape # Now the shape changes to (1,4)
'resize' changed the shape of our original NumPy array a (It changes shape 'IN-PLACE').
One more point is:
np.reshape can take -1 in one dimension. np.resize can't.
Example as below:
arr = np.arange(20)
arr.resize(5, 2, 2)
arr.reshape(2, 2, -1)

Convert 1D array into numpy matrix

I have a simple, one dimensional Python array with random numbers. What I want to do is convert it into a numpy Matrix of a specific shape. My current attempt looks like this:
randomWeights = []
for i in range(80):
randomWeights.append(random.uniform(-1, 1))
W = np.mat(randomWeights)
W.reshape(8,10)
Unfortunately it always creates a matrix of the form:
[[random1, random2, random3, ...]]
So only the first element of one dimension gets used and the reshape command has no effect. Is there a way to convert the 1D array to a matrix so that the first x items will be row 1 of the matrix, the next x items will be row 2 and so on?
Basically this would be the intended shape:
[[1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, ... , 16],
[..., 800]]
I suppose I can always build a new matrix in the desired form manually by parsing through the input array. But I'd like to know if there is a simpler, more eleganz solution with built-in functions I'm not seeing. If I have to build those matrices manually I'll have a ton of extra work in other areas of the code since all my source data comes in simple 1D arrays but will be computed as matrices.
reshape() doesn't reshape in place, you need to assign the result:
>>> W = W.reshape(8,10)
>>> W.shape
(8,10)
You can use W.resize(), ndarray.resize()

Categories