When would you use `flatten()` instead of`reshape(-1)`?

When would you use `flatten()` instead of`reshape(-1)`? - python

Often, when numpy has seemingly duplicate functions, there often ends up being some sort of unique purpose for one or the other.
I am trying to figure out if there are any situations where flatten() should be used instead of reshape(-1)

flatten returns a copy of the array. reshape will return a view if possible.
So, for example, if y = x.reshape(-1) is a view, then modifying y also modifies x:
In [149]: x = np.arange(3)
In [150]: y = x.reshape(-1)
In [151]: y[0] = 99
In [152]: x
Out[152]: array([99, 1, 2])
But since y = x.flatten() is a copy, modifying y will never modify x:
In [153]: x = np.arange(3)
In [154]: y = x.flatten()
In [155]: y[0] = 99
In [156]: x
Out[156]: array([0, 1, 2])
Here is an example of when reshape returns a copy instead of a view:
In [161]: x = np.arange(24).reshape(4,6)[::2, :]
In [163]: y = x.reshape(-1)
In [164]: y[0] = 99
In [165]: x
Out[165]:
array([[ 0, 1, 2, 3, 4, 5],
[12, 13, 14, 15, 16, 17]])
Since x is unaffected by an assignment made to y, we know y is a copy of
x, not a view.

Related

What does '...' mean in a python slice [duplicate]

What is the meaning of x[...] below?
a = np.arange(6).reshape(2,3)
for x in np.nditer(a, op_flags=['readwrite']):
x[...] = 2 * x

While the proposed duplicate What does the Python Ellipsis object do? answers the question in a general python context, its use in an nditer loop requires, I think, added information.
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#modifying-array-values
Regular assignment in Python simply changes a reference in the local or global variable dictionary instead of modifying an existing variable in place. This means that simply assigning to x will not place the value into the element of the array, but rather switch x from being an array element reference to being a reference to the value you assigned. To actually modify the element of the array, x should be indexed with the ellipsis.
That section includes your code example.
So in my words, the x[...] = ... modifies x in-place; x = ... would have broken the link to the nditer variable, and not changed it. It's like x[:] = ... but works with arrays of any dimension (including 0d). In this context x isn't just a number, it's an array.
Perhaps the closest thing to this nditer iteration, without nditer is:
In [667]: for i, x in np.ndenumerate(a):
...: print(i, x)
...: a[i] = 2 * x
...:
(0, 0) 0
(0, 1) 1
...
(1, 2) 5
In [668]: a
Out[668]:
array([[ 0, 2, 4],
[ 6, 8, 10]])
Notice that I had to index and modify a[i] directly. I could not have used, x = 2*x. In this iteration x is a scalar, and thus not mutable
In [669]: for i,x in np.ndenumerate(a):
...: x[...] = 2 * x
...
TypeError: 'numpy.int32' object does not support item assignment
But in the nditer case x is a 0d array, and mutable.
In [671]: for x in np.nditer(a, op_flags=['readwrite']):
...: print(x, type(x), x.shape)
...: x[...] = 2 * x
...:
0 <class 'numpy.ndarray'> ()
4 <class 'numpy.ndarray'> ()
...
And because it is 0d, x[:] cannot be used instead of x[...]
----> 3 x[:] = 2 * x
IndexError: too many indices for array
A simpler array iteration might also give insight:
In [675]: for x in a:
...: print(x, x.shape)
...: x[:] = 2 * x
...:
[ 0 8 16] (3,)
[24 32 40] (3,)
this iterates on the rows (1st dim) of a. x is then a 1d array, and can be modified with either x[:]=... or x[...]=....
And if I add the external_loop flag from the next section, x is now a 1d array, and x[:] = would work. But x[...] = still works and is more general. x[...] is used all the other nditer examples.
In [677]: for x in np.nditer(a, op_flags=['readwrite'], flags=['external_loop']):
...: print(x, type(x), x.shape)
...: x[...] = 2 * x
[ 0 16 32 48 64 80] <class 'numpy.ndarray'> (6,)
Compare this simple row iteration (on a 2d array):
In [675]: for x in a:
...: print(x, x.shape)
...: x[:] = 2 * x
...:
[ 0 8 16] (3,)
[24 32 40] (3,)
this iterates on the rows (1st dim) of a. x is then a 1d array, and can be modified with either x[:] = ... or x[...] = ....
Read and experiment with this nditer page all the way through to the end. By itself, nditer is not that useful in python. It does not speed up iteration - not until you port your code to cython.np.ndindex is one of the few non-compiled numpy functions that uses nditer.

The ellipsis ... means as many : as needed.
For people who don't have time, here is a simple example:
In [64]: X = np.reshape(np.arange(9), (3,3))
In [67]: Y = np.reshape(np.arange(2*3*4), (2,3,4))
In [70]: X
Out[70]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [71]: X[:,0]
Out[71]: array([0, 3, 6])
In [72]: X[...,0]
Out[72]: array([0, 3, 6])
In [73]: Y
Out[73]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [74]: Y[:,0]
Out[74]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15]])
In [75]: Y[...,0]
Out[75]:
array([[ 0, 4, 8],
[12, 16, 20]])
In [76]: X[0,...,0]
Out[76]: array(0)
In [77]: Y[0,...,0]
Out[77]: array([0, 4, 8])
This makes it easy to manipulate only one dimension at a time.
One thing - You can have only one ellipsis in any given indexing expression, or your expression would be ambiguous about how many : should be put in each.

I believe a very good parallel (that most people are maybe used to) is to think that way:
import numpy as np
random_array = np.random.rand(2, 2, 2, 2)
In such case, [:, :, :, 0] and [..., 0] are the same.
You can use to analyse only an specific dimension, say you have a batch of 50 128x128 RGB image (50, 3, 128, 128), if you want to slice a piece of it in every image at every color channel, you could either do image[:,:,50:70, 20:80] or image[...,50:70,20:80]
Just be aware that you can't use it more than once in the statement like [...,0,...] is invalid.

"Multiply" 1d numpy array with a smaller one and sum the result

I want to "multiply" (for lack of better description) a numpy array X of size M with a smaller numpy array Y of size N, for every N elements in X. Then, I want to sum the resulting array (almost like a dotproduct).
I hope the example makes it more clear:
Example
X = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Y = [1,2,3]
Z = mymul(X, Y)
= [0*1, 1*2, 2*3, 3*1, 4*2, 5*3, 6*1, 7*2, 8*3, 9*1]
= [ 0, 2, 6, 3, 8, 15, 6, 14, 24, 9]
result = sum(Z) = 87
X and Y can be of varying lengths and Y is always smaller than X, but not necessarily divisible (e.g. M % N != 0)
I have some solutions but they are quite slow. I'm hoping there is a faster way to do this.
import numpy as np
X = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int)
Y = np.array([1,2,3], dtype=int)
# these work but are slow for large X, Y
# simple for-loop
t = 0
for i in range(len(X)):
t += X[i] * Y[i % len(Y)]
print(t) #87
# extend Y M/N times so np.dot can be applied
Ytiled = np.tile(Y, int(np.ceil(len(X) / len(Y))))[:len(X)]
t = np.dot(X, Ytiled)
print(t) #87

Resize Y to same length as X and then use matrix-multiplication -
In [52]: np.dot(X, np.resize(Y,len(X)))
Out[52]: 87
Alternative to using np.resize would be with tiling. Hence, np.tile(Y,(m+n-1)//n)[:m] for m,n = len(X), len(Y), could replace np.resize(Y,len(X)) for a faster one.
Another without resizing Y to achieve memory-efficiency -
In [79]: m,n = len(X), len(Y)
In [80]: s = n*(m//n)
In [81]: X2D = X[:s].reshape(-1,n)
In [82]: X2D.dot(Y).sum() + np.dot(X[s:],Y[:m-s])
Out[82]: 87
Alternatively, we can use np.einsum('ij,j->',X2D,Y) to replace X2D.dot(Y).sum().

You can use convolve (documentation):
np.convolve(X, Y[::-1], 'same')[::len(Y)].sum()
Remember to reverse the second array.

Two dimensional function not returning array of values?

I'm trying to plot a 2-dimensional function (specifically, a 2-d Laplace solution). I defined my function and it returns the right value when I put in specific numbers, but when I try running through an array of values (x,y below), it still returns only one number. I tried with a random function of x and y (e.g., f(x,y) = x^2 + y^2) and it gives me an array of values.
def V_func(x,y):
a = 5
b = 4
Vo = 4
n = np.arange(1,100,2)
sum_list = []
for indx in range(len(n)):
sum_term = (1/n[indx])*(np.cosh(n[indx]*np.pi*x/a))/(np.cosh(n[indx]*np.pi*b/a))*np.sin(n[indx]*np.pi*y/a)
sum_list = np.append(sum_list,sum_term)
summation = np.sum(sum_list)
V = 4*Vo/np.pi * summation
return V
x = np.linspace(-4,4,50)
y = np.linspace(0,5,50)
V_func(x,y)
Out: 53.633709914177224

Try this:
def V_func(x,y):
a = 5
b = 4
Vo = 4
n = np.arange(1,100,2)
# sum_list = []
sum_list = np.zeros(50)
for indx in range(len(n)):
sum_term = (1/n[indx])*(np.cosh(n[indx]*np.pi*x/a))/(np.cosh(n[indx]*np.pi*b/a))*np.sin(n[indx]*np.pi*y/a)
# sum_list = np.append(sum_list,sum_term)
sum_list += sum_term
# summation = np.sum(sum_list)
# V = 4*Vo/np.pi * summation
V = 4*Vo/np.pi * sum_list
return V

Define a pair of arrays:
In [6]: x = np.arange(3); y = np.arange(10,13)
In [7]: x,y
Out[7]: (array([0, 1, 2]), array([10, 11, 12]))
Try a simple function of the 2
In [8]: x + y
Out[8]: array([10, 12, 14])
Since they have the same size, they can be summed (or otherwise combined) elementwise. The result has the same shape as the 2 inputs.
Now try 'broadcasting'. x[:,None] has shape (3,1)
In [9]: x[:,None] + y
Out[9]:
array([[10, 11, 12],
[11, 12, 13],
[12, 13, 14]])
The result is (3,3), the first 3 from the reshaped x, the second from y.
I can generate the pair of arrays with meshgrid:
In [10]: I,J = np.meshgrid(x,y,sparse=True, indexing='ij')
In [11]: I
Out[11]:
array([[0],
[1],
[2]])
In [12]: J
Out[12]: array([[10, 11, 12]])
In [13]: I + J
Out[13]:
array([[10, 11, 12],
[11, 12, 13],
[12, 13, 14]])
Note the added parameters in meshgrid. So that's how we go about generating 2d values from a pair of 1d arrays.
Now look at what sum does. As you use it in the function:
In [14]: np.sum(I + J)
Out[14]: 108
the result is a scalar. See the docs. If I specify an axis I get an array.
In [15]: np.sum(I + J, axis=0)
Out[15]: array([33, 36, 39])
If you gave V_func the right x and y, sum_list could be a 3d array. That axis-less sum reduces it to a scalar.
In code like this you need to keep track of array shapes. Include test prints if needed; don't just assume anything; test it. Pay attention to how dimensions grow and shrink as they pass through various operations.

Elementwise multiplication of numpy matrix and column array

Using numpy, I want to multiple a matrix x by a column array y, elementwise:
x = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y = numpy.array([1, 2, 3])
z = numpy.multiply(x, y)
print z
This gives the output as if y is a row array:
[[ 1 4 9]
[ 4 10 18]
[ 7 16 27]]
However, I want the output as if y is a column array:
[[ 1 2 3]
[ 8 10 12]
[21 24 27]]
So how can I manipulate y to achieve this? If I use:
y = numpy.transpose(y)
then y remains the same shape.

Enclose it in another list to make it 2D:
>>> y2 = numpy.transpose([y])
>>> y2
array([[1],
[2],
[3]])
>>> numpy.multiply(x, y2)
array([[ 1, 2, 3],
[ 8, 10, 12],
[21, 24, 27]])

The reason you can't transpose y is because it's initialized as a 1-D array. Transposing an array only makes sense in two (or more) dimensions.
To get around these mixed-dimension issues, numpy actually provides a set of convenience functions to sanitize your inputs:
y = np.array([1, 2, 3])
y1 = np.atleast_1d(y) # Converts array to 1-D if less than that
y2 = np.atleast_2d(y) # Converts array to 2-D if less than that
y3 = np.atleast_3d(y) # Converts array to 3-D if less than that
I also think np.column_stack falls under this convenience category, as it puts together 1-D and 2-D arrays as columns like you would expect, rather than having to figure out the right series of reshapes and stacks.
y1 = np.array([1, 2, 3])
y2 = np.array([2, 4, 6])
y3 = np.array([[2, 6], [2, 4], [7, 7]])
y = np.column_stack((y1, y2, y3))
I think these functions aren't as well known as they should be, and I find them much easier, more flexible, and safer than manually fiddling with reshape or array dimensions. They also avoid making copies when possible, which can be a small performance speedup.
To answer your question, you should use np.atleast_2d to convert your array to a 2-D array, then transpose it.
y = np.atleast_2d(y).T
The other way to quickly do it without worrying about y is to transpose x then transpose the result back.
z = (x.T * y).T
Though this can obfuscate the intent of the code. It is probably faster though if performance is important.
If performance is important, that can inform which method you want to use. Some timings on my computer:
%timeit x * np.atleast_2d(y).T
100000 loops, best of 3: 7.98 us per loop
%timeit (x.T*y).T
100000 loops, best of 3: 3.27 us per loop
%timeit x * np.transpose([y])
10000 loops, best of 3: 20.2 us per loop
%timeit x * y.reshape(-1, 1)
100000 loops, best of 3: 3.66 us per loop

You can use reshape:
y = y.reshape(-1,1)

The y variable has a shape of (3,). If you construct it this way:
y = numpy.array([1, 2, 3], ndmin=2)
...it will have a shape of (1,3), which you can transpose to get the result you want:
y = numpy.array([1, 2, 3], ndmin=2).T
z = numpy.multiply(x, y)

Euclidean distances between several images and one base image

I have a matrix X of dimensions (30x8100) and another one Y of dimensions (1x8100). I want to generate an array containing the difference between them (X[1]-Y, X[2]-Y,..., X[30]-Y)
Can anyone help?

All you need for that is
X - Y
Since several people have offered answers that seem to try to make the shapes match manually, I should explain:
Numpy will automatically expand Y's shape so that it matches with that of X. This is called broadcasting, and it usually does a very good job of guessing what should be done. In ambiguous cases, an axis keyword can be applied to tell it which direction to do things. Here, since Y has a dimension of length 1, that is the axis that is expanded to be length 30 to match with X's shape.
For example,
In [87]: import numpy as np
In [88]: n, m = 3, 5
In [89]: x = np.arange(n*m).reshape(n,m)
In [90]: y = np.arange(m)[None,...]
In [91]: x.shape
Out[91]: (3, 5)
In [92]: y.shape
Out[92]: (1, 5)
In [93]: (x-y).shape
Out[93]: (3, 5)
In [106]: x
Out[106]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [107]: y
Out[107]: array([[0, 1, 2, 3, 4]])
In [108]: x-y
Out[108]:
array([[ 0, 0, 0, 0, 0],
[ 5, 5, 5, 5, 5],
[10, 10, 10, 10, 10]])
But this is not really a euclidean distance, as your title seems to suggest you want:
df = np.asarray(x - y) # the difference between the images
dst = np.sqrt(np.sum(df**2, axis=1)) # their euclidean distances

use array and use numpy broadcasting in order to subtract it from Y
init the matrix:
>>> from numpy import *
>>> a = array([[1,2,3],[4,5,6]])
Accessing the second row in a:
>>> a[1]
array([4, 5, 6])
Subtract array from Y
>>> Y = array([3,9,0])
>>> a - Y
array([[-2, -7, 3],
[ 1, -4, 6]])

Just iterate rows from your numpy array and you can actually just subtract them and numpy will make a new array with the differences!
import numpy as np
final_array = []
#X is a numpy array that is 30X8100 and Y is a numpy array that is 1X8100
for row in X:
output = row - Y
final_array.append(output)
output will be your resulting array of X[0] - Y, X[1] - Y etc. Now your final_array will be an array with 30 arrays inside, each that have the values of the X-Y that you need! Simple as that. Just make sure you convert your matrices to a numpy arrays first
Edit: Since numpy broadcasting will do the iteration, all you need is one line once you have your two arrays:
final_array = X - Y
And then that is your array with the differences!

a1 = numpy.array(X) #make sure you have a numpy array like [[1,2,3],[4,5,6],...]
a2 = numpy.array(Y) #make sure you have a 1d numpy array like [1,2,3,...]
a2 = [a2] * len(a1[0]) #make a2 as wide as a1
a2 = numpy.array(zip(*a2)) #transpose it (a2 is now same shape as a1)
print a1-a2 #idiomatic difference between a1 and a2 (or X and Y)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

When would you use `flatten()` instead of`reshape(-1)`? - python

Often, when numpy has seemingly duplicate functions, there often ends up being some sort of unique purpose for one or the other. I am trying to figure out if there are any situations where flatten() should be used instead of reshape(-1)

Related

What does '...' mean in a python slice [duplicate]

"Multiply" 1d numpy array with a smaller one and sum the result

Two dimensional function not returning array of values?

Elementwise multiplication of numpy matrix and column array

Euclidean distances between several images and one base image

Categories

Resources