I am trying to understand the signature functionality in numpy.vectorize. I have some examples but did not help much in the understanding.
>>import scipy.stats
>>pearsonr = np.vectorize(scipy.stats.pearsonr, signature='(n),(n)->(),()')
>>pearsonr([[0, 1, 2, 3]], [[1, 2, 3, 4], [4, 3, 2, 1]])
(array([ 1., -1.]), array([ 0., 0.]))
>>convolve = np.vectorize(np.convolve, signature='(n),(m)->(k)')
>>convolve(np.eye(4), [1, 2, 1])
array([[1., 2., 1., 0., 0., 0.],
[0., 1., 2., 1., 0., 0.],
[0., 0., 1., 2., 1., 0.],
[0., 0., 0., 1., 2., 1.]])
>>>import numpy as np
>>>qr = np.vectorize(np.linalg.qr, signature='(m,n)->(m,k),(k,n)')
>>>qr(np.random.normal(size=(1, 3, 2)))
(array([[-0.31622777, -0.9486833 ],
[-0.9486833 , 0.31622777]]),
array([[-3.16227766, -4.42718872, -5.69209979],
[ 0. , -0.63245553, -1.26491106]]))
>>>import scipy
>>>logm = np.vectorize(scipy.linalg.logm, signature='(m,m)->(m,m)')
>>>logm(np.random.normal(size=(1, 3, 2)))
array([[[ 1.08226288, -2.29544602],
[ 2.12599894, -1.26335203]]])
Can you please someone explain the functionality-syntax of the signatures
signature='(n),(n)->(),()'
signature='(n),(m)->(k)'
signature='(m,n)->(m,k),(k,n)'
signature='(m,m)->(m,m)'
used in the aforementioned examples? If we didn't use the signatures, how the examples would have been implemented in a more easy-naive way?
Any help is highly appreciated.
The aforementioned examples can be found here and here.
I think the explanation would be clearer if we knew the 'signature' of the individual functions - what they expect, and what they produce. But I can make some deductions from the code you show.
>>pearsonr = np.vectorize(scipy.stats.pearsonr, signature='(n),(n)->(),()')
>>pearsonr([[0, 1, 2, 3]], [[1, 2, 3, 4], [4, 3, 2, 1]])
(array([ 1., -1.]), array([ 0., 0.]))
This is called with a (4,) and (2,4) arrays (well, lists that become such arrays). They broadcast together to (2,4). The stats function is then called twice, once for each row of the pair, getting two (4,) arrays, and returning 2 scalar values (maybe the mean and std?)
>>convolve = np.vectorize(np.convolve, signature='(n),(m)->(k)')
>>convolve(np.eye(4), [1, 2, 1])
array([[1., 2., 1., 0., 0., 0.],
[0., 1., 2., 1., 0., 0.],
[0., 0., 1., 2., 1., 0.],
[0., 0., 0., 1., 2., 1.]])
This called with (4,4) and (3,) arrays. I think convolve gets called 4 times, once for each row of the eye, and getting the same [1,2,1] each time. The result is a 4 row array (with 6 columns - determined by convolve itself, not vectorize.
>>>import numpy as np
>>>qr = np.vectorize(np.linalg.qr, signature='(m,n)->(m,k),(k,n)')
>>>qr(np.random.normal(size=(1, 3, 2)))
(array([[-0.31622777, -0.9486833 ],
[-0.9486833 , 0.31622777]]),
array([[-3.16227766, -4.42718872, -5.69209979],
[ 0. , -0.63245553, -1.26491106]]))
Signature: np.linalg.qr(a, mode='reduced')
a : array_like, shape (M, N)
'reduced' : returns q, r with dimensions (M, K), (K, N) (default)
vectorize signature just repeats the information in the docs.
a is (1,3,2) shape array; so qr is called once (1st dimension), with a (3,2) array. The result is 2 arrays, (2,k) and (k,3) shapes. When I run it I get an added size 1 dimension (1,2,3) and (1,2,2). Different numbers because of random:
In [120]: qr = np.vectorize(np.linalg.qr, signature='(m,n)->(m,k),(k,n)')
...: qr(np.random.normal(size=(1, 3,2)))
Out[120]:
(array([[[-0.61362528, 0.09161174],
[ 0.63682861, -0.52978942],
[-0.46681188, -0.84316692]]]),
array([[[-0.65301725, -1.00494992],
[ 0. , 0.8068886 ]]]))
>>>import scipy
>>> logm = np.vectorize(scipy.linalg.logm, signature='(m,m)->(m,m)')
>>>logm(np.random.normal(size=(1, 3, 2)))
array([[[ 1.08226288, -2.29544602],
[ 2.12599894, -1.26335203]]])
scipy.linalg.logm expects square array, and returns the same.
Calling logm with a (1,3,2) produces an error, because (3,2) is not a square array:
ValueError: inconsistent size for core dimension 'm': 2 vs 3
Calling scipy.linalg.logm directly produces the same error, worded differently:
linalg.logm(np.random.normal(size=(3, 2)))
ValueError: expected square array_like input
When I say the function is called twice, or something like that, I'm ignoring the test call that's used to determine the return dtype.
Related
So there is a csv file I'm reading where I'm focusing on col3 where the rows have the values of different lengths where initially it was being read as a type str but was fixed using pd.eval.
df = pd.read_csv('datafile.csv', converters={'col3': pd.eval})
row e.g. [0, 100, -200, 300, -150...]
There are many rows of different sizes and I want to calculate the element wise average, where I have followed this solution.
I first ran into the Numpy VisibleDeprecationWarning error which I fixed using this.
But for the last step of the solution using np.nanmean I'm running into a new error which is
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
My code looks like this so far:
import pandas as pd
import numpy as np
import itertools
df = pd.read_csv('datafile.csv', converters={'col3': pd.eval})
datafile = df[(df['col1'] == 'Red') & (df['col2'] == Name) & ((df['col4'] == 'EX') | (df['col5'] == 'EX'))]
np.warnings.filterwarnings('ignore', category=np.VisibleDeprecationWarning)
ar = np.array(list(itertools.zip_longest(df['col3'], fillvalue=np.nan)))
print(ar)
np.nanmean(ar,axis=1)
the arrays print like this
And the error is pointing towards the last line
The error I can see if pointing towards the arrays being of type object but I'm not sure how to fix it.
Make a ragged array:
In [23]: arr = np.array([np.arange(5), np.ones(5),np.zeros(3)],object)
In [24]: arr
Out[24]:
array([array([0, 1, 2, 3, 4]), array([1., 1., 1., 1., 1.]),
array([0., 0., 0.])], dtype=object)
Note the shape and dtype.
Try to use mean on it:
In [25]: np.mean(arr)
Traceback (most recent call last):
Input In [25] in <cell line: 1>
np.mean(arr)
File <__array_function__ internals>:180 in mean
File /usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3432 in mean
return _methods._mean(a, axis=axis, dtype=dtype,
File /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:180 in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)
ValueError: operands could not be broadcast together with shapes (5,) (3,)
Apply mean to each element array works:
In [26]: [np.mean(a) for a in arr]
Out[26]: [2.0, 1.0, 0.0]
Trying to use zip_longest:
In [27]: import itertools
In [28]: list(itertools.zip_longest(arr))
Out[28]:
[(array([0, 1, 2, 3, 4]),),
(array([1., 1., 1., 1., 1.]),),
(array([0., 0., 0.]),)]
No change. We can use it by unpacking the arr - but it has padded the arrays in the wrong way:
In [29]: list(itertools.zip_longest(*arr))
Out[29]: [(0, 1.0, 0.0), (1, 1.0, 0.0), (2, 1.0, 0.0), (3, 1.0, None), (4, 1.0, None)]
zip_longest can be used to pad lists, but it takes more thought than this.
If we make an array from that list:
In [35]: np.array(list(itertools.zip_longest(*arr,fillvalue=np.nan)))
Out[35]:
array([[ 0., 1., 0.],
[ 1., 1., 0.],
[ 2., 1., 0.],
[ 3., 1., nan],
[ 4., 1., nan]])
and transpose it, we can take the nanmean:
In [39]: np.array(list(itertools.zip_longest(*arr,fillvalue=np.nan))).T
Out[39]:
array([[ 0., 1., 2., 3., 4.],
[ 1., 1., 1., 1., 1.],
[ 0., 0., 0., nan, nan]])
In [40]: np.nanmean(_, axis=1)
Out[40]: array([2., 1., 0.])
I am working with a thematic raster of land use classes. The goal is to split the raster into smaller tiles of a given size. For example, I have a raster of 1490 pixels and I want to split it into tiles of 250x250 pixels. To get tiles of equal size, I would want to increase the width of the raster to 1500 pixels to fit in exactly 6 tiles. To do so, I need to increase the size of the raster by 10 pixels.
I am currently opening the raster with the rasterio library, which returns a NumPy ndarray. Is there a function to add a buffer around this array? The goal would be something like this:
import numpy as np
a = np.array([
[1,4,5],
[4,5,5],
[1,2,2]
])
a_with_buffer = a.buffer(a, 1) # 2nd argument refers to the buffer size
Then a_with_buffer would look as following:
[0,0,0,0,0]
[0,1,4,5,0],
[0,4,5,5,0],
[0,1,2,2,0],
[0,0,0,0,0]
You can use np.pad:
>>> np.pad(a, 1)
array([[0, 0, 0, 0, 0],
[0, 1, 4, 5, 0],
[0, 4, 5, 5, 0],
[0, 1, 2, 2, 0],
[0, 0, 0, 0, 0]])
you can create np.zeros then insert a in the index what you want like below.
Try this:
>>> a = np.array([[1,4,5],[4,5,5],[1,2,2]])
>>> b = np.zeros((5,5))
>>> b[1:1+a.shape[0],1:1+a.shape[1]] = a
>>> b
array([[0., 0., 0., 0., 0.],
[0., 1., 4., 5., 0.],
[0., 4., 5., 5., 0.],
[0., 1., 2., 2., 0.],
[0., 0., 0., 0., 0.]])
I have big 3D matrices indicating the position of agents in a 3D space. The values of the matrix are 0 if there is not agent on it and 1 if there is an agent on it.
Then, my problem is that I want the agents to 'grow' in the sense that I want them to be determined by lets say a cube (3x3x3) of ones. If already gotten a way to do it but I'm having trouble when the agent is close to the borders.
For example, I have a matrix of positions 100x100x100, if I know my agent is at position (x, y, z) I will do:
positions_matrix = numpy.zeros((100, 100, 100))
positions_matrix[x - 1: x + 2, y - 1: y + 2, z - 1: z + 2] += numpy.ones((3, 3, 3))
Of course in my real code I'm looping over more positions but this is basically it. This works but the problem comes when the agent is to close to the border in which the sum can't be made because the resultant matrix from slicing would be smaller than the ones matrix.
Any idea how to solve it or if numpy or any other package have an implementation for this? I couldn't manage to find it although I'm pretty sure I'm not the first one to face against this.
A slightly more programmatic way of solving the problem:
import numpy as np
m = np.zeros((100, 100, 100))
slicing = tuple(
slice(max(0, x_i - 1), min(x_i + 2, d - 1))
for x_i, d in zip((x, y, z), m.shape))
ones_shape = tuple(s.stop - s.start for s in slicing)
m[slicing] += np.ones(ones_shape)
But it is otherwise the same as the accepted answer.
You should cut at the lower and upper bounds, using something like:
import numpy as np
m = np.zeros((100, 100, 100))
x_min, x_max = np.max([0, x-1]), np.min([x+2, m.shape[0]-1])
y_min, y_max = np.max([0, y-1]), np.min([y+2, m.shape[1]-1])
z_min, z_max = np.max([0, z-1]), np.min([z+2, m.shape[2]-1])
m[x_min:x_max, y_min:y_max, z_min:z_max] += np.ones((x_max-x_min, y_max-y_min, z_max-z_min))
There is a solution using np.put, and its 'clip' option.
It just requires a little gymnastics because the function requires indices in the flattened matrix; fortunately, the function np.ravel_multi_index does the job:
import itertools
import numpy as np
x, y, z = 2, 0, 4
positions_matrix = np.zeros((100,100,100))
indices = np.array( list( itertools.product( (x-1, x, x+1), (y-1, y, y+1), (z-1, z, z+1)) ))
flat_indices = np.ravel_multi_index(indices.T, positions_matrix.shape, mode='clip')
positions_matrix.put(flat_indices, 1+positions_matrix.take(flat_indices))
# positions_matrix[2,1,4] is now 1.0
The nice thing about this solution is that you can play with other modes, for instance 'wrap' (if your agents live on a donut ;-) or in a periodic space).
I'll explain how it works on a smaller 2D matrix:
import itertools
import numpy as np
positions_matrix = np.zeros((8,8))
ones = np.ones((3,3))
x, y = 0, 4
indices = np.array( list( itertools.product( (x-1, x, x+1), (y-1, y, y+1) )))
# array([[-1, 3],
# [-1, 4],
# [-1, 5],
# [ 0, 3],
# [ 0, 4],
# [ 0, 5],
# [ 1, 3],
# [ 1, 4],
# [ 1, 5]])
flat_indices = np.ravel_multi_index(indices.T, positions_matrix.shape, mode='clip')
# array([ 3, 4, 5, 3, 4, 5, 11, 12, 13])
positions_matrix.put(flat_indices, ones, mode='clip')
# positions_matrix is now:
# array([[0., 0., 0., 1., 1., 1., 0., 0.],
# [0., 0., 0., 1., 1., 1., 0., 0.],
# [0., 0., 0., 0., 0., 0., 0., 0.],
# [ ...
By the way, in this case mode='clip' was redundant for put.
Well, I just cheated put does an assignment. The +=1 requires both take and put:
positions_matrix.put(flat_indices, ones.flat + positions_matrix.take(flat_indices))
# notice that ones has to be flattened, or alternatively the result of take could be reshaped (3,3)
# positions_matrix is now:
# array([[0., 0., 0., 2., 2., 2., 0., 0.],
# [0., 0., 0., 2., 2., 2., 0., 0.],
# [0., 0., 0., 0., 0., 0., 0., 0.],
# [ ...
There is one important difference in this solution compared to the others: the ones matrix is always (3,3),
which may or may not be an advantage.
The trick is in this flat_indices list, that has repeating entries (result of clip).
It may thus require some precautions, if you add a non constant sub-matrix at max indices:
x, y = 1, 7
values = 1 + np.arange(9)
indices = np.array( list( itertools.product( (x-1, x, x+1), (y-1, y, y+1) )))
flat_indices = np.ravel_multi_index(indices.T, positions_matrix.shape, mode='clip')
positions_matrix.put(flat_indices, values, mode='clip')
# positions_matrix is now:
# array([[0., 0., 0., 2., 2., 2., 1., 3.],
# [0., 0., 0., 2., 2., 2., 4., 6.],
# [0., 0., 0., 0., 0., 0., 7., 9.],
... you were probably expecting the last column to be 2 5 8.
Currently, you could work on flat_indices, for example by putting -1 in the out-of-bounds locations.
But it'd all be easier if np.put accepted non-flat indices, or if there was a clip mode='ignore'.
I'm trying to pad a numpy array, and I cannot seem to find the right approach from the documentation for numpy. I have an array:
a = array([2, 1, 3, 5, 7])
This represents the index for an array I wish to create. So at index value 2 or 1 or 3 etc I would like to have a one in the array, and everywhere else in the target array, to be padded with zeros. Sort of like an array mask. I would also like to specify the overall length of the target array, l. So my ideal function would like something like:
>>> foo(a,l)
array([0,1,1,1,0,1,0,1,0,0,0]
, where l=10 for the above example.
EDIT:
So I wrote this function:
def padwithones(a,l) :
p = np.zeros(l)
for i in a :
p = np.insert(p,i,1)
return p
Which gives:
Out[19]:
array([ 0., 1., 0., 1., 1., 1., 0., 1., 0., 0., 0., 0., 0.,
0., 0.])
Which isn't correct!
What you're looking for is basically a one-hot array:
def onehot(foo, l):
a = np.zeros(l, dtype=np.int32)
a[foo] = 1
return a
Example:
In [126]: onehot([2, 1, 3, 5, 7], 10)
Out[126]: array([0, 1, 1, 1, 0, 1, 0, 1, 0, 0])
In NumPy, how can you efficiently make a 1-D object into a 2-D object where the singleton dimension is inferred from the current object (i.e. a list should go to either a 1xlength or lengthx1 vector)?
# This comes from some other, unchangeable code that reads data files.
my_list = [1,2,3,4]
# What I want to do:
my_numpy_array[some_index,:] = numpy.asarray(my_list)
# The above doesn't work because of a broadcast error, so:
my_numpy_array[some_index,:] = numpy.reshape(numpy.asarray(my_list),(1,len(my_list)))
# How to do the above without the call to reshape?
# Is there a way to directly convert a list, or vector, that doesn't have a
# second dimension, into a 1 by length "array" (but really it's still a vector)?
In the most general case, the easiest way to add extra dimensions to an array is by using the keyword None when indexing at the position to add the extra dimension. For example
my_array = numpy.array([1,2,3,4])
my_array[None, :] # shape 1x4
my_array[:, None] # shape 4x1
Why not simply add square brackets?
>> my_list
[1, 2, 3, 4]
>>> numpy.asarray([my_list])
array([[1, 2, 3, 4]])
>>> numpy.asarray([my_list]).shape
(1, 4)
.. wait, on second thought, why is your slice assignment failing? It shouldn't:
>>> my_list = [1,2,3,4]
>>> d = numpy.ones((3,4))
>>> d
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
>>> d[0,:] = my_list
>>> d[1,:] = numpy.asarray(my_list)
>>> d[2,:] = numpy.asarray([my_list])
>>> d
array([[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]])
even:
>>> d[1,:] = (3*numpy.asarray(my_list)).T
>>> d
array([[ 1., 2., 3., 4.],
[ 3., 6., 9., 12.],
[ 1., 2., 3., 4.]])
import numpy as np
a = np.random.random(10)
sel = np.at_least2d(a)[idx]
What about expand_dims?
np.expand_dims(np.array([1,2,3,4]), 0)
has shape (1,4) while
np.expand_dims(np.array([1,2,3,4]), 1)
has shape (4,1).
You can always use dstack() to replicate your array:
import numpy
my_list = array([1,2,3,4])
my_list_2D = numpy.dstack((my_list,my_list));