Why does meshgrid have one more dimension than input? - python

I am sorry if this is obvious, but I am having trouble understanding why it seems that np.meshgrid produces array who's shape is more than the input:
grid = np.meshgrid(
np.linspace(-1, 1, 5),
np.linspace(-1, 1, 4),
np.linspace(-1, 1, 3), indexing='ij')
np.shape(grid)
(3, 5, 4, 3)
To me it should have been: (5, 4, 3)
or
grid = np.meshgrid(
np.linspace(-1, 1, 5),
np.linspace(-1, 1, 4), indexing='ij')
np.shape(grid)
(2, 5, 4)
To me it should have been: (5, 4)
I would be very grateful if somebody could explain me that.... Thanks a lot!

In [92]: grid = np.meshgrid(
...: np.linspace(-1, 1, 5),
...: np.linspace(-1, 1, 4), indexing='ij')
...:
In [93]: grid
Out[93]:
[array([[-1. , -1. , -1. , -1. ],
[-0.5, -0.5, -0.5, -0.5],
[ 0. , 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5, 0.5],
[ 1. , 1. , 1. , 1. ]]),
array([[-1. , -0.33333333, 0.33333333, 1. ],
[-1. , -0.33333333, 0.33333333, 1. ],
[-1. , -0.33333333, 0.33333333, 1. ],
[-1. , -0.33333333, 0.33333333, 1. ],
[-1. , -0.33333333, 0.33333333, 1. ]])]
grid is a list with two arrays. The first array has numbers from the first argument (the one with 5 elements). The second has numbers from the second argument.
Why should np.shape(grid) is (5,4)? What layout were you expecting?
np.shape(grid) actually does np.array(grid).shape, which is why there's an added dimension.

Related

How to initialize locations of numpy array using dictionary keys and values?

I have the following numpy array which is basically a 3 channel image:
arr = np.zeros((6, 4, 3), dtype=np.float32)
# dictionary of values, key is array location
values_of_channel_0 = {
(0, 2) : 1,
(1, 0) : 1,
(1, 3) : 5,
(2, 1) : 2,
(2, 2) : 3,
(2, 3) : 1,
(3, 0) : 1,
(3, 2) : 2,
(4, 0) : 2,
(4, 2) : 20,
(5, 0) : 1,
(5, 2) : 10,
(5, 3) : 1
}
I am trying to find the most elegant way to set all the values of the 3rd channel according to the dictionary. Here is what I tried:
locations = list(values_of_channel_0.keys())
values = list(values_of_channel_0.values())
arr[lc,0] = values # trying to set the 3rd channel
But this fails.
Is there a way in which this can be done without looping over keys and values?
What's wrong with a simple loop? Something will have to iterate over the key/value-pairs you provide in your dictionary in any case?
import numpy as np
arr = np.zeros((6, 4, 3), dtype=np.float32)
# dictionary of values, key is array location
values_of_channel_0 = {
(0, 2) : 1,
(1, 0) : 1,
(1, 3) : 5,
(2, 1) : 2,
(2, 2) : 3,
(2, 3) : 1,
(3, 0) : 1,
(3, 2) : 2,
(4, 0) : 2,
(4, 2) : 20,
(5, 0) : 1,
(5, 2) : 10,
(5, 3) : 1
}
for (a, b), v in values_of_channel_0.items():
arr[a, b, 0] = v
print(arr)
Result:
[[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 1. 0. 0.]
[ 0. 0. 0.]]
[[ 1. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 5. 0. 0.]]
[[ 0. 0. 0.]
[ 2. 0. 0.]
[ 3. 0. 0.]
[ 1. 0. 0.]]
[[ 1. 0. 0.]
[ 0. 0. 0.]
[ 2. 0. 0.]
[ 0. 0. 0.]]
[[ 2. 0. 0.]
[ 0. 0. 0.]
[20. 0. 0.]
[ 0. 0. 0.]]
[[ 1. 0. 0.]
[ 0. 0. 0.]
[10. 0. 0.]
[ 1. 0. 0.]]]
If you insist on not looping for the assignment, you can construct a data structure that can be assigned at once:
channel_0 = [[values_of_channel_0[b, a] if (b, a) in values_of_channel_0 else 0 for a in range(4)] for b in range(6)]
arr[..., 0] = channel_0
But this is clearly rather pointless and not even more efficient. If you have some control over how values_of_channel_0 is constructed, you could consider constructing it as a nested list or array of the right dimensions immediately, to allow for this type of assignment.
Users #mechanicpig and #michaelszczesny offer a very clean alternative (which will be more efficient since it relies on the efficient implementation of zip()):
arr[(*zip(*values_of_channel_0), 0)] = list(values_of_channel_0.values())
Edit: you asked for an explanation of the lefthand side.
This hinges on the unpacking operator *. *values_of_channel_0 spreads all the keys of the dictionary values_of_channel_0 into a call to zip(). Since these keys are all 2-tuples of int, zip will yield two tuples, one with all the first coordinates (0, 1, 1, ...) and the second with the second coordinates (2, 0, 3, ...).
Since the call to zip() is also preceded by *, these two values will be spread to index arr[], together with a final coordinate 0. So this:
arr[(*zip(*values_of_channel_0), 0)] = ...
Is essentially the same as:
arr[((0, 1, 1, ...), (2, 0, 3, ...), 0)] = ...
That's a slice of arr with exactly the same number of elements as the dictionary, including all the elements with the needed coordinates. And so assigning list(values_of_channel_0.values()) to it works and has the desired effect of assigning the matching values to the correct coordinates.

Numpy: How to stack a single array into each row of a bigger array and turn it into a 2D array?

I have a numpy array named heartbeats with 100 rows. Each row has 5 elements.
I also have a single array named time_index with 5 elements.
I need to prepend the time index to each row of heartbeats.
heartbeats = np.array([
[-0.58, -0.57, -0.55, -0.39, -0.40],
[-0.31, -0.31, -0.32, -0.46, -0.46]
])
time_index = np.array([-2, -1, 0, 1, 2])
What I need:
array([-2, -0.58],
[-1, -0.57],
[0, -0.55],
[1, -0.39],
[2, -0.40],
[-2, -0.31],
[-1, -0.31],
[0, -0.32],
[1, -0.46],
[2, -0.46])
I only wrote two rows of heartbeats to illustrate.
Assuming you are using numpy, the exact output array you are looking for can be made by stacking a repeated version of time_index with the raveled version of heartbeats:
np.stack((np.tile(time_index, len(heartbeats)), heartbeats.ravel()), axis=-1)
Another approach, using broadcasting
In [13]: heartbeats = np.array([
...: [-0.58, -0.57, -0.55, -0.39, -0.40],
...: [-0.31, -0.31, -0.32, -0.46, -0.46]
...: ])
...: time_index = np.array([-2, -1, 0, 1, 2])
Make a target array:
In [14]: res = np.zeros(heartbeats.shape + (2,), heartbeats.dtype)
In [15]: res[:,:,1] = heartbeats # insert a (2,5) into a (2,5) slot
In [17]: res[:,:,0] = time_index[None] # insert a (5,) into a (2,5) slot
In [18]: res
Out[18]:
array([[[-2. , -0.58],
[-1. , -0.57],
[ 0. , -0.55],
[ 1. , -0.39],
[ 2. , -0.4 ]],
[[-2. , -0.31],
[-1. , -0.31],
[ 0. , -0.32],
[ 1. , -0.46],
[ 2. , -0.46]]])
and then reshape to 2d:
In [19]: res.reshape(-1,2)
Out[19]:
array([[-2. , -0.58],
[-1. , -0.57],
[ 0. , -0.55],
[ 1. , -0.39],
[ 2. , -0.4 ],
[-2. , -0.31],
[-1. , -0.31],
[ 0. , -0.32],
[ 1. , -0.46],
[ 2. , -0.46]])
[17] takes a (5,), expands it to (1,5), and then to (2,5) for the insert. Read up on broadcasting.
As an alternative way, you can repeat time_index by np.concatenate based on the specified times:
concatenated = np.concatenate([time_index] * heartbeats.shape[0])
# [-2 -1 0 1 2 -2 -1 0 1 2]
# result = np.dstack((concatenated, heartbeats.reshape(-1))).squeeze()
result = np.array([concatenated, heartbeats.reshape(-1)]).T
Using np.concatenate may be faster than np.tile. This solution is faster than Mad Physicist, but the fastest is using broadcasting as hpaulj's answer.

scipy sparse matrix division

I have been trying to divide a python scipy sparse matrix by a vector sum of its rows. Here is my code
sparse_mat = bsr_matrix((l_data, (l_row, l_col)), dtype=float)
sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])
However, it throws an error no matter how I try it
sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 381, in __div__
return self.__truediv__(other)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 427, in __truediv__
raise NotImplementedError
NotImplementedError
Anyone with an idea of where I am going wrong?
You can circumvent the problem by creating a sparse diagonal matrix from the reciprocals of your row sums and then multiplying it with your matrix. In the product the diagonal matrix goes left and your matrix goes right.
Example:
>>> a
array([[0, 9, 0, 0, 1, 0],
[2, 0, 5, 0, 0, 9],
[0, 2, 0, 0, 0, 0],
[2, 0, 0, 0, 0, 0],
[0, 9, 5, 3, 0, 7],
[1, 0, 0, 8, 9, 0]])
>>> b = sparse.bsr_matrix(a)
>>>
>>> c = sparse.diags(1/b.sum(axis=1).A.ravel())
>>> # on older scipy versions the offsets parameter (default 0)
... # is a required argument, thus
... # c = sparse.diags(1/b.sum(axis=1).A.ravel(), 0)
...
>>> a/a.sum(axis=1, keepdims=True)
array([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
[ 0. , 1. , 0. , 0. , 0. , 0. ],
[ 1. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.375 , 0.20833333, 0.125 , 0. , 0.29166667],
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
>>> (c # b).todense() # on Python < 3.5 replace c # b with c.dot(b)
matrix([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
[ 0. , 1. , 0. , 0. , 0. , 0. ],
[ 1. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.375 , 0.20833333, 0.125 , 0. , 0.29166667],
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
Something funny is going on. I have no problem performing the element division. I wonder if it's a Py2 issue. I'm using Py3.
In [1022]: A=sparse.bsr_matrix([[2,4],[1,2]])
In [1023]: A
Out[1023]:
<2x2 sparse matrix of type '<class 'numpy.int32'>'
with 4 stored elements (blocksize = 2x2) in Block Sparse Row format>
In [1024]: A.A
Out[1024]:
array([[2, 4],
[1, 2]], dtype=int32)
In [1025]: A.sum(axis=1)
Out[1025]:
matrix([[6],
[3]], dtype=int32)
In [1026]: A/A.sum(axis=1)
Out[1026]:
matrix([[ 0.33333333, 0.66666667],
[ 0.33333333, 0.66666667]])
or to try the other example:
In [1027]: b=sparse.bsr_matrix([[0, 9, 0, 0, 1, 0],
...: [2, 0, 5, 0, 0, 9],
...: [0, 2, 0, 0, 0, 0],
...: [2, 0, 0, 0, 0, 0],
...: [0, 9, 5, 3, 0, 7],
...: [1, 0, 0, 8, 9, 0]])
In [1028]: b
Out[1028]:
<6x6 sparse matrix of type '<class 'numpy.int32'>'
with 14 stored elements (blocksize = 1x1) in Block Sparse Row format>
In [1029]: b.sum(axis=1)
Out[1029]:
matrix([[10],
[16],
[ 2],
[ 2],
[24],
[18]], dtype=int32)
In [1030]: b/b.sum(axis=1)
Out[1030]:
matrix([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
....
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
The result of this sparse/dense is also dense, where as the c*b (c is the sparse diagonal) is sparse.
In [1039]: c*b
Out[1039]:
<6x6 sparse matrix of type '<class 'numpy.float64'>'
with 14 stored elements in Compressed Sparse Row format>
The sparse sum is a dense matrix. It is 2d, so there's no need to expand it dimensions. In fact if I try that I get an error:
In [1031]: A/(A.sum(axis=1)[:,None])
....
ValueError: shape too large to be a matrix.
Per this message, to keep the matrix sparse, you access the data values and use the (nonzero) indices:
sums = np.asarray(A.sum(axis=1)).squeeze() # this is dense
A.data /= sums[A.nonzero()[0]]
If dividing by the nonzero row mean instead of the sum, one can
nnz = A.getnnz(axis=1) # this is also dense
means = sums / nnz
A.data /= means[A.nonzero()[0]]

Error in scipy sparse diags matrix construction

When using scipy.sparse.spdiags or scipy.sparse.diags I have noticed want I consider to be a bug in the routines eg
scipy.sparse.spdiags([1.1,1.2,1.3],1,4,4).toarray()
returns
array([[ 0. , 1.2, 0. , 0. ],
[ 0. , 0. , 1.3, 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ]])
That is for positive diagonals it drops the first k data. One might argue that there is some grand programming reason for this and that I just need to pad with zeros. OK annoying as that may be, one can use scipy.sparse.diags which gives the correct result. However this routine has a bug that can't be worked around
scipy.sparse.diags([1.1,1.2],0,(4,2)).toarray()
gives
array([[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ],
[ 0. , 0. ]])
nice, and
scipy.sparse.diags([1.1,1.2],-2,(4,2)).toarray()
gives
array([[ 0. , 0. ],
[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2]])
but
scipy.sparse.diags([1.1,1.2],-1,(4,2)).toarray()
gives an error saying ValueError: Diagonal length (index 0: 2 at offset -1) does not agree with matrix size (4, 2). Obviously the answer is
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
and for extra random behaviour we have
scipy.sparse.diags([1.1],-1,(4,2)).toarray()
giving
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.1],
[ 0. , 0. ]])
Anyone know if there is a function for constructing diagonal sparse matrices that actually works?
Executive summary: spdiags works correctly, even if the matrix input isn't the most intuitive. diags has a bug that affects some offsets in rectangular matrices. There is a bug fix on scipy github.
The example for spdiags is:
>>> data = array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
>>> diags = array([0,-1,2])
>>> spdiags(data, diags, 4, 4).todense()
matrix([[1, 0, 3, 0],
[1, 2, 0, 4],
[0, 2, 3, 0],
[0, 0, 3, 4]])
Note that the 3rd column of data always appears in the 3rd column of the sparse. The other columns also line up. But they are omitted where they 'fall off the edge'.
The input to this function is a matrix, while the input to diags is a ragged list. The diagonals of the sparse matrix all have different numbers of values. So the specification has to accomodate this in one or other. spdiags does this by ignoring some values, diags by taking a list input.
The sparse.diags([1.1,1.2],-1,(4,2)) error is puzzling.
the spdiags equivalent does work:
In [421]: sparse.spdiags([[1.1,1.2]],-1,4,2).A
Out[421]:
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
The error is raised in this block of code:
for j, diagonal in enumerate(diagonals):
offset = offsets[j]
k = max(0, offset)
length = min(m + offset, n - offset)
if length <= 0:
raise ValueError("Offset %d (index %d) out of bounds" % (offset, j))
try:
data_arr[j, k:k+length] = diagonal
except ValueError:
if len(diagonal) != length and len(diagonal) != 1:
raise ValueError(
"Diagonal length (index %d: %d at offset %d) does not "
"agree with matrix size (%d, %d)." % (
j, len(diagonal), offset, m, n))
raise
The actual matrix constructor in the diags is:
dia_matrix((data_arr, offsets), shape=(m, n))
This is the same constructor that spdiags uses, but without any manipulation.
In [434]: sparse.dia_matrix(([[1.1,1.2]],-1),shape=(4,2)).A
Out[434]:
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
In dia format, the inputs are stored exactly as given by spdiags (complete with that matrix with extra values):
In [436]: M.data
Out[436]: array([[ 1.1, 1.2]])
In [437]: M.offsets
Out[437]: array([-1], dtype=int32)
As #user2357112 points out, length = min(m + offset, n - offset is wrong, producing 3 in the test case. Changing it to length = min(m + k, n - k) makes all cases for this (4,2) matrix work. But it fails with the transpose: diags([1.1,1.2], 1, (2, 4))
The correction, as of Oct 5, for this issue is:
https://github.com/pv/scipy-work/commit/529cbde47121c8ed87f74fa6445c05d71353eb6c
length = min(m + offset, n - offset, min(m,n))
With this fix, diags([1.1,1.2], 1, (2, 4)) works.

Recover data from matplotlib scatter plot [duplicate]

This question already has an answer here:
Extracting data from a scatter plot in Matplotlib
(1 answer)
Closed 2 years ago.
From a matplotlib scatter plot, I'm trying the recover the point data. Consider
from matplotlib import pyplot as plt
import numpy as np
fig = plt.figure()
x = np.linspace(0.0, 1.0, 5)
y = np.linspace(0.0, 1.0, 5)
plt.scatter(x, y)
ax = fig.get_children()[1]
pc = ax.get_children()[2]
for path in pc.get_paths():
print
print('path:')
print(path)
print
print('segments:')
for vert, code in path.iter_segments():
print(code, vert)
plt.show()
This yields
path:
Path(array([[ 0. , -0.5 ],
[ 0.13260155, -0.5 ],
[ 0.25978994, -0.44731685],
[ 0.35355339, -0.35355339],
[ 0.44731685, -0.25978994],
[ 0.5 , -0.13260155],
[ 0.5 , 0. ],
[ 0.5 , 0.13260155],
[ 0.44731685, 0.25978994],
[ 0.35355339, 0.35355339],
[ 0.25978994, 0.44731685],
[ 0.13260155, 0.5 ],
[ 0. , 0.5 ],
[-0.13260155, 0.5 ],
[-0.25978994, 0.44731685],
[-0.35355339, 0.35355339],
[-0.44731685, 0.25978994],
[-0.5 , 0.13260155],
[-0.5 , 0. ],
[-0.5 , -0.13260155],
[-0.44731685, -0.25978994],
[-0.35355339, -0.35355339],
[-0.25978994, -0.44731685],
[-0.13260155, -0.5 ],
[ 0. , -0.5 ],
[ 0. , -0.5 ]]), array([ 1, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 79], dtype=uint8))
segments:
(1, array([ 0. , -0.5]))
(4, array([ 0.13260155, -0.5 , 0.25978994, -0.44731685, 0.35355339,
-0.35355339]))
(4, array([ 0.44731685, -0.25978994, 0.5 , -0.13260155, 0.5 , 0.
]))
(4, array([ 0.5 , 0.13260155, 0.44731685, 0.25978994, 0.35355339,
0.35355339]))
(4, array([ 0.25978994, 0.44731685, 0.13260155, 0.5 , 0. ,
0.5 ]))
(4, array([-0.13260155, 0.5 , -0.25978994, 0.44731685, -0.35355339,
0.35355339]))
(4, array([-0.44731685, 0.25978994, -0.5 , 0.13260155, -0.5 , 0.
]))
(4, array([-0.5 , -0.13260155, -0.44731685, -0.25978994, -0.35355339,
-0.35355339]))
(4, array([-0.25978994, -0.44731685, -0.13260155, -0.5 , 0. ,
-0.5 ]))
(79, array([ 0. , -0.5]))
/usr/local/lib/python2.7/dist-packages/matplotlib/collections.py:590:
FutureWarning: elementwise comparison failed; returning scalar instead, but in
the future will perform elementwise comparison
if self._edgecolors == str('face'):
but I don't see any of that data correlate with the actual scatter input data. Perhaps it's not the ax.get_children()[2] path collection I need to look at?
Given the PathCollection returned by plt.scatter, you could call its get_offsets method:
from matplotlib import pyplot as plt
import numpy as np
fig = plt.figure()
x = np.linspace(0.0, 1.0, 5)
y = np.linspace(0.0, 1.0, 5)
s = plt.scatter(x, y)
print(s.get_offsets())
# [[ 0. 0. ]
# [ 0.25 0.25]
# [ 0.5 0.5 ]
# [ 0.75 0.75]
# [ 1. 1. ]]
Or, given the axes object, ax, you could access the PathCollection via ax.collections, and then call get_offsets:
In [110]: ax = fig.get_axes()[0]
In [129]: ax.collections[0].get_offsets()
Out[131]:
array([[ 0. , 0. ],
[ 0.25, 0.25],
[ 0.5 , 0.5 ],
[ 0.75, 0.75],
[ 1. , 1. ]])
You could also get the z coordinate. In case you used 3d data:
from matplotlib import pyplot as plt
import numpy as np
fig = plt.figure()
x = np.linspace(0.0, 1.0, 5)
y = np.linspace(0.0, 1.0, 5)
z = np.linspace(0.0, 10, 5)
s = plt.scatter(x, y, c=z)
cbar=plt.colorbar(s)
To retrieve information of x,y,z:
ax=fig.get_axes()[0]
x_r=ax.collections[0].get_offsets()[:,0]
y_r=ax.collections[0].get_offsets()[:,1]
z_r=ax.collections[0].get_array()

Categories