Recover data from matplotlib scatter plot [duplicate] - python

This question already has an answer here:
Extracting data from a scatter plot in Matplotlib
(1 answer)
Closed 2 years ago.
From a matplotlib scatter plot, I'm trying the recover the point data. Consider
from matplotlib import pyplot as plt
import numpy as np
fig = plt.figure()
x = np.linspace(0.0, 1.0, 5)
y = np.linspace(0.0, 1.0, 5)
plt.scatter(x, y)
ax = fig.get_children()[1]
pc = ax.get_children()[2]
for path in pc.get_paths():
print
print('path:')
print(path)
print
print('segments:')
for vert, code in path.iter_segments():
print(code, vert)
plt.show()
This yields
path:
Path(array([[ 0. , -0.5 ],
[ 0.13260155, -0.5 ],
[ 0.25978994, -0.44731685],
[ 0.35355339, -0.35355339],
[ 0.44731685, -0.25978994],
[ 0.5 , -0.13260155],
[ 0.5 , 0. ],
[ 0.5 , 0.13260155],
[ 0.44731685, 0.25978994],
[ 0.35355339, 0.35355339],
[ 0.25978994, 0.44731685],
[ 0.13260155, 0.5 ],
[ 0. , 0.5 ],
[-0.13260155, 0.5 ],
[-0.25978994, 0.44731685],
[-0.35355339, 0.35355339],
[-0.44731685, 0.25978994],
[-0.5 , 0.13260155],
[-0.5 , 0. ],
[-0.5 , -0.13260155],
[-0.44731685, -0.25978994],
[-0.35355339, -0.35355339],
[-0.25978994, -0.44731685],
[-0.13260155, -0.5 ],
[ 0. , -0.5 ],
[ 0. , -0.5 ]]), array([ 1, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 79], dtype=uint8))
segments:
(1, array([ 0. , -0.5]))
(4, array([ 0.13260155, -0.5 , 0.25978994, -0.44731685, 0.35355339,
-0.35355339]))
(4, array([ 0.44731685, -0.25978994, 0.5 , -0.13260155, 0.5 , 0.
]))
(4, array([ 0.5 , 0.13260155, 0.44731685, 0.25978994, 0.35355339,
0.35355339]))
(4, array([ 0.25978994, 0.44731685, 0.13260155, 0.5 , 0. ,
0.5 ]))
(4, array([-0.13260155, 0.5 , -0.25978994, 0.44731685, -0.35355339,
0.35355339]))
(4, array([-0.44731685, 0.25978994, -0.5 , 0.13260155, -0.5 , 0.
]))
(4, array([-0.5 , -0.13260155, -0.44731685, -0.25978994, -0.35355339,
-0.35355339]))
(4, array([-0.25978994, -0.44731685, -0.13260155, -0.5 , 0. ,
-0.5 ]))
(79, array([ 0. , -0.5]))
/usr/local/lib/python2.7/dist-packages/matplotlib/collections.py:590:
FutureWarning: elementwise comparison failed; returning scalar instead, but in
the future will perform elementwise comparison
if self._edgecolors == str('face'):
but I don't see any of that data correlate with the actual scatter input data. Perhaps it's not the ax.get_children()[2] path collection I need to look at?

Given the PathCollection returned by plt.scatter, you could call its get_offsets method:
from matplotlib import pyplot as plt
import numpy as np
fig = plt.figure()
x = np.linspace(0.0, 1.0, 5)
y = np.linspace(0.0, 1.0, 5)
s = plt.scatter(x, y)
print(s.get_offsets())
# [[ 0. 0. ]
# [ 0.25 0.25]
# [ 0.5 0.5 ]
# [ 0.75 0.75]
# [ 1. 1. ]]
Or, given the axes object, ax, you could access the PathCollection via ax.collections, and then call get_offsets:
In [110]: ax = fig.get_axes()[0]
In [129]: ax.collections[0].get_offsets()
Out[131]:
array([[ 0. , 0. ],
[ 0.25, 0.25],
[ 0.5 , 0.5 ],
[ 0.75, 0.75],
[ 1. , 1. ]])

You could also get the z coordinate. In case you used 3d data:
from matplotlib import pyplot as plt
import numpy as np
fig = plt.figure()
x = np.linspace(0.0, 1.0, 5)
y = np.linspace(0.0, 1.0, 5)
z = np.linspace(0.0, 10, 5)
s = plt.scatter(x, y, c=z)
cbar=plt.colorbar(s)
To retrieve information of x,y,z:
ax=fig.get_axes()[0]
x_r=ax.collections[0].get_offsets()[:,0]
y_r=ax.collections[0].get_offsets()[:,1]
z_r=ax.collections[0].get_array()

Related

Why does meshgrid have one more dimension than input?

I am sorry if this is obvious, but I am having trouble understanding why it seems that np.meshgrid produces array who's shape is more than the input:
grid = np.meshgrid(
np.linspace(-1, 1, 5),
np.linspace(-1, 1, 4),
np.linspace(-1, 1, 3), indexing='ij')
np.shape(grid)
(3, 5, 4, 3)
To me it should have been: (5, 4, 3)
or
grid = np.meshgrid(
np.linspace(-1, 1, 5),
np.linspace(-1, 1, 4), indexing='ij')
np.shape(grid)
(2, 5, 4)
To me it should have been: (5, 4)
I would be very grateful if somebody could explain me that.... Thanks a lot!
In [92]: grid = np.meshgrid(
...: np.linspace(-1, 1, 5),
...: np.linspace(-1, 1, 4), indexing='ij')
...:
In [93]: grid
Out[93]:
[array([[-1. , -1. , -1. , -1. ],
[-0.5, -0.5, -0.5, -0.5],
[ 0. , 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5, 0.5],
[ 1. , 1. , 1. , 1. ]]),
array([[-1. , -0.33333333, 0.33333333, 1. ],
[-1. , -0.33333333, 0.33333333, 1. ],
[-1. , -0.33333333, 0.33333333, 1. ],
[-1. , -0.33333333, 0.33333333, 1. ],
[-1. , -0.33333333, 0.33333333, 1. ]])]
grid is a list with two arrays. The first array has numbers from the first argument (the one with 5 elements). The second has numbers from the second argument.
Why should np.shape(grid) is (5,4)? What layout were you expecting?
np.shape(grid) actually does np.array(grid).shape, which is why there's an added dimension.

numpy - align 2 vectors with potentially missing values

I have 2 numpy matrix with slightly different alignment
X
id, value
1, 0.78
2, 0.65
3, 0.77
...
...
98, 0.88
99, 0.77
100, 0.87
Y
id, value
1, 0.79
2, 0.65
3, 0.78
...
...
98, 0.89
100, 0.80
Y is simply missing a particular ID.
I would like to perform vector operations on X and Y (e.g. correlation, difference...etc). Meaning I need to drop the corresponding missing value in X. How would I do that?
All the values are the same, so the extra element in x will be the difference between the sums.
This solution is o(n), other solutions here are o(n^2)
Data generation:
import numpy as np
# x = np.arange(10)
x = np.random.rand(10)
y = np.r_[x[:6], x[7:]] # exclude 6
print(x)
np.random.shuffle(y)
print(y)
Solution:
Notice np.isclose() used for floating point comparison.
sum_x = np.sum(x)
sum_y = np.sum(y)
diff = sum_x - sum_y
value_index = np.argwhere(np.isclose(x, diff))
print(value_index)
Delete relevant index
deleted = np.delete(x, value_index)
print(deleted)
out:
[0.36373441 0.5030346 0.895204 0.03352821 0.20693263 0.28651572
0.25859596 0.97969841 0.77368822 0.80105397]
[0.97969841 0.77368822 0.28651572 0.36373441 0.5030346 0.895204
0.03352821 0.80105397 0.20693263]
[[6]]
[0.36373441 0.5030346 0.895204 0.03352821 0.20693263 0.28651572
0.97969841 0.77368822 0.80105397]
Use in1d:
>>> X
array([[ 1. , 0.53],
[ 2. , 0.72],
[ 3. , 0.44],
[ 4. , 0.35],
[ 5. , 0.32],
[ 6. , 0.14],
[ 7. , 0.52],
[ 8. , 0.4 ],
[ 9. , 0.1 ],
[10. , 0.1 ]])
>>> Y
array([[ 1. , 0.19],
[ 2. , 0.96],
[ 3. , 0.24],
[ 4. , 0.44],
[ 5. , 0.12],
[ 6. , 0.91],
[ 7. , 0.7 ],
[ 8. , 0.54],
[10. , 0.09]])
>>> X[np.in1d(X[:, 0], Y[:, 0])]
array([[ 1. , 0.53],
[ 2. , 0.72],
[ 3. , 0.44],
[ 4. , 0.35],
[ 5. , 0.32],
[ 6. , 0.14],
[ 7. , 0.52],
[ 8. , 0.4 ],
[10. , 0.1 ]])
You can try this:
X = X[~numpy.isnan(X)]
Y = Y[~numpy.isnan(Y)]
And there you can do whatever operation you want

Using for loop to replace values in a matrix but only the last replaced value is kept

x_n = np.arange(0, 1.0, 0.25)
u_m = np.arange(0, 1.0, 0.5)
for x in range(len(x_n)):
for u in range(len(u_m)):
zeros_array = np.zeros( (len(x_n), len(u_m)) )
zeros_array[x,u] = x_n[x] - u_m[u]
zeros_array
#result
array([[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0.25]])
Only the last replaced value is kept. I want to know how to keep all the replaced values.
You're initializing a new zeros_array on every iteration of the loop, so it's straight-forward that when the loop ends, only the last zeros_array value is kept, to solve this, you need to define zeros_array once outside the loop and keep updating it inside:
x_n = np.arange(0, 1.0, 0.25)
u_m = np.arange(0, 1.0, 0.5)
zeros_array = np.zeros((len(x_n), len(u_m)))
for x in range(len(x_n)):
for u in range(len(u_m)):
zeros_array[x, u] = x_n[x] - u_m[u]
print(zeros_array)
Output:
[[ 0. -0.5 ]
[ 0.25 -0.25]
[ 0.5 0. ]
[ 0.75 0.25]]
you have the initialization of the zeros_array inside the loop so it's doing it every loop
do:
zeros_array = np.zeros((len(x_n),len(u_m)))
for x in range(len(x_n)):
for u in range(len(u_m)):
zeros_array[x,u] = x_n[x] - u_m[u]
output:
array([[ 0. , -0.5 ],
[ 0.25, -0.25],
[ 0.5 , 0. ],
[ 0.75, 0.25]])

Matplotlib RegularPolygon collection location on the canvas

I am trying to plot a feature map (SOM) using python.
To keep it simple, imagine a 2D plot where each unit is represented as an hexagon.
As it is shown on this topic: Hexagonal Self-Organizing map in Python the hexagons are located side-by-side formated as a grid.
I manage to write the following piece of code and it works perfectly for a set number of polygons and for only few shapes (6 x 6 or 10 x 4 hexagons for example). However one important feature of a method like this is to support any grid shape from 3 x 3.
def plot_map(grid,
d_matrix,
w=10,
title='SOM Hit map'):
"""
Plot hexagon map where each neuron is represented by a hexagon. The hexagon
color is given by the distance between the neurons (D-Matrix) Scaled
hexagons will appear on top of the background image whether the hits array
is provided. They are scaled according to the number of hits on each
neuron.
Args:
- grid: Grid dictionary (keys: centers, x, y ),
- d_matrix: array contaning the distances between each neuron
- w: width of the map in inches
- title: map title
Returns the Matplotlib SubAxis instance
"""
n_centers = grid['centers']
x, y = grid['x'], grid['y']
fig = plt.figure(figsize=(1.05 * w, 0.85 * y * w / x), dpi=100)
ax = fig.add_subplot(111)
ax.axis('equal')
# Discover difference between centers
collection_bg = RegularPolyCollection(
numsides=6, # a hexagon
rotation=0,
sizes=(y * (1.3 * 2 * math.pi * w) ** 2 / x,),
edgecolors = (0, 0, 0, 1),
array= d_matrix,
cmap = cm.gray,
offsets = n_centers,
transOffset = ax.transData,
)
ax.add_collection(collection_bg, autolim=True)
ax.axis('off')
ax.autoscale_view()
ax.set_title(title)
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.05)
plt.colorbar(collection_bg, cax=cax)
return ax
I've tried to make something that automatically understands the grid shape. It didn't work (and I'm not sure why). It always appear a undesired space between the hexagons
Summarising: I would like to generate 3x3 or 6x6 or 10x4 (and so on) grid using hexagons with no spaces in the between for given points and setting the plot width.
As it was asked, here is the data for the hexagons location. As you can see, it always the same pattern
3x3
{'centers': array([[ 1.5 , 0.8660254 ],
[ 2.5 , 0.8660254 ],
[ 3.5 , 0.8660254 ],
[ 1. , 1.73205081],
[ 2. , 1.73205081],
[ 3. , 1.73205081],
[ 1.5 , 2.59807621],
[ 2.5 , 2.59807621],
[ 3.5 , 2.59807621]]),
'x': array([ 3.]),
'y': array([ 3.])}
6x6
{'centers': array([[ 1.5 , 0.8660254 ],
[ 2.5 , 0.8660254 ],
[ 3.5 , 0.8660254 ],
[ 4.5 , 0.8660254 ],
[ 5.5 , 0.8660254 ],
[ 6.5 , 0.8660254 ],
[ 1. , 1.73205081],
[ 2. , 1.73205081],
[ 3. , 1.73205081],
[ 4. , 1.73205081],
[ 5. , 1.73205081],
[ 6. , 1.73205081],
[ 1.5 , 2.59807621],
[ 2.5 , 2.59807621],
[ 3.5 , 2.59807621],
[ 4.5 , 2.59807621],
[ 5.5 , 2.59807621],
[ 6.5 , 2.59807621],
[ 1. , 3.46410162],
[ 2. , 3.46410162],
[ 3. , 3.46410162],
[ 4. , 3.46410162],
[ 5. , 3.46410162],
[ 6. , 3.46410162],
[ 1.5 , 4.33012702],
[ 2.5 , 4.33012702],
[ 3.5 , 4.33012702],
[ 4.5 , 4.33012702],
[ 5.5 , 4.33012702],
[ 6.5 , 4.33012702],
[ 1. , 5.19615242],
[ 2. , 5.19615242],
[ 3. , 5.19615242],
[ 4. , 5.19615242],
[ 5. , 5.19615242],
[ 6. , 5.19615242]]),
'x': array([ 6.]),
'y': array([ 6.])}
11x4
{'centers': array([[ 1.5 , 0.8660254 ],
[ 2.5 , 0.8660254 ],
[ 3.5 , 0.8660254 ],
[ 4.5 , 0.8660254 ],
[ 5.5 , 0.8660254 ],
[ 6.5 , 0.8660254 ],
[ 7.5 , 0.8660254 ],
[ 8.5 , 0.8660254 ],
[ 9.5 , 0.8660254 ],
[ 10.5 , 0.8660254 ],
[ 11.5 , 0.8660254 ],
[ 1. , 1.73205081],
[ 2. , 1.73205081],
[ 3. , 1.73205081],
[ 4. , 1.73205081],
[ 5. , 1.73205081],
[ 6. , 1.73205081],
[ 7. , 1.73205081],
[ 8. , 1.73205081],
[ 9. , 1.73205081],
[ 10. , 1.73205081],
[ 11. , 1.73205081],
[ 1.5 , 2.59807621],
[ 2.5 , 2.59807621],
[ 3.5 , 2.59807621],
[ 4.5 , 2.59807621],
[ 5.5 , 2.59807621],
[ 6.5 , 2.59807621],
[ 7.5 , 2.59807621],
[ 8.5 , 2.59807621],
[ 9.5 , 2.59807621],
[ 10.5 , 2.59807621],
[ 11.5 , 2.59807621],
[ 1. , 3.46410162],
[ 2. , 3.46410162],
[ 3. , 3.46410162],
[ 4. , 3.46410162],
[ 5. , 3.46410162],
[ 6. , 3.46410162],
[ 7. , 3.46410162],
[ 8. , 3.46410162],
[ 9. , 3.46410162],
[ 10. , 3.46410162],
[ 11. , 3.46410162]]),
'x': array([ 11.]),
'y': array([ 4.])}
I've manage to find a workaround by calculating the figure size of inches according the given dpi. After, I compute the pixel distance between two adjacent points (by plotting it using a hidden scatter plot). This way I could calculate the hexagon apothem and estimate correctly the size of the hexagon's inner circle (as the matplotlib expects).
No gaps in the end!
import matplotlib.pyplot as plt
from matplotlib import colors, cm
from matplotlib.collections import RegularPolyCollection
from mpl_toolkits.axes_grid1 import make_axes_locatable
import math
import numpy as np
def plot_map(grid,
d_matrix,
w=1080,
dpi=72.,
title='SOM Hit map'):
"""
Plot hexagon map where each neuron is represented by a hexagon. The hexagon
color is given by the distance between the neurons (D-Matrix)
Args:
- grid: Grid dictionary (keys: centers, x, y ),
- d_matrix: array contaning the distances between each neuron
- w: width of the map in inches
- title: map title
Returns the Matplotlib SubAxis instance
"""
n_centers = grid['centers']
x, y = grid['x'], grid['y']
# Size of figure in inches
xinch = (x * w / y) / dpi
yinch = (y * w / x) / dpi
fig = plt.figure(figsize=(xinch, yinch), dpi=dpi)
ax = fig.add_subplot(111, aspect='equal')
# Get pixel size between to data points
xpoints = n_centers[:, 0]
ypoints = n_centers[:, 1]
ax.scatter(xpoints, ypoints, s=0.0, marker='s')
ax.axis([min(xpoints)-1., max(xpoints)+1.,
min(ypoints)-1., max(ypoints)+1.])
xy_pixels = ax.transData.transform(np.vstack([xpoints, ypoints]).T)
xpix, ypix = xy_pixels.T
# In matplotlib, 0,0 is the lower left corner, whereas it's usually the
# upper right for most image software, so we'll flip the y-coords
width, height = fig.canvas.get_width_height()
ypix = height - ypix
# discover radius and hexagon
apothem = .9 * (xpix[1] - xpix[0]) / math.sqrt(3)
area_inner_circle = math.pi * (apothem ** 2)
collection_bg = RegularPolyCollection(
numsides=6, # a hexagon
rotation=0,
sizes=(area_inner_circle,),
edgecolors = (0, 0, 0, 1),
array= d_matrix,
cmap = cm.gray,
offsets = n_centers,
transOffset = ax.transData,
)
ax.add_collection(collection_bg, autolim=True)
ax.axis('off')
ax.autoscale_view()
ax.set_title(title)
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="10%", pad=0.05)
plt.colorbar(collection_bg, cax=cax)
return ax

getting a list of coordinates from a 2D matrix

Let's say I have a 10 x 20 matrix of values (so 200 data points)
values = np.random.rand(10,20)
with a known regular spacing between coordinates so that the x and y coordinates are defined by
coord_x = np.arange(0,5,0.5) --> gives [0.0,0.5,1.0,1.5...4.5]
coord_y = np.arange(0,5,0.25) --> gives [0.0,0.25,0.50,0.75...4.5]
I'd like to get an array representing each coordinates points so that
the shape of the array is (200,2), 200 being the total number of points and the extra dimension simply representing x and y such as
coord[0][0]=0.0, coord[0][1]=0.0
coord[1][0]=0.0, coord[1][1]=0.25
coord[2][0]=0.0, coord[2][1]=0.50
...
coord[19][0]=0.0, coord[19][1]=5.0
coord[20][0]=0.5, coord[20][1]=0.0
coord[21][0]=0.5, coord[21][1]=0.25
coord[22][0]=0.5, coord[22][1]=0.50
...
coord[199][0]=4.5, coord[199][1]=4.5
That would a fairly easy thing to do with a double for loop, but I wonder if there is more elegant solution using built-in numpy (or else) functions.
?
I think meshgrid may be what you're looking for.
Here's an example, with smaller number of datapoints:
>>> from numpy import fliplr, dstack, meshgrid, linspace
>>> x, y, nx, ny = 4.5, 4.5, 3, 10
>>> Xs = linspace(0, x, nx)
>>> Ys = linspace(0, y, ny)
>>> fliplr(dstack(meshgrid(Xs, Ys)).reshape(nx * ny, 2))
array([[ 0. , 0. ],
[ 0. , 2.25],
[ 0. , 4.5 ],
[ 0.5 , 0. ],
[ 0.5 , 2.25],
[ 0.5 , 4.5 ],
[ 1. , 0. ],
[ 1. , 2.25],
[ 1. , 4.5 ],
[ 1.5 , 0. ],
[ 1.5 , 2.25],
[ 1.5 , 4.5 ],
[ 2. , 0. ],
[ 2. , 2.25],
[ 2. , 4.5 ],
[ 2.5 , 0. ],
[ 2.5 , 2.25],
[ 2.5 , 4.5 ],
[ 3. , 0. ],
[ 3. , 2.25],
[ 3. , 4.5 ],
[ 3.5 , 0. ],
[ 3.5 , 2.25],
[ 3.5 , 4.5 ],
[ 4. , 0. ],
[ 4. , 2.25],
[ 4. , 4.5 ],
[ 4.5 , 0. ],
[ 4.5 , 2.25],
[ 4.5 , 4.5 ]])
I think you meant coord_y = np.arange(0,5,0.25) in your question. You can do
from numpy import meshgrid,column_stack
x,y=meshgrid(coord_x,coord_y)
coord = column_stack((x.T.flatten(),y.T.flatten()))

Categories