matplotlib connecting the dots in scatter plot

matplotlib connecting the dots in scatter plot - python

I am trying to visualize some data regarding the time at which the process was running or alive and the time it was idle. For each process, I have a_x_axis the time at which process started running and a_live_for is the time it was alive after it woke up. I have two data points in for each process. I am trying to connect these two dots by a line by connecting 1st green dot with the first red dot and second green dot with the second red dot and so on, so I can see alive and idle time for each process in the large data set. I looked into scatter plot examples but could not find any way to solve this issue.
import matplotlib.pyplot as plt
a_x_axis = [32, 30, 40, 50, 60, 78]
a_live = [1, 3, 2, 1, 2, 4]
a_alive_for = [a + b for a, b in zip(a_x_axis, a_live)]
b_x_axis = [22, 25, 45, 55, 60, 72]
b_live = [1, 3, 2, 1, 2, 4]
b_alive_for = [a + b for a, b in zip(b_x_axis, b_live)]
a_y_axis = []
b_y_axis = []
for i in range(0, len(a_x_axis)):
a_y_axis.append('process-1')
b_y_axis.append('process-2')
print("size of a: %s" % len(a_x_axis))
print("size of a: %s" % len(a_y_axis))
plt.xlabel('time (s)')
plt.scatter(a_x_axis, [1]*len(a_x_axis))
plt.scatter(a_alive_for, [1]*len(a_x_axis))
plt.scatter(b_x_axis, [2]*len(b_x_axis))
plt.scatter(b_alive_for, [2]*len(b_x_axis))
plt.show()

You need:
import matplotlib.pyplot as plt
a_x_axis = [32, 30, 40, 50, 60, 78]
a_live = [1, 3, 2, 1, 2, 4]
a_alive_for = [a + b for a, b in zip(a_x_axis, a_live)]
b_x_axis = [22, 25, 45, 55, 60, 72]
b_live = [1, 3, 2, 1, 2, 4]
b_alive_for = [a + b for a, b in zip(b_x_axis, b_live)]
a_y_axis = []
b_y_axis = []
for i in range(0, len(a_x_axis)):
a_y_axis.append('process-1')
b_y_axis.append('process-2')
print("size of a: %s" % len(a_x_axis))
print("size of a: %s" % len(a_y_axis))
plt.xlabel('time (s)')
plt.scatter(a_x_axis, [1]*len(a_x_axis))
plt.scatter(a_alive_for, [1]*len(a_x_axis))
plt.scatter(b_x_axis, [2]*len(b_x_axis))
plt.scatter(b_alive_for, [2]*len(b_x_axis))
for i in range(0, len(a_x_axis)):
plt.plot([a_x_axis[i],a_alive_for[i]], [1,1], 'green')
for i in range(0, len(b_x_axis)):
plt.plot([b_x_axis[i],b_alive_for[i]], [2,2], 'green')
plt.show()
Output:

scatter is just not the tool for plotting lines, it's plot. And it accepts 2D-arrays of x- and y-coordinates, so you don't have to manually iterate over lists. So you would need sth like
plt.plot([a_x_axis, a_alive_for], [[1]*n,[1]*n], 'green')
with n = len(a_x_axis).
However, you could structure your data much better in numpy arrays or pandas dataframes where you can set titles for columns, too. (Is it that, what you wanted to achieve by appending 'process-x' to your data lists...?)
Also, the colors of your markers seem to me not chosen by purpose; if you want to have them the same like the lines you could even leave scatter completely away.

Related

Query the value of the four neighbors of an element in a numpy 2D array

I have a 2D array of 5*5 like this:
>>> np.random.seed(100)
>>> a = np.random.randint(0,100, (5,5))
>>> a
array([[ 8, 24, 67, 87, 79],
[48, 10, 94, 52, 98],
[53, 66, 98, 14, 34],
[24, 15, 60, 58, 16],
[ 9, 93, 86, 2, 27]])
if I have an initial position, is there any way to quickly and easily get the values of its four neighbors around it? The method I'm using now is a bit cumbersome:
Suppose the current position is [x, y] (if x=2, y=3 then the value in the array is 14,)，then the position above it is [x-1, y], the bottom is [x+1, y], the left side is [y-1, x], and the right side is [y+1, x]. I use the following four lines of code to get the values of neighbors.
curr_val = a[2,3]
up_val = a[2+1, 3]
bott_val = a[2-1, 3]
left_val = a[2, 3+1]
right_val = a[2, 3-1]
So my question is is there a more convenient function in numpy that can do this and even query the values of four neighbors at once?

You can also use:
mask = np.array([[0, 1, 0],
[1, 0, 1],
[0, 1, 0]]).astype(bool)
a[i-1:i+2, j-1:j+2][mask]
output:
array([53, 93, 94, 86])

This is not the shortest method, but a flexible way could be to use a mask and a convolution to build this mask.
The advantage is that you can use any mask easily, just change the kernel.
from scipy.signal import convolve2d
kernel = [[0,1,0], # define points to pick around the target
[1,0,1],
[0,1,0]]
mask = np.zeros_like(a, dtype=bool) # build empty mask
mask[x,y] = True # set target(s)
# boolean indexing
a[convolve2d(mask, kernel, mode='same').astype(bool)]
output: array([52, 98, 34, 58])

The fastest way is this one taking usec to compute. Some times shortest is not the best. This one is very simple to understand and has no package dependencies.
This also works for edge-cases.
def neighbors(matrix: np.ndarray, x: int, y: int):
x_len, y_len = np.array(matrix.shape) - 1
nbr = []
if x > x_len or y > y_len:
return nbr
if x != 0:
nbr.append(matrix[x-1][y])
if y != 0:
nbr.append(matrix[x][y-1])
if x != x_len:
nbr.append(matrix[x+1][y])
if y != y_len:
nbr.append(matrix[x][y+1])
return nbr

How to get equally spaced grid points in an irregularly shaped figure?

I have an irregularly shaped image and I want to get equally spaced grid points inside that.
The image that I have for example is Image I have
I am thinking of using OpenCV to get the corner coordinates and that is easy. But I do not know how to pass all the corner coordinates or divide my shape in identifiable geometric shapes and do this.
Right now, I have hard coded the coordinates and created a function to pass the coordinates.
import numpy as np
import matplotlib.pyplot as plt
import functools
def gridFunc(arr):
center = np.mean(arr, axis=0)
x = np.arange(min(arr[:, 0]), max(arr[:, 0]) + 0.04, 0.4)
y = np.arange(min(arr[:, 1]), max(arr[:, 1]) + 0.04, 0.4)
a, b = np.meshgrid(x, y)
points = np.stack([a.reshape(-1), b.reshape(-1)]).T
def normal(a, b):
v = b - a
n = np.array([v[1], -v[0]])
# normal needs to point out
if (center - a) # n > 0:
n *= -1
return n
mask = functools.reduce(np.logical_and, [((points - a) # normal(a, b)) < 0 for a, b in zip(arr[:-1], arr[1:])])
#plt.plot(arr[:, 0], arr[:, 1])
#plt.gca().set_aspect('equal')
#plt.scatter(points[mask][:, 0], points[mask][:, 1])
#plt.show()
return points[mask]
arr1 = np.array([[0, 7],[3, 10],[3, 4],[0, 7]])
arr2 = np.array([[3, 0], [3, 14], [12, 14], [12, 0], [3,0]])
arr3 = np.array([[12, 4], [12, 10], [20, 10], [20, 4], [12, 4]])
arr_1 = gridFunc(arr1)
arr_2 = gridFunc(arr2)
arr_3 = gridFunc(arr3)
res = np.append(arr_1, arr_2)
res = np.reshape(res, (-1, 2))
res = np.append(res, arr_3)
res = np.reshape(res, (-1, 2))
plt.scatter(res[:,0], res[:,1])
plt.show()
The image that I get is this, But I am doing this manually And I want to extend this to other shapes as well.
Image I get

When using cut in a pandas dataframe to bin it, why is the binning not properly done?

I have a dataframe that I want to bin (i.e., group into sub-ranges) by one column, and take the mean of the second column for each of the bins:
import pandas as pd
import numpy as np
data = pd.DataFrame(columns=['Score', 'Age'])
data.Score = [1, 1, 1, 1, 0, 1, 2, 1, 0, 1, 1, 0, 2, 1, 1, 2, 1, 0, 1, 1, -1, 1, 0, 1, 1, 0, 1, 0, -2, 1]
data.Age = [29, 59, 44, 52, 60, 53, 45, 47, 57, 54, 35, 32, 48, 31, 49, 43, 67, 32, 31, 42, 37, 45, 52, 59, 56, 57, 48, 45, 56, 31]
_, bins = np.histogram(data.Age, 10)
labels = ['{}-{}'.format(i + 1, j) for i, j in zip(bins[:-1], bins[1:])]
labels[0] = '{}-{}'.format(bins[0], bins[1])
binned = pd.cut(data.Age, bins=bins, labels=labels, include_lowest=True, precision=0)
df = data.groupby(binned)['Score'].mean().reset_index()
df
There are 2 issues with this binning:
there is a gap of 1 between the upper bound of the (n-1)th bin and the lower bound of the nth bin (which means the binning is not continuous, and data points that lie in this gap are skipped).
the last few bin limits have a lot of digits after the decimal place. I have used the precision=0 flag in the cut, but it seems to be of no use - no matter what x I use in precision=x, it still produces the bins with the last few bins having a lot of digits after the decimal point.
The second point causes problem when, for instance, I try to plot df, where it ruins the look of the x-axis:
import matplotlib.pyplot as plt
plt.plot([str(i) for i in df.Age], df.Score, 'o-')
Why is this occurring inspite of the precision=0 flag that I put to imply I want only integers as the bin limits, and not floats? And how do I fix it?
I'm temporarily solving this issue by converting the bin values to ints manually:
_, bins = np.histogram(data.Age, 10)
for i in range(len(bins)): # my fix
bins[i] = int(bins[i])
labels = ['{}-{}'.format(i + 1, j) for i, j in zip(bins[:-1], bins[1:])]
labels[0] = '{}-{}'.format(bins[0], bins[1])
binned = pd.cut(data.Age, bins=bins, labels=labels, include_lowest=True, precision=0)
df = data.groupby(binned)['Score'].mean().reset_index()
df
But this feels like a hack, and I think it should have a "proper" solution instead of a hacky fix. And although it fixed the second issue, I'm not sure if this fixes the first issue.

Regarding the two issues you mentioned in your question, both of them result from one line in your code which is
labels = ['{}-{}'.format(i + 1, j) for i, j in zip(bins[:-1], bins[1:])]
The gab resulted from i+1, also the digits resulted from computer approximation in the same line.
Therefore, modify it to
labels = [f'{i:.1f}-{j:.1f}' for i, j in zip(bins[:-1], bins[1:])]
in which we make an approximation to one digit.
and no need for labels[0] = '{}-{}'.format(bins[0], bins[1])

Matplotlib: Hysteresis loop using Mirrored or Split x axis

In an experiment, a load cell advances in equal increments of distance with time, compresses a sample; stops when a specified distance from the start point is reached; then retracts in equal increments of distance with time back to the starting position.
A plot of pressure (load cell reading) on the y axis against pressure on the x axis produces a familiar hysteresis loop. A plot of pressure (load cell reading) on the y axis against time on the x axis produces an assymetric peak with the maximum pressure in the centre, corresponding to the maximum advancement point of the sensor.
Instead of the above, I'd like to plot pressure on the y axis against distance on the x axis, with the additional constraint that the x axis is labelled starting at 0, with maximum pressure at the middle of the x axis, and 0 again at the right hand end of the x-axis. In other words, the curve will be identical in shape to the plot of pressure v time, but will be of pressure v distance, where the left half of the plot indicates the distance of the probe from its starting position during advancement; and the right half of the plot indicates distance of the probe from its starting position during retraction.
My actual datasets contain thousands of rows of data but by way of illustration, a minimal dummy dataset would look something like the following, where the 3 columns correspond to Time, Distance of probe from origin, and Pressure measured by probe respectively:
[
[0,0,0],
[1,2,10],
[2,4,30],
[3,6,60],
[4,4,35],
[5,2,15],
[6,0,0]
]
I can't work out how to get MatPlotlib to construct the x-axis so that the range goes from 0 to a maximum, then back to 0 again. I'd be grateful for advice on how to achieve this plot in the most simple and elegant way. Many thanks.

As you have time, you can use it for the x axis values and just change the x tick labels:
import numpy as np
import matplotlib.pyplot as plt
# Time, Distance, Pressure
data = [[0, 0, 0],
[1, 2, 10],
[2, 4, 30],
[3, 6, 60],
[4, 4, 35],
[5, 2, 15],
[6, 0, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
fig = plt.figure()
ax = fig.add_subplot(111)
max_ticks = 10
skip = (data.shape[0] / max_ticks) + 1
ax.plot(data[:, 0], data[:, 2]) # Pressure(time)
ax.set_xticks(data[::skip, 0])
ax.set_xticklabels(data[::skip, 1]) # Pressure(Distance(time)) ?
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
The skip is just so you don't end up with too many ticks on the plot, change as you like.
As said in comment, the above only holds for uniforme changes in distance as a function of time. For non uniform changes, you'll have to use something like:
data = [[0, 0, 0],
[1, 2, 10],
[2, 4, 30],
[3, 6, 60],
[3.5, 5.4, 40],
[4, 4, 35],
[5, 2, 15],
[6, 0, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
def find_max_pos(data, column=0):
return np.argmax(data[:, column])
def reverse_unload(data, unload_start):
# prepare new_data with new column:
new_shape = np.array(data.shape)
new_shape[1] += 1
new_data = np.empty(new_shape)
# copy all correct data
new_data[:, 0] = data[:, 0]
new_data[:, 1] = data[:, 1]
new_data[:, 2] = data[:, 2]
new_data[:unload_start+1, 3] = data[:unload_start+1, 1]
# use gradient to fill the rest
gradient = -np.gradient(data[:, 1])
for i in range(unload_start + 1, data.shape[0]):
new_data[i, 3] = new_data[i-1, 3] + gradient[i]
return new_data
data = reverse_unload(data, find_max_pos(data, 1))
fig = plt.figure()
ax = fig.add_subplot(111)
max_ticks = 10
skip = (data.shape[0] / max_ticks) + 1
ax.plot(data[:, 3], data[:, 2]) # Pressure("Distance")
ax.set_xticks(data[::skip, 3])
ax.set_xticklabels(data[::skip, 1])
ax.grid() # added for clarity
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
Regarding the fact that using the measured values as the ticks results in these not being round nice numbers, I found it was just easier to map the automatic ticks from matplotlib to the correct values:
import numpy as np
import matplotlib.pyplot as plt
data = [[0, 0, 0],
[1, 2, 10],
[2, 4, 30],
[3, 6, 60],
[3.5, 5.4, 40],
[4, 4, 35],
[5, 2, 15],
[6, 0, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
def find_max_pos(data, column=0):
return np.argmax(data[:, column])
def reverse_unload(data):
unload_start = find_max_pos(data, 1)
# prepare new_data with new column:
new_shape = np.array(data.shape)
new_shape[1] += 1
new_data = np.empty(new_shape)
# copy all correct data
new_data[:, 0] = data[:, 0]
new_data[:, 1] = data[:, 1]
new_data[:, 2] = data[:, 2]
new_data[:unload_start+1, 3] = data[:unload_start+1, 1]
# use gradient to fill the rest
gradient = data[unload_start:-1, 1]-data[unload_start+1:, 1]
for i, j in enumerate(range(unload_start + 1, data.shape[0])):
new_data[j, 3] = new_data[j-1, 3] + gradient[i]
return new_data
def create_map_function(data):
"""
Return function that maps values of distance
folded over the maximum pressure applied.
"""
max_index = find_max_pos(data, 1)
x0, y0 = data[max_index, 1], data[max_index, 1]
x1, y1 = 2*data[max_index, 1], 0
m = (y1 - y0) / (x1 - x0)
b = y0 - m*x0
def map_function(x):
if x < x0:
return x
else:
return m*x+b
return map_function
def process_data(data):
data = reverse_unload(data)
map_function = create_map_function(data)
fig, ax = plt.subplots()
ax.plot(data[:, 3], data[:, 2])
ax.set_xticklabels([map_function(x) for x in ax.get_xticks()])
ax.grid()
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
if __name__ == '__main__':
process_data(data)

Update: Have found a workaround to the problem of rounding ticks to the nearest integer by using the np.around function which rounds decimals to the nearest even value, to a specified number of decimal places (default = 0): e.g. 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc. More info here: https://docs.scipy.org/doc/numpy1.10.4/reference/generated/numpy.around.html
So berna1111's code becomes:
import numpy as np
import matplotlib.pyplot as plt
# Time, Distance, Pressure
data = [[0, 0, 0],
[1, 1.9, 10], # Dummy data including decimals to demonstrate rounding
[2, 4.1, 30],
[3, 6.1, 60],
[4, 3.9, 35],
[5, 1.9, 15],
[6, -0.2, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
fig = plt.figure()
ax = fig.add_subplot(111)
max_ticks = 10
skip = (data.shape[0] / max_ticks) + 1
ax.plot(data[:, 0], data[:, 2]) # Pressure(time)
ax.set_xticks(data[::skip, 0])
ax.set_xticklabels(np.absolute(np.around((data[::skip, 1])))) # Pressure(Distance(time)); rounded to nearest integer
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
According to the numpy documentation, np.around should round the final value of -0.2 for Distance to '0.0'; however it seems to round to '-0.0' instead. Not sure why this occurs, but since all my xticklabels in this particular case need to be positive integers or zero, I can correct this behaviour by using the np.absolute function as shown above. Everything now seems to work OK for my requirements, but if I'm missing something, or there's a better solution, please let me know.

Automatically assign color to nodes in Graphviz

I'm using Python and Graphviz to draw some cluster graph consist of nodes.
I want to assign different colors to each node, dependent on an attribute, e.g. its x-coordinate.
Here's how I produce graph:
def add_nodes(graph, nodes):
for n in nodes:
if isinstance(n, tuple):
graph.node(n[0], **n[1])
else:
graph.node(n)
return graph
A = [[517, 1, [409], 10, 6],
[534, 1, [584], 10, 12],
[614, 1, [247], 11, 5],
[679, 1, [228], 13, 7],
[778, 1, [13], 14, 14]]
nodesgv = []
for node in A:
nodesgv.append((str(node[0]),{'label': str(node[0]), 'color': ???, 'style': 'filled'}))
graph = functools.partial(gv.Graph, format='svg', engine='neato')
add_nodes(graph(), nodesgv).render(('img/test'))
And now I want to assign a color to each node with the ordering of the first value of each node.
More specifically what I want is:
a red node (517)
a yellow node (534)
a green node (614)
a blue node (679)
and a purple node (778)
I know how to assign colors to the graph, but what I'm looking for is something similar to the c=x part when using matplotlib.
Problem is I'm not able to know the number of nodes (clusters) beforehand, so for example if I've got 7 nodes, I still want a graph with 7 nodes that start from a red one, and end with a purple one.
plt.scatter(x, y, c=x, s=node_sizes)
So is there any attribute in Graphviz that can do this?
Or can anyone tell me how does the colormap in matplotlib work?
Sorry for the lack of clarity. T^T

Oh I figured out a way to get what I want.
Just for recording and for someone else may have a same problem(?)
Can just rescale a color map and assign the corresponding index (of color) to the nodes.
def add_nodes(graph, nodes):
for n in nodes:
if isinstance(n, tuple):
graph.node(n[0], **n[1])
else:
graph.node(n)
return graph
A = [[517, 1, [409], 10, 6],
[534, 1, [584], 10, 12],
[614, 1, [247], 11, 5],
[679, 1, [228], 13, 7],
[778, 1, [13], 14, 14]]
nodesgv = []
Arange = [ a[0] for a in A]
norm = mpl.colors.Normalize(vmin = min(Arange), vmax = max(Arange))
cmap = cm.jet
for index, i in enumerate(A):
x = i[0]
m = cm.ScalarMappable(norm = norm, cmap = cmap)
mm = m.to_rgba(x)
M = colorsys.rgb_to_hsv(mm[0], mm[1], mm[2])
nodesgv.append((str(i[0]),{'label': str((i[1])), 'color': "%f, %f, %f" % (M[0], M[1], M[2]), 'style': 'filled'}))
graph = functools.partial(gv.Graph, format='svg', engine='neato')
add_nodes(graph(), nodesgv).render(('img/test'))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

matplotlib connecting the dots in scatter plot - python

Related

Query the value of the four neighbors of an element in a numpy 2D array

How to get equally spaced grid points in an irregularly shaped figure?

When using cut in a pandas dataframe to bin it, why is the binning not properly done?

Matplotlib: Hysteresis loop using Mirrored or Split x axis

Automatically assign color to nodes in Graphviz

Categories

Resources