Matplotlib Scatter plot [duplicate]

Matplotlib Scatter plot [duplicate] - python

I am trying to make a scatter plot and annotate data points with different numbers from a list.
So, for example, I want to plot y vs x and annotate with corresponding numbers from n.
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
ax = fig.add_subplot(111)
ax1.scatter(z, y, fmt='o')
Any ideas?

I'm not aware of any plotting method which takes arrays or lists but you could use annotate() while iterating over the values in n.
import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
fig, ax = plt.subplots()
ax.scatter(z, y)
for i, txt in enumerate(n):
ax.annotate(txt, (z[i], y[i]))
There are a lot of formatting options for annotate(), see the matplotlib website:

In case anyone is trying to apply the above solutions to a .scatter() instead of a .subplot(),
I tried running the following code
import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
fig, ax = plt.scatter(z, y)
for i, txt in enumerate(n):
ax.annotate(txt, (z[i], y[i]))
But ran into errors stating "cannot unpack non-iterable PathCollection object", with the error specifically pointing at codeline fig, ax = plt.scatter(z, y)
I eventually solved the error using the following code
import matplotlib.pyplot as plt
plt.scatter(z, y)
for i, txt in enumerate(n):
plt.annotate(txt, (z[i], y[i]))
I didn't expect there to be a difference between .scatter() and .subplot()
I should have known better.

In versions earlier than matplotlib 2.0, ax.scatter is not necessary to plot text without markers. In version 2.0 you'll need ax.scatter to set the proper range and markers for text.
import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
fig, ax = plt.subplots()
for i, txt in enumerate(n):
ax.annotate(txt, (z[i], y[i]))
And in this link you can find an example in 3d.

You may also use pyplot.text (see here).
def plot_embeddings(M_reduced, word2Ind, words):
"""
Plot in a scatterplot the embeddings of the words specified in the list "words".
Include a label next to each point.
"""
for word in words:
x, y = M_reduced[word2Ind[word]]
plt.scatter(x, y, marker='x', color='red')
plt.text(x+.03, y+.03, word, fontsize=9)
plt.show()
M_reduced_plot_test = np.array([[1, 1], [-1, -1], [1, -1], [-1, 1], [0, 0]])
word2Ind_plot_test = {'test1': 0, 'test2': 1, 'test3': 2, 'test4': 3, 'test5': 4}
words = ['test1', 'test2', 'test3', 'test4', 'test5']
plot_embeddings(M_reduced_plot_test, word2Ind_plot_test, words)

I would love to add that you can even use arrows /text boxes to annotate the labels. Here is what I mean:
import random
import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
fig, ax = plt.subplots()
ax.scatter(z, y)
ax.annotate(n[0], (z[0], y[0]), xytext=(z[0]+0.05, y[0]+0.3),
arrowprops=dict(facecolor='red', shrink=0.05))
ax.annotate(n[1], (z[1], y[1]), xytext=(z[1]-0.05, y[1]-0.3),
arrowprops = dict( arrowstyle="->",
connectionstyle="angle3,angleA=0,angleB=-90"))
ax.annotate(n[2], (z[2], y[2]), xytext=(z[2]-0.05, y[2]-0.3),
arrowprops = dict(arrowstyle="wedge,tail_width=0.5", alpha=0.1))
ax.annotate(n[3], (z[3], y[3]), xytext=(z[3]+0.05, y[3]-0.2),
arrowprops = dict(arrowstyle="fancy"))
ax.annotate(n[4], (z[4], y[4]), xytext=(z[4]-0.1, y[4]-0.2),
bbox=dict(boxstyle="round", alpha=0.1),
arrowprops = dict(arrowstyle="simple"))
plt.show()
Which will generate the following graph:

For limited set of values matplotlib is fine. But when you have lots of values the tooltip starts to overlap over other data points. But with limited space you can't ignore the values. Hence it's better to zoom out or zoom in.
Using plotly
import plotly.express as px
import pandas as pd
df = px.data.tips()
df = px.data.gapminder().query("year==2007 and continent=='Americas'")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", text="country", log_x=True, size_max=100, color="lifeExp")
fig.update_traces(textposition='top center')
fig.update_layout(title_text='Life Expectency', title_x=0.5)
fig.show()

Python 3.6+:
coordinates = [('a',1,2), ('b',3,4), ('c',5,6)]
for x in coordinates: plt.annotate(x[0], (x[1], x[2]))

This might be useful when you need individually annotate in different time (I mean, not in a single for loop)
ax = plt.gca()
ax.annotate('your_lable', (x,y))
where x and y are the your target coordinate and type is float/int.

As a one liner using list comprehension and numpy:
[ax.annotate(x[0], (x[1], x[2])) for x in np.array([n,z,y]).T]
setup is ditto to Rutger's answer.

Related

How to draw a scatter graph with 2 y-axis

I am trying to plot a scatter graph with X, Y and Y2. The main objective is to study the relationship between these three features.
import matplotlib.cm as cm
import matplotlib.pyplot as plt
X = [110, 120, 130, 140, 150]
Y = [0.1, 0.2, 0.3, 0.4, 0.5]
Y2 = [5, 4, 3, 2, 1]
plt.title('X vs Y vs Y2')
plt.xlabel('X')
plt.ylabel('Y')
points1 = plt.scatter(X, Y,
c=Y, cmap="rainbow", alpha=1) #set style options
cbar = plt.colorbar(points1)
cbar.set_label('Y2')
But I got something like this:
The points of the graph show the relationship between X and Y. I want Y2 relation to be shown by using the colorbar labelled at the right side of the y-axis. I expected it to look like this:

How to add a single marker in a bar graph

I have the following code generating a bar graph. However, for the last bar, I need a star marker to show that there is no data for the last bar, here in the graph it's number 10.
data = pd.read_csv('data.csv')
df = pd.DataFrame(data)
plt.figure(figsize=(3,2))
X = list(df.iloc[:, 0])
Y = list(df.iloc[:, 1])
Z= list(df.iloc[:, 2])
X_axis = np.arange(len(X))
plt.bar(X_axis - 0.2, Y, 0.4, label='Actual',color='#436bad')
plt.bar(X_axis + 0.2, Z, 0.4, label='Predicted',color='#c5c9c7')
plt.legend(loc=2, prop={'size': 6.5})
labels=['1','2','3','4','5','6','7','8','9','10']
plt.xticks(X,labels,rotation=60)
plt.xlabel("Node no")
plt.ylabel("Accuracy (%)")
plt.ylim(60,95)

You can use plt.text and set * where do you want, like below:
(Because I can't run your code. I send an example)
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1,3,7])
y = [2, 3, 2]
z = [1, 2, 3]
plt.bar(x-0.1, y, width=0.2, color='b', align='center')
plt.bar(x+0.1, z, width=0.2, color='g', align='center')
labels=[1,2,3,4,5,6,7,8,9,10]
plt.xticks(range(1,11),labels,rotation=60)
str_x = [l for l in labels if not l in x]
for s_x in str_x:
plt.text(s_x, 0.1, '*', ha='center', fontsize=26)
Output:

how to generate a series of histograms on matplotlib?

I would like to generate a series of histogram shown below:
The above visualization was done in tensorflow but I'd like to reproduce the same visualization on matplotlib.
EDIT:
Using plt.fill_between suggested by #SpghttCd, I have the following code:
colors=cm.OrRd_r(np.linspace(.2, .6, 10))
plt.figure()
x = np.arange(100)
for i in range(10):
y = np.random.rand(100)
plt.fill_between(x, y + 10-i, 10-i,
facecolor=colors[i]
edgecolor='w')
plt.show()
This works great, but is it possible to use histogram instead of a continuous curve?

EDIT:
joypy based approach, like mentioned in the comment of october:
import pandas as pd
import joypy
import numpy as np
df = pd.DataFrame()
for i in range(0, 400, 20):
df[i] = np.random.normal(i/410*5, size=30)
joypy.joyplot(df, overlap=2, colormap=cm.OrRd_r, linecolor='w', linewidth=.5)
for finer control of colors, you can define a color gradient function which accepts a fractional index and start and stop color tuples:
def color_gradient(x=0.0, start=(0, 0, 0), stop=(1, 1, 1)):
r = np.interp(x, [0, 1], [start[0], stop[0]])
g = np.interp(x, [0, 1], [start[1], stop[1]])
b = np.interp(x, [0, 1], [start[2], stop[2]])
return (r, g, b)
Usage:
joypy.joyplot(df, overlap=2, colormap=lambda x: color_gradient(x, start=(.78, .25, .09), stop=(1.0, .64, .44)), linecolor='w', linewidth=.5)
Examples with different start and stop tuples:
original answer:
You could iterate over your dataarrays you'd like to plot with plt.fill_between, setting colors to some gradient and the line color to white:
creating some sample data:
import numpy as np
t = np.linspace(-1.6, 1.6, 11)
y = np.cos(t)**2
y2 = lambda : y + np.random.random(len(y))/5-.1
plot the series:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
colors = cm.OrRd_r(np.linspace(.2, .6, 10))
plt.figure()
for i in range(10):
plt.fill_between(t+i, y2()+10-i/10, 10-i/10, facecolor = colors[i], edgecolor='w')
If you want it to have more optimized towards your example you should perhaps consider providing some sample data.
EDIT:
As I commented below, I'm not quite sure if I understand what you want - or if you want the best for your task. Therefore here a code which plots besides your approach in your edit two smples of how to present a bunch of histograms in a way that they are better comparable:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.cm as cm
N = 10
np.random.seed(42)
colors=cm.OrRd_r(np.linspace(.2, .6, N))
fig1 = plt.figure()
x = np.arange(100)
for i in range(10):
y = np.random.rand(100)
plt.fill_between(x, y + 10-i, 10-i,
facecolor=colors[i],
edgecolor='w')
data = np.random.binomial(20, .3, (N, 100))
fig2, axs = plt.subplots(N, figsize=(10, 6))
for i, d in enumerate(data):
axs[i].hist(d, range(20), color=colors[i], label=str(i))
fig2.legend(loc='upper center', ncol=5)
fig3, ax = plt.subplots(figsize=(10, 6))
ax.hist(data.T, range(20), color=colors, label=[str(i) for i in range(N)])
fig3.legend(loc='upper center', ncol=5)
This leads to the following plots:
your plot from your edit:
N histograms in N subplots:
N histograms side by side in one plot:

Drawing heat map in python

I'm having two lists x, y representing coordinates in 2D. For example x = [1,4,0.5,2,5,10,33,0.04] and y = [2,5,44,0.33,2,14,20,0.03]. x[i] and y[i] represent one point in 2D. Now I also have a list representing "heat" values for each (x,y) point, for example z = [0.77, 0.88, 0.65, 0.55, 0.89, 0.9, 0.8,0.95]. Of course x,y and z are much higher dimensional than the example.
Now I would like to plot a heat map in 2D where x and y represents the axis coordinates and z represents the color. How can this be done in python?

This code produces a heat map. With a few more data points, the plot starts looking pretty nice and I've found it to be very quick in general even for >100k points.
import matplotlib.pyplot as plt
import matplotlib.tri as tri
import numpy as np
import math
x = [1,4,0.5,2,5,10,33,0.04]
y = [2,5,44,0.33,2,14,20,0.03]
z = [0.77, 0.88, 0.65, 0.55, 0.89, 0.9, 0.8, 0.95]
levels = [0.7, 0.75, 0.8, 0.85, 0.9]
plt.figure()
ax = plt.gca()
ax.set_aspect('equal')
CS = ax.tricontourf(x, y, z, levels, cmap=plt.get_cmap('jet'))
cbar = plt.colorbar(CS, ticks=np.sort(np.array(levels)),ax=ax, orientation='horizontal', shrink=.75, pad=.09, aspect=40,fraction=0.05)
cbar.ax.set_xticklabels(list(map(str,np.sort(np.array(levels))))) # horizontal colorbar
cbar.ax.tick_params(labelsize=8)
plt.title('Heat Map')
plt.xlabel('X Label')
plt.ylabel('Y Label')
plt.show()
Produces this image:
or if you're looking for a more gradual color change, change the tricontourf line to this:
CS = ax.tricontourf(x, y, z, np.linspace(min(levels),max(levels),256), cmap=cmap)
and then the plot will change to:

Based on this answer, you might want to do something like:
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
xs0 = [1,4,0.5,2,5,10,33,0.04]
ys0 = [2,5,44,0.33,2,14,20,0.03]
zs0 = [0.77, 0.88, 0.65, 0.55, 0.89, 0.9, 0.8,0.95]
N = 30j
extent = (np.min(xs0),np.max(xs0),np.min(ys0),np.max(ys0))
xs,ys = np.mgrid[extent[0]:extent[1]:N, extent[2]:extent[3]:N]
resampled = griddata(xs0, ys0, zs0, xs, ys, interp='linear')
plt.imshow(np.fliplr(resampled).T, extent=extent,interpolation='none')
plt.colorbar()
The example here might also help: http://matplotlib.org/examples/pylab_examples/griddata_demo.html

scatter plot with aligned annotations at each data point

I want to produce a scatter plot with dozens of points, which could potentially be very close to each other. I've tried the method of annotation from the answer to the question:
>> matplotlib scatter plot with different text at each data point
but you can see that the labels / annotations overlap when the points are close enough to each other. Is there any library or method to generate such plots with individual annotations that don't collide with each other, nor with borders of the plot?
import matplotlib.pyplot as plt
z = [0.15, 0.3, 0.45, 0.46, 0.6, 0.75]
y = [2.56422, 3.77284, 3.52623, 3.52623, 3.51468, 3.02199]
n = [58, 651, 393, "393(2)", 203, 123]
fig, ax = plt.subplots()
ax.scatter( z, y )
for i, txt in enumerate( n ):
ax.annotate( txt, ( z[i] + .01, y[i] + .01 ) )
plt.show()

I have written a library adjustText which does exactly this. https://github.com/Phlya/adjustText
z = [0.15, 0.3, 0.45, 0.46, 0.6, 0.75]
y = [2.56422, 3.77284, 3.52623, 3.52623, 3.51468, 3.02199]
n = [58, 651, 393, "393(2)", 203, 123]
fig, ax = plt.subplots()
ax.scatter( z, y )
texts = []
for i, txt in enumerate( n ):
texts.append(ax.text(z[i], y[i], txt))
adjust_text(texts)
plt.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib Scatter plot [duplicate] - python

Python 3.6+: coordinates = [('a',1,2), ('b',3,4), ('c',5,6)] for x in coordinates: plt.annotate(x[0], (x[1], x[2]))

This might be useful when you need individually annotate in different time (I mean, not in a single for loop) ax = plt.gca() ax.annotate('your_lable', (x,y)) where x and y are the your target coordinate and type is float/int.

As a one liner using list comprehension and numpy: [ax.annotate(x[0], (x[1], x[2])) for x in np.array([n,z,y]).T] setup is ditto to Rutger's answer.

Related

How to draw a scatter graph with 2 y-axis

How to add a single marker in a bar graph

how to generate a series of histograms on matplotlib?

Drawing heat map in python

scatter plot with aligned annotations at each data point

Categories

Resources