I'm trying to visualize data where each X value has multiple Y values and I would like to distinguish each Y value visaully. This is the example code
xLables = ['A1','A2','A3','A4','A5']
YValues = [[1,2,3,4],[1,2,3,4,5,6,7],[1,2,3],[5,6,7],[1,2,3]]
X = [xLables[i] for i, data in enumerate(YValues) for j in range(len(data))]
Y = [val for data in YValues for val in data]
plt.scatter(X, Y)
plt.grid()
plt.show()
When I plot this , I get the following attached
Each X label has corresponding Y values ... For Ex: A1 has 1,2,3,4 , A2 has 1,2,3,4,5,6,7 and so on
I have two questions on this one
(1) Can I have different markers for different Y-values .. all 1's are stars , all 2's are diamonds , all 10's are circles ?
something like this may be
(2) Is there a better way to plot such 2D data and distingush them where each X has multiple Y values
Any suggestions/help is highly appreciated
Thanks
I tried to add markers and different colors , but they apply to all Y values for each X .. but not specific to each Y values..
My solution is particularly ad hoc, but it replicates your target drawing using your data, so that I feel confident posting her here.
import matplotlib.pyplot as plt
labels = ['A1','A2','A3','A4','A5']
Y2D = [[1,2,3,4],[1,2,3,4,5,6,7],[1,2,3],[5,6,7],[1,2,3]]
# prepare a dictionary with the characteristics
# we want to change according to the Y value
d = {1:dict(marker="*", s=150, color="red"),
2:dict(marker="o", s=100, color="yellow"),
3:dict(marker="o", s= 60, color="blue"),
4:dict(marker="o", s=100, color="green"),
5:dict(marker="o", s=100, color="red"),
6:dict(marker="*", s=150, color="blue"),
7:dict(marker="o", s=100, color="lightgray")}
# an outer loop on the abscissae and the lists of Y values
for x, ys in zip(labels, Y2D):
an inner loop on the Y values, plotted separately
for y in ys:
# here the point is to unpack the values contained
# in the "inner" dictionary, addressing the outer by Y
# zorder=5 places the dots above the grid
plt.scatter(x, y, ec='k', zorder=5, **d[y])
plt.grid(1)
plt.show()
I think the easiest way is to use one plt.scatter() per score.
import matplotlib.pyplot as plt
xLables = [ 'A1','A2','A3','A4','A5']
YValues = [ [1,2,3,4],[1,2,3,4,5,6,7],[1,2,3],[5,6,7],[1,2,3]]
markers = [ '.', 'o', '^', 'v', '>', '<', '*'] # to be customized
Y = [None for i in range( len( xLables))]
for y in range( len( markers)):
for x in range( len( xLables)):
Y[x] = y+1 if y+1 in YValues[x] else None # values start at 1
if any( Y): # something to display?
plt.scatter( xLables, Y, marker=markers[y])
plt.grid()
plt.show()
Add your marker types to list & iterate over them accordingly.
from matplotlib import pyplot as plt
import matplotlib
xLables = ['A1','A2','A3','A4','A5']
YValues = [[1,2,3,4],[1,2,3,4,5,6,7],[1,2,3],[5,6,7],[1,2,3]]
X = [xLables[i] for i, data in enumerate(YValues) for j in range(len(data))]
Y = [val for data in YValues for val in data]
plt.scatter(X, Y, marker=matplotlib.markers.CARETDOWNBASE)
plt.grid()
markers=['8','+', '.', 'o', '*','^', 's', 'p', 'h','8','+', '.', 'o', '*','^', 's', 'p', 'h' ]
for i in range(18):
plt.plot(X[i], Y[i], marker=markers[i])
plt.xlabel('X Label')
plt.ylabel('Y Label')
plt.show()
Output:
Note: change the order you want accordingly you want. That will be replicated over graphs
You can add markers.
A list of them can be found here: https://matplotlib.org/stable/api/markers_api.html
I think you can use it like this:
plt.scatter([1, 2, 3], marker=11) which means that you'll have to put the values you want to be the same in the same list. I don't think there is a way of giving a list of markers or something like that.
As far as i understand your code would look something like:
plt.scatter(X, Y[0], marker=1)
plt.scatter(X, Y[1], marker=1)
plt.scatter(X, Y[2], marker=1)
plt.scatter(X, Y[3], marker=1)
You could make this a for loop if it is something having to work for different sizes but i suppose you'll figure it out from here.
Good luck i hope this helps.
Related
Background:
I have a list_of_x_and_y_list that contains x and y values which looks like:
[[(44800, 14888), (132000, 12500), (40554, 12900)], [(None, 193788), (101653, 78880), (3866, 160000)]]
I have another data_name_list ["data_a","data_b"] so that
"data_a" = [(44800, 14888), (132000, 12500), (40554, 12900)]
"data_b" = [(None, 193788), (101653, 78880), (3866, 160000)]
The len of list_of_x_and_y_list / or len of data_name_list is > 20.
Question:
How can I create a scatter plot for each item (being the same colour) in the data_name_list?
What I have tried:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax = plt.axes(facecolor='#FFFFFF')
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']
print(list_of_x_and_y_list)
for x_and_y_list, data_name, color in zip(list_of_x_and_y_list, data_name_list, colors):
for x_and_y in x_and_y_list,:
print(x_and_y)
x, y = x_and_y
ax.scatter(x, y, label=data_name, color=color) # "label=data_name" creates
# a huge list as a legend!
# :(
plt.title('Matplot scatter plot')
plt.legend(loc=2)
file_name = "3kstc.png"
fig.savefig(file_name, dpi=fig.dpi)
print("Generated: {}".format(file_name))
The Problem:
The legend appears to be a very long list, which I don't know how to rectify:
Relevant Research:
Matplotlib scatterplot
Scatter Plot
Scatter plot in Python using matplotlib
The reason you get a long repeated list as a legend is because you are providing each point as a separate series, as matplotlib does not automatically group your data based on the labels.
A quick fix is to iterate over the list and zip together the x-values and the y-values of each series as two tuples, so that the x tuple contains all the x-values and the y tuple the y-values.
Then you can feed these tuples to the plt.plot method together with the labels.
I felt that the names list_of_x_and_y_list were uneccessary long and complicated, so in my code I've used shorter names.
import matplotlib.pyplot as plt
data_series = [[(44800, 14888), (132000, 12500), (40554, 12900)],
[(None, 193788), (101653, 78880), (3866, 160000)]]
data_names = ["data_a","data_b"]
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax = plt.axes(facecolor='#FFFFFF')
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']
for data, data_name, color in zip(data_series, data_names, colors):
x,y = zip(*data)
ax.scatter(x, y, label=data_name, color=color)
plt.title('Matplot scatter plot')
plt.legend(loc=1)
To only get one entry per data_name, you should add data_name only once as a label. The rest of the calls should go with label=None.
The simplest you can achieve this using the current code, is to set data_name to None at the end of the loop:
from matplotlib import pyplot as plt
from random import randint
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.set_facecolor('#FFFFFF')
# create some random data, suppose the sublists have different lengths
list_of_x_and_y_list = [[(randint(1000, 4000), randint(2000, 5000)) for col in range(randint(2, 10))]
for row in range(10)]
data_name_list = list('abcdefghij')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
for x_and_y_list, data_name, color in zip(list_of_x_and_y_list, data_name_list, colors):
for x_and_y in x_and_y_list :
x, y = x_and_y
ax.scatter(x, y, label=data_name, color=color)
data_name = None
plt.legend(loc=2)
plt.show()
Some things can be simplified, making the code 'more pythonic', for example:
for x_and_y in x_and_y_list :
x, y = x_and_y
can be written as:
for x, y in x_and_y_list:
Another issue, is that with a lot of data calling scatter for every point could be rather slow. All the x and y belonging to the same list can be plotted together. For example using list comprehension:
for x_and_y_list, data_name, color in zip(list_of_x_and_y_list, data_name_list, colors):
xs = [x for x, y in x_and_y_list]
ys = [y for x, y in x_and_y_list]
ax.scatter(xs, ys, label=data_name, color=color)
scatter could even get a list of colors per point, but plotting all the points in one go, wouldn't allow for labels per data_name.
Very often, numpy is used to store numerical data. This has some advantages, such as vectorization for quick calculations. With numpy the code would look like:
import numpy as np
for x_and_y_list, data_name, color in zip(list_of_x_and_y_list, data_name_list, colors):
xys = np.array(x_and_y_list)
ax.scatter(xys[:,0], xys[:,1], label=data_name, color=color)
I'm trying to make a 2d plot using pyplot. I'm reading in a file with several columns, each of which contains around 100 values between 1 and 10. I'm plotting column 5 against column 6, which is fine.
What I also want to do is label the resulting line with integer values from column 0. So the line will have 11 points on, at the positions (x,y) where column 0 is an integer. I'd also like those points to be labelled with that integer.
I'd really appreciate any help with this, its driving me crazy!
From your question, I'm not 100% clear exactly what you're wanting to do.
Do you just want to label every vertex in a line? Or do you only want to label vertices that are integers? Or do you want to interpolate where integer "crossings" would line along the line and label those?
First off, for loading your text file, look into numpy.loadtxt, if you aren't already. In your particular case, you could do something like:
z, x, y = np.loadtxt('data.txt', usecols=[0, 5, 6]).T
At any rate, as a quick example of the simplest option (labeling every vertex):
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(10)
y = 2 * x
z = x ** 2
fig, ax = plt.subplots()
ax.plot(x, y, 'bo-')
for X, Y, Z in zip(x, y, z):
# Annotate the points 5 _points_ above and to the left of the vertex
ax.annotate('{}'.format(Z), xy=(X,Y), xytext=(-5, 5), ha='right',
textcoords='offset points')
plt.show()
Now, for the second option, we might have something more like this (Similar to what #mathematical.coffee suggested):
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-0.6, 5.6, 0.2)
y = 2 * x
z = x**2
fig, ax = plt.subplots()
ax.plot(x, y, 'bo-')
# Note the threshold... I'm assuming you want 1.000001 to be considered an int.
# Otherwise, you'd use "z % 1 == 0", but beware exact float comparisons!!
integers = z % 1 < 1e-6
for (X, Y, Z) in zip(x[integers], y[integers], z[integers]):
ax.annotate('{:.0f}'.format(Z), xy=(X,Y), xytext=(-10, 10), ha='right',
textcoords='offset points',
arrowprops=dict(arrowstyle='->', shrinkA=0))
plt.show()
I actually want to recreate an image like the following:
Specially the little X on the xaxes
I have a
list = [[100,-3],[200,None],[120,-2] ... ]
and I do
for x in list:
if x[1]!=None:
plot(x[0],x[1],'ok')
else:
### PLot on the axes ###
But while I am plotting I do not know what the axes are. I know that some values are None, for example ( 250,None), So I want to plot on the xaxes at x = 250, but I have not idea what eventually the min(ylim()) will be.
I know I can do plot(250,-5,'X',zorder=999999) but this is only when I know what the min axes is.. (I can not do min, max and so to know the min axes. as the real data is a list inside a list inside a dictionary etc.. )
So the trick is to use a custom transformation. The regular data transformation for the x axis and the axes transformation for the y axis. Matplotlib calls that a blended transformation, which you need to create yourself. You'll find more information in this awesome guide.
And as #ThePredator already pointed out, you have to set clip_on=False, otherwise your markers will be clipped.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.transforms as transforms
fig, ax = plt.subplots()
# the x coords of this transformation are data, and the
# y coord are axes
trans = transforms.blended_transform_factory( ax.transData, ax.transAxes)
# data points on the axes
x = np.random.rand(5)*100. + 200.
y = [0]*5
ax.plot(x, y, 'kx', transform=trans, markersize=10, markeredgewidth=2,
clip_on=False)
# regular data
x = np.random.rand(5)*100. + 200.
y = np.random.rand(5)*100. + 200.
ax.plot(x, y, 'ro')
plt.show()
Result:
You can use the clip_on = False option. Example:
In your case, you can set your y limits.
Example:
x = [0,1,2,3,4,5]
y = [0,0,0,0,0,0]
plt.plot(x,y,'x',markersize=20,clip_on=False,zorder=100)
plt.ylim(0,1)
plt.show()
You can use get_ylim() in order to get the position of the axis and then plot on it.
I have a dataset like this,where I have a set of values for xs and I plot the corresponding line graph with the values of ys.
xs = np.array([1,2,5,6,9,10,11)
ys = pow(xs,2)
ys
plt.plot(xs, ys, linestyle='-', marker='o')
plt.show()
If you notice by default, plot connects the points and draws line. But, I want to draw the line at 0 for missing points. How do I do this ? Should I manipulate the data to fill missing values with zeros (numpy,maybe) or is there a way to plot this matplotlib.plot ?
To be precise I need to plot: xs = np.array([1,2,0,0,5,6,0,0,9,10,11,0,0,0,0])
ys = pow(xs,2)
But, as of now, this is my xs=np.array([1,2,5,6,9,10,11). How do i fill the missing elements in the range 1:15. I looked at masked_array which is different. Is there any other fill option in numpy ?
Since you want to plot points that aren't in your data set, it will be hard to do directly in matplotlib. But, constructing the points is easy enough using put:
xs = array([1,2,5,6,9,10,11])
ys = xs**2
x = arange(12)
y = zeros(12, dtype=int32)
put(y, xs, ys)
plt.plot(x, y, 'o', clip_on=False)
If you aren't dealing with an integer X axis, you can do this:
xs = array([1.0,2,5,6,9,10,11])
ys = xs**2
x = arange(0, 12, 0.5)
y = zeros(x.shape)
mask = r_[diff(searchsorted(xs, x)), 0]
y[mask == 1] = ys
plt.plot(x, y, 'o', clip_on=False)
I'm trying to make a 2d plot using pyplot. I'm reading in a file with several columns, each of which contains around 100 values between 1 and 10. I'm plotting column 5 against column 6, which is fine.
What I also want to do is label the resulting line with integer values from column 0. So the line will have 11 points on, at the positions (x,y) where column 0 is an integer. I'd also like those points to be labelled with that integer.
I'd really appreciate any help with this, its driving me crazy!
From your question, I'm not 100% clear exactly what you're wanting to do.
Do you just want to label every vertex in a line? Or do you only want to label vertices that are integers? Or do you want to interpolate where integer "crossings" would line along the line and label those?
First off, for loading your text file, look into numpy.loadtxt, if you aren't already. In your particular case, you could do something like:
z, x, y = np.loadtxt('data.txt', usecols=[0, 5, 6]).T
At any rate, as a quick example of the simplest option (labeling every vertex):
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(10)
y = 2 * x
z = x ** 2
fig, ax = plt.subplots()
ax.plot(x, y, 'bo-')
for X, Y, Z in zip(x, y, z):
# Annotate the points 5 _points_ above and to the left of the vertex
ax.annotate('{}'.format(Z), xy=(X,Y), xytext=(-5, 5), ha='right',
textcoords='offset points')
plt.show()
Now, for the second option, we might have something more like this (Similar to what #mathematical.coffee suggested):
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-0.6, 5.6, 0.2)
y = 2 * x
z = x**2
fig, ax = plt.subplots()
ax.plot(x, y, 'bo-')
# Note the threshold... I'm assuming you want 1.000001 to be considered an int.
# Otherwise, you'd use "z % 1 == 0", but beware exact float comparisons!!
integers = z % 1 < 1e-6
for (X, Y, Z) in zip(x[integers], y[integers], z[integers]):
ax.annotate('{:.0f}'.format(Z), xy=(X,Y), xytext=(-10, 10), ha='right',
textcoords='offset points',
arrowprops=dict(arrowstyle='->', shrinkA=0))
plt.show()