I have a data set for clusters . And the output seems something like this :-
1 [1,2]
2 [1,6]
1 [2,4]
Where 1,2 ... is the cluster id and [1,2]..so on are the points . So i want to plot the points x co-ordinate and y co-ordinate on both the axis and corresponding to that a point in graph depicting the cluster id as label and for different id the color of points should be different. How do i go about it?
Thanks
If one axis is the cluster id I don't get how you fit both the x and y coordinates onto the other axis. So I plotted the x,y on the x and y axis and used the cluster id as a label; you can swap which value goes into which axis, I guess:
import matplotlib.pyplot as plt
from ast import literal_eval
data = """1 [1,2]
2 [1,6]
1 [2,4]"""
def cluster_color(clusternum):
'''Some kind of map of cluster ID to a color'''
if clusternum == 1:
return [(1, 0,0)]
if clusternum == 2:
return [(0, 0, 1)]
else:
return [(0, 1,0)]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
def datalines(pt):
'''Pick the coordinates and label out of this text format'''
label, xy = pt.split(' ')
xy = literal_eval(xy)
assert(len(xy) == 2)
return xy[0], xy[1], label
for pt in data.splitlines():
if len(pt) > 0:
x, y, cluster = datalines(pt)
ax.scatter(x, y, c= cluster_color(float(cluster)))
ax.text(x + .01, y + .01, cluster)
fig.show()
N.b.: if you have a lot of data, don't call scatter for each point separately; append x, y , cluster to three separate lists and scatter the lists.
Related
I am trying to create a function that will return a graph with the x most frequent count plot in python.
def counting(df, label, number):
a = df[label].value_counts()[:number]
b = a.to_frame()
return b
def barplot(df, label, number = 3):
fig = plt.figure() #Plotting the graph
fig.set_size_inches([10, 5]) #Size of the figure
sns.countplot(x = label, data=counting(df, label, number))
plt.title(label)
plt.show()
return
I am seeing the label's values on the x axis. But the values should go on the Y axis, and the label names should go on X axis.
I have some data as numpy arrays x, y, v as shown in the code below.
This is actually dummy data for velocity (v) of dust particles in a x-y plane.
I have binned my data into 4 bins and for each bin I have calculated mean of entries in each bin and made a heat map.
Now what I want to do is make a histogram/distribution of v in each bin with 0 as the centre of the histogram.
I do not want to plot the mean anymore, just want to divide my data into the same bins as this code and for each bin I want to generate a histogram of the values in the bins.
How should I do it?
I think this is a way to model the spectrum of an emission line from the gas particles. Any help is appreciated! Thanks!
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
x = np.array([-10,-2,4,12,3,6,8,14,3])
y = np.array([5,5,-6,8,-20,10,2,2,8])
v = np.array([4,-6,-10,40,22,-14,20,8,-10])
x_bins = np.linspace(-20, 20, 3)
y_bins = np.linspace(-20, 20, 3)
H, xedges, yedges = np.histogram2d(x, y, bins = [x_bins, y_bins], weights = v)
pstat = stats.binned_statistic_2d(x, y, v, statistic='mean', bins = [x_bins, y_bins])
plt.xlabel("x")
plt.ylabel("y")
plt.imshow(pstat.statistic.T, origin='lower', cmap='RdBu',
extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar().set_label('mean', rotation=270)
EDIT: Please note that my original data is huge. My arrays for x,y, v are very large and I am using 30x30 grid, that is, not just 4quadrants but 900 bins. I might also need to increase the bin number. So, we want to find a way to automatically divide the 'v' data into the regularly spaced bins and then be able to plot the histograms of the 'v' data in each bin.
I would iterate over the zipped x and y, then flag if v is inside the quadrant and append them to a quadrant list. after, you can plot whatever you'd like:
x = np.array([-10,-2,4,12,3,6,8,14,3])
y = np.array([5,5,-6,8,-20,10,2,2,8])
v = np.array([4,-6,-10,40,22,-14,20,8,-10])
q1 = []
q2 = []
q3 = []
q4 = []
for i, (x1,y1) in enumerate(zip(x,y)):
if x1<0 and y1>=0:
q1.append(v[i])
elif x1>=0 and y1>=0:
q2.append(v[i])
elif x1>=0 and y1<0:
q3.append(v[i])
elif x1<0 and y1<0:
q4.append(v[i])
print(q1)
print(q2)
print(q3)
print(q4)
#[4, -6]
#[40, -14, 20, 8, -10]
#[-10, 22]
#[]
plt.hist(q1, density=True)
plt.hist(q2, density=True)
plt.hist(q3, density=True)
#q4 is empty
If I have a Python MatPlotLib figure (for example, a matplotlib.axes._subplots.AxesSubplot object), is there a way to extract from it positions of the points and the error bars? I.e. I want to get arrays containing x,y-coordinates and y-errors.
Example:
import numpy as np
import seaborn as sb
x = np.random.uniform(-2, 2, 10000)
y = np.random.normal(x**2, np.abs(x) + 1)
p = sb.regplot(x=x, y=y, x_bins=10, fit_reg=None)
How can I extract from 'p' positions of the points and the error bars?
Thank you for your help!
The errorbar data are stored in p.lines, since seaborn plots them using plt.plot.
You can access their positions using line.get_xdata() and line.get_ydata().
The point data are stored in p.collections, since they are plotted internally in seaborn using plt.scatter.
Getting at the point positions from the PathCollection object takes one extra step, as shown in this answer: Get positions of points in PathCollection created by scatter(): i.e. you have to set the offset_position first, before accessing the offsets.
Here's an example to get both the point data and the errorbar data from the
matplotlib Axes object, p.
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
x = np.random.uniform(-2, 2, 10000)
y = np.random.normal(x**2, np.abs(x) + 1)
p = sb.regplot(x=x, y=y, x_bins=10, fit_reg=None)
# First, get the positions of the points:
coll = p.collections[0]
coll.set_offset_position('data')
points_xy = coll.get_offsets()
print points_xy
#[[-1.65295679 3.05723876]
# [-1.29981986 1.60258005]
# [-0.94417279 0.8999881 ]
# [-0.56964819 0.38035406]
# [-0.20253243 0.0774201 ]
# [ 0.15535504 0.024336 ]
# [ 0.5362322 0.30849082]
# [ 0.90482003 0.85788122]
# [ 1.26136841 1.66294418]
# [ 1.63048127 3.02934186]]
# Next, get the positions of the errorbars
xerr = []
yerr = []
for line in p.lines:
xerr.append(line.get_xdata()[0])
yerr.append(line.get_ydata().tolist())
print xerr
# [-1.6529567859649865, -1.2998198636006264, -0.94417278886439027, -0.56964818931133276, -0.20253243328132031, 0.15535504153419355, 0.53623219583456194, 0.90482002911787607, 1.2613684083224488, 1.6304812696399549]
print yerr
# [[2.908807029542707, 3.200571530218434], [1.4449980200239572, 1.751504207194087], [0.7633753040974505, 1.029774999216172], [0.26593411110949544, 0.4753543268237353], [-0.0030674495857816496, 0.15582564460187567], [-0.052610243112427575, 0.09899773706322114], [0.21019700161329888, 0.41120457637300634], [0.7328000635837721, 0.9826379405190817], [1.508513523393156, 1.8184617796582343], [2.885113765027557, 3.1670479251950376]]
plt.show()
Here points_xy is a list of (x,y) coordinates of the points, xerr is the x-coordinate of the errorbars (which is, of course, the same as the x-coordinates in points_xy), and yerr is a list of pairs of y-coordinates: the top and bottom of each errorbar.
If you know the points are in the center of the error bars (it looks like they are, for this example), then this should do it:
import numpy as np
import seaborn as sb
x = np.random.uniform(-2, 2, 10000)
y = np.random.normal(x**2, np.abs(x) + 1)
p = sb.regplot(x=x, y=y, x_bins=10, fit_reg=None)
def get_data(p):
x_list = []
lower_list = []
upper_list = []
for line in p.lines:
x_list.append(line.get_xdata()[0])
lower_list.append(line.get_ydata()[0])
upper_list.append(line.get_ydata()[1])
y = 0.5 * (np.asarray(lower_list) + np.asarray(upper_list))
y_error = np.asarray(upper_list) - y
x = np.asarray(x_list)
return x, y, y_error
get_data(p)
Here the returned y_error will be the magnitude of the error bars.
This is my plot:
If I were to draw your attention to the axis labelled 'B' you'll see that everything is not as it should be.
The plots was produced using this:
def newPoly3D(self):
from matplotlib.cm import autumn
# This passes a pandas dataframe of shape (data on rows x 4 columns)
df = self.loadData()
fig = plt.figure(figsize=(10,10))
ax = fig.gca(projection='3d')
vels = [1.42,1.11,0.81,0.50]
which_joints = df.columns
L = len(which_joints)
dmin,dmax = df.min().min(),df.max().max()
dix = df.index.values
offset=-5
for i,j in enumerate(which_joints):
ax.add_collection3d(plt.fill_between(dix,df[j],
dmin,
lw=1.5,
alpha=0.3/float(i+1.),
facecolor=autumn(i/float(L))),
zs=vels[i],
zdir='y')
ax.grid(False)
ax.set_xlabel('A')
ax.set_xlim([0,df.index[-1]])
ax.set_xticks([])
ax.xaxis.set_ticklabels([])
ax.set_axis_off
ax.set_ylabel('B')
ax.set_ylim([0.4, max(vels)+0.075])
ax.set_yticks(vels)
ax.tick_params(direction='out', pad=10)
ax.set_zlabel('C')
ax.set_zlim([dmin,dmax])
ax.xaxis.labelpad = -10
ax.yaxis.labelpad = 15
ax.zaxis.labelpad = 15
# Note the inversion of the axis
plt.gca().invert_yaxis()
First I want to align the ticks on the yaxis (labelled B) with each coloured face. As you can see they are now offset slightly down.
Second I want to align the yaxis tick labels with the above, as you cans see they are currently very much offset downwards. I do not know why.
EDIT:
Here is some example data; each column represents one coloured face on the above plot.
-13.216256 -7.851065 -9.965357 -25.502654
-13.216253 -7.851063 -9.965355 -25.502653
-13.216247 -7.851060 -9.965350 -25.502651
-13.216236 -7.851052 -9.965342 -25.502647
-13.216214 -7.851038 -9.965324 -25.502639
-13.216169 -7.851008 -9.965289 -25.502623
-13.216079 -7.850949 -9.965219 -25.502592
-13.215900 -7.850830 -9.965078 -25.502529
Here we are again, with a simpler plot, reproduced with this data:
k = 10
df = pd.DataFrame(np.array([range(k),
[x + 1 for x in range(k)],
[x + 4 for x in range(k)],
[x + 9 for x in range(k)]]).T,columns=list('abcd'))
If you want to try this with the above function, comment out the df line in the function and change its argument as so def newPoly3D(df): so that you can pass the the test df above.
I have a three-dimensional array.
The first dimension has 4 elements.
The second dimension has 10 elements.
The third dimension has 5 elements.
I want to plot the contents of this array as follows.
Each element of the first dimension gets its own graph (four graphs on the page)
The values of the second dimension correspond to the y values of the graphs. (there are 10 lines on each graph)
The values of the third dimension correspond to the x values of the graphs (each of the 10 lines has 5 x values)
I'm pretty new to python, and even newer to graphing.
I figured out how to correctly load my array with the data...and I'm not even trying to get the 'four graphs on one page' aspect working.
For now I just want one graph to work correctly.
Here's what I have so far (once my array is set up, and I've correctly loaded my arrays. Right now the graph shows up, but it's blank, and the x-axis includes negative values. None of my data is negative)
for n in range(1):
for m in range(10):
for o in range(5):
plt.plot(quadnumcounts[n][m][o])
plt.xlabel("Trials")
plt.ylabel("Frequency")
plt.show()
Any help would be really appreciated!
Edit. Further clarification. Let's say my array is loaded as follows:
myarray[0][1][0] = 22
myarray[0][1][1] = 10
myarray[0][1][2] = 15
myarray[0][1][3] = 25
myarray[0][1][4] = 13
I want there to be a line, with the y values 22, 10, 15, 25, 13, and the x values 1, 2, 3, 4, 5 (since it's 0 indexed, I can just +1 before printing the label)
Then, let's say I have
myarray[0][2][0] = 10
myarray[0][2][1] = 17
myarray[0][2][2] = 9
myarray[0][2][3] = 12
myarray[0][2][4] = 3
I want that to be another line, following the same rules as the first.
Here's how to make the 4 plots with 10 lines in each.
import matplotlib.pyplot as plt
for i, fig_data in enumerate(quadnumcounts):
# Set current figure to the i'th subplot in the 2x2 grid
plt.subplot(2, 2, i + 1)
# Set axis labels for current figure
plt.xlabel('Trials')
plt.ylabel('Frequency')
for line_data in fig_data:
# Plot a single line
xs = [i + 1 for i in range(len(line_data))]
ys = line_data
plt.plot(xs, ys)
# Now that we have created all plots, show the result
plt.show()
Here is the example of creating subplots of your data. You have not provided the dataset so I used x to be an angle from 0 to 360 degrees and the y to be the trigonemetric functions of x (sine and cosine).
Code example:
import numpy as np
import pylab as plt
x = np.arange(0, 361) # 0 to 360 degrees
y = []
y.append(1*np.sin(x*np.pi/180.0))
y.append(2*np.sin(x*np.pi/180.0))
y.append(1*np.cos(x*np.pi/180.0))
y.append(2*np.cos(x*np.pi/180.0))
z = [[x, y[0]], [x, y[1]], [x, y[2]], [x, y[3]]] # 3-dimensional array
# plot graphs
for count, (x_data, y_data) in enumerate(z):
plt.subplot(2, 2, count + 1)
plt.plot(x_data, y_data)
plt.xlabel('Angle')
plt.ylabel('Amplitude')
plt.grid(True)
plt.show()
Output:
UPDATE:
Using the sample date you provided in your update, you could proceed as follows:
import numpy as np
import pylab as plt
y1 = (10, 17, 9, 12, 3)
y2 = (22, 10, 15, 25, 13)
y3 = tuple(reversed(y1)) # generated for explanation
y4 = tuple(reversed(y2)) # generated for explanation
mydata = [y1, y2, y3, y4]
# plot graphs
for count, y_data in enumerate(mydata):
x_data = range(1, len(y_data) + 1)
print x_data
print y_data
plt.subplot(2, 2, count + 1)
plt.plot(x_data, y_data, '-*')
plt.xlabel('Trials')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
Note that the dimensions are slightly different from yours. Here they are such that mydata[0][0] == 10, mydata[1][3] == 25 etc. The output is show below: