I need to build a graph from a group of files. My script below and output.
import sys
import matplotlib.pyplot as plt
import matplotlib.image as img
import pandas as pd
import numpy as np
import glob
df=ReadMultPRYFiles(f"/data/beegfs/projects/XOMG2201-FLD/databases/orient/RL53744.00/RL*RP_15*")
# Define variables
X = df['x num']
Y = df['y num']
z = df['value']
# Plot the x, y, and z coordinates as a scatter plot with color representing z
plt.scatter(X, Y, c=z, cmap='rainbow', s=20, marker = 's',zorder=10)
# Y ticks frequency
plt.yticks(np.arange(min(Y), max(Y), 10))
# Add labels to the x and y axes
plt.xlabel('REC_X')
plt.ylabel('REC_Y')
# display
plt.show()
All good but I would like to see on Y label only the values I actually have, from 15264 to 15808, without interpolation or values outside the range. The interval may vary, unfortunately.
to have yticks only for the existing y values you can change the following line
plt.yticks(np.arange(min(Y), max(Y), 10))
to the
plt.yticks(Y.sort_values().tolist())
Performance Improvements
The above answer seems a little bit inefficient. We only need unique values in the Y axis so the following piece of code could do the trick but in a more efficient way.
plt.yticks(np.sort(Y.unique()).tolist())
We are taking advantage of NumPy instead of pandas. and we perform the sorting/converting to list only on the unique values
plt.yticks(np.unique(Y))
As suggested by JohanC works well and quickly.
Related
EDIT: I responded in the comments but I've tried the method in the marked post - my z data is not calculated form my x and y so I can't use a function like that.
I have xyz data that looks like the below:
NEW:the xyz data in the file i produce - I extract these as x,y,z
And am desperately trying to get a plot that has x against y with z as the colour.
y is binned data that goes from (for instance) 2.5 to 0.5 in uneven bins. So the y values are all the same for one set of x and z data. The x data is temperature and the z is density info.
So I'm expecting a plot that looks like a bunch of stacked rectangles where there is a gradient of colour for one bin of y values which spans lots of x values.
However all the codes I've tried don't like my z values and the best I can do is:
The axes look right but the colour bar goes from the bottom to the top of the y axis instead of plotting one z value for each x value at the correct y value
I got this to work with this code:
import matplotlib.cm as cm
from matplotlib.colors import LogNorm
import numpy as np
import scipy.interpolate
data=pandas.read_csv('Data.csv',delimiter=',', header=0,index_col=False)
x=data.tempbin
y=data.sizefracbin
z=data.den
x=x.values
y=y.values
z=z.values
X,Y=np.meshgrid(x,y)
Z=[]
for i in range(len(x)):
Z.append(z)
Z=np.array(Z)
plt.pcolormesh(X,Y,Z)
plt.colorbar()
plt.show()
I've tried everything I could find online such as in the post here: matplotlib 2D plot from x,y,z values
But either there is a problem reshaping my z values or it just gives me empty plots with various errors all to do (I think) with my z values.
Am I missing something? Thank you for your help!
Edit in reponse to : ImportanceOfBeingErnest
I tried this :
import matplotlib.cm as cm
from matplotlib.colors import LogNorm
import numpy as np
import scipy.interpolate
data=pandas.read_csv('Data.csv',delimiter=',', header=0,index_col=False)
data.sort_values('sizefrac')
x=data.tempbin
y=data.sizefrac
z=data.INP
x=x.values
y=y.values
z=z.values
X=x[1:].reshape(N,N)
Y=y[1:].reshape(N,N)
Z=z[1:].reshape(N,N)
plt.pcolormesh(X,Y,Z)
plt.colorbar()
plt.show()
and got a very empty plot. Just showed me the axes and colourbar as in my attached image but pure white inside the axes! No error or anything...
And the reshaping I need to remove a data point from each because otherwise the reshaping won't work
Adapting the linked question to you problem, you should get:
import numpy as np
import matplotlib.pyplot as plt
x = list(range(10))*10
y = np.repeat(list(range(10)), 10)
# build random z data
z = np.multiply(x, y)
N = int(len(z)**.5)
Z = z.reshape(N, N)
plt.imshow(Z[::-1], extent=(np.amin(x), np.amax(x), np.amin(y), np.amax(y)), aspect = 'auto')
plt.show()
The answer was found by Silmathoron in a comment on his answer above - the answer above did not help but in the comments he noticed that the X,Y data was not gridded in w way which would create rectangles on the plot and also mentioned that Z needed to be one smaller than X and Y - from this I could fix my code - thanks all
I frequently find myself working in log units for my plots, for example taking np.log10(x) of data before binning it or creating contour plots. The problem is, when I then want to make the plots presentable, the axes are in ugly log units, and the tick marks are evenly spaced.
If I let matplotlib do all the conversions, i.e. by setting ax.set_xaxis('log') then I get very nice looking axes, however I can't do that to my data since it is e.g. already binned in log units. I could manually change the tick labels, but that wouldn't make the tick spacing logarithmic. I suppose I could also go and manually specify the position of every minor tick such it had log spacing, but is that the only way to achieve this? That is a bit tedious so it would be nice if there is a better way.
For concreteness, here is a plot:
I want to have the tick labels as 10^x and 10^y (so '1' is '10', 2 is '100' etc.), and I want the minor ticks to be drawn as ax.set_xaxis('log') would draw them.
Edit: For further concreteness, suppose the plot is generated from an image, like this:
import matplotlib.pyplot as plt
import scipy.misc
img = scipy.misc.face()
x_range = [-5,3] # log10 units
y_range = [-55, -45] # log10 units
p = plt.imshow(img,extent=x_range+y_range)
plt.show()
and all we want to do is change the axes appearance as I have described.
Edit 2: Ok, ImportanceOfBeingErnest's answer is very clever but it is a bit more specific to images than I wanted. I have another example, of binned data this time. Perhaps their technique still works on this, though it is not clear to me if that is the case.
import numpy as np
import pandas as pd
import datashader as ds
from matplotlib import pyplot as plt
import scipy.stats as sps
v1 = sps.lognorm(loc=0, scale=3, s=0.8)
v2 = sps.lognorm(loc=0, scale=1, s=0.8)
x = np.log10(v1.rvs(100000))
y = np.log10(v2.rvs(100000))
x_range=[np.min(x),np.max(x)]
y_range=[np.min(y),np.max(y)]
df = pd.DataFrame.from_dict({"x": x, "y": y})
#------ Aggregate the data ------
cvs = ds.Canvas(plot_width=30, plot_height=30, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'x', 'y')
# Create contour plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.contourf(agg, extent=x_range+y_range)
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()
The general answer to this question is probably given in this post:
Can I mimic a log scale of an axis in matplotlib without transforming the associated data?
However here an easy option might be to scale the content of the axes and then set the axes to a log scale.
A. image
You may plot your image on a logarithmic scale but make all pixels the same size in log units. Unfortunately imshow does not allow for such kind of image (any more), but one may use pcolormesh for that purpose.
import numpy as np
import matplotlib.pyplot as plt
import scipy.misc
img = scipy.misc.face()
extx = [-5,3] # log10 units
exty = [-45, -55] # log10 units
x = np.logspace(extx[0],extx[-1],img.shape[1]+1)
y = np.logspace(exty[0],exty[-1],img.shape[0]+1)
X,Y = np.meshgrid(x,y)
c = img.reshape((img.shape[0]*img.shape[1],img.shape[2]))/255.0
m = plt.pcolormesh(X,Y,X[:-1,:-1], color=c, linewidth=0)
m.set_array(None)
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
plt.show()
B. contour
The same concept can be used for a contour plot.
import numpy as np
from matplotlib import pyplot as plt
x = np.linspace(-1.1,1.9)
y = np.linspace(-1.4,1.55)
X,Y = np.meshgrid(x,y)
agg = np.exp(-(X**2+Y**2)*2)
fig, ax = plt.subplots()
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
exp = lambda x: 10.**(np.array(x))
cf = ax.contourf(exp(X), exp(Y),agg, extent=exp([x.min(),x.max(),y.min(),y.max()]))
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()
How to draw something like this?
There's kind of like a horizontal line until next data point show up, then a vertical line to adjust the location y. The usual plot function in matplotlib just plot a straight line between two data point, which doesn't satisfy what I need.
You may use one of the drawstyles "steps-pre", "steps-mid", "steps-post" to get a a step-like appearance of your curve.
plt.plot(x,y, drawstyle="steps-pre")
Full example:
import matplotlib.pyplot as plt
import numpy as np; np.random.seed()
x = np.arange(12)
y = np.random.rand(12)
styles = ["default","steps-pre","steps-mid", "steps-post"]
fig, axes = plt.subplots(nrows=len(styles), figsize=(4,7))
for ax, style in zip(axes, styles):
ax.plot(x,y, drawstyle=style)
ax.set_title("drawstyle={}".format(style))
fig.tight_layout()
plt.show()
Just as #cricket_007 said in the comments -- make each y value repeat at the next x value. Below a way how to achieve this with numpy.
EDIT:
Thanks to the comment by #ImportanceOfBeingErnest I replaced the original code that extended the data with a much simpler solution.
from matplotlib import pyplot as plt
import numpy as np
#producing some sample data
x = np.linspace(0,1,20)
y = np.random.rand(x.shape[0])
#extending data to repeat each y value at the next x value
##x1 = np.zeros(2*x.shape[0]-1)
##x1[::2] = x
##x1[1::2] = x[1:]
x1 = np.repeat(x,2)[1:]
##y1 = np.zeros(2*y.shape[0]-1)
##y1[::2] = y
##y1[1::2] = y[:-1]
y1 = np.repeat(y,2)[:-1]
plt.plot(x1, y1)
plt.show()
The result looks like this:
I'm working on a graphic with matplotlib in Python 3.4 that represents:
x = (months)
y = (12 values)
import matplotlib.pyplot as plt
import numpy as np
import calendar
N = 12
mult = 12500
x = np.arange(N)
y = mult *np.random.randn(12)
plt.plot(x, y, 'r')
plt.xticks(x, calendar.month_name[1:13], rotation=20 )
plt.yticks(y, y)
plt.grid('on')
plt.margins(0.05)
plt.show()
Tha labels of the yticks are the values in y, but when some values are very similar the ylabels overlap.
Example:
I've tried linespacing property, but it just works with each label it doesnt affect to the set.
How do I give some spacing to the ylabels or avoid that overlapping?
As pointed out in a comment by #jme: "As a person who reads a lot of graphs, I'd suggest that labeling the y-axis this way is kind of disorienting. I'd label the y-axis with regular intervals, and label the individual points with their y-coords. Like this, for example:
I have two arrays x,y obtained from a machine learning calculations and I wish to make a scatter plot with the reference data x on the diagonal in a way to visualize better the predicted values y against the true ones x. Please can you suggest me how to do it in python or gnuplot?
import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
plt.scatter(x, y, c=colors)
plt.plot( [0,1],[0,1] )
plt.savefig('a.png')
This will produce:
Check this page for more information.
a simple example:
import matplotlib.pyplot as plt
import numpy as np
x=np.linspace(0,100,101)
y=np.random.normal(x) # add some noise
plt.plot(x,y,'r.') # x vs y
plt.plot(x,x,'k-') # identity line
plt.xlim(0,100)
plt.ylim(0,100)
plt.show()
In matplotlib, you can also draw an "infinite" line in order to avoid having to define the exact coordinates. For example, if you have an axes ax, you can do:
pt = (0, 0)
ax.axline(pt, slope=1, color='black')
where pt is an intersection point. Note if pt isn't included in the limits of the plot, the limits will be modified to include it.