python ggplot is great, but still new, and I find the need to fallback on traditional matplotlib techniques to modify my plots. But I'm not sure how to either pass an axis instance to ggplot, or get one back from it.
So let's say I build a plot like so:
import ggplot as gp
(explicit import)
p = gp.ggplot(gp.aes(x='basesalary', y='compensation'), data = df)
p + gp.geom_histogram(binwidth = 10000)
No problems so far. But now let's say I want the y-axis in log scale. I'd like to be able to do this:
plt.gca().set_yscale('log')
Unfortunately, plt.gca() doesn't access the axis created by ggplot. I end up with two figures: the histogram from ggplot in linear scale, and an empty figure with a log-scale y axis.
I've tried a few variations with both gca() and gcf() without success.
There might have been some changes since 2013 when this question was asked. The way to produce a matplotlib figure from a ggplot is
g.make()
after that, figure and axes can be obtained via
fig = plt.gcf()
ax = plt.gca()
or, if there are more axes, axes = fig.axes.
Then, additional features can be added in matplotlib, like also shown in this question's answer.
Finally the plot can be saved using the usual savefig command.
Complete example:
import ggplot as gp
import matplotlib.pyplot as plt
# produce ggplot
g = gp.ggplot(gp.aes(x='carat', y='price'), data=gp.diamonds)
g = g + gp.geom_point()
g = g + gp.ylab(' ')+ gp.xlab(' ')
# Make
g.make()
# obtain figure from ggplot
fig = plt.gcf()
ax = plt.gca()
# adjust some of the ggplot axes' parameters
ax.set_title("ggplot plot")
ax.set_xlabel("Some x label")
plt.savefig(__file__+".png")
plt.show()
[This is outdated with current ggpy]
There is now a scale_y_log(). If you want to do something in matplotlib, you can get the current figure/axis with
g = ggplot(...)
fig = g.draw()
#or
g.draw() # or print(g)
fig = plt.gcf()
ax = plt.gca()
Your version fails because ggplots draws the plot on print(g) in the ggplot.__repr__() method (which calls ggplot.draw()), so there is simple no matplotlib figure right after constructing the ggplot object but only after print (or g.draw()). g.draw() also returns the figure, so you don't need to use plt.gcf()
Did you try:
p = gp.ggplot(gp.aes(x='basesalary', y='compensation'), data = df)
p + gp.geom_histogram(binwidth = 10000) + gp.scale_y_log()
Not sure if it works just like that though, just guessing from looking at the code...
Related
As the title states I want to return a plt or figure (still not sure what the difference between the two things are) using matplotlib. The main idea behind it is so I can save the plt/figure later.
import seaborn as sns
from matplotlib import pyplot as plt
def graph(df, id):
# size of the graph
xlims = (-180, 180)
ylims = (-180, 180)
# dictate the colors of the scatter plot based on the grouping of hot or cold
color_dict = {'COLD': 'blue',
'HOT': 'red'}
title_name = f"{id}"
ax = sns.scatterplot(data=df, hue='GRP', x='X_GRID', y='Y_GRID',
legend=False, palette=color_dict)
ax.set_title(title_name)
ax.set(xlim=xlims)
ax.set(ylim=ylims)
if show_grid:
# pass in the prev graph so I can overlay grid
ax = self.__get_grid( ax)
circle1 = plt.Circle(xy=(0, 0), radius=150, color='black', fill=False, zorder=3)
ax.add_patch(circle1)
ax.set_aspect('equal')
plt.axis('off')
plt.savefig(title_name + '_in_ftn.png')
fig = plt.figure()
plt.clf()
return (fig, title_name + '.png')
plots = []
# dfs is just a tuple of df, id for example purposes
for df, id in dfs:
plots.append(graph(df, id))
for plot, file_name in plots:
plot.savefig(file_name)
plot.clf()
When using plot.savefig(filename) it saves, but the saved file is blank which is wrong. Am I not properly returning the object I want to save? If not what should I return to be able to save it?
I kind of having it work, but not really. I am currently saving two figures for testing purposes. For some reason when I use the fig=plt.figure() and saving it outside the function the title of the figure and the filename are different (even though they should be the same since the only difference is .png)
However, when saving it inside the function the title name of the figure and the filename name are the same.
You code has multiple issues that I'll try to discuss here:
Your confusion around plt
First of all, there is no such thing as "a plt". plt is the custom name you are giving to the matplotlib.pyplot module when you are importing it with the line import matplotlib.pyplot as plt. You are basically just renaming the module with an easy to type abbreviation. If you had just written import matplotlib, you would have to write matplotlib.pyplot.axis('off') instead of plt.axis('off').
Mix of procedural and object oriented approach
You are using a mix of the procedural and object oriented approach for matplotlib.
Either you call your methods on the axis object (ax) or you can call functions that implicitly handle the axis and figure. For example you could either create and axis and then call ax.plot(...) or instead use plt.plot(...), which implicitly creates the figure and axis. In your case, you mainly use the object oriented approach on the axis object that is returned by the seaborn function. However, you should use ax.axis('off') instead of plt.axis('off').
You create a new blank figure
When you are calling the seaborn function sns.scatterplot, you are implicitly creating a matplotlib figure and axis object. You catch that axis object in the variable ax. You then use plt.savefig to save your image in the function, which works by implicitly getting the figure corresponding to the currently used axis object. However, you are then creating a new figure by calling fig = plt.figure(), which is of course blank, and then returning it. What you should do, is getting the figure currently used by the axis object you are working with. You can get it by calling fig = plt.gcf() (which stands for "get current figure") and would be the procedural approach, or better use fig = ax.get_figure()
What you should do instead is something like this:
import seaborn as sns
from matplotlib import pyplot as plt
def graph(df, id):
# size of the graph
xlims = (-180, 180)
ylims = (-180, 180)
# dictate the colors of the scatter plot based on the grouping of hot or cold
color_dict = {'COLD': 'blue',
'HOT': 'red'}
title_name = f"{id}"
ax = sns.scatterplot(data=df, hue='GRP', x='X_GRID', y='Y_GRID',
legend=False, palette=color_dict)
ax.set_title(title_name)
ax.set(xlim=xlims)
ax.set(ylim=ylims)
if show_grid:
# pass in the prev graph so I can overlay grid
ax = self.__get_grid( ax)
circle1 = plt.Circle(xy=(0, 0), radius=150, color='black', fill=False, zorder=3)
ax.add_patch(circle1)
ax.set_aspect('equal')
ax.axis('off')
fig = ax.get_figure()
fig.savefig(title_name + '_in_ftn.png')
return (fig, title_name + '.png')
I have the following code for generating a time series plot
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
series = pd.Series([np.sin(ii*np.pi) for ii in range(30)],
index=pd.date_range(start='2019-01-01', end='2019-12-31',
periods=30))
series.plot(ax=ax)
I want to set an automatic limit for x and y, I tried using ax.margins() but it does not seem to work:
ax.margins(y=0.1, x=0.05)
# even with
# ax.margins(y=0.1, x=5)
What I am looking for is an automatic method like padding=0.1 (10% of whitespace around the graph)
Pandas and matplotlib seem to be confused rather often while collaborating when axes have dates. For some reason in this case ax.margins doesn't work as expected with the x-axis.
Here is a workaround which does seem to do the job, explicitely moving the xlims:
xmargins = 0.05
ymargins = 0.1
ax.margins(y=ymargins)
x0, x1 = plt.xlim()
plt.xlim(x0-xmargins*(x1-x0), x1+xmargins*(x1-x0))
Alternatively, you could work directly with matplotlib's plot, which does work as expected applying the margins to the date axis.
ax.plot(series.index, series)
ax.margins(y=0.1, x=0.05)
PS: This post talks about setting use_sticky_edges to False and calling autoscale_view after setting the margins, but also that doesn't seem to work here.
ax.use_sticky_edges = False
ax.autoscale_view(scaley=True, scalex=True)
You can use ax.set_xlim and ax.set_ylim to set the x and y limits of your plot respectively.
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
series = pd.Series([np.sin(ii*np.pi) for ii in range(30)],
index=pd.date_range(start='2019-01-01', end='2019-12-31',
periods=30))
# set xlim to be a between certain dates
ax.set_xlim((pd.to_datetime('2019-01-01'), pd.to_datetime('2019-01-31'))
# set ylim to be between certain values
ax.set_ylim((-0.5, 0.5))
series.plot(ax=ax)
I'd like to do something like this:
import matplotlib.pyplot as plt
%matplotlib inline
fig1 = plt.figure(1)
plt.plot([1,2,3],[5,2,4])
plt.show()
In one cell, and then redraw the exact same plot in another cell, like so:
plt.figure(1) # attempting to reference the figure I created earlier...
# four things I've tried:
plt.show() # does nothing... :(
fig1.show() # throws warning about backend and does nothing
fig1.draw() # throws error about renderer
fig1.plot([1,2,3],[5,2,4]) # This also doesn't work (jupyter outputs some
# text saying matplotlib.figure.Figure at 0x..., changing the backend and
# using plot don't help with that either), but regardless in reality
# these plots have a lot going on and I'd like to recreate them
# without running all of the same commands over again.
I've messed around with some combinations of this stuff as well but nothing works.
This question is similar to IPython: How to show the same plot in different cells? but I'm not particularly looking to update my plot, I just want to redraw it.
I have found a solution to do this. The trick is to create a figure with an axis fig, ax = plt.subplots() and use the axis to plot. Then we can just call fig at the end of any other cell we want to replot the figure.
import matplotlib.pyplot as plt
import numpy as np
x_1 = np.linspace(-.5,3.3,50)
y_1 = x_1**2 - 2*x_1 + 1
fig, ax = plt.subplots()
plt.title('Reusing this figure', fontsize=20)
ax.plot(x_1, y_1)
ax.set_xlabel('x',fontsize=18)
ax.set_ylabel('y',fontsize=18, rotation=0, labelpad=10)
ax.legend(['Eq 1'])
ax.axis('equal');
This produces
Now we can add more things by using the ax object:
t = np.linspace(0,2*np.pi,100)
h, a = 2, 2
k, b = 2, 3
x_2 = h + a*np.cos(t)
y_2 = k + b*np.sin(t)
ax.plot(x_2,y_2)
ax.legend(['Eq 1', 'Eq 2'])
fig
Note how I just wrote fig in the last line, making the notebook output the figure once again.
I hope this helps!
I'm trying to show multiple figures at once, but with an offset so I don't have to move the first figure to check that it showed all the figures (plots).
So here's an example:
from pylab import *
figure(0)
plot()
figure(1)
plot()
show()
These figures are shown on top of each other, but I want them to look like this when I run my program:
EDIT:
Any suggestions?
I usually do this with Figure.add_subplot:
fig = figure(0)
ax = fig.add_subplot(211)
ax.plot(...)
ax = fig.add_subplot(212)
ax.plot(...)
show()
If you're wondering what the magic 211 and 212 mean, see this question.
If you're using the tkagg backend, you can do:
import matplotlib.pyplot as plt
for i in range(5):
fig = plt.figure()
fig.canvas._tkcanvas.master.geometry('800x600+{:d}+{:d}'.format(70*i,70*i))
plt.show()
I think that the same treatment could be used for others backends...
Regards
I want to automatically generate a series of plots which are clipped to patches. If I try and reuse a patch object, it moves position across the canvas.
This script (based on an answer to a previous question by Yann) demonstrates what is happening.
import pylab as plt
import scipy as sp
import matplotlib.patches as patches
sp.random.seed(100)
x = sp.random.random(100)
y = sp.random.random(100)
patch = patches.Circle((.75,.75),radius=.25,fc='none')
def doplot(x,y,patch,count):
fig = plt.figure()
ax = fig.add_subplot(111)
im = ax.scatter(x,y)
ax.add_patch(patch)
im.set_clip_path(patch)
plt.savefig(str(count) + '.png')
for count in xrange(4):
doplot(x,y,patch,count)
The first plot looks like this:
But in the second '1.png', the patch has moved..
However replotting again doesn't move the patch. '2.png' and '3.png' look exactly the same as '1.png'.
Could anyone point me in the right direction of what I'm doing wrong??
In reality, the patches I'm using are relatively complex and take some time to generate - I'd prefer to not have to remake them every frame if possible.
The problem can be avoided by using the same axes for each plot, with ax.cla() called to clear the plot after each iteration.
import pylab as plt
import scipy as sp
import matplotlib.patches as patches
sp.random.seed(100)
patch = patches.Circle((.75,.75),radius=.25,fc='none')
fig = plt.figure()
ax = fig.add_subplot(111)
def doplot(x,y,patch,count):
ax.set_xlim(-0.2,1.2)
ax.set_ylim(-0.2,1.2)
x = sp.random.random(100)
y = sp.random.random(100)
im = ax.scatter(x,y)
ax.add_patch(patch)
im.set_clip_path(patch)
plt.savefig(str(count) + '.png')
ax.cla()
for count in xrange(4):
doplot(x,y,patch,count)
An alternative to unutbu's answer, is to use the copy package, which can copy objects. It is very hard to see how things are changing after one calls add_patch, but they are. The axes, figure, extents,clip_box,transform and window_extent properties of the patch are changed. Unfortantely the superficial printing of each of these properties results in the same string, so it looks like they are not changing. But the underlying attributes of some or all of these properties, eg extents is a Bbox, are probably changed.
The copy call will allow you to get a unique patch for each figure you make, without know what kind of patch it is. This still does not answer why this happens, but as I wrote above it's an alternative solution to the problem:
import copy
def doplot(x,y,patch,count):
newPatch = copy.copy(patch)
fig = plt.figure(dpi=50)
ax = fig.add_subplot(111)
im = ax.scatter(x,y)
ax.add_patch(newPatch)
im.set_clip_path(newPatch)
plt.savefig(str(count) + '.png')
Also you can use fig.savefig(str(count) + '.png'). This explicitly saves the figure fig where as the plt.savefig call saves the current figure, which happens to be the one you want.