Basic Pandas matplotlib plotting - python

Sorry to ask such a basic question, but after hours (and hours) of frustration I'm turning to the list for some expert help.
I have two pandas dataframes, df1 and df2. df1 has columns A and B, while df2 has columns C and D. I want to use matplotlib to make a scatterplot of A vs. B, with labelled axes, and a histogram of C, also with a title on the x axis. Then I want to save both figures in pdf files.
I can accomplish the former with
import matplotlib.pyplot as plt
plt.scatter(df1['A'],df1['B'])
plt.xlabel('X title')
plt.ylabel('Y title')
plt.savefig('myfig1.pdf')
But I can't get the histogram to work, and if it does, it creates a graph with both the scatterplot and the histogram in it.
Any help greatly appreciated.

It sounds like you just need to make another figure for the histogram,
import matplotlib.pyplot as plt
fig1 = plt.figure()
plt.scatter(df1['A'],df1['B'])
plt.xlabel('X title')
plt.ylabel('Y title')
plt.savefig('myfig1.pdf')
fig2 = plt.figure()
... <histogram code>
Or you can assign the axes to variables so you dont have to do everything in order,
import random
x = [random.random() for i in range(50)]
y = [random.random() for i in range(50)]
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
fig2 = plt.figure()
ax2 = fig2.add_subplot(111)
ax1.scatter( x, y )
ax1.set_xlabel('X title')
ax1.set_ylabel('Y title')
fig1.savefig('myfig1.pdf')
ax2.hist( y )
Note that when setting properties of an axis using its methods, most of the plt attributes become set_X. For example, instead of plt.ylabel('my_y') you do ax1.set_ylabel('my_y'). You can still use the plt methods, but they will apply to whatever the current plot is. The variables ax1 and ax2 give you a little more freedom about when you do things.

Related

Show legend that matplotlib dynamically created

My df has 4 columns: x, y, z, and grouping. I have created a 3D plot, with the assigned color of each point being decided by what grouping it belongs to in that row. For reference, a "grouping" can be any number from 1 to 6. The code is shown below:
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter3D(df.x, df.y, df.z, c=df.grouping)
plt.show()
I would like to show a legend on the plot that shows which color belongs to which grouping. Previously, I was using Seaborn for a 2D plot and the legend was automatically plotted. How can I add this feature with matplotlib?
If the values to be colormapped are numeric, the solution can be as simple as:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
a = np.random.rand(3,40)
c = np.random.randint(1,7, size=a.shape[1])
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
sc = ax.scatter3D(*a, c=c)
plt.legend(*sc.legend_elements())
plt.show()

Histogram at specific coordinates inside axes

What I want to achieve with Python 3.6 is something like this :
Obviously made in paint and missing some ticks on the xAxis. Is something like this possible? Essentially, can I control exactly where to plot a histogram (and with what orientation)?
I specifically want them to be on the same axes just like the figure above and not on separate axes or subplots.
fig = plt.figure()
ax2Handler = fig.gca()
ax2Handler.scatter(np.array(np.arange(0,len(xData),1)), xData)
ax2Handler.hist(xData,bins=60,orientation='horizontal',normed=True)
This and other approaches (of inverting the axes) gave me no results. xData is loaded from a panda dataframe.
# This also doesn't work as intended
fig = plt.figure()
axHistHandler = fig.gca()
axScatterHandler = fig.gca()
axHistHandler.invert_xaxis()
axHistHandler.hist(xData,orientation='horizontal')
axScatterHandler.scatter(np.array(np.arange(0,len(xData),1)), xData)
A. using two axes
There is simply no reason not to use two different axes. The plot from the question can easily be reproduced with two different axes:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
xData = np.random.rand(1000)
fig,(ax,ax2)= plt.subplots(ncols=2, sharey=True)
fig.subplots_adjust(wspace=0)
ax2.scatter(np.linspace(0,1,len(xData)), xData, s=9)
ax.hist(xData,bins=60,orientation='horizontal',normed=True)
ax.invert_xaxis()
ax.spines['right'].set_visible(False)
ax2.spines['left'].set_visible(False)
ax2.tick_params(axis="y", left=0)
plt.show()
B. using a single axes
Just for the sake of answering the question: In order to plot both in the same axes, one can shift the bars by their length towards the left, effectively giving a mirrored histogram.
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
xData = np.random.rand(1000)
fig,ax= plt.subplots(ncols=1)
fig.subplots_adjust(wspace=0)
ax.scatter(np.linspace(0,1,len(xData)), xData, s=9)
xlim1 = ax.get_xlim()
_,__,bars = ax.hist(xData,bins=60,orientation='horizontal',normed=True)
for bar in bars:
bar.set_x(-bar.get_width())
xlim2 = ax.get_xlim()
ax.set_xlim(-xlim2[1],xlim1[1])
plt.show()
You might be interested in seaborn jointplots:
# Import and fake data
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(2,1000)
# actual plot
jg = sns.jointplot(data[0], data[1], marginal_kws={"bins":100})
jg.ax_marg_x.set_visible(False) # remove the top axis
plt.subplots_adjust(top=1.15) # fill the empty space
produces this:
See more examples of bivariate distribution representations, available in Seaborn.

How to make xticks evenly spaced despite their value?

I am trying to generate a plot with x-axis being a geometric sequence while the y axis is a number between 0.0 and 1.0. My code looks like this:
form matplotlib import pyplot as plt
plt.xticks(X)
plt.plot(X,Y)
plt.show()
which generates a plot like this:
As you can see, I am explicitly setting the x-axis ticks to the ones belonging to the geometric sequence.
My question:Is it possible to make x-ticks evenly spaced despite their value, as the initial terms of the sequence are small, and crowded together. Kind of like logarithmic scale, which would be ideal if dealing with powers of a base, but not for a geometric sequence, I think, as is the case here.
You can do it by plotting your variable as a function of the "natural" variable that parametrizes your curve. For example:
n = 12
a = np.arange(n)
x = 2**a
y = np.random.rand(n)
fig = plt.figure(1, figsize=(7,7))
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.plot(x,y)
ax1.xaxis.set_ticks(x)
ax2.plot(a, y) #we plot y as a function of a, which parametrizes x
ax2.xaxis.set_ticks(a) #set the ticks to be a
ax2.xaxis.set_ticklabels(x) # change the ticks' names to x
which produces:
I had the same problem and spent several hours trying to find something appropriate. But it appears to be really easy and you do not need to make any parameterization or play with some x-ticks positions, etc.
The only thing you need to do is just to plot your x-values as str, not int: plot(x.astype('str'), y)
By modifying the code from the previous answer you will get:
n = 12
a = np.arange(n)
x = 2**a
y = np.random.rand(n)
fig = plt.figure(1, figsize=(7,7))
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.plot(x,y)
ax1.xaxis.set_ticks(x)
ax2.plot(x.astype('str'), y)
Seaborn has a bunch of categorical plot handling natively this kind of task.
Such as pointplot:
sns.pointplot(x="x", y="y", data=df, ax=ax)
Exemple
fig, [ax1, ax2] = plt.subplots(2, figsize=(7,7))
sns.lineplot(data=df, x="x", y="y", ax=ax1) #relational plot
sns.pointplot(data=df, x="x", y="y", ax=ax2) #categorical plot
In case of using Pandas Dataframe:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
n = 12
df = pd.DataFrame(dict(
X=2**np.arange(n),
Y=np.random.randint(1, 9, size=n),
)).set_index('X')
# index is reset in order to use as xticks
df.reset_index(inplace=True)
fig = plt.figure()
ax1 = plt.subplot(111)
df['Y'].plot(kind='bar', ax=ax1, figsize=(7, 7), use_index=True)
# set_ticklabels used to place original indexes
ax1.xaxis.set_ticklabels(df['X'])
convert int to str:
X = list(map(str, X))
plt.xticks(X)
plt.plot(X,Y)
plt.show()

Preserving the xticks in multiple bar plots in matplotlib

1) I am not able to see the text-based xticks which are stored as list in the variable x. When I have only one single column based bar plot, I can see the xticks as text but not for more.
2)how can I control the font properties of xticks and the values in y axis?
Thank you.
import matplotlib.pyplot as plt
import pylab as pl
import numpy as np
#load text and columns into different variables
data = np.genfromtxt('a', names=True, dtype=None, usecols=("X", "N2", "J2", "V2", "asd", "xyz"))
x = data['X']
n = data['N2']
j = data['J2']
v = data['V2']
#make x axis string based labels
r=np.arange(1,25,1.5)
plt.xticks(r,x) #make sure dimension of x and n matches
plt.figure(figsize=(3.2,2), dpi=300, linewidth=3.0)
ax = plt.subplot(111)
ax.bar(r,v,width=0.9,color='red',edgecolor='black', lw=0.5, align='center')
plt.axhline(y=0,linewidth=1.0,color='black') #horizontal line at y=0
plt.axis([0.5,16.5,-0.4,0.20])
ax.bar(r,j,width=0.6,color='green',edgecolor='black', lw=0.5, align='center')
ax.bar(r,n,width=0.3,color='blue',edgecolor='black', lw=0.5, align='center')
plt.axhline(y=0,linewidth=1,color='black') #horizontal line at y=0
plt.axis([0.5,24.5,-0.36,0.15])
plt.savefig('fig',dpi=300,format='png',orientation='landscape')
The way you're doing it, you just need to move the call to plt.xticks(r,x) to somewhere after you create the figure you're working on. Otherwise pyplot will create a new figure for you.
However, I would also consider switching to the more explicit object-oriented interface to matplotlib.
This way you'd use:
fig, ax = plt.subplots(1,1) # your only call to plt
ax.bar(r,v,width=0.9,color='red',edgecolor='black', lw=0.5, align='center')
ax.bar(r,j,width=0.6,color='green',edgecolor='black', lw=0.5, align='center')
ax.bar(r,n,width=0.3,color='blue',edgecolor='black', lw=0.5, align='center')
ax.set_xticks(r)
ax.set_xticklabels(x)
ax.axhline(y=0,linewidth=1,color='black')
fig.savefig('fig',dpi=300,format='png',orientation='landscape')
# or use plt.show() to see the figure interactively or inline, depending on backend
# (see Joe Kington's comment below)

How to add a second x-axis in matplotlib

I have a very simple question. I need to have a second x-axis on my plot and I want that this axis has a certain number of tics that correspond to certain position of the first axis.
Let's try with an example. Here I am plotting the dark matter mass as a function of the expansion factor, defined as 1/(1+z), that ranges from 0 to 1.
semilogy(1/(1+z),mass_acc_massive,'-',label='DM')
xlim(0,1)
ylim(1e8,5e12)
I would like to have another x-axis, on the top of my plot, showing the corresponding z for some values of the expansion factor. Is that possible? If yes, how can I have xtics ax
I'm taking a cue from the comments in #Dhara's answer, it sounds like you want to set a list of new_tick_locations by a function from the old x-axis to the new x-axis. The tick_function below takes in a numpy array of points, maps them to a new value and formats them:
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twiny()
X = np.linspace(0,1,1000)
Y = np.cos(X*20)
ax1.plot(X,Y)
ax1.set_xlabel(r"Original x-axis: $X$")
new_tick_locations = np.array([.2, .5, .9])
def tick_function(X):
V = 1/(1+X)
return ["%.3f" % z for z in V]
ax2.set_xlim(ax1.get_xlim())
ax2.set_xticks(new_tick_locations)
ax2.set_xticklabels(tick_function(new_tick_locations))
ax2.set_xlabel(r"Modified x-axis: $1/(1+X)$")
plt.show()
You can use twiny to create 2 x-axis scales. For Example:
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twiny()
a = np.cos(2*np.pi*np.linspace(0, 1, 60.))
ax1.plot(range(60), a)
ax2.plot(range(100), np.ones(100)) # Create a dummy plot
ax2.cla()
plt.show()
Ref: http://matplotlib.sourceforge.net/faq/howto_faq.html#multiple-y-axis-scales
Output:
From matplotlib 3.1 onwards you may use ax.secondary_xaxis
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(1,13, num=301)
y = (np.sin(x)+1.01)*3000
# Define function and its inverse
f = lambda x: 1/(1+x)
g = lambda x: 1/x-1
fig, ax = plt.subplots()
ax.semilogy(x, y, label='DM')
ax2 = ax.secondary_xaxis("top", functions=(f,g))
ax2.set_xlabel("1/(x+1)")
ax.set_xlabel("x")
plt.show()
If You want your upper axis to be a function of the lower axis tick-values you can do as below. Please note: sometimes get_xticks() will have a ticks outside of the visible range, which you have to allow for when converting.
import matplotlib.pyplot as plt
fig, ax1 = plt.subplots()
ax1 = fig.add_subplot(111)
ax1.plot(range(5), range(5))
ax1.grid(True)
ax2 = ax1.twiny()
ax2.set_xticks( ax1.get_xticks() )
ax2.set_xbound(ax1.get_xbound())
ax2.set_xticklabels([x * 2 for x in ax1.get_xticks()])
title = ax1.set_title("Upper x-axis ticks are lower x-axis ticks doubled!")
title.set_y(1.1)
fig.subplots_adjust(top=0.85)
fig.savefig("1.png")
Gives:
Answering your question in Dhara's answer comments: "I would like on the second x-axis these tics: (7,8,99) corresponding to the x-axis position 10, 30, 40. Is that possible in some way?"
Yes, it is.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = fig.add_subplot(111)
a = np.cos(2*np.pi*np.linspace(0, 1, 60.))
ax1.plot(range(60), a)
ax1.set_xlim(0, 60)
ax1.set_xlabel("x")
ax1.set_ylabel("y")
ax2 = ax1.twiny()
ax2.set_xlabel("x-transformed")
ax2.set_xlim(0, 60)
ax2.set_xticks([10, 30, 40])
ax2.set_xticklabels(['7','8','99'])
plt.show()
You'll get:
I'm forced to post this as an answer instead of a comment due to low reputation.
I had a similar problem to Matteo. The difference being that I had no map from my first x-axis to my second x-axis, only the x-values themselves. So I wanted to set the data on my second x-axis directly, not the ticks, however, there is no axes.set_xdata. I was able to use Dhara's answer to do this with a modification:
ax2.lines = []
instead of using:
ax2.cla()
When in use also cleared my plot from ax1.

Categories