generate series of plots with pandas dataframe - python

I have to generate a series of scatter plots (roughly 100 in total).
I have created an example to illustrate the problem.
First do an import.
import pandas as pd
Create a pandas dataframe.
# Create dataframe
data = {'name': ['Jason', 'Jason', 'Tina', 'Tina', 'Tina', 'Jason', 'Tina'],
'report_value': [4, 24, 31, 2, 3, 5, 10],
'coverage_id': ['m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7']}
df = pd.DataFrame(data)
print(df)
Output:
coverage_id name report_value
0 m1 Jason 4
1 m2 Jason 24
2 m3 Tina 31
3 m4 Tina 2
4 m5 Tina 3
5 m6 Jason 5
6 m7 Tina 10
The goal is generate two scatter plots without using a for-loop. The name of the person, Jason or Tina, should be displayed in the title. The report_value should be on the y-axis in both plots and the coverage_id (which is a string) on the x-axis.
I thought I should start with:
df.groupby('name')
Then I need to apply the operation to every group.
This way I have the dataframe grouped by their names. I don't know how to proceed and get Python to make the two plots for me.
Thanks a lot for any help.

I think you can use this solution, but first is necessary convert string column to numeric, plot and last set xlabels:
import matplotlib.pyplot as plt
u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i
groups = df.groupby('name')
# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
ax.plot(group.coverage_id,
group.report_value,
marker='o',
linestyle='',
ms=12,
label=name)
ax.set(xticks=range(len(i)), xticklabels=u)
ax.legend()
plt.show()
Another seaborn solution with seaborn.pairplot:
import seaborn as sns
u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i
g=sns.pairplot(x_vars=["coverage_id"], y_vars=["report_value"], data=df, hue="name", size=5)
g.set(xticklabels=u, xlim=(0, None))

Related

how to set x_axis label(not xtick label) for all subplots in relplot?

I tried drawing subplot through relplot method of seaborn. Now the question is, due to the original dataset is varying, sometimes I don't know how much final subplots will be.
I set col_wrap to limit it, but sometimes the results looks not so good. For example, I set col_wrap = 3, while there are 5 subplots as below:
As the figure shows, the x_axis only occurs in the C D E, which seems strange. I want x axis label is shown in all subplots(from A to E).
Now I already know that facet_kws={'sharex': 'col'} allows plots to have independent axis scales(according to set axis limits on individual facets of seaborn facetgrid).
But I want set labels for x axis of all subplots.I haven't found any solution for it.
Any keyword like set_xlabels in object FacetGrid seems to be useless, because official document announces they only control "on the bottom row of the grid".
FacetGrid.set_xlabels(label=None, clear_inner=True, **kwargs)
Label the x axis on the bottom row of the grid.
The following are my example data and my code:
city date value
0 A 1 9
1 B 1 20
2 C 1 4
3 D 1 33
4 E 1 2
5 A 2 22
6 B 2 32
7 C 2 27
8 D 2 32
9 E 2 18
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_excel("data/example_data.xlsx")
# print(df)
g = sns.relplot(data=df, x="date", y="value", kind="line", col="city", col_wrap=3,
errorbar=None, facet_kws={'sharex': 'col'})
(g.set_axis_labels("x_axis", "y_axis", )
.set_titles("{col_name}")
.tight_layout()
.add_legend()
)
plt.subplots_adjust(top=0.94, wspace=None, hspace=0.4)
plt.show()
Thanks in advance.
In order to reduce superfluous information, Seaborn makes these inner labels invisible. You can make them visible again:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'date': np.repeat([1, 2], 5),
'value': np.random.randint(1, 20, 10),
'city': np.tile([*'abcde'], 2)})
# print(df)
g = sns.relplot(data=df, x="date", y="value", kind="line", col="city", col_wrap=3,
errorbar=None, facet_kws={'sharex': 'col'})
g.set_titles("{col_name}")
g.add_legend()
for ax in g.axes.flat:
ax.set_xlabel('x axis', visible=True)
ax.set_ylabel('y axis', visible=True)
plt.subplots_adjust(top=0.94, wspace=None, hspace=0.4)
plt.show()

How to plot a grouped bar plot of count from pandas

I have a dataframe with the following columns:
gender class
male A
female A
male B
female B
male B
female A
I want to plot a double bar graph with the columns as each gender and the values as the count of how many of each gender are in class A vs B respectively.
So the bars should be grouped by gender and there should be 2 bars - one for each class.
How do I visualize this? I see this example but I'm really confused
speed = [0.1, 17.5, 40, 48, 52, 69, 88]
lifespan = [2, 8, 70, 1.5, 25, 12, 28]
index = ['snail', 'pig', 'elephant',
'rabbit', 'giraffe', 'coyote', 'horse']
df = pd.DataFrame({'speed': speed,
'lifespan': lifespan}, index=index)
speed lifespan
snail 0.1 2.0
pig 17.5 8.0
elephant 40.0 70.0
rabbit 48.0 1.5
giraffe 52.0 25.0
coyote 69.0 12.0
horse 88.0 28.0
ax = df.plot.bar(rot=0)
My index is just row 0 to the # of rows, so I'm confused how I can configure df.plot.bar to work with my use case. Any help would be appreciated!
Use pandas.DataFrame.pivot_table to reshape the dataframe from a long to wide format. The index will be the x-axis, and the columns will be the groups when plotted with pandas.DataFrame.plot
pd.crosstab(df['gender'], df['class']) can also be used to reshape with an aggregation.
Alternatively, use seaborn.countplot and hue='class', or the figure level version seaborn.catplot with kind='count', both of which can create the desired plot without reshaping the dataframe.
If one of the desired columns is in the index, either specify df.index or reset the index with df = df.reset_index()
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = {'gender': ['male', 'female', 'male', 'female', 'male', 'female'], 'class': ['A', 'A', 'B', 'B', 'B', 'A']}
df = pd.DataFrame(data)
# pivot the data and aggregate
dfp = df.pivot_table(index='gender', columns='class', values='class', aggfunc='size')
# plot
dfp.plot(kind='bar', figsize=(5, 3), rot=0)
plt.show()
plt.figure(figsize=(5, 3))
sns.countplot(data=df, x='gender', hue='class')
plt.show()
sns.catplot(kind='count', data=df, x='gender', hue='class', height=3, aspect=1.4)
plt.show()

How to merge two plots in Pandas?

I want to merge two plots, that is my dataframe:
df_inc.head()
id date real_exe_time mean mean+30% mean-30%
0 Jan 31 33.14 43.0 23.0
1 Jan 30 33.14 43.0 23.0
2 Jan 33 33.14 43.0 23.0
3 Jan 38 33.14 43.0 23.0
4 Jan 36 33.14 43.0 23.0
My first plot:
df_inc.plot.scatter(x = 'date', y = 'real_exe_time')
Then
My second plot:
df_inc.plot(x='date', y=['mean','mean+30%','mean-30%'])
When I try to merge with:
fig=plt.figure()
ax = df_inc.plot(x='date', y=['mean','mean+30%','mean-30%']);
df_inc.plot.scatter(x = 'date', y = 'real_exe_time', ax=ax)
plt.show()
I got the following:
How I can merge the right way?
You should not repeat your mean values as an extra column. df.plot() for categorical data will be plotted against the index - hence you will see the original scatter plot (also plotted against the index) squeezed into the left corner.
You could create instead an additional aggregation dataframe that you can plot then into the same graph:
import matplotlib.pyplot as plt
import pandas as pd
#test data generation
import numpy as np
n=30
np.random.seed(123)
df = pd.DataFrame({"date": np.random.choice(list("ABCDEF"), n), "real_exe_time": np.random.randint(1, 100, n)})
df = df.sort_values(by="date").reindex()
#aggregate data for plotting
df_agg = df.groupby("date")["real_exe_time"].agg(mean="mean").reset_index()
df_agg["mean+30%"] = df_agg["mean"] * 1.3
df_agg["mean-30%"] = df_agg["mean"] * 0.7
#plot both into the same subplot
ax = df.plot.scatter(x = 'date', y = 'real_exe_time')
df_agg.plot(x='date', y=['mean','mean+30%','mean-30%'], ax=ax)
plt.show()
Sample output:
You could also consider using seaborn that has, for instance, pointplots for categorical data aggregation.
I'm Guessing that you haven't transform the Date to a datetime object so the first thing you should do is this
#Transform the date to datetime object
df_inc['date']=pd.to_datetime(df_inc['date'],format='%b')
fig=plt.figure()
ax = df_inc.plot(x='date', y=['mean','mean+30%','mean-30%']);
df_inc.plot.scatter(x = 'date', y = 'real_exe_time', ax=ax)
plt.show()

Drawing 3 dimension using python matplotlib

I'm trying to draw the following chart using python.
Can you help me out?
thanks
You can try this.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'realtime':[2,3,4,2,4],
'esttime':[1,1,3,1,4],
'time of 5 mins': ['09:15','09:20','09:25','09:30','09:35']})
df
realtime esttime time of 5 mins
0 2 1 9:15
1 3 1 9:20
2 4 3 9:25
3 2 1 9:30
4 4 4 9:35
Convert your time of 5 mins to valid datetime object using pd.to_datetime.
df['time of 5 mins']=pd.to_datetime(df['time of 5 mins'],format='%H:%M').dt.strftime('%H:%M')
Output:
Now, use time of 5 mins as X-Axis and Y-Axis for realtime and esttime and use matplotlib.pyplot.plot.annotate as 3-rd dimension.
index= ['A', 'B', 'C', 'D', 'E']
plt.plot(df['time of 5 mins'],df['esttime'],marker='o',alpha=0.8,color='#CD5C5C',lw=0.8)
plt.plot(df['time of 5 mins'],df['realtime'],marker='o',alpha=0.8,color='green',lw=0.8)
ax= plt.gca() #gca is get current axes
for i,txt in enumerate(index):
ax.annotate(txt,(df['time of 5 mins'][i],df['realtime'][i]))
ax.annotate(txt,(df['time of 5 mins'][i],df['esttime'][i]))
plt.show()
To make the plot more complete add legend, xlabel, ylabel, title, and stretch the X-Y Axis ranges a little so that it will be visually aesthetic. More details about matplotlib.pyplot here
import matplotlib.pyplot as plt
import numpy as np
y = [2, 3, 4, 2, 4]
y2 = [1, 1, 3, 1, 4]
a = ['9:15', '9:20', '9:25', '9:30', '9:35']
x = np.arange(5)
fig = plt.figure()
ax = plt.subplot(111)
ax.plot(x, y, label='Real Time')
ax.plot(x, y2, label='Estimated Time')
plt.xticks(x, labels=a)
plt.xlabel('Time')
chartBox = ax.get_position()
ax.set_position([chartBox.x0, chartBox.y0, chartBox.width*0.6, chartBox.height])
ax.legend(loc='upper center', bbox_to_anchor=(1.45, 0.8), shadow=True, ncol=1)
plt.show()

Pandas dataframe plotting - issue when switching from two subplots to single plot w/ secondary axis

I have two sets of data I want to plot together on a single figure. I have a set of flow data at 15 minute intervals I want to plot as a line plot, and a set of precipitation data at hourly intervals, which I am resampling to a daily time step and plotting as a bar plot. Here is what the format of the data looks like:
2016-06-01 00:00:00 56.8
2016-06-01 00:15:00 52.1
2016-06-01 00:30:00 44.0
2016-06-01 00:45:00 43.6
2016-06-01 01:00:00 34.3
At first I set this up as two subplots, with precipitation and flow rate on different axis. This works totally fine. Here's my code:
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime
filename = 'manhole_B.csv'
plotname = 'SSMH-2A B'
plt.style.use('bmh')
# Read csv with precipitation data, change index to datetime object
pdf = pd.read_csv('precip.csv', delimiter=',', header=None, index_col=0)
pdf.columns = ['Precipitation[in]']
pdf.index.name = ''
pdf.index = pd.to_datetime(pdf.index)
pdf = pdf.resample('D').sum()
print(pdf.head())
# Read csv with flow data, change index to datetime object
qdf = pd.read_csv(filename, delimiter=',', header=None, index_col=0)
qdf.columns = ['Flow rate [gpm]']
qdf.index.name = ''
qdf.index = pd.to_datetime(qdf.index)
# Plot
f, ax = plt.subplots(2)
qdf.plot(ax=ax[1], rot=30)
pdf.plot(ax=ax[0], kind='bar', color='r', rot=30, width=1)
ax[0].get_xaxis().set_ticks([])
ax[1].set_ylabel('Flow Rate [gpm]')
ax[0].set_ylabel('Precipitation [in]')
ax[0].set_title(plotname)
f.set_facecolor('white')
f.tight_layout()
plt.show()
2 Axis Plot
However, I decided I want to show everything on a single axis, so I modified my code to put precipitation on a secondary axis. Now my flow data data has disppeared from the plot, and even when I set the axis ticks to an empty set, I get these 00:15 00:30 and 00:45 tick marks along the x-axis.
Secondary-y axis plots
Any ideas why this might be occuring?
Here is my code for the single axis plot:
f, ax = plt.subplots()
qdf.plot(ax=ax, rot=30)
pdf.plot(ax=ax, kind='bar', color='r', rot=30, secondary_y=True)
ax.get_xaxis().set_ticks([])
Here is an example:
Setup
In [1]: from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame({'x' : np.arange(10),
'y1' : np.random.rand(10,),
'y2' : np.square(np.arange(10))})
df
Out[1]: x y1 y2
0 0 0.451314 0
1 1 0.321124 1
2 2 0.050852 4
3 3 0.731084 9
4 4 0.689950 16
5 5 0.581768 25
6 6 0.962147 36
7 7 0.743512 49
8 8 0.993304 64
9 9 0.666703 81
Plot
In [2]: fig, ax1 = plt.subplots()
ax1.plot(df['x'], df['y1'], 'b-')
ax1.set_xlabel('Series')
ax1.set_ylabel('Random', color='b')
for tl in ax1.get_yticklabels():
tl.set_color('b')
ax2 = ax1.twinx() # Note twinx, not twiny. I was wrong when I commented on your question.
ax2.plot(df['x'], df['y2'], 'ro')
ax2.set_ylabel('Square', color='r')
for tl in ax2.get_yticklabels():
tl.set_color('r')
Out[2]:

Categories