Related
trying to create some boxplots of pandas dataframes.
I have dataframes that typically look like this (not sure if there was a good way to show it so just took a screenshot).
I am creating a boxplot for each dataframe (after transposing) using the df.boxplot() method, it comes out almost exactly how I want it using the code below:
ax = crit_storm_df[tp_cols].T.boxplot()
ax.set_xlabel("Duration (m)")
ax.set_ylabel("Max Flow (cu.m/sec")
ax.set_xlim(0, None)
ax.set_ylim(0, None)
ax.set_title(crit_storm_df.name)
plt.show()
Example pic of output graph. What's lacking though is I want to add a legend with one entry for each box that represents a column in my dataframe in the pic above. Since I transposed the df before plotting, I would like to have a legend entry for each row, i.e. "tp01", "tp02" etc.
Anyone know what I should be doing instead? Is there a way to do this through the df.boxplot() method or do I need to do something in matplotlib?
I tried ax.legend() but it doesn't do anything except give me a warning:
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
Any help would be appreciated, thanks!
If you simply want your boxes to have different colors, you can use seaborn. It's default behavior there:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.randn(10, 4),
columns=['Col1', 'Col2', 'Col3', 'Col4'])
ax = sns.boxplot(data=df)
plt.legend(ax.patches, df.columns)
plt.show()
Edit: adding legend
Output:
To get similar type of required graph this code is help you to do that :
import matplotlib.pyplot as plt
import pandas as pd
data = {
'Duration': [10, 15, 20, 25, 30, 45, 60, 90, 120, 180, 270, 360],
'tp01': [13.1738, 13.1662, 14.3903, 14.2772, 14.3223, 12.5686, 14.8710, 8.9785, 9.2224, 7.4957, 3.6493, 5.7982],
'tp02': [13.1029, 14.2570, 16.5373, 12.6589, 11.0455, 12.6777, 8.1715, 9.3830, 8.3498, 6.0930, 6.4310, 7.4538],
'tp03': [14.5263, 13.6724, 11.4800, 13.4982, 12.3987, 11.6688, 10.4089, 7.0736, 5.8004, 10.1354, 5.5874, 5.6749],
'tp04': [14.7589, 11.6993, 12.5825, 13.5627, 11.9481, 10.7803, 8.9388, 5.7076, 12.7690, 9.7546, 9.5004, 5.9912],
'tp05': [15.5543, 14.1007, 11.7304, 13.3218, 12.4318, 9.5237, 11.9014, 5.6778, 14.2627, 3.7422, 6.4555, 3.3458],
'tp06': [13.5196, 12.5939, 12.5679, 11.4414, 9.3590, 9.6083, 9.6704, 10.5239, 9.1028, 6.0336, 7.0258, 5.9800],
'tp07': [14.7476, 13.3925, 13.0324, 13.3649, 14.7832, 8.1078, 7.1307, 15.4406, 5.0187, 6.9497, 3.6492, 4.8642],
'tp08': [13.3995, 14.3639, 12.7579, 10.6844, 10.3281, 10.2541, 8.8257, 8.8773, 8.3498, 5.7315, 7.8469, 6.7316],
'tp09': [16.7954, 17.1788, 15.9850, 10.8780, 12.5249, 10.2174, 7.5735, 7.3753, 7.1157, 4.8536, 9.1581, 5.6369],
'tp10': [15.7671, 16.1570, 11.6122, 15.2340, 13.2356, 13.2270, 11.6810, 7.1157, 8.0048, 5.5782, 6.0876, 5.7982],
}
df = pd.DataFrame(data).set_index("Duration")
fig, ax = plt.subplots()
df.T.plot(kind='box', ax=ax)
labels = df.columns
lines = [plt.Line2D([0, 1], [0, 1], color=c, marker='o', markersize=10) for c in plt.rcParams['axes.prop_cycle'].by_key()['color'][:len(labels)]]
ax.legend(lines, labels, loc='best')
ax.set_xlabel("Duration (m)")
ax.set_ylabel("Max Flow (cu.m/sec")
ax.set_xlim(0, None)
ax.set_ylim(0, None)
ax.set_xticklabels(df.index)
plt.show()
Result:
I have a dataframe with this data and want to plot it with a bar graph with x-axis labels being months
import pandas as pd
data = {'Birthday': ['1900-01-31', '1900-02-28', '1900-03-31', '1900-04-30', '1900-05-31', '1900-06-30', '1900-07-31', '1900-08-31', '1900-09-30', '1900-10-31', '1900-11-30', '1900-12-31'],
'Players': [32, 25, 27, 19, 27, 18, 18, 21, 23, 21, 26, 23]}
df = pd.DataFrame(data)
Birthday Players
1900-01-31 32
1900-02-28 25
1900-03-31 27
1900-04-30 19
1900-05-31 27
1900-06-30 18
1900-07-31 18
1900-08-31 21
1900-09-30 23
1900-10-31 21
1900-11-30 26
1900-12-31 23
This is what I have
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
fig = plt.figure(figsize=(12, 7))
locator = mdates.MonthLocator()
fmt = mdates.DateFormatter('%b')
X = plt.gca().xaxis
X.set_major_locator(locator)
X.set_major_formatter(fmt)
plt.bar(month_df.index, month_df.Players, color = 'maroon', width=10)
but the result is this with the label starting from Feb instead of Jan
Bar plot x-axis tick locations are 0 indexed, not datetimes
This solution applies to any plot with a discrete axis (e.g. bar, hist, heat, etc.).
Similar to this answer, the easiest solution follows:
Skip to step 3 if the str column already exists
Convert the 'Birthday' column to a datetime dtype with pd.to_datetime
Extract the abbreviated month name to a separate column
Order the column with pd.Categorical. The build-in calendar module is used to supply an ordered list of abbreviated month names, or the list can be typed manually
Plot the dataframe with pandas.DataFrame.plot, which uses matplotlib as the default backend
Tested in python 3.8.12, pandas 1.3.4, matplotlib 3.4.3
import pandas as pd
import matplotlib.pyplot as plt
from calendar import month_abbr as ma # ordered abbreviated month names
# convert the Birthday column to a datetime and extract only the date component
df.Birthday = pd.to_datetime(df.Birthday)
# create a month column
df['month'] = df.Birthday.dt.strftime('%b')
# convert the column to categorical and ordered
df.month = pd.Categorical(df.month, categories=ma[1:], ordered=True)
# plot the dataframe
ax = df.plot(kind='bar', x='month', y='Players', figsize=(12, 7), rot=0, legend=False)
If there are many repeated months, where the data must be aggregated, then combine the data using pandas.DataFrame.groupby and aggregate some function like .mean() or .sum()
dfg = df.groupby('month').Players.sum()
ax = dfg.plot(kind='bar', figsize=(12, 7), rot=0, legend=False)
Typically, matplotlib.bar does not do a very good job with datetimes for various reasons. It's easy to manually set your x tick locations and labels as below. This a fixed formatter convenience wrapper function, but it lets you take control quite easily.
#generate data
data = pd.Series({
'1900-01-31' : 32, '1900-02-28' : 25, '1900-03-31' : 27,
'1900-04-30' : 19, '1900-05-31' : 27, '1900-06-30' : 18,
'1900-07-31' : 18, '1900-08-31' : 21, '1900-09-30' : 23,
'1900-10-31' : 21, '1900-11-30' : 26, '1900-12-31' : 23,
})
#make plot
fig, ax = plt.subplots(figsize=(12, 7))
ax.bar(range(len(data)), data, color = 'maroon', width=0.5, zorder=3)
#ax.set_xticks uses a fixed locator
ax.set_xticks(range(len(data)))
#ax.set_xticklables uses a fixed formatter
ax.set_xticklabels(pd.to_datetime(data.index).strftime('%b'))
#format plot a little bit
ax.spines[['top','right']].set_visible(False)
ax.tick_params(axis='both', left=False, bottom=False, labelsize=13)
ax.grid(axis='y', color='gray', dashes=(8,3), alpha=0.5)
I'm not familiar with matplotlib.dates but because you are using pandas there are simple ways doing what you need using pandas.
Here is my code:
import pandas as pd
import calendar
from matplotlib import pyplot as plt
# data
data = {'Birthday': ['1900-01-31', '1900-02-28', '1900-03-31', '1900-04-30', '1900-05-31', '1900-06-30', '1900-07-31', '1900-08-31', '1900-09-30', '1900-10-31', '1900-11-30', '1900-12-31'],
'Players': [32, 25, 27, 19, 27, 18, 18, 21, 23, 21, 26, 23]}
df = pd.DataFrame(data)
# convert column to datetime
df["Birthday"] = pd.to_datetime(df["Birthday"], format="%Y-%m-%d")
# groupby month and plot bar plot
df.groupby(df["Birthday"].dt.month).sum().plot(kind="bar", color = "maroon")
# set plot properties
plt.xlabel("Birthday Month")
plt.ylabel("Count")
plt.xticks(ticks = range(0,12) ,labels = calendar.month_name[1:])
# show plot
plt.show()
Output:
Wanted 'Age' as the x-axis, 'Pos' as the y-axis and labels as 'Player' Names. But for some reason, not able to do label the points.
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import adjustText as at
data = pd.read_excel("path to the file")
fig, ax = plt.subplots()
fig.set_size_inches(7,3)
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
df.plot.scatter(x='Age',
y='Pos',
c='DarkBlue', xticks=([15,20,25,30,35,40]))
y = df.Player
texts = []
for i, txt in enumerate(y):
plt.text()
at.adjust_text(texts, arrowprops=dict(arrowstyle="simple, head_width=0.25, tail_width=0.05", color='black', lw=0.5, alpha=0.5))
plt.show()
Summary of the data :
df.head()
Player Pos Age
0 Thibaut Courtois GK 28
1 Karim Benzema FW 32
2 Sergio Ramos DF 34
3 Raphael Varane DF 27
4 Luka Modric MF 35
Error :
ConversionError: Failed to convert value(s) to axis units: 'GK'
This is the plot so far; not able to label these points:
EDIT:
This is what I wanted but of all points:
Also, Could anyone help me in re-ordering the labels on the yaxis.
Like, I wanted FW,MF,DF,GK as my order but the plot is in MF,DF,FW,GK.
Thanks.
A similar solution was described here. Essentially, you want to annotate the points in your scatter plot.
I have stripped your code. Note that you need to plot the data with matplotlib (and not with pandas): df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age']). In this way, you can use the annotation()-method.
import matplotlib.pyplot as plt
import pandas as pd
# build data
data = [
['Thibaut Courtois', 'GK', 28],
['Karim Benzema', 'FW', 32],
['Sergio Ramos','DF', 34],
['Raphael Varane', 'DF', 27],
['Luka Modric', 'MF', 35],
]
# create pandas DataFrame
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
# open figure + axis
fig, ax = plt.subplots()
# plot
ax.scatter(x=df['Age'],y=df['Pos'],c='DarkBlue')
# set labels
ax.set_xlabel('Age')
ax.set_ylabel('Pos')
# annotate points in axis
for idx, row in df.iterrows():
ax.annotate(row['Player'], (row['Age'], row['Pos']) )
# force matplotlib to draw the graph
plt.show()
This is what you'll get as output:
I'm trying to create a matplotlib bar chart with categories on the X-axis, but I can't get the categories right. Here's a minimal example of what I'm trying to do.
data = [[46, 11000], [97, 15000], [27, 24000], [36, 9000], [9, 17000]]
df = pd.DataFrame(data, columns=['car_id', 'price'])
fig1, ax1 = plt.subplots(figsize=(10,5))
ax1.set_title('Car prices')
ax1.bar(df['car_id'], df['price'])
plt.xticks(np.arange(len(df)), list(df['car_id']))
plt.legend()
plt.show()
I need the five categories (car_id) on the X-axis. What Am I doing wrong? :-/
You can turn car_id into category:
df['car_id'] = df['car_id'].astype('category')
df.plot.bar(x='car_id')
Output:
You can also plot just the price column and relabel:
ax = df.plot.bar(y='price')
ax.set_xticklabels(df['car_id'])
You got confused in the xticks with the label and position. Here you specify the position np.arange(len(df)) and the labels list(df['car_id']. So he puts the labels at the specified position list(df['car_id'], i.e. array([0, 1, 2, 3, 4]).
If the position and the labels are here the same, just replace plt.xticks(np.arange(len(df)), list(df['car_id'])) by plt.xticks(df['car_id']).
If you want them to be evenly spaced, your approach is right but you also need to change ax1.bar(df['car_id'], df['price']) toax1.bar(np.arange(len(df)), df['price']), so that the bar x-position is now evenly spaced.
Full code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = [[46, 11000], [97, 15000], [27, 24000], [36, 9000], [9, 17000]]
df = pd.DataFrame(data, columns=['car_id', 'price'])
fig1, ax1 = plt.subplots(figsize=(10,5))
ax1.set_title('Car prices')
ax1.bar(np.arange(len(df)), df['price'])
ax1.set_xticks(np.arange(len(df)))
ax1.set_xticklabels(df['car_id'])
plt.show()
I have two pandas data frames having same column names.
Dataframe 1:
Dataframe 2:
Both the data frames have same column names. I need to visualize
both the dfs in same scatter plot where X-axis would be values
present in the 'function' column i.e D1_1_2, D1_2_3 etc
Single scatter plot is required for all the entries(or labels) ex:
'D1_1_2', 'D1_2_3' etc , in the 'function' column as X-axis. Y-axis can dynamically pick the numeric values.
Different colors for both data frame values.
Add spacing or jitters between overlapping values.
Need support in this.
With below example you might get an idea on how to do what you are looking for:
import pandas as pd
import matplotlib.pyplot as plt
index = ["D1_1-2", "D1_2-3", "D1_3-4"]
df1 = pd.DataFrame({"count": [10, 20, 25]}, index=index)
df2 = pd.DataFrame({"count": [15, 11, 30]}, index=index)
ax = df1.plot(style='ro', legend=False)
df2.plot(style='bo',ax=ax, legend=False)
plt.show()
The key is asking plot of df2 to use the axis from plot of df1.
The plot you get for this is as follows:
Aproach with jitter:
If you want to add jitter to your data one approach can be as follows, where instead of using the previous plot axis we concatenate the dataframes and iterate over it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
index = ["D1_1-2", "D1_2-3", "D1_3-4", "D1_4-5", "D1_5-6", "D1_6-7", "D1_7-8", "D1_8-9", "D1_1-3", "D1_2-3", "D1_3-5", "D1_5-7"]
df1 = pd.DataFrame({"count": [10, 20, 25, 30, 32, 35, 25, 15, 5, 17, 11, 2]}, index=index)
df2 = pd.DataFrame({"count": [15, 11, 30, 30, 20, 30, 25, 27, 5, 16, 11, 5]}, index=index)
#We ensure we use different column names for df1 and df2
df1.columns = ["count1"]
df2.columns = ["count2"]
#We concatenate the dataframes
df = pd.concat([df1, df2],axis=1)
#Function to add jitter to the array
def rand_jitter(arr):
stdev = .01*(max(arr)-min(arr))
return arr + np.random.randn(len(arr)) * stdev
# We iterate between the two columns of the concatenated dataframe
for i,d in enumerate(df):
y = df[d]
arr = range(1,len(y)+1)
x = rand_jitter(arr)
plt.plot(x, y, mfc = ["red","blue"][i], mec='k', ms=7, marker="o", linestyle="None")
# We set the ticks as the index labels and rotate the labels to avoid overlapping
plt.xticks(arr, index, rotation='vertical')
plt.show()
Finally this results on following graph: