Clustered barchart in matplotlib? - python

How do I plot a barchart similar to
Clustered bar plot in gnuplot using python matplotlib?
date|name|empid|app|subapp|hours
20140101|A|0001|IIC|I1|2.5
20140101|A|0001|IIC|I2|3
20140101|A|0001|IIC|I3|4
20140101|A|0001|CAR|C1|2.5
20140101|A|0001|CAR|C2|3
20140101|A|0001|CAR|C3|2
20140101|A|0001|CAR|C4|2
Trying to plot the subapp hours by app for the same person. Couldn't see an example in the demo pages of matplotlib.
EDIT: None of the examples cited below seem to work for unequal # of bars for each category as above.

The examples didn't manage unequal # of bars but you can use another approach. I'll post you an example.
Note: I use pandas to manipulate your data, if you don't know about it you should give it a try http://pandas.pydata.org/:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
df = pd.read_table("data.csv",sep="|")
grouped = df.groupby('app')['hours']
colors = "rgbcmyk"
fig, ax = plt.subplots()
initial_gap = 0.1
start = initial_gap
width = 1.0
gap = 0.05
for app,group in grouped:
size = group.shape[0]
ind = np.linspace(start,start + width, size+1)[:-1]
w = (ind[1]-ind[0])
start = start + width + gap
plt.bar(ind,group,w,color=list(colors[:size]))
tick_loc = (np.arange(len(grouped)) * (width+gap)) + initial_gap + width/2
ax.set_xticklabels([app for app,_ in grouped])
ax.xaxis.set_major_locator(mtick.FixedLocator(tick_loc))
plt.show()
And on data.csv is the data:
date|name|empid|app|subapp|hours
20140101|A|0001|IIC|I1|2.5
20140101|A|0001|IIC|I2|3
20140101|A|0001|IIC|I3|4
20140101|A|0001|CAR|C1|2.5
20140101|A|0001|CAR|C2|3
20140101|A|0001|CAR|C3|2
20140101|A|0001|CAR|C4|2

Related

How to plot Multiline Graphs Via Seaborn library in Python?

I have written a code that looks like this:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([10.03,100.348,1023.385])
power1 = np.array([100000,86000,73000])
power2 = np.array([1008000,95000,1009000])
df1 = pd.DataFrame(data = {'Size': T, 'Encrypt_Time': power1, 'Decrypt_Time': power2})
exp1= sns.lineplot(data=df1)
plt.savefig('exp1.png')
exp1_smooth= sns.lmplot(x='Size', y='Time', data=df, ci=None, order=4, truncate=False)
plt.savefig('exp1_smooth.png')
That gives me Graph_1:
The Size = x- axis is a constant line but as you can see in my code it varies from (10,100,1000).
How does this produces a constant line? I want to produce a multiline graph with x-axis = Size(T),y- axis= Encrypt_Time and Decrypt_Time (power1 & power2).
Also I wanted to plot a smooth graph of the same graph I am getting right now but it gives me error. What needs to be done to achieve a smooth multi-line graph with x-axis = Size(T),y- axis= Encrypt_Time and Decrypt_Time (power1 & power2)?
I think it not the issue, the line represents for size looks like constant but it NOT.
Can see that values of size in range 10-1000 while the minimum division of y-axis is 20,000 (20 times bigger), make it look like a horizontal line on your graph.
You can try with a bigger values to see the slope clearly.
If you want 'size` as x-axis, you can try below example:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([10.03,100.348,1023.385])
power1 = np.array([100000,86000,73000])
power2 = np.array([1008000,95000,1009000])
df1 = pd.DataFrame(data = {'Size': T, 'Encrypt_Time': power1, 'Decrypt_Time': power2})
fig = plt.figure()
fig = sns.lineplot(data=df1, x='Size',y='Encrypt_Time' )
fig = sns.lineplot(data=df1, x='Size',y='Decrypt_Time' )

How to Saving an interactive Matplotlib figure

How can I save an interactive Matplotlib figure to excel or just as figure? I just want to share those with non-technical users. However, I can do that using Plotly and share the link but because of data privacy issues, they don't prefer that method. Is there any method I can do that?
I have interactive images like below. Some are stacked together. So need to zoom that's why I need to save images in an interactive mood. Then non-technical users can zoom in and check the results.
import pandas as pd
import matplotlib
import mplcursors
data = [['Alex',10000,15000,6705],['Bob',12000,11050,7050],['Clarke',10300,10450,9050],['Alan',10100,15500,6205],['Sam',10300,15050,6785]]
df = pd.DataFrame(data,columns=['Name','Year2019','Year2020','Year2021'])
dfLen = len(df)
x =df['Year2020']
if dfLen > 0:
%matplotlib notebook
%matplotlib notebook
import matplotlib.pyplot as plt
fig , ax = plt.subplots(figsize=(8,6))
a = df['Year2019']/100
# Create a scatter plot (Use s - make the size of each vendors usage)
plt.scatter('Year2019', 'Year2020',s= x , c='Year2019', data=df)
plt.title("2019 Vs 2020 sold items",fontsize=12)
plt.xlabel('2019', fontsize=10)
plt.ylabel('2020',fontsize=10)
list1 = list(df['Name'])
i = 0;
for row in df.itertuples():
h = list1[i]
i=i+1
h = str(h)
c = row.Year2019
d = row.Year2020
ax.text(c,d,s = h, size = 8)
crs = mplcursors.cursor(ax,hover=True)
crs.connect("add", lambda sel: sel.annotation.set_text(
'Point x - {} ,\n'
'y- {}'.format(sel.target[0], sel.target[1])))
plt.show()

Annotating scatterplot points with DF column text Matplotlib

I'm fairly new to Python and I'm struggling annotating plots at the minute.
I've come from R so I'm used to the ease of being able to annotate scatterplot points with minimum code.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
url = ('https://fbref.com/en/share/nXtrf')
df = pd.read_html(url)[0]
df = df[['Unnamed: 1_level_0', 'Unnamed: 2_level_0', 'Play', 'Perf']].copy()
df.columns = df.columns.droplevel()
df = df[['Player','Squad','Min','SoTA','Saves']]
df = df.drop([25])
df['Min'] = pd.to_numeric(df['Min'])
df['SoTA'] = pd.to_numeric(df['SoTA'])
df['Saves'] = pd.to_numeric(df['Saves'])
df['Min'] = df[df['Min'] > 1600]['Min']
df = df.dropna()
df.plot(x = 'Saves', y = 'SoTA', kind = "scatter")
I've tried numerous ways to annotate this plot. I'd like the points to be annotated with corresponding data from 'Player' column.
I've tried using a label_point function that I've found while trying to find a work around buy I keep getting Key Error 0 on most ways I try.
Any assistance would be great. Thanks.
You could loop through both columns and add a text for each entry. Note that you need to save the ax returned by df.plot(...).
ax = df.plot(x='Saves', y='SoTA', kind="scatter")
for x, y, player in zip(df['Saves'], df['SoTA'], df['Player']):
ax.text(x, y, f'{player}', ha='left', va='bottom')
xmin, xmax = ax.get_xlim()
ax.set_xlim(xmin, xmax + 0.15 * (xmax - xmin)) # some more margin to fit the texts
An alternative is to use the mplcursors library to show an annotation while hovering (or after a click):
import mplcursors
mplcursors.cursor(hover=True)

Python Seaborn Chart - Shadow Area

Sorry to my noob question, but how can I add a shadow area/color between the upper and lower lines in a seaborn chart?
The primary code I've working on is the following:
plt.figure(figsize=(18,10))
sns.set(style="darkgrid")
palette = sns.color_palette("mako_r", 3)
sns.lineplot(x="Date", y="Value", hue='Std_Type', style='Value_Type', sizes=(.25, 2.5), palette = palette, data=tbl4)
The idea is to get some effect like below (the example from seaborn website):
But I could not replicate the effect although my data structure is pretty much in the same fashion as fmri (seaborn example)
from seaborn link:
import seaborn as sns
sns.set(style="darkgrid")
# Load an example dataset with long-form data
fmri = sns.load_dataset("fmri")
# Plot the responses for different events and regions
sns.lineplot(x="timepoint", y="signal",
hue="region", style="event",
data=fmri)
Do you have some ideas?
I tried to change the chart style, but if I go to a distplot or relplot, for example, the x_axis cannot show the timeframe...
Check this code:
# import
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
sns.set(style = 'darkgrid')
# data generation
time = pd.date_range(start = '2006-01-01', end = '2020-01-01', freq = 'M')
tbl4 = pd.DataFrame({'Date': time,
'down': 1 - 0.5*np.random.randn(len(time)),
'up': 4 + 0.5*np.random.randn(len(time))})
tbl4 = tbl4.melt(id_vars = 'Date',
value_vars = ['down', 'up'],
var_name = 'Std_Type',
value_name = 'Value')
# figure plot
fig, ax = plt.subplots(figsize=(18,10))
sns.lineplot(ax = ax,
x = 'Date',
y = 'Value',
hue = 'Std_Type',
data = tbl4)
# fill area
plt.fill_between(x = tbl4[tbl4['Std_Type'] == 'down']['Date'],
y1 = tbl4[tbl4['Std_Type'] == 'down']['Value'],
y2 = tbl4[tbl4['Std_Type'] == 'up']['Value'],
alpha = 0.3,
facecolor = 'green')
plt.show()
which gives me this plot:
Since I do not have access to your data, I generated random ones. Replace them with yours.
The shadow area is done with plt.fill_between (documentation here), where you specify the x array (common to both curves), the upper and lower limits of the area as y1 and y2 and, optionally a color and its transparency with the facecolor and alpha parameters respectively.
You cannot do it through ci parameter, since it is used to show the confidence interval of your data.

How to add axis offset in matplotlib plot?

I'm drawing several point plots in seaborn on the same graph. The x-axis is ordinal, not numerical; the ordinal values are the same for each point plot. I would like to shift each plot a bit to the side, the way pointplot(dodge=...) parameter does within multiple lines within a single plot, but in this case for multiple different plots drawn on top of each other. How can I do that?
Ideally, I'd like a technique that works for any matplotlib plot, not just seaborn specifically. Adding an offset to the data won't work easily, since the data is not numerical.
Example that shows the plots overlapping and making them hard to read (dodge within each plot works okay)
import pandas as pd
import seaborn as sns
df1 = pd.DataFrame({'x':list('ffffssss'), 'y':[1,2,3,4,5,6,7,8], 'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
sns.pointplot(data=df1, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='<')
sns.pointplot(data=df2, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='>')
I could use something other than seaborn, but the automatic confidence / error bars are very convenient so I'd prefer to stick with seaborn here.
Answering this for the most general case first.
A dodge can be implemented by shifting the artists in the figure by some amount. It might be useful to use points as units of that shift. E.g. you may want to shift your markers on the plot by 5 points.
This shift can be accomplished by adding a translation to the data transform of the artist. Here I propose a ScaledTranslation.
Now to keep this most general, one may write a function which takes the plotting method, the axes and the data as input, and in addition some dodge to apply, e.g.
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
The full functional code:
import matplotlib.pyplot as plt
from matplotlib import transforms
import numpy as np
import pandas as pd
def draw_dodge(*args, **kwargs):
func = args[0]
dodge = kwargs.pop("dodge", 0)
ax = kwargs.pop("ax", plt.gca())
trans = ax.transData + transforms.ScaledTranslation(dodge/72., 0,
ax.figure.dpi_scale_trans)
artist = func(*args[1:], **kwargs)
def iterate(artist):
if hasattr(artist, '__iter__'):
for obj in artist:
iterate(obj)
else:
artist.set_transform(trans)
iterate(artist)
return artist
X = ["a", "b"]
Y = np.array([[1,2],[2,2],[3,2],[1,4]])
Dodge = np.arange(len(Y),dtype=float)*10
Dodge -= Dodge.mean()
fig, ax = plt.subplots()
for y,d in zip(Y,Dodge):
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
ax.margins(x=0.4)
plt.show()
You may use this with ax.plot, ax.scatter etc. However not with any of the seaborn functions, because they don't return any useful artist to work with.
Now for the case in question, the remaining problem is to get the data in a useful format. One option would be the following.
df1 = pd.DataFrame({'x':list('ffffssss'),
'y':[1,2,3,4,5,6,7,8],
'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
N = len(np.unique(df1["x"].values))*len([df1,df2])
Dodge = np.linspace(-N,N,N)/N*10
fig, ax = plt.subplots()
k = 0
for df in [df1,df2]:
for (n, grp) in df.groupby("h"):
x = grp.groupby("x").mean()
std = grp.groupby("x").std()
draw_dodge(ax.errorbar, x.index, x.values,
yerr =std.values.flatten(), ax=ax,
dodge=Dodge[k], marker="o", label=n)
k+=1
ax.legend()
ax.margins(x=0.4)
plt.show()
You can use linspace to easily shift your graphs to where you want them to start and end. The function also makes it very easy to scale the graph so they would be visually the same width
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
start_offset = 3
end_offset = start_offset
y1 = np.random.randint(0, 10, 20) ##y1 has 20 random ints from 0 to 10
y2 = np.random.randint(0, 10, 10) ##y2 has 10 random ints from 0 to 10
x1 = np.linspace(0, 20, y1.size) ##create a number of steps from 0 to 20 equal to y1 array size-1
x2 = np.linspace(0, 20, y2.size)
plt.plot(x1, y1)
plt.plot(x2, y2)
plt.show()

Categories