Python bar plot with irregular spacing - python

I am using a bar chart to plot query frequencies, but I consistently see uneven spacing between the bars. These look like they should be related to to the ticks, but they're in different positions
This shows up in larger plots
And smaller ones
def TestPlotByFrequency (df, f_field, freq, description):
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(df[f_field][0:freq].index,\
df[f_field][0:freq].values)
plt.show()
This is not related to data either, none at the top have the same frequency count
count
0 8266
1 6603
2 5829
3 4559
4 4295
5 4244
6 3889
7 3827
8 3769
9 3673
10 3606
11 3479
12 3086
13 2995
14 2945
15 2880
16 2847
17 2825
18 2719
19 2631
20 2620
21 2612
22 2590
23 2583
24 2569
25 2503
26 2430
27 2287
28 2280
29 2234
30 2138
Is there any way to make these consistent?

The problem has to do with aliasing as the bars are too thin to really be separated. Depending on the subpixel value where a bar starts, the white space will be visible or not. The dpi of the plot can either be set for the displayed figure or when saving the image. However, if you have too many bars increasing the dpi will only help a little.
As suggested in this post, you can also save the image as svg to get a vector format. Depending where you want to use it, it can be perfectly rendered.
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
matplotlib.rcParams['figure.dpi'] = 300
t = np.linspace(0.0, 2.0, 50)
s = 1 + np.sin(2 * np.pi * t)
df = pd.DataFrame({'time': t, 'voltage': s})
fig, ax = plt.subplots()
ax.bar(df['time'], df['voltage'], width = t[1]*.95)
plt.savefig("test.png", dpi=300)
plt.show()
Image with 100 dpi:
Image with 300 dpi:

Related

Is it possible to plot a barchart with upper and lower limits of the bins with Pandas,seaborn or Matplotlib

I will like to know how I can go about plotting a barchart with upper and lower limits of the bins represented by the values in the age_classes column of the dataframe shown below with pandas, seaborn or matplotlib. A sample of the dataframe looks like this:
age_classes total_cases male_cases female_cases
0 0-9 693 381 307
1 10-19 931 475 454
2 20-29 4530 1919 2531
3 30-39 7466 3505 3885
4 40-49 13701 6480 7130
5 50-59 20975 11149 9706
6 60-69 18089 11761 6254
7 70-79 19238 12281 6868
8 80-89 16252 8553 7644
9 >90 4356 1374 2973
10 Unknown 168 84 81
If you want a chart like this:
then you can make it with sns.barplot setting age_classes as x and one columns (in my case total_cases) as y, like in this code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data.csv')
fig, ax = plt.subplots()
sns.barplot(ax = ax,
data = df,
x = 'age_classes',
y = 'total_cases')
plt.show()

Making Categorical or Grouped Bar Graph with secondary Axis Line Graph

I need to compare different sets of daily data between 4 shifts(categorical / groupby), using bar graphs and line graphs. I have looked everywhere and have not found a working solution for this that doesn't include generating new pivots and such.
I've used both, matplotlib and seaborn, and while I can do one or the other(different colored bars/lines for each shift), once I incorporate the other, either one disappears, or other anomalies happen like only one plot point shows. I have looked all over and there are solutions for representing a single series of data on both chart types, but none that goes into multi category or grouped for both.
Data Example:
report_date wh_id shift Head_Count UTL_R
3/17/19 55 A 72 25%
3/18/19 55 A 71 10%
3/19/19 55 A 76 20%
3/20/19 55 A 59 33%
3/21/19 55 A 65 10%
3/22/19 55 A 54 20%
3/23/19 55 A 66 14%
3/17/19 55 1 11 10%
3/17/19 55 2 27 13%
3/17/19 55 3 18 25%
3/18/19 55 1 23 100%
3/18/19 55 2 16 25%
3/18/19 55 3 12 50%
3/19/19 55 1 28 10%
3/19/19 55 2 23 50%
3/19/19 55 3 14 33%
3/20/19 55 1 29 25%
3/20/19 55 2 29 25%
3/20/19 55 3 10 50%
3/21/19 55 1 17 20%
3/21/19 55 2 29 14%
3/21/19 55 3 30 17%
3/22/19 55 1 12 14%
3/22/19 55 2 10 100%
3/22/19 55 3 17 14%
3/23/19 55 1 16 10%
3/23/19 55 2 11 100%
3/23/19 55 3 13 10%
tm_daily_df = pd.read_csv('fg_TM_Daily.csv')
tm_daily_df = tm_daily_df.set_index('report_date')
fig2, ax2 = plt.subplots(figsize=(12,8))
ax3 = ax2.twinx()
group_obj = tm_daily_df.groupby('shift')
g = group_obj['Head_Count'].plot(kind='bar', x='report_date', y='Head_Count',ax=ax2,stacked=False,alpha = .2)
g = group_obj['UTL_R'].plot(kind='line',x='report_date', y='UTL_R', ax=ax3,marker='d', markersize=12)
plt.legend(tm_daily_df['shift'].unique())
This code has gotten me the closest I've been able to get. Notice that even with stacked = False, they are still stacked. I changed the setting to True, and nothing changes.
All i need is for the bars to be next to each other with the same color scheme representative of the shift
The graph:
Here are two solutions (stacked and unstacked). Based on your questions we will:
plot Head_Count in the left y axis and UTL_R in the right y axis.
report_date will be our x axis
shift will represent the hue of our graph.
The stacked version uses pandas default plotting feature, and the unstacked version uses seaborn.
EDIT
From your request, I added a 100% stacked graph. While it is not quite exactly what you asked in the comment, the graph type you asked may create some confusion when reading (are the values based on the upper line of the stack or the width of the stack). An alternative solution may be using a 100% stacked graph.
Stacked
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
fig, ax = plt.subplots(figsize=(12,6))
ax2 = ax.twinx()
dfg['Head_Count'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.6)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None)
ax.set_title('My Graph')
plt.show()
Stacked 100%
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
# Create `Head_Count_Pct` column
for date in dfg.index.get_level_values('report_date').unique():
for shift in dfg.loc[date, :].index.get_level_values('shift').unique():
dfg.loc[(date, shift), 'Head_Count_Pct'] = dfg.loc[(date, shift), 'Head_Count'].sum() / dfg.loc[(date, 'A'), 'Head_Count'].sum()
fig, ax = plt.subplots(figsize=(12,6))
ax2 = ax.twinx()
pal = sns.color_palette("Set1")
dfg[dfg.index.get_level_values('shift').isin(['1','2','3'])]['Head_Count_Pct'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.5, color=pal)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None, color=pal)
ax.set_title('My Graph')
plt.show()
Unstacked
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
fig, ax = plt.subplots(figsize=(15,6))
ax2 = ax.twinx()
sns.barplot(x=dfg.index.get_level_values('report_date'),
y=dfg.Head_Count,
hue=dfg.index.get_level_values('shift'), ax=ax, alpha=0.7)
sns.lineplot(x=dfg.index.get_level_values('report_date'),
y=dfg.UTL_R,
hue=dfg.index.get_level_values('shift'), ax=ax2, marker='o', legend=None)
ax.set_title('My Graph')
plt.show()
EDIT #2
Here is the graph as you requested in a second time (stacked, but stack n+1 does not start where stack n ends).
It is slightly more involving as we have to do multiple things:
- we need to manually assign our color to our shift in our df
- once we have our colors assign, we will iterate through each date range and 1) sort or Head_Count values descending (so that our largest sack is in the back when we plot the graph), and 2) plot the data and assign the color to each stacj
- Then we can create our second y axis and plot our UTL_R values
- Then we need to assign the correct color to our legend labels
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def assignColor(shift):
if shift == 'A':
return 'R'
if shift == '1':
return 'B'
if shift == '2':
return 'G'
if shift == '3':
return 'Y'
# map a color to a shift
df['color'] = df['shift'].apply(assignColor)
fig, ax = plt.subplots(figsize=(15,6))
# plot our Head_Count values
for date in df.report_date.unique():
d = df[df.report_date == date].sort_values(by='Head_Count', ascending=False)
y = d.Head_Count.values
x = date
color = d.color
b = plt.bar(x,y, color=color)
# Plot our UTL_R values
ax2 = ax.twinx()
sns.lineplot(x=df.report_date, y=df.UTL_R, hue=df['shift'], marker='o', legend=None)
# Assign the color label color to our legend
leg = ax.legend(labels=df['shift'].unique(), loc=1)
legend_maping = dict()
for shift in df['shift'].unique():
legend_maping[shift] = df[df['shift'] == shift].color.unique()[0]
i = 0
for leg_lab in leg.texts:
leg.legendHandles[i].set_color(legend_maping[leg_lab.get_text()])
i += 1
How about this?
tm_daily_df['UTL_R'] = tm_daily_df['UTL_R'].str.replace('%', '').astype('float') / 100
pivoted = tm_daily_df.pivot_table(values=['Head_Count', 'UTL_R'],
index='report_date',
columns='shift')
pivoted
# Head_Count UTL_R
# shift 1 2 3 A 1 2 3 A
# report_date
# 3/17/19 11 27 18 72 0.10 0.13 0.25 0.25
# 3/18/19 23 16 12 71 1.00 0.25 0.50 0.10
# 3/19/19 28 23 14 76 0.10 0.50 0.33 0.20
# 3/20/19 29 29 10 59 0.25 0.25 0.50 0.33
# 3/21/19 17 29 30 65 0.20 0.14 0.17 0.10
# 3/22/19 12 10 17 54 0.14 1.00 0.14 0.20
# 3/23/19 16 11 13 66 0.10 1.00 0.10 0.14
fig, ax = plt.subplots()
pivoted['Head_Count'].plot.bar(ax=ax)
pivoted['UTL_R'].plot.line(ax=ax, legend=False, secondary_y=True, marker='D')
ax.legend(loc='upper left', title='shift')

Add hue category labels in seaborn stripplot

I have two DataFrames that I am plotting as a stripplot. I am able to plot them pretty much as I wish, but I would like to know if it is possible to add the category labels for the "hue".
The plot currently looks like this:
However, I would like to add the labels of the categories (there are only two of them) to each "column" for each letter. So that it looks something like this:
The DataFrames look like this (although these are just edited snippets):
Case Letter Size Weight
0 upper A 20 bold
1 upper A 23 bold
2 lower A 61 bold
3 lower A 62 bold
4 upper A 78 bold
5 upper A 95 bold
6 upper B 23 bold
7 upper B 40 bold
8 lower B 47 bold
9 upper B 59 bold
10 upper B 61 bold
11 upper B 99 bold
12 lower C 23 bold
13 upper D 23 bold
14 upper D 66 bold
15 lower D 99 bold
16 upper E 5 bold
17 upper E 20 bold
18 upper E 21 bold
19 upper E 22 bold
...and...
Case Letter Size Weight
0 upper A 4 normal
1 upper A 6 normal
2 upper A 7 normal
3 upper A 8 normal
4 upper A 9 normal
5 upper A 12 normal
6 upper A 25 normal
7 upper A 26 normal
8 upper A 38 normal
9 upper A 42 normal
10 lower A 43 normal
11 lower A 57 normal
12 lower A 90 normal
13 upper B 4 normal
14 lower B 6 normal
15 upper B 8 normal
16 upper B 9 normal
17 upper B 12 normal
18 upper B 21 normal
19 lower B 25 normal
The relevant code I have is:
fig, ax = plt.subplots(figsize=(10, 7.5))
plt.tight_layout()
sns.stripplot(x=new_df_normal['Letter'], y=new_df_normal['Size'],
hue=new_df_normal['Case'], jitter=False, dodge=True,
size=8, ax=ax, marker='D',
palette={'upper': 'red', 'lower': 'red'})
plt.setp(ax.get_legend().get_texts(), fontsize='16') # for legend text
plt.setp(ax.get_legend().get_title(), fontsize='18') # for legend title
ax.set_xlabel("Letter", fontsize=20)
ax.set_ylabel("Size", fontsize=20)
ax.set_ylim(0, 105)
ax.tick_params(labelsize=20)
ax2 = ax.twinx()
sns.stripplot(x=new_df_bold['Letter'], y=new_df_bold['Size'],
hue=new_df_bold['Case'], jitter=False, dodge=True,
size=8, ax=ax2, marker='D',
palette={'upper': 'green', 'lower': 'green'})
ax.legend_.remove()
ax2.legend_.remove()
ax2.set_xlabel("", fontsize=20)
ax2.set_ylabel("", fontsize=20)
ax2.set_ylim(0, 105)
ax2.tick_params(labelsize=20)
Is it possible to add those category labels ("bold" and "normal") for each column?
Using seaborn’s scatter plot you could access to the style (or even size) parameter. But you might not end up with your intended layout in the end. scatterplot documentation.
Or you could use the catplot and play with rows and columns. seaborn doc for catplot
Unfortunately Seaborn does not natively provide what you are looking for : another level of nesting beyond the hue parameter in stripplot (see stripplot documentation. Some seaborn tickets are opened that might be related, eg this ticket. But I’ve come accros some similar feature requests in seaborn that were refused, see this ticket
One last possibility is to dive into the matplotlib primitives to manipulate your seaborn diagram (since seaborn is just on top of matplotlib). Needless to say it would require a lot of effort, and might end-up nullifying seaborn in the first place ;)
Set dodge=True enables this:
import seaborn as sns
tips = sns.load_dataset("tips")
sns.violinplot(x="day", y="total_bill", hue="smoker",
data=tips, palette="muted")
sns.stripplot(x="day", y="total_bill", hue="smoker",
data=tips, palette="muted", dodge=True)
EDIT:
And with the df provided by the OP:
df = pd.read_csv('./ongenz.tsv', sep='\t')
sns.stripplot(x=df['Letter'], y=df['Size'], data=df, hue=df['Case'], dodge=True)

Interpolate Temperature Data On Urban Area Using Cartopy

I'm trying to interpolate temperature data observed on an urban area formed by 5 locations. I am using cartopy to interpolate and draw the map, however, when I run the script the temperature interpolation is not shown and I only get the layer of the urban area with the color palette. Can someone help me fix this error? The link of shapefile is
https://www.dropbox.com/s/0u76k3yegvr09sx/LimiteAMG.shp?dl=0
https://www.dropbox.com/s/yxsmm3v2ey3ngsp/LimiteAMG.cpg?dl=0
https://www.dropbox.com/s/yx05n31dfkggbb6/LimiteAMG.dbf?dl=0
https://www.dropbox.com/s/a6nk0xczgjeen2d/LimiteAMG.prj?dl=0
https://www.dropbox.com/s/royw7s51n2f0a6x/LimiteAMG.qpj?dl=0
https://www.dropbox.com/s/7k44dcl1k5891qc/LimiteAMG.shx?dl=0
Data
Lat Lon tmax
0 20.8208 -103.4434 22.8
1 20.7019 -103.4728 17.7
2 20.6833 -103.3500 24.9
3 20.6280 -103.4261 NaN
4 20.7205 -103.3172 26.4
5 20.7355 -103.3782 25.7
6 20.6593 -103.4136 NaN
7 20.6740 -103.3842 25.8
8 20.7585 -103.3904 NaN
9 20.6230 -103.4265 NaN
10 20.6209 -103.5004 NaN
11 20.6758 -103.6439 24.5
12 20.7084 -103.3901 24.0
13 20.6353 -103.3994 23.0
14 20.5994 -103.4133 25.0
15 20.6302 -103.3422 NaN
16 20.7400 -103.3122 23.0
17 20.6061 -103.3475 NaN
18 20.6400 -103.2900 23.0
19 20.7248 -103.5305 24.0
20 20.6238 -103.2401 NaN
21 20.4753 -103.4451 NaN
Code:
import cartopy
import cartopy.crs as ccrs
from matplotlib.colors import BoundaryNorm
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import cartopy.io.shapereader as shpreader
from metpy.calc import get_wind_components
from metpy.cbook import get_test_data
from metpy.gridding.gridding_functions import interpolate, remove_nan_observation
from metpy.plots import add_metpy_logo
from metpy.units import units
to_proj = ccrs.PlateCarree()
data=pd.read_csv('/home/borisvladimir/Documentos/Datos/EMAs/EstacionesZMG/RedZMG.csv',usecols=(1,2,3),names=['Lat','Lon','tmax'],na_values=-99999,header=0)
fname='/home/borisvladimir/Dropbox/Diversos/Shapes/LimiteAMG.shp'
adm1_shapes = list(shpreader.Reader(fname).geometries())
lon = data['Lon'].values
lat = data['Lat'].values
xp, yp, _ = to_proj.transform_points(ccrs.Geodetic(), lon, lat).T
x_masked, y_masked, t = remove_nan_observations(xp, yp, data['tmax'].values)
#Interpola temp usando Cressman
tempx, tempy, temp = interpolate(x_masked, y_masked, t, interp_type='cressman', minimum_neighbors=3, search_radius=400000, hres=35000)
temp = np.ma.masked_where(np.isnan(temp), temp)
levels = list(range(-20, 20, 1))
cmap = plt.get_cmap('viridis')
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig = plt.figure(figsize=(15, 10))
view = fig.add_subplot(1, 1, 1, projection=to_proj)
view.add_geometries(adm1_shapes, ccrs.PlateCarree(),edgecolor='black', facecolor='white', alpha=0.5)
view.set_extent([-103.8, -103, 20.3, 21.099 ], ccrs.PlateCarree())
ZapLon,ZapLat=-103.50,20.80
GuadLon,GuadLat=-103.33,20.68
TonaLon,TonaLat=-103.21,20.62
TlaqLon,TlaqLat=-103.34,20.59
TlajoLon,TlajoLat=-103.44,20.47
plt.text(ZapLon,ZapLat,'Zapopan',transform=ccrs.Geodetic())
plt.text(GuadLon,GuadLat,'Guadalajara',transform=ccrs.Geodetic())
plt.text(TonaLon,TonaLat,'Tonala',transform=ccrs.Geodetic())
plt.text(TlaqLon,TlaqLat,'Tlaquepaque',transform=ccrs.Geodetic())
plt.text(TlajoLon,TlajoLat,'Tlajomulco',transform=ccrs.Geodetic())
mmb = view.pcolormesh(tempx, tempy, temp,transform=ccrs.PlateCarree(),cmap=cmap, norm=norm)
plt.colorbar(mmb, shrink=.4, pad=0.02, boundaries=levels)
plt.show()
The problem is in the call to MetPy's interpolate function. With the setting of hres=35000, it is generating a grid spaced at 35km. However, it appears that your data points are spaced much more closely than that; together, that results in a generated grid that has only two points, as shown as the red points below (black points are the original stations with non-masked data):
The result is that it only creates two points for the grid, both of which are outside the bounds of your data points; therefore those points end up masked. If instead we set hres to something much lower, say 5km (i.e. 5000), then a much more sensible result comes out:

How to adjust month on axis without overlapp using matplotlib?

I'm trying to plot the following data with matplotlib.
Month A B C
0 2014/06 41 17 3
1 2014/07 48 11 7
2 2014/08 58 20 4
3 2014/09 43 16 6
4 2014/10 73 13 7
5 2014/11 69 22 16
6 2014/12 65 34 9
7 2015/01 69 27 12
I'm having the following code:
x = np.arange(len(df["Month"].values))
y1=df["A"].values.astype(int)
y2=df["B"].values.astype(int)
y3=df["C"].values.astype(int)
my_xticks = df["Month"].values
plt.xticks(x, my_xticks)
plt.plot(x,y1)
plt.plot(x,y2)
plt.plot(x,y3)
plt.show()
The problem is the months are overlapping each other on x-axis. Can I make this automatically adjusted by Python. Not only I need to rotate, but also automatically ignore some months. Otherwise, it's too crowded.
Matplotlib has a function which can automatically format your x axis when they are dates - autofmt_xdate. This automatically rotates the labels, and positions the ticks. They can be changed from the defaults by passing arguments to this function. They can also, of course, be changed manually, but this requires (slightly) more effort.
You can easily reduce the number of dates shown be sampling every 2nd element of the list, using the slice notation [::2]
# Code here that creates a list of dates called list_of_dates...
print (list_of_dates)
# ['2016-08', '2016-09', '2016-10', '2016-11', '2016-12', '2017-01',
# '2017-02', '2017-03', '2017-04', '2017-05', '2017-06', '2017-07',
# '2017-08', '2017-09', '2017-10', '2017-11', '2017-12', '2018-01']
x = np.arange(0, len(list_of_dates), 1)
plt.xticks(x[::2], list_of_dates[::2])
plt.plot(x, np.random.randn(len(list_of_dates)))
# plt.gcf() means "get current figure"
plt.gcf().autofmt_xdate(ha="center")
plt.show()
Which gives:

Categories