Can't plot comparative (double) histogram from Pandas table - python

Here's the table from the dataframe:
Points_groups
Qty Contracts
Qty Gones
1
350+
108
275
2
300-350
725
1718
3
250-300
885
3170
4
200-250
2121
10890
5
150-200
3120
7925
6
100-150
653
1318
7
50-100
101
247
8
0-50
45
137
I'd like to get something like this out of it:
But that the columns correspond to the 'x' axis,
which was built from the 'Scores_groups' column like this
I tried a bunch of options already, but I couldn't get it.
For example:
df.plot(kind ='hist')
plt.xlabel('Points_groups')
plt.ylabel("Number Of Students");
or
sns.distplot(df['Кол-во Ушедшие'])
sns.distplot(df['Кол-во Контракт'])
plt.show()
or
df.hist(column='Баллы_groups', by= ['Кол-во Контракт', 'Кол-во Ушедшие'], bins=2, grid=False, rwidth=0.9,color='purple', sharex=True);

Since you already have the distribution in your pandas dataframe, the plot you need can be achieved with the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Df = pd.DataFrame({'key': ['red', 'green', 'blue'], 'A': [1, 2, 1], 'B': [2, 4, 3]})
X_axis = np.arange(len(Df['key']))
plt.bar(X_axis - 0.2, Df['A'], 0.4, label = 'A')
plt.bar(X_axis + 0.2, Df['B'], 0.4, label = 'B')
X_label = list(Df['key'].values)
plt.xticks(X_axis, X_label)
plt.legend()
plt.show()
Since I don't have access to your data, I made some mock dataframe. This results in the following figure:

Related

How to plot distributions for several bivariate groups of variable using Python

I am analysing data which is organised as following:
There are 4 different pandas data fram for each groups (A, B and C).
Each dataframe representing a group has 4 subroups (columns) and rows representing thoer corresponding observations.
For example, a single group of data looks like:
subgroup-1
subgroup-2
subgroup-3
subgroup-4
12
4
NaN
9
15
3
4
NaN
16
8
3
11
17
12
8
13
11
17
12
14
I want to visualise the distributions for each subgroup for the different group. Can anyone let me know what are the available options in Python to do this (the chart types I can use). Thanks.
I tried using histogram, density plots but all of them work only for 2 variables.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# pandas Dataframes
group_A = pd.DataFrame(np.random.rand(50, 4) , columns=['subgroup-1' , 'subgroup-2' , 'subgroup-3' , 'subgroup-4'])
group_B = pd.DataFrame(np.random.rand(50, 4) , columns=['subgroup-1' , 'subgroup-2' , 'subgroup-3' , 'subgroup-4'])
group_C = pd.DataFrame(np.random.rand(50, 4) , columns=['subgroup-1' , 'subgroup-2' , 'subgroup-3' , 'subgroup-4'])
def plot_hist(subgroup):
np.random.seed(19680801)
n_bins = 10
x = np.dstack([group_A[subgroup] , group_B[subgroup] , group_C[subgroup]])[0]
fig, axes = plt.subplots(nrows=2, ncols=2)
ax0, ax1, ax2, ax3 = axes.flatten()
ax0.hist(x, n_bins, density=True, histtype='bar', label = ['A', 'B', 'C'])
ax0.legend(prop={'size': 10})
ax0.set_title('bars with legend')
ax1.hist(x, n_bins, density=True, histtype='bar', stacked=True)
ax1.set_title('stacked bar')
ax2.hist(x, n_bins, histtype='step', stacked=True, fill=False)
ax2.set_title('stack step (unfilled)')
# Make a multiple-histogram of data-sets with different length.
x_multi = [np.random.randn(n) for n in [10000, 5000, 2000]]
ax3.hist(x_multi, n_bins, histtype='bar')
ax3.set_title('different sample sizes')
fig.tight_layout()
plt.show()
plot_hist('subgroup-1')
reference

Is it possible to plot a barchart with upper and lower limits of the bins with Pandas,seaborn or Matplotlib

I will like to know how I can go about plotting a barchart with upper and lower limits of the bins represented by the values in the age_classes column of the dataframe shown below with pandas, seaborn or matplotlib. A sample of the dataframe looks like this:
age_classes total_cases male_cases female_cases
0 0-9 693 381 307
1 10-19 931 475 454
2 20-29 4530 1919 2531
3 30-39 7466 3505 3885
4 40-49 13701 6480 7130
5 50-59 20975 11149 9706
6 60-69 18089 11761 6254
7 70-79 19238 12281 6868
8 80-89 16252 8553 7644
9 >90 4356 1374 2973
10 Unknown 168 84 81
If you want a chart like this:
then you can make it with sns.barplot setting age_classes as x and one columns (in my case total_cases) as y, like in this code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data.csv')
fig, ax = plt.subplots()
sns.barplot(ax = ax,
data = df,
x = 'age_classes',
y = 'total_cases')
plt.show()

Seaborn scatter plot from pandas dataframe colours based on third column

I have a pandas dataframe, with columns 'groupname', 'result', and 'temperature'. I've plotted a Seaborn swarmplot, where x='groupname' and y='result', which shows the results data separated into the groups.
What I also want to do is to colour the markers according to their temperature, using a colormap, so that for example the coldest are blue and hottest red.
Plotting the chart is very simple:
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
data = {'groupname': ['G0', 'G0', 'G0', 'G0', 'G1', 'G1', 'G1'], 'shot': [1, 2, 3, 4, 1, 2, 3], 'temperature': [20, 25, 35, 10, -20, -17, -6], 'result': [10.0, 10.1, 10.5, 15.0, 15.1, 13.5, 10.5]}
df = pd.DataFrame(data)
groupname shot temperature result
0 G0 1 20 10.0
1 G0 2 25 10.1
2 G0 3 35 10.5
3 G0 4 10 15.0
4 G1 1 -20 15.1
5 G1 2 -17 13.5
6 G1 3 -6 10.5
plt.figure()
sns.stripplot(data=results, x="groupname", y="result")
plt.show()
But now I'm stuck trying to colour the points, I've tried a few things like:
sns.stripplot(data=results, x="groupname", y="result", cmap=matplotlib.cm.get_cmap('Spectral'))
which doesn't seem to do anything.
Also tried:
sns.stripplot(data=results, x="groupname", y="result", hue='temperature')
which does colour the points depending on the temperature, however the colours are random rather than mapped.
I feel like there is probably a very simple way to do this, but haven't been able to find any examples.
Ideally looking for something like:
sns.stripplot(data=results, x="groupname", y="result", colorscale='temperature')
Hello the keyword you are looking for is "palette"
Below should work:
sns.stripplot(data=results, x="groupname", y="result", hue='temperature',palette="vlag")
http://man.hubwiz.com/docset/Seaborn.docset/Contents/Resources/Documents/generated/seaborn.stripplot.html

matplotlib histogram with frequency and counts

I have data (from a space delimited text file with two columns) which is already binned but only a width of 1. I want to increase this width to about 5. How can I do this using numpy/matplotlib in Python?
Using,
data = loadtxt('file.txt')
x = data[:, 0]
y = data[:, 1]
plt.bar(x,y)
creates too many bars and using,
plt.hist(data)
doesn't plot the histogram appropriately. I guess I don't understand how matplotlib's histogram plotting works.
See some of the data below.
264 1
265 1
266 4
267 2
268 2
269 2
270 2
271 2
272 5
273 3
274 2
275 6
276 7
277 3
278 7
279 5
280 9
281 4
282 8
283 11
284 9
285 15
286 19
287 11
288 12
289 10
290 13
291 18
292 20
293 14
294 15
What if you use numpy.reshape to transform your data before using plt.bar, for example:
In [83]: import numpy as np
In [84]: import matplotlib.pyplot as plt
In [85]: data = np.array([[1,2,3,4,5,6], [4,3,8,9,1,2]]).T
In [86]: data
Out[86]:
array([[1, 4],
[2, 3],
[3, 8],
[4, 9],
[5, 1],
[6, 2]])
In [87]: y = data[:,1].reshape(-1,2).sum(axis=1)
In [89]: y
Out[89]: array([ 7, 17, 3])
In [91]: x = data[:,0].reshape(-1,2).mean(axis=1)
In [92]: x
Out[92]: array([ 1.5, 3.5, 5.5])
In [96]: plt.bar(x, y)
Out[96]: <Container object of 3 artists>
In [97]: plt.show()
I am not an expert at matplotlib but I find hist to be incredibly useful. The examples on the matplotlib site give a great overview of some of the features.
I don't know how to use your provided sample data without transforming it. I altered your example to dequantize those data before creating a histogram.
I calculated the bin size using this question's first answer.
import matplotlib.pyplot as plt
import numpy as np
data = np.loadtxt('file.txt')
dequantized = data[:,0].repeat(data[:,1].astype(int))
dequantized[0:7]
# Each row's first column is repeated the number of times found in the
# second column creating a single array.
# array([ 264., 265., 266., 266., 266., 266., 267.])
def bins(xmin, xmax, binwidth, padding):
# Returns an array of integers which can be used to represent bins
return np.arange(
xmin - (xmin % binwidth) - padding,
xmax + binwidth + padding,
binwidth)
histbins = bins(min(dequantized), max(dequantized), 5, 5)
plt.figure(1)
plt.hist(dequantized, histbins)
plt.show()
This histogram displayed looks like this.
I hope this example is useful.

Pandas Plotting with Multi-Index

After performing a groupby.sum() on a DataFrame I'm having some trouble trying to create my intended plot.
import pandas as pd
import numpy as np
np.random.seed(365)
rows = 100
data = {'Month': np.random.choice(['2014-01', '2014-02', '2014-03', '2014-04'], size=rows),
'Code': np.random.choice(['A', 'B', 'C'], size=rows),
'ColA': np.random.randint(5, 125, size=rows),
'ColB': np.random.randint(0, 51, size=rows),}
df = pd.DataFrame(data)
Month Code ColA ColB
0 2014-03 C 59 47
1 2014-01 A 24 9
2 2014-02 C 77 50
dfg = df.groupby(['Code', 'Month']).sum()
ColA ColB
Code Month
A 2014-01 124 102
2014-02 398 282
2014-03 474 198
2014-04 830 237
B 2014-01 477 300
2014-02 591 167
2014-03 522 192
2014-04 367 169
C 2014-01 412 180
2014-02 275 205
2014-03 795 291
2014-04 901 309
How can I create a subplot (kind='bar') for each Code, where the x-axis is the Month and the bars are ColA and ColB?
I found the unstack(level) method to work perfectly, which has the added benefit of not needing a priori knowledge about how many Codes there are.
ax = dfg.unstack(level=0).plot(kind='bar', subplots=True, rot=0, figsize=(9, 7), layout=(2, 3))
plt.tight_layout()
Using the following DataFrame ...
# using pandas version 0.14.1
from pandas import DataFrame
import pandas as pd
import matplotlib.pyplot as plt
data = {'ColB': {('A', 4): 3.0,
('C', 2): 0.0,
('B', 4): 51.0,
('B', 1): 0.0,
('C', 3): 0.0,
('B', 2): 7.0,
('Code', 'Month'): '',
('A', 3): 5.0,
('C', 1): 0.0,
('C', 4): 0.0,
('B', 3): 12.0},
'ColA': {('A', 4): 66.0,
('C', 2): 5.0,
('B', 4): 125.0,
('B', 1): 5.0,
('C', 3): 41.0,
('B', 2): 52.0,
('Code', 'Month'): '',
('A', 3): 22.0,
('C', 1): 14.0,
('C', 4): 51.0,
('B', 3): 122.0}}
df = DataFrame(data)
... you can plot the following (using cross-section):
f, a = plt.subplots(3,1)
df.xs('A').plot(kind='bar',ax=a[0])
df.xs('B').plot(kind='bar',ax=a[1])
df.xs('C').plot(kind='bar',ax=a[2])
One for A, one for B and one for C, x-axis: 'Month', the bars are ColA and ColB.
Maybe this is what you are looking for.
Creating the desired visualization is all about shaping the dataframe to fit the plotting API.
seaborn can easily aggregate long form data from a dataframe without .groupby or .pivot_table.
Given the original dataframe df, the easiest option is the convert it to a long form with pandas.DataFrame.melt, and then plot with seaborn.catplot, which is a high-level API for matplotlib.
Change the default estimator from mean to sum
The 'Month' column in the OP is a string type. In general, it's better to convert the column to datetime dtype with pd._to_datetime
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.2, seaborn 0.11.2
seaborn.catplot
import seaborn as sns
dfm = df.melt(id_vars=['Month', 'Code'], var_name='Cols')
Month Code Cols value
0 2014-03 C ColA 59
1 2014-01 A ColA 24
2 2014-02 C ColA 77
3 2014-04 B ColA 114
4 2014-01 C ColA 67
# specify row and col to get a plot like that produced by the accepted answer
sns.catplot(kind='bar', data=dfm, col='Code', x='Month', y='value', row='Cols', order=sorted(dfm.Month.unique()),
col_order=sorted(df.Code.unique()), estimator=sum, ci=None, height=3.5)
sns.catplot(kind='bar', data=dfm, col='Code', x='Month', y='value', hue='Cols', estimator=sum, ci=None,
order=sorted(dfm.Month.unique()), col_order=sorted(df.Code.unique()))
pandas.DataFrame.plot
pandas uses matplotlib and the default plotting backend.
To produce the plot like the accepted answer, it's better to use pandas.DataFrame.pivot_table instead of .groupby, because the resulting dataframe is in the correct shape, without the need to unstack.
dfp = df.pivot_table(index='Month', columns='Code', values=['ColA', 'ColB'], aggfunc='sum')
dfp.plot(kind='bar', subplots=True, rot=0, figsize=(9, 7), layout=(2, 3))
plt.tight_layout()

Categories