Plot with Histogram an attribute from a dataframe - python

I have a dataframe with the weight and the number of measures of each user. The df looks like:
id_user
weight
number_of_measures
1
92.16
4
2
80.34
5
3
71.89
11
4
81.11
7
5
77.23
8
6
92.37
2
7
88.18
3
I would like to see an histogram with the attribute of the table (weight, but I want to do it for both cases) at the x-axis and the frequency in the y-axis.
Does anyone know how to do it with matplotlib?

Ok, it seems to be quite easy:
import pandas as pd
import matplotlib.pyplot as plt
hist = df.hist(bins=50)
plt.show()

Related

Create separate graph of each series and save as pdf in Python [duplicate]

This question already has answers here:
Pandas dataframe groupby plot
(3 answers)
Saving plots (AxesSubPlot) generated from python pandas with matplotlib's savefig
(6 answers)
How to save a Seaborn plot into a file
(10 answers)
Closed 6 months ago.
I have a pandas dataframe as below:
Well Name
READTIME
WL
0
A
02-Jul-20
12
1
B
03-Aug-22
18
2
C
05-Jul-21
14
3
A
03-May-21
16
4
B
01-Jan-19
19
5
C
12-Dec-20
20
6
D
14-Nov-21
14
7
A
01-Mar-22
17
8
B
15-Feb-21
11
9
C
10-Oct-20
10
10
D
14-Sep-21
5
groupByName = df.groupby(['Well Name', 'READTIME'])
After grouping them by 'Well Name' and Readtime, i got the following:
Well Name READTIME WL
A 2020-07-02 12
2021-05-03 16
2022-03-01 17
B 2019-01-01 19
2021-02-15 11
2022-08-03 18
C 2020-10-10 10
2020-12-12 20
2021-07-05 14
D 2021-09-14 5
2021-11-14 14
I have got the following graph by running this code:
sns.relplot(data=df, x="READTIME", y="WL", hue="Well Name",kind="line", height=4, aspect=3)
I want to have a separate graph for each "Well Name" and saved it as a pdf. I will really appreciate your help with this. Thank you
To separate out the plots, you can iterate over the four unique Well Names in your dataset and filter the dataset for each Well Name before plotting:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# I saved your data as an Excel file
df = pd.read_excel('Book1.xlsx')
print(df)
# Get the set of unique Well Names
well_names = set(df['Well Name'].to_list())
for wn in well_names:
# Create dataframe containing only rows with this Well Name
this_wn = df[df['Well Name'] == wn]
# Plot, save, and show
sns.relplot(data=this_wn, x="READTIME", y="WL", hue="Well Name",kind="line", height=4, aspect=3)
plt.savefig(f'{wn}.png')
plt.show(block=True)
This generated the following 4 image files:
For saving in a PDF file, please see this answer.
In this case, specifying a row results in a faceted graph.
sns.relplot(data=df, x="READTIME", y="WL", hue="Well Name", kind="line", row='Well Name', height=4, aspect=3)

Plotting three dimensions of categorical data in Python

My data has three categorical variables I'm trying to visualize:
City (one of five)
Occupation (one of four)
Blood type (one of four)
So far, I've succeeded in grouping the data in a way that I think will be easy to work with:
import numpy as np, pandas as pd
# Make data
cities = ['Tijuana','Las Vegas','Los Angeles','Anaheim','Atlantis']
occupations = ['Doctor','Lawyer','Engineer','Drone security officer']
bloodtypes = ['A','B','AB','O']
df = pd.DataFrame({'City': np.random.choice(cities,500),
'Occupation': np.random.choice(occupations,500),
'Blood Type':np.random.choice(bloodtypes,500)})
# You need to make a dummy column, otherwise the groupby returns an empty df
df['Dummy'] = np.ones(500)
# This is now what I'd like to plot
df.groupby(by=['City','Occupation','Blood Type']).count().unstack(level=1)
Returns:
Dummy
Occupation Doctor Drone security officer Engineer Lawyer
City Blood Type
Anaheim A 7 7 7 7
AB 6 10 8 5
B 2 10 4 2
O 4 3 3 6
Atlantis A 6 5 5 7
AB 12 7 7 10
B 7 4 7 3
O 7 4 6 4
Las Vegas A 8 4 8 5
AB 5 6 8 9
B 6 10 6 6
O 6 9 5 9
Los Angeles A 7 4 8 8
AB 9 8 8 8
B 3 6 4 1
O 9 11 11 9
Tijuana A 3 4 5 3
AB 9 5 5 7
B 3 6 4 9
O 3 5 5 8
My goal is to create something like the Seaborn swarmplot shown below, which comes from the Seaborn documentation. Seaborn applies jitter to the quantitative data so that you can see the individual data points and their hues:
With my data, I'd like to plot City on the x-axis and Occupation on the y-axis, applying jitter to each, and then hue by Blood type. However, sns.swarmplot requires one of the axes to be quantitative:
sns.swarmplot(data=df,x='City',y='Occupation',hue='Blood Type')
returns an error.
An acceptable alternative might be to create 20 categorical bar plots, one for each intersection of City and Occupation, which I would do by running a for loop over each category, but I can't imagine how I'd feed that to matplotlib subplots to get them in a 4x5 grid.
The most similar question I could find was in R, and the asker only wanted to indicate the most common value for the third variable, so I didn't get any good ideas from there.
Thanks for any help you can provide.
Alright, I got to work on the "acceptable alternative" today and I have found a solution using basically pure matplotlib (but I stuck the Seaborn styling on top of it, just because).
import numpy as np, pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cm import get_cmap
from matplotlib.patches import Patch
import seaborn as sns
# Make data
cities = ['Tijuana','Las Vegas','Los Angeles','Anaheim','Atlantis']
occupations = ['Doctor','Lawyer','Engineer','Drone security officer']
bloodtypes = ['A','B','AB','O']
df = pd.DataFrame({'City': np.random.choice(cities,500),
'Occupation': np.random.choice(occupations,500),
'Blood Type':np.random.choice(bloodtypes,500)})
# Make a dummy column, otherwise the groupby returns an empty df
df['Dummy'] = np.ones(500)
# This is now what I'd like to plot
grouped = df.groupby(by=['City','Occupation','Blood Type']).count().unstack()
# List of blood types, to use later as categories in subplots
kinds = grouped.columns.levels[1]
# colors for bar graph
colors = [get_cmap('viridis')(v) for v in np.linspace(0,1,len(kinds))]
sns.set(context="talk")
nxplots = len(grouped.index.levels[0])
nyplots = len(grouped.index.levels[1])
fig, axes = plt.subplots(nxplots,
nyplots,
sharey=True,
sharex=True,
figsize=(10,12))
fig.suptitle('City, occupation, and blood type')
# plot the data
for a, b in enumerate(grouped.index.levels[0]):
for i, j in enumerate(grouped.index.levels[1]):
axes[a,i].bar(kinds,grouped.loc[b,j],color=colors)
axes[a,i].xaxis.set_ticks([])
axeslabels = fig.add_subplot(111, frameon=False)
plt.tick_params(labelcolor='none', top=False, bottom=False, left=False, right=False)
plt.grid(False)
axeslabels.set_ylabel('City',rotation='horizontal',y=1,weight="bold")
axeslabels.set_xlabel('Occupation',weight="bold")
# x- and y-axis labels
for i, j in enumerate(grouped.index.levels[1]):
axes[nyplots,i].set_xlabel(j)
for i, j in enumerate(grouped.index.levels[0]):
axes[i,0].set_ylabel(j)
# Tune this manually to make room for the legend
fig.subplots_adjust(right=0.82)
fig.legend([Patch(facecolor = i) for i in colors],
kinds,
title="Blood type",
loc="center right")
Returns this:
I'd appreciate any feedback, and I'd still love it if someone could provide the preferred solution.

Seaborn distplot only whole numbers

How can I make a distplot with seaborn to only have whole numbers?
My data is an array of numbers between 0 and ~18. I would like to plot the distribution of the numbers.
Impressions
0 210
1 1084
2 2559
3 4378
4 5500
5 5436
6 4525
7 3329
8 2078
9 1166
10 586
11 244
12 105
13 51
14 18
15 5
16 3
dtype: int64
Code I'm using:
sns.distplot(Impressions,
# bins=np.arange(Impressions.min(), Impressions.max() + 1),
# kde=False,
axlabel=False,
hist_kws={'edgecolor':'black', 'rwidth': 1})
plt.xticks = range(current.Impressions.min(), current.Impressions.max() + 1, 1)
Plot looks like this:
What I'm expecting:
The xlabels should be whole numbers
Bars should touch each other
The kde line should simply connect the top of the bars. By the looks of it, the current one assumes to have 0s between (x, x + 1), hence why the downward spike (This isn't required, I can turn off kde)
Am I using the correct tool for the job or distplot shouldn't be used for whole numbers?
For your problem can be solved bellow code,
import seaborn as sns # for data visualization
import numpy as np # for numeric computing
import matplotlib.pyplot as plt # for data visualization
arr = np.array([1,2,3,4,5,6,7,8,9])
sns.distplot(arr, bins = arr, kde = False)
plt.xticks(arr)
plt.show()
enter image description here
In this way, you can plot histogram using seaborn sns.distplot() function.
Note: Whatever data you will pass to bins and plt.xticks(). It should be an ascending order.

Using pandas series date as xtick label

I have this dataframe called 'dfArrivalDate' (with the first 11 rows shown)
arrival_date count
0 2013-06-08 9
1 2013-06-27 8
2 2013-03-06 8
3 2013-06-01 8
4 2013-06-28 6
5 2012-11-28 6
6 2013-06-11 5
7 2013-06-29 5
8 2013-06-09 4
9 2013-06-03 3
10 2013-05-31 3
sortedArrivalDate = transform.sort('arrival_date')
I wanted to plot them in a bar chart to see the count by arrival date. I called
sortedArrivalDate.plot(kind = 'bar') [![enter image description here][1]]
but i'm getting the index as the row ticks of my bar chart. I figured i need to use 'xticks'.
sortedArrivalDate.plot(kind = 'bar', xticks = sortedArrivalDate.arrival_date)
but I run into the error: TypeError: Cannot compare type 'Timestamp' with type 'float'
I tried a different approach.
fig, ax = plt.subplots()
ax.plot(sortedArrivalDate.arrival_date, sortedArrivalDate.count)
This time the error is ValueError: x and y must have same first dimension
I'm thinking this might just be an easy fix and since I don't have much experience coding in pandas and matplotlib, I might be missing a very simple thing here. Care to guide me in the right direction? thanks.
IIUC:
df = df.sort_values(by='arrival_date')
df.plot(x='arrival_date', y='count', kind='bar')

how to plot two barh in one axis in pyqtgraph?

I need something like this:
demo data:
bottom10
Out[12]:
0 -9.823127e+08
1 -8.069270e+08
2 -6.030317e+08
3 -5.709379e+08
4 -5.224355e+08
5 -4.755464e+08
6 -4.095561e+08
7 -3.989287e+08
8 -3.885740e+08
9 -3.691114e+08
Name: amount, dtype: float64
top10
Out[13]:
0 9.360520e+08
1 9.078776e+08
2 6.603838e+08
3 4.967611e+08
4 4.409362e+08
5 3.914972e+08
6 3.547471e+08
7 3.538894e+08
8 3.368558e+08
9 3.189895e+08
Name: amount, dtype: float64
The same question for matplotlib is here:how to plot two barh in one axis
But there is not any ax.twiny() in pyqtgraph. Any other way?
I found a Widgets "BarGraphItem",which not written in offical documentation(PyQtGraph’s Widgets List). It can "rotate()" to make barh like matplotlib. It's not perfect but works!
import pyqtgraph as pg
import pandas as pd
import numpy as np
bottom10 = pd.DataFrame({'amount':-np.sort(np.random.rand(10))})
top10 = pd.DataFrame({'amount':np.sort(np.random.rand(10))[::-1]})
maxtick=max(top10.amount.max(),-bottom10.amount.min())*1.3
win1 = pg.plot()
axtop=pg.BarGraphItem(x=range(len(top10)),height=top10.amount,width=0.6,brush='r')
axtop.rotate(-90)
win1.addItem(axtop)
axbt=pg.BarGraphItem(x=range(len(top10)),height=-bottom10.amount,y0=maxtick+bottom10.amount,width=0.6,brush='g')
axbt.rotate(-90)
win1.addItem(axbt)

Categories