Seaborn distplot only whole numbers - python

How can I make a distplot with seaborn to only have whole numbers?
My data is an array of numbers between 0 and ~18. I would like to plot the distribution of the numbers.
Impressions
0 210
1 1084
2 2559
3 4378
4 5500
5 5436
6 4525
7 3329
8 2078
9 1166
10 586
11 244
12 105
13 51
14 18
15 5
16 3
dtype: int64
Code I'm using:
sns.distplot(Impressions,
# bins=np.arange(Impressions.min(), Impressions.max() + 1),
# kde=False,
axlabel=False,
hist_kws={'edgecolor':'black', 'rwidth': 1})
plt.xticks = range(current.Impressions.min(), current.Impressions.max() + 1, 1)
Plot looks like this:
What I'm expecting:
The xlabels should be whole numbers
Bars should touch each other
The kde line should simply connect the top of the bars. By the looks of it, the current one assumes to have 0s between (x, x + 1), hence why the downward spike (This isn't required, I can turn off kde)
Am I using the correct tool for the job or distplot shouldn't be used for whole numbers?

For your problem can be solved bellow code,
import seaborn as sns # for data visualization
import numpy as np # for numeric computing
import matplotlib.pyplot as plt # for data visualization
arr = np.array([1,2,3,4,5,6,7,8,9])
sns.distplot(arr, bins = arr, kde = False)
plt.xticks(arr)
plt.show()
enter image description here
In this way, you can plot histogram using seaborn sns.distplot() function.
Note: Whatever data you will pass to bins and plt.xticks(). It should be an ascending order.

Related

Plot with Histogram an attribute from a dataframe

I have a dataframe with the weight and the number of measures of each user. The df looks like:
id_user
weight
number_of_measures
1
92.16
4
2
80.34
5
3
71.89
11
4
81.11
7
5
77.23
8
6
92.37
2
7
88.18
3
I would like to see an histogram with the attribute of the table (weight, but I want to do it for both cases) at the x-axis and the frequency in the y-axis.
Does anyone know how to do it with matplotlib?
Ok, it seems to be quite easy:
import pandas as pd
import matplotlib.pyplot as plt
hist = df.hist(bins=50)
plt.show()

Python- compress lower end of y-axis in contourf plot

The issue
I have a contourf plot I made with a pandas dataframe that plots some 2-dimensional value with time on the x-axis and vertical pressure level on the y-axis. The field, time, and pressure data I'm pulling is all from a netCDF file. I can plot it fine, but I'd like to scale the y-axis to better represent the real atmosphere. (The default scaling is linear, but the pressure levels in the file imply a different king of scaling.) Basically, it should look something like the plot below on the y-axis. It's like a log scale, but compressing the bottom part of the axis instead of the top. (I don't know the term for this... like a log scale but inverted?) It doesn't need to be exact.
Working example (written in Jupyter notebook)
#modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker, colors
#data
time = np.arange(0,10)
lev = np.array([900,800,650,400,100])
df = pd.DataFrame(np.arange(50).reshape(5,10),index=lev,columns=time)
df.index.name = 'Level'
print(df)
0 1 2 3 4 5 6 7 8 9
Level
900 0 1 2 3 4 5 6 7 8 9
800 10 11 12 13 14 15 16 17 18 19
650 20 21 22 23 24 25 26 27 28 29
400 30 31 32 33 34 35 36 37 38 39
100 40 41 42 43 44 45 46 47 48 49
#lists for plotting
levtick = np.arange(len(lev))
clevels = np.arange(0,55,5)
#Main plot
fig, ax = plt.subplots(figsize=(10, 5))
im = ax.contourf(df,levels=clevels,cmap='RdBu_r')
#x-axis customization
plt.xticks(time)
ax.set_xticklabels(time)
ax.set_xlabel('Time')
#y-axis customization
plt.yticks(levtick)
ax.set_yticklabels(lev)
ax.set_ylabel('Pressure')
#title and colorbar
ax.set_title('Some mean time series')
cbar = plt.colorbar(im,values=clevels,pad=0.01)
tick_locator = ticker.MaxNLocator(nbins=11)
cbar.locator = tick_locator
cbar.update_ticks()
The Question
How can I scale the y-axis such that values near the bottom (900, 800) are compressed while values near the top (200) are expanded and given more plot space, like in the sample above my code? I tried using ax.set_yscale('function', functions=(forward, inverse)) but didn't understand how it works. I also tried simply ax.set_yscale('log'), but log isn't what I need.
You can use a custom scale transformation with ax.set_yscale('function', functions=(forward, inverse)) as you suggested. From the documentation:
forward and inverse are callables that return the scale transform
and its inverse.
In this case, define in forward() the function you want, such as the inverse of the log function, or a more custom one for your need. Call this function before your y-axis customization.
def forward(x):
return 2**x
def inverse(x):
return np.log2(x)
ax.set_yscale('function', functions=(forward,inverse))

Plotting three dimensions of categorical data in Python

My data has three categorical variables I'm trying to visualize:
City (one of five)
Occupation (one of four)
Blood type (one of four)
So far, I've succeeded in grouping the data in a way that I think will be easy to work with:
import numpy as np, pandas as pd
# Make data
cities = ['Tijuana','Las Vegas','Los Angeles','Anaheim','Atlantis']
occupations = ['Doctor','Lawyer','Engineer','Drone security officer']
bloodtypes = ['A','B','AB','O']
df = pd.DataFrame({'City': np.random.choice(cities,500),
'Occupation': np.random.choice(occupations,500),
'Blood Type':np.random.choice(bloodtypes,500)})
# You need to make a dummy column, otherwise the groupby returns an empty df
df['Dummy'] = np.ones(500)
# This is now what I'd like to plot
df.groupby(by=['City','Occupation','Blood Type']).count().unstack(level=1)
Returns:
Dummy
Occupation Doctor Drone security officer Engineer Lawyer
City Blood Type
Anaheim A 7 7 7 7
AB 6 10 8 5
B 2 10 4 2
O 4 3 3 6
Atlantis A 6 5 5 7
AB 12 7 7 10
B 7 4 7 3
O 7 4 6 4
Las Vegas A 8 4 8 5
AB 5 6 8 9
B 6 10 6 6
O 6 9 5 9
Los Angeles A 7 4 8 8
AB 9 8 8 8
B 3 6 4 1
O 9 11 11 9
Tijuana A 3 4 5 3
AB 9 5 5 7
B 3 6 4 9
O 3 5 5 8
My goal is to create something like the Seaborn swarmplot shown below, which comes from the Seaborn documentation. Seaborn applies jitter to the quantitative data so that you can see the individual data points and their hues:
With my data, I'd like to plot City on the x-axis and Occupation on the y-axis, applying jitter to each, and then hue by Blood type. However, sns.swarmplot requires one of the axes to be quantitative:
sns.swarmplot(data=df,x='City',y='Occupation',hue='Blood Type')
returns an error.
An acceptable alternative might be to create 20 categorical bar plots, one for each intersection of City and Occupation, which I would do by running a for loop over each category, but I can't imagine how I'd feed that to matplotlib subplots to get them in a 4x5 grid.
The most similar question I could find was in R, and the asker only wanted to indicate the most common value for the third variable, so I didn't get any good ideas from there.
Thanks for any help you can provide.
Alright, I got to work on the "acceptable alternative" today and I have found a solution using basically pure matplotlib (but I stuck the Seaborn styling on top of it, just because).
import numpy as np, pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cm import get_cmap
from matplotlib.patches import Patch
import seaborn as sns
# Make data
cities = ['Tijuana','Las Vegas','Los Angeles','Anaheim','Atlantis']
occupations = ['Doctor','Lawyer','Engineer','Drone security officer']
bloodtypes = ['A','B','AB','O']
df = pd.DataFrame({'City': np.random.choice(cities,500),
'Occupation': np.random.choice(occupations,500),
'Blood Type':np.random.choice(bloodtypes,500)})
# Make a dummy column, otherwise the groupby returns an empty df
df['Dummy'] = np.ones(500)
# This is now what I'd like to plot
grouped = df.groupby(by=['City','Occupation','Blood Type']).count().unstack()
# List of blood types, to use later as categories in subplots
kinds = grouped.columns.levels[1]
# colors for bar graph
colors = [get_cmap('viridis')(v) for v in np.linspace(0,1,len(kinds))]
sns.set(context="talk")
nxplots = len(grouped.index.levels[0])
nyplots = len(grouped.index.levels[1])
fig, axes = plt.subplots(nxplots,
nyplots,
sharey=True,
sharex=True,
figsize=(10,12))
fig.suptitle('City, occupation, and blood type')
# plot the data
for a, b in enumerate(grouped.index.levels[0]):
for i, j in enumerate(grouped.index.levels[1]):
axes[a,i].bar(kinds,grouped.loc[b,j],color=colors)
axes[a,i].xaxis.set_ticks([])
axeslabels = fig.add_subplot(111, frameon=False)
plt.tick_params(labelcolor='none', top=False, bottom=False, left=False, right=False)
plt.grid(False)
axeslabels.set_ylabel('City',rotation='horizontal',y=1,weight="bold")
axeslabels.set_xlabel('Occupation',weight="bold")
# x- and y-axis labels
for i, j in enumerate(grouped.index.levels[1]):
axes[nyplots,i].set_xlabel(j)
for i, j in enumerate(grouped.index.levels[0]):
axes[i,0].set_ylabel(j)
# Tune this manually to make room for the legend
fig.subplots_adjust(right=0.82)
fig.legend([Patch(facecolor = i) for i in colors],
kinds,
title="Blood type",
loc="center right")
Returns this:
I'd appreciate any feedback, and I'd still love it if someone could provide the preferred solution.

How do i show the proper count value in seaborn?

CH Gayle 17
YK Pathan 16
AB de Villiers 15
DA Warner 14
SK Raina 13
RG Sharma 13
MEK Hussey 12
AM Rahane 12
MS Dhoni 12
G Gambhir 12
I have a series like this. I want to plot the player on the x axis and their respective value on the y axis. I tried this code:
man_of_match=(matches['player_of_match'].value_counts())
sns.countplot(x=(man_of_match),data=matches,color='B')
sns.plt.show()
But with this code, it plots the frequency of the numeric value, i.e on x axis 12 gets plotted and the count on y axis becomes 4. Similarly for 13 on x axis it shows 2 on y axis.
How do i make the x axis show the name of the player and the y axis the corresponding value of the player.?
sns.countplot is meant to do the counting for you. You are counting yourself with value_counts then plotting the counts of counts. Pass matches directly to sns.countplot
ax = sns.countplot(matches['player_of_match'], color='B')
plt.sca(ax)
plt.xticks(rotation=90);
If you want to limit it to the top 10 players. Use value_counts as you did. But use matplotlib directly, to plot.
ax = matches['player_of_match'].value_counts().head(10).plot.bar(width=.8, color='R')
ax.set_xlabel('player_of_match')
ax.set_ylabel('count')
You can get it to look a lot like the seaborn plot
kws = dict(width=.8, color=sns.color_palette('pastel'))
ax = matches['player_of_match'].value_counts().head(10).plot.bar(**kws)
ax.set_xlabel('player_of_match')
ax.set_ylabel('count')
ax.grid(False, axis='x')

Plotting rows corresponding to a given y-value separately

Suppose I have a table of data-
No. 200 400 600 800
1 13 14 17 18
2 16 18 20 21
3 20 15 18 19
and so on...
where each column represents a y-value for a given x-value. The first line is the x-value and the first column is the number of each dataset.
How can I read in and plot each row seperately?
For an idea of how I would like my results to be for the table I have quoted above see the following images. I have plotted each plot individually.
http://postimg.org/image/yw46zw7er/92d01c08/
http://postimg.org/image/c1kf2nqwp/29a8b1c8/
Matplotlib plots 2d arrays by plotting each column, so here you just need to transpose your data. Assuming the data is in a text file called data.csv.
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.csv')
x = [200, 400, 600, 800]
plt.plot(x, data.T)
plt.legend((1,2,3))
plt.show()

Categories