How to draw 2 histograms in 1 table? - python

I was planning to combine these 2 histograms under 1 table. Also, they need to be side by side, i.e., data cannot overlap each other.
import matplotlib.pyplot as plt
df.hist(column='oq_len', bins = 25, color = 'blue')
df.hist(column='cq_len', bins = 25, color = 'red')
plt.show()

Using seaborn, histograms can be combined. multiple='dodge' will place the bars for different columns next to each other. shrink=0.8 will make the bars a bit narrower (default they occupy all available space).
The original plot gives the impression that the values are all integers. The bin edges should take this into account. sns.histplot's dicrete=True parameter takes care of that.
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'cq_len': np.random.geometric(p=0.25, size=500).clip(0, 15),
'oq_len': np.random.geometric(p=0.31, size=500).clip(0, 15)})
sns.set_style('whitegrid')
plt.figure(figsize=(12, 5))
ax = sns.histplot(data=df, discrete=True, multiple='dodge', shrink=0.8)
ax.xaxis.set_major_locator(MultipleLocator(1))
ax.margins(x=0.01)
plt.show()

Related

Plot data from two DataFrame with only one colorbar in a scatter plot

I have two DataFrame for two different datasets that contain columns RA,Dec, and Vel. I need to plot them to a same scatter plot and show one colorbar instead of two. There's similar question using pure matplotlib here, but I need to do it using scatter plot function from pandas. Here's my experiment so far:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data1 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-20,10,5)})
data2 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-10,20,5)})
fig, ax = plt.subplots(figsize=(12, 10))
data1.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='^',ax=ax,label='Methanol',vmin=-20, vmax=20)
data2.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='o',ax=ax,label='Water',vmin=-20, vmax=20)
ax.set_xlabel('$\Delta$RA (arcsec.)')
ax.set_ylabel('$\Delta$Dec. (arcsec.)')
ax.set_title('Maser Spot')
ax.invert_xaxis()
ax.legend(loc=2)
Using this code, I managed to plot two DataFrame into one scatter plot. But it shows two colorbars as you can see here:
Test Case.
Any help is appreciated.
You can just add colorbar = False in the first plot.
The final code will be :
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data1 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-20,10,5)})
data2 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-10,20,5)})
fig, ax = plt.subplots(figsize=(12, 10))
data1.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='^',ax=ax,label='Methanol',vmin=-20, vmax=20,
colorbar=False)
data2.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='o',ax=ax,label='Water',vmin=-20, vmax=20)
ax.set_xlabel('$\Delta$RA (arcsec.)')
ax.set_ylabel('$\Delta$Dec. (arcsec.)')
ax.set_title('Maser Spot')
ax.invert_xaxis()
ax.legend(loc=2)

Multiple graphs instead one using Matplotlib

The code below takes a dataframe filters by a string in a column and then plot the values of another column
I plot the values of the using histogram and than worked fine until I added Mean, Median and standard deviation but now I am just getting an empty graph where instead the all of the variables mentioned below should be plotted in one graph together with their labels
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
from matplotlib import pyplot as plt
import numpy as np
df = pd.read_csv(r'C:/Users/output.csv', delimiter=";", encoding='unicode_escape')
df['Plot_column'] = df['Plot_column'].str.split(',').str[0]
df['Plot_column'] = df['Plot_column'].astype('int64', copy=False)
X=df[df['goal_colum']=='start running']['Plot_column'].values
dev_x= X
mean_=np.mean(dev_x)
median_=np.median(dev_x)
standard_=np.std(dev_x)
plt.hist(dev_x, bins=5)
plt.plot(mean_, label='Mean')
plt.plot(median_, label='Median')
plt.plot(standard_, label='Std Deviation')
plt.title('Data')
https://matplotlib.org/3.1.1/gallery/statistics/histogram_features.html
There are two major ways to plot in matplotlib, pyplot (the easy way) and ax (the hard way). Ax lets you customize your plot more and you should work to move towards that. Try something like the following
num_bins = 50
fig, ax = plt.subplots()
# the histogram of the data
n, bins, patches = ax.hist(dev_x, num_bins, density=1)
ax.plot(np.mean(dev_x))
ax.plot(np.median(dev_x))
ax.plot(np.std(dev_x))
# Tweak spacing to prevent clipping of ylabel
fig.tight_layout()
plt.show()

how to expand or "zoom" pandas plot() figure?

I am plotting with pandas plot() functions as follows:
In:
from matplotlib.pyplot import *
from datetime import date
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
fig, ax = subplots()
df['session_duration_seconds'].sort_index().value_counts().plot(figsize=(25,10), fontsize=24)
ax.legend(['session_duration_seconds'],fontsize=22)
ax.set_xlabel("Title", fontsize=22)
ax.set_ylabel("Title", fontsize=22)
ax.grid()
However, my plot looks very "behind" I would like to expand the plot in order to show in more detail the following section of the figure:
Out:
Thus, my question is how can I expand or getting more close with pandas plot over that portion of the image?
Just an example to show how this could work:
df = pd.DataFrame({'Values': [1000, 1, 2, 3 , 4 , 2, 5]})
df.plot()
Now let's restrict the y-range
import matplotlib.pyplot as plt
df.plot()
plt.ylim(0, 10)
and we see the details of the curve.
Note that the curve is so steep near 0 due to the huge slope induced by the first y-value of 1000.
Also you can just scale the y-axis directly form within pandas plot functions:
df.plot(logy=True)

Add Second Colorbar to a Seaborn Heatmap / Clustermap

I was trying to help someone add a colorbar for the vertical blue bar in the image below. We tried many variations of plt.colorbar(row_colors) (like above and below sns.clustermap()) and looked around online for 2 hours, but no luck. We just want to add a colorbar for the blues, please help!
import pickle
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
feat_mat, freq, label = pickle.load(open('file.pkl', 'rb'))
feat_mat_df = pd.DataFrame(feat_mat[4])
freq_df = pd.DataFrame(freq)
freq_df_transposed = freq_df.transpose()
my_palette = dict(zip(set(freq_df_transposed[int('4')]), sns.color_palette("PuBu", len(set(freq_df_transposed[int('4')]))))))
row_colors = freq_df_transposed[int('4')].map(my_palette)
sns.clustermap(feat_mat_df, metric="euclidean", standard_scale=1, method="complete", cmap="coolwarm", row_colors = row_colors)
plt.show()
This is where he based his code from: #405 Dendrogram with heatmap and coloured leaves
I think something like this should work for your purposes- I didn't have a clustermap example available but the logic is the same to do what you want to do. Basically-you're going to take that list of colors you made and imshow it, then hide the imshow plot, and plot the colorbar in its place.
In my example, I use make_axes_locatable to place axes next to the plot with your data to put the colorbar inside - https://matplotlib.org/2.0.2/mpl_toolkits/axes_grid/users/overview.html. I find placing a new axes for other objects (legends color maps or otherwise) easier than trying to draw them on the same axes.
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import seaborn as sns
from mpl_toolkits.axes_grid1 import make_axes_locatable
import random
uniform_data = np.random.rand(10, 12)
fig, ax = plt.subplots(1,1, figsize = (5,5))
divider = make_axes_locatable(ax)
axDivY = divider.append_axes( 'right', size=0.2, pad= 0.1)
axDivY2 = divider.append_axes( 'right', size=0.2, pad= 0.2)
# we will use this for the colorscale bar
axDivY3 = divider.append_axes( 'right', size=0.2, pad= 0.2)
ax1 = sns.heatmap(uniform_data, ax=ax, cbar_ax=axDivY)
# the palette you were using to make the label column on the clustermap
# some simulated labels for your data with values
color_label_list =[random.randint(0,20) for i in range(20)]
pal = sns.color_palette("PuBu", len(set(color_label_list)))
n = len(pal)
size = 1
# plot the colors with imshow to make a colormap later
ax2 = axDivY2.imshow(np.array([color_label_list]),
cmap=mpl.colors.ListedColormap(list(pal)),
interpolation="nearest", aspect="auto")
# turn off the axes so they aren't visible- note that you need ax.axis('off) if you have older matplotlib
axDivY2.set_axis_off()
axDivY2.set_visible(False)
# plot the colorbar on the other axes (which is on top of the one that we turned off)
plt.colorbar(ax2, cax = axDivY3) ;

set constant width to every bar in a bar plot

I am trying to plot a bar plot where each bin has a difference length and as a result I end up with a very ugly result.c:) What I would like to do is still be able to define a bin of deference lengths but all the bars be plotted the same fixed width. How can I do that? Here is what I have done so far:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_palette("deep", desat=.6)
sns.set_context(rc={"figure.figsize": (8, 4)})
np.random.seed(9221999)
data = [0,2,30,40,50,10,50,40,150,70,150,10,3,70,70,90,10,2]
bins = [0,1,2,3,4,5,6,7,8,9,10,20,30,40,50,60,70,80,90,100,200]
plt.hist(data, bins=bins);
EDIT
This question has been marked as duplicate but in fact non of the proposed links solved my problem; the 1st is a very crappy workaround and the 2nd doesn't solve the problem at all as it sets all bars' width to a certain number.
Here you go, with seaborn, as you please. But you have to understand that seaborn itself uses matplotlib to create plots.
AND: Please delete your other question, now it really is a duplicate.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_palette("deep", desat=.6)
sns.set_context(rc={"figure.figsize": (8, 4)})
data = [0,2,30,40,50,10,50,40,150,70,150,10,3,70,70,90,10,2]
bins = [0,1,2,3,4,5,6,7,8,9,10,20,30,40,50,60,70,80,90,100,200]
bin_middles = bins[:-1] + np.diff(bins)/2.
bar_width = 1.
m, bins = np.histogram(data, bins)
plt.bar(np.arange(len(m)) + (1-bar_width)/2., m, width=bar_width)
ax = plt.gca()
ax.set_xticks(np.arange(len(bins)))
ax.set_xticklabels(['{:.0f}'.format(i) for i in bins])
plt.show()
Personally I think, that plotting your data like this is confusing. Having non-linear (or non-log) axis scaling is usually not a good idea.
Are you wanting to place a bar with a fixed width at the center of each bin?
If so, try something something similar to this:
import numpy as np
import matplotlib.pyplot as plt
data = [0,2,30,40,50,10,50,40,150,70,150,10,3,70,70,90,10,2]
bins = [0,1,2,3,4,5,6,7,8,9,10,20,30,40,50,60,70,80,90,100,200]
counts, _ = np.histogram(data, bins)
centers = np.mean([bins[:-1], bins[1:]], axis=0)
plt.bar(centers, counts, width=5, align='center')
plt.show()

Categories