how to expand or "zoom" pandas plot() figure? - python

I am plotting with pandas plot() functions as follows:
In:
from matplotlib.pyplot import *
from datetime import date
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
fig, ax = subplots()
df['session_duration_seconds'].sort_index().value_counts().plot(figsize=(25,10), fontsize=24)
ax.legend(['session_duration_seconds'],fontsize=22)
ax.set_xlabel("Title", fontsize=22)
ax.set_ylabel("Title", fontsize=22)
ax.grid()
However, my plot looks very "behind" I would like to expand the plot in order to show in more detail the following section of the figure:
Out:
Thus, my question is how can I expand or getting more close with pandas plot over that portion of the image?

Just an example to show how this could work:
df = pd.DataFrame({'Values': [1000, 1, 2, 3 , 4 , 2, 5]})
df.plot()
Now let's restrict the y-range
import matplotlib.pyplot as plt
df.plot()
plt.ylim(0, 10)
and we see the details of the curve.
Note that the curve is so steep near 0 due to the huge slope induced by the first y-value of 1000.
Also you can just scale the y-axis directly form within pandas plot functions:
df.plot(logy=True)

Related

How to draw 2 histograms in 1 table?

I was planning to combine these 2 histograms under 1 table. Also, they need to be side by side, i.e., data cannot overlap each other.
import matplotlib.pyplot as plt
df.hist(column='oq_len', bins = 25, color = 'blue')
df.hist(column='cq_len', bins = 25, color = 'red')
plt.show()
Using seaborn, histograms can be combined. multiple='dodge' will place the bars for different columns next to each other. shrink=0.8 will make the bars a bit narrower (default they occupy all available space).
The original plot gives the impression that the values are all integers. The bin edges should take this into account. sns.histplot's dicrete=True parameter takes care of that.
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'cq_len': np.random.geometric(p=0.25, size=500).clip(0, 15),
'oq_len': np.random.geometric(p=0.31, size=500).clip(0, 15)})
sns.set_style('whitegrid')
plt.figure(figsize=(12, 5))
ax = sns.histplot(data=df, discrete=True, multiple='dodge', shrink=0.8)
ax.xaxis.set_major_locator(MultipleLocator(1))
ax.margins(x=0.01)
plt.show()

Change y axis range of a secondary axis in python Matplotlib

I have two plots overlaid on each other generated by the following code:
import matplotlib.pyplot as plt
import pandas as pd
width=.5
t=pd.DataFrame({'bars':[3.4,3.1,5.1,5.1,3.8,4.2,5.2,4.0,3.6],'lines':[2.4,2.2,2.4,2.1,2.0,2.1,1.9,1.8,1.9]})
t['bars'].plot(kind='bar',width=width)
t['lines'].plot(secondary_y=True, color='red')
ax=plt.gca()
plt.xlim([-width,len(t['bars'])-width])
ax.set_xticklabels(('1','2','3','4','5','6','7','8','9'))
plt.show()
I want to be able to scale the range of the second y axis to go from 0.0 to 2.5 (instead of 1.8 to 2.4) in steps of .5. How can I define this without changing the bar chart at all?
Pandas returns the axis on which it plots when you call the plot function. Just save that axis and modify the limits using the object oriented approach.
import matplotlib.pyplot as plt
import pandas as pd
width=.5
t=pd.DataFrame({'bars':[3.4,3.1,5.1,5.1,3.8,4.2,5.2,4.0,3.6],'lines':[2.4,2.2,2.4,2.1,2.0,2.1,1.9,1.8,1.9]})
ax1 = t['bars'].plot(kind='bar',width=width)
ax2 = t['lines'].plot(secondary_y=True, color='red')
ax2.set_ylim(0, 2.5)
ax1.set_xlim([-width,len(t['bars'])-width])
ax1.set_xticklabels(('1','2','3','4','5','6','7','8','9'))
plt.show()

Plot data from two DataFrame with only one colorbar in a scatter plot

I have two DataFrame for two different datasets that contain columns RA,Dec, and Vel. I need to plot them to a same scatter plot and show one colorbar instead of two. There's similar question using pure matplotlib here, but I need to do it using scatter plot function from pandas. Here's my experiment so far:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data1 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-20,10,5)})
data2 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-10,20,5)})
fig, ax = plt.subplots(figsize=(12, 10))
data1.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='^',ax=ax,label='Methanol',vmin=-20, vmax=20)
data2.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='o',ax=ax,label='Water',vmin=-20, vmax=20)
ax.set_xlabel('$\Delta$RA (arcsec.)')
ax.set_ylabel('$\Delta$Dec. (arcsec.)')
ax.set_title('Maser Spot')
ax.invert_xaxis()
ax.legend(loc=2)
Using this code, I managed to plot two DataFrame into one scatter plot. But it shows two colorbars as you can see here:
Test Case.
Any help is appreciated.
You can just add colorbar = False in the first plot.
The final code will be :
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data1 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-20,10,5)})
data2 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-10,20,5)})
fig, ax = plt.subplots(figsize=(12, 10))
data1.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='^',ax=ax,label='Methanol',vmin=-20, vmax=20,
colorbar=False)
data2.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='o',ax=ax,label='Water',vmin=-20, vmax=20)
ax.set_xlabel('$\Delta$RA (arcsec.)')
ax.set_ylabel('$\Delta$Dec. (arcsec.)')
ax.set_title('Maser Spot')
ax.invert_xaxis()
ax.legend(loc=2)

How to add error bars in matplotlib for multiple groups from dataframe?

I've run multiple regressions and stored the coefficients and standard errors into a data frame like this:
I wanted to make a graph that shows how the coefficient changes for each group over time, like so:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(14,8))
sns.set(style= "whitegrid")
sns.lineplot(x="time", y="coef",
hue="group",
data=eventstudy)
plt.axhline(y=0 , color='r', linestyle='--')
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show
plt.savefig('eventstudygraph.png')
Which produces:
But I would like to include error bars using the 'stderr' data from my main data set.
I think I can do it using 'plt.errorbar'. But can't seem to figure out how to make it work. At the moment, I've tried adding the 'plt.errorbar line and experimenting different with different iterations:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(14,8))
sns.set(style= "whitegrid")
sns.lineplot(x="time", y="coef",
hue="group",
data=eventstudy)
plt.axhline(y=0 , color='r', linestyle='--')
plt.errorbar("time", "coef", xerr="stderr", data=eventstudy)
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show
plt.savefig('eventstudygraph.png')
As you can see, it seems to be creating it's own group/line in the graph. I think I would know how to use 'plt.errorbar' if I had just one group, but I don't have a clue how to make it work for 3 groups. Is there some way of making 3 versions of 'plt.errorbar' so I can create the error bars for each group separately? Or is there something simpler?
You need to iterate through the different groups, and plot the errorbar separately, what you have above is plotting all the error bars at one go:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(111)
df = pd.DataFrame({"time":[1,2,3,4,5]*3,"coef":np.random.uniform(-0.5,0.5,15),
"stderr":np.random.uniform(0.05,0.1,15),
"group":np.repeat(['Monthly','3 Monthly','6 Monthly'],5)})
fig,ax = plt.subplots(figsize=(14,8))
sns.set(style= "whitegrid")
lvls = df.group.unique()
for i in lvls:
ax.errorbar(x = df[df['group']==i]["time"],
y=df[df['group']==i]["coef"],
yerr=df[df['group']==i]["stderr"],label=i)
ax.axhline(y=0 , color='r', linestyle='--')
ax.legend()

Multiple graphs instead one using Matplotlib

The code below takes a dataframe filters by a string in a column and then plot the values of another column
I plot the values of the using histogram and than worked fine until I added Mean, Median and standard deviation but now I am just getting an empty graph where instead the all of the variables mentioned below should be plotted in one graph together with their labels
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
from matplotlib import pyplot as plt
import numpy as np
df = pd.read_csv(r'C:/Users/output.csv', delimiter=";", encoding='unicode_escape')
df['Plot_column'] = df['Plot_column'].str.split(',').str[0]
df['Plot_column'] = df['Plot_column'].astype('int64', copy=False)
X=df[df['goal_colum']=='start running']['Plot_column'].values
dev_x= X
mean_=np.mean(dev_x)
median_=np.median(dev_x)
standard_=np.std(dev_x)
plt.hist(dev_x, bins=5)
plt.plot(mean_, label='Mean')
plt.plot(median_, label='Median')
plt.plot(standard_, label='Std Deviation')
plt.title('Data')
https://matplotlib.org/3.1.1/gallery/statistics/histogram_features.html
There are two major ways to plot in matplotlib, pyplot (the easy way) and ax (the hard way). Ax lets you customize your plot more and you should work to move towards that. Try something like the following
num_bins = 50
fig, ax = plt.subplots()
# the histogram of the data
n, bins, patches = ax.hist(dev_x, num_bins, density=1)
ax.plot(np.mean(dev_x))
ax.plot(np.median(dev_x))
ax.plot(np.std(dev_x))
# Tweak spacing to prevent clipping of ylabel
fig.tight_layout()
plt.show()

Categories