Plotting multiple Pandas autocorrelation plots in different plots - python

My question is somewhat related to this one. I have a Pandas DataFrame and I want to separately plot the autocorrelation function for value each item in category. Below is what I've tried, and it plots all of the autocorrelation functions on the same plot. How can I plot them separately and also control plot size?
# Import libraries
import pandas as pd
from pandas.plotting import autocorrelation_plot
# Create DataFrame
df = pd.DataFrame({
'category': ['sav','sav','sav','sav','sav','check','check','check','check','check','cd','cd','cd','cd','cd'],
'value': [1.2,1.3,1.5,1.7,1.8, 10,13,17,20,25, 7,8,8.5,9,9.3]
})
# Loop through for each item in category and plot autocorrelation function
for cat in df['category'].unique():
s = df[df['category']==cat]['value']
s = s.diff().iloc[1:] #First order difference to de-trend
ax = autocorrelation_plot(s)

One easy way is to force rendering after each iteration with plt.show():
# Loop through for each item in category and plot autocorrelation function
for cat in df['category'].unique():
# create new figure, play with size
plt.figure(figsize=(10,6))
s = df[df['category']==cat]['value']
s = s.diff().iloc[1:] #First order difference to de-trend
ax = autocorrelation_plot(s)
plt.show() # here
Also the syntax can be simplified with groupby:
for cat, data in df.groupby('category')['value']:
plt.figure(figsize=(10,6))
autocorrelation_plot(data.diff().iloc[1:])
plt.title(cat)
plt.show()

Related

No Output: Bar Graph Using Matplotlib

I have a df of Airbnb where each row represents a airbnb listing. I am trying to plot two columns as bar plot using Matplotlib.
fig,ax= plt.subplots()
ax.bar(airbnb['neighbourhood_group'],airbnb['revenue'])
plt.show()
What I think is, this graph should plot every neighbourhood on x axis and avg revenue per neighbourhood group on y axis(by default bar graph takes mean value per category)
This code of line keeps on running without giving me any error as if it has entered an indefinite while loop.
Can someone please suggest what could be wrong?
following I have used a dataframe, since none is available.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create sample DataFrame
y = np.random.rand(10,2)
y[:,0]= np.arange(10)
df = pd.DataFrame(y, columns=["neighbourhood_group", "revenue"])
Make sure that the "np.random" always gives different values for the revenue column when you start the program.
df:
# bar plot
ax = df.plot(x="neighbourhood_group", y="revenue", kind="bar")
regarding your statement that your code runs like in a loop. Could it be that the amount of data to be processed from the DataFrame to display the bar chart is too much effort. However, to say that for sure you would have to provide us with a dataset.

Creating whisker plots from grouped pandas Series

I have a dataset of values arriving in 5min timestamped intervals that I'm visualising grouped by hours of day, like this
I want to turn this into a whisker/box plot for the added information. However, the implementations of matplotlib, seaborn and pandas of this plot all want an array of raw data to compute the plot's contents themselves.
Is there a way to create whisker plots from pre-computed/grouped mean, median, std and quartiles? I would like to avoid reinventing the wheel with a comparatively inefficient grouping algorithm to build per-day datasets just for this.
This is some code to produce toy data and a version of the current plot.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# some toy data in a 15-day range
data = [1.5+np.sin(x)*5 for x in np.arange(0, 403.3, .1)]
s = pd.Series(data=data, index=pd.date_range('2019-01-01', '2019-01-15', freq='5min'))
s.groupby(s.index.hour).mean().plot(kind='bar')
plt.show()
Adding to #Quang Hoang's solution: You can use hlines() to display the median as well:
axis.bar(data.index, data['q75'] - data['q25'], bottom=data['q25'], width=wd)
axis.hlines(y=data['median'], xmin=data.index-wd/2, xmax=data.index+wd/2, color='black', linewidth=1)
I don't think there is anything for that. But you can create a whisker plot fairly simply with two plot command:
# precomputed data:
data = (s.groupby(s.index.hour)
.agg(['mean','std','median',
lambda x: x.quantile(.25),
lambda x: x.quantile(.75)])
)
data.columns = ['mean','std','median','q25','q75']
# plot the whiskers with `errorbar` from `mean` and `std`
fig, ax = plt.subplots(figsize=(12,6))
ax.errorbar(data.index,data['mean'],
yerr=data['std']*1.96,
linestyle='none',
capsize=5
)
# plot the boxes with `bar` at bottoms from quantiles
ax.bar(data.index, data['q75']-data['q25'], bottom=data['q25'])
Output:

Grid of plots with lines overplotted in matplotlib

I have a dataframe that consists of a bunch of x,y data that I'd like to see in scatter form along with a line. The dataframe consists of data with its form repeated over multiple categories. The end result I'd like to see is some kind of grid of the plots, but I'm not totally sure how matplotlib handles multiple subplots of overplotted data.
Here's an example of the kind of data I'm working with:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
category = np.arange(1,10)
total_data = pd.DataFrame()
for i in category:
x = np.arange(0,100)
y = 2*x + 10
data = np.random.normal(0,1,100) * y
dataframe = pd.DataFrame({'x':x, 'y':y, 'data':data, 'category':i})
total_data = total_data.append(dataframe)
We have x data, we have y data which is a linear model of some kind of generated dataset (the data variable).
I had been able to generate individual plots based on subsetting the master dataset, but I'd like to see them all side-by-side in a 3x3 grid in this case. However, calling the plots within the loop just overplots them all onto one single image.
Is there a good way to take the following code block and make a grid out of the category subsets? Am I overcomplicating it by doing the subset within the plot call?
plt.scatter(total_data['x'][total_data['category']==1], total_data['data'][total_data['category']==1])
plt.plot(total_data['x'][total_data['category']==1], total_data['y'][total_data['category']==1], linewidth=4, color='black')
If there's a simpler way to generate the by-category scatter plus line, I'm all for it. I don't know if seaborn has a similar or more intuitive method to use than pyplot.
You can use either sns.FacetGrid or manual plt.plot. For example:
g = sns.FacetGrid(data=total_data, col='category', col_wrap=3)
g = g.map(plt.scatter, 'x','data')
g = g.map(plt.plot,'x','y', color='k');
Gives:
Or manual plt with groupby:
fig, axes = plt.subplots(3,3)
for (cat, data), ax in zip(total_data.groupby('category'), axes.ravel()):
ax.scatter(data['x'], data['data'])
ax.plot(data['x'], data['y'], color='k')
gives:

Scatterplot groupby columns in python

I am trying to make a scatterplot over two different types of categorical variables, each with three different levels. Right now I am using the seaborn library in python:
sns.pairplot(x_vars = ['UTM_x'], y_vars = ['UTM_y'], data = df, hue = "Mobility_Provider", height = 5)
sns.pairplot(x_vars = ['UTM_x'], y_vars = ['UTM_y'], data = df, hue = "zone_number", height = 5)
which gives me two separate scatter plot, one grouped by Mobility_Provider, one grouped by zone_number. However, I was wondering if it's possible to combine these two graphs together, e.g. different levels of Mobility_Provider are represented in different colours, while different levels of zone_number are represented in different shapes/markers of the plot.
Thanks a lot!
A sample plot would be:
Plot1
Plot2
Each row of the df has x and y values, and two categorical variables ("Mobility_Provider" and "zone_number")
This can be easily done using seaborn's scatterplot, just use
hue = "Mobility_Provider",style="zone_number"
Something like this
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import pandas as pd
df = pd.DataFrame({'x':[1,2,3,4],'y':[1,2,3,4],'Mobility_Provider':[0,0,1,1],\
'zone_number':[0,1,0,1]})
sns.scatterplot(x="x", y="y",s=100,hue='Mobility_Provider',style='zone_number', data=df)
plt.show()

Plotting Pandas groupby groups using subplots and loop

I am trying to generate a grid of subplots based off of a Pandas groupby object. I would like each plot to be based off of two columns of data for one group of the groupby object. Fake data set:
C1,C2,C3,C4
1,12,125,25
2,13,25,25
3,15,98,25
4,12,77,25
5,15,889,25
6,13,56,25
7,12,256,25
8,12,158,25
9,13,158,25
10,15,1366,25
I have tried the following code:
import pandas as pd
import csv
import matplotlib as mpl
import matplotlib.pyplot as plt
import math
#Path to CSV File
path = "..\\fake_data.csv"
#Read CSV into pandas DataFrame
df = pd.read_csv(path)
#GroupBy C2
grouped = df.groupby('C2')
#Figure out number of rows needed for 2 column grid plot
#Also accounts for odd number of plots
nrows = int(math.ceil(len(grouped)/2.))
#Setup Subplots
fig, axs = plt.subplots(nrows,2)
for ax in axs.flatten():
for i,j in grouped:
j.plot(x='C1',y='C3', ax=ax)
plt.savefig("plot.png")
But it generates 4 identical subplots with all of the data plotted on each (see example output below):
I would like to do something like the following to fix this:
for i,j in grouped:
j.plot(x='C1',y='C3',ax=axs)
next(axs)
but I get this error
AttributeError: 'numpy.ndarray' object has no attribute 'get_figure'
I will have a dynamic number of groups in the groupby object I want to plot, and many more elements than the fake data I have provided. This is why I need an elegant, dynamic solution and each group data set plotted on a separate subplot.
Sounds like you want to iterate over the groups and the axes in parallel, so rather than having nested for loops (which iterates over all groups for each axis), you want something like this:
for (name, df), ax in zip(grouped, axs.flat):
df.plot(x='C1',y='C3', ax=ax)
You have the right idea in your second code snippet, but you're getting an error because axs is an array of axes, but plot expects just a single axis. So it should also work to replace next(axs) in your example with ax = axs.next() and change the argument of plot to ax=ax.

Categories