statsmodels autocorrelation doesn't use all values - python

I have some values over time that i plot with the autocorrelation:
import pandas as pd
from statsmodels.graphics.tsaplots import plot_acf
import matplotlib.pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/wwwusage.csv', names=['value'], header=0)
fig, axes = plt.subplots(2, sharex=True)
axes[0].plot(df.value); axes[0]
plot_acf(df.value, ax=axes[1])
plt.show()
Which return this plot, but should return this plot.
If i use the normal acf function without the plot, I get some more values in the plot but still not all:
import pandas as pd
from statsmodels.tsa.stattools import acf
import matplotlib.pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/wwwusage.csv', names=['value'], header=0)
fig, axes = plt.subplots(2, sharex=True)
axes[0].plot(df.value)
axes[1].plot(acf(df.value))
plt.show()
Why is that? I use the same variable df.value in both plots.
Edit:
If i use pandas i get this plot, that doesn't seem right. And I'd really like to use the first function I mentioned, since it's the best plot visualisation:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/wwwusage.csv', names=['value'], header=0)
fig, axes = plt.subplots(2, sharex=True)
axes[0].plot(df.value)
df_value_acf = [df.value.autocorr(i) for i in range(1,len(df.value))]
axes[1].plot(df_value_acf)
plt.show()

Related

how to set IQR on seaborn box plot to 10% 90% [duplicate]

I want to plot seaborn boxplot with box from min to max, rather than from 2nd to 3rd quartile, can I control it in matplotlib or seaborn? I know I can control the whiskers - how about boxes?
Here is an approach that mimics seaborn's boxplot via a horiontal plot using an aggregated dataframe.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# set sns plot style
sns.set()
tips = sns.load_dataset('tips')
fig, (ax1, ax2) = plt.subplots(nrows=2)
sns.boxplot(x='total_bill', y='day', data=tips, ax=ax1)
day_min_max = tips[['day', 'total_bill']].groupby('day').agg(['min', 'max', 'median'])
day_min_max.columns = day_min_max.columns.droplevel(0) # remove the old column name, only leaving 'min' and 'max'
ax2.use_sticky_edges = False
sns.barplot(y=day_min_max.index, x=day_min_max['median'] - day_min_max['min'], left=day_min_max['min'], ec='k', ax=ax2)
sns.barplot(y=day_min_max.index, x=day_min_max['max'] - day_min_max['median'], left=day_min_max['median'], ec='k', ax=ax2)
plt.tight_layout()
plt.show()
Depicting the first and third quartiles is the defining characteristic of a boxplot, so I don't think that this option exists. However, if you want to use the minima and maxima, you are not going to plot any whiskers, and hence you can simply use a barplot instead:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = np.random.rand(10, 3)
sns.barplot(x=np.arange(10), y=data.ptp(axis=1), bottom=data.min(axis=1))
plt.show()

Merge Count Plot and Mean in same plot SNS

I am trying to create a count plot and also add another plot on it which would actually be the mean of the other columns.
The sample data is in the below link:
Sample Data
I have used the below code to create the sns count plot:
df = pd.read_csv("latestfile.csv")
df.sort_values(by=["Business"],inplace=True)
sns.countplot(data=df,x=df["Business"],hue="location")
and I generate the below:
Now I use the groupby and use the below code to get the desired data:
dfg = df.groupby(["Business","location"])['Ageing'].mean().reset_index()
dfg.set_index("Business",inplace=True)
but how do I plot this on the same count plot on the different y axis.
Unable to think of a way to do it.
Below is what I am finally looking for:
Of course, you can squeeze another bar plot into the countplot graph:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
df = pd.read_csv("test.csv")
df.sort_values(by=["Business"],inplace=True)
ax1 = sns.countplot(data=df, x="Business", hue="location", palette="muted", edgecolor="black")
for patch in ax1.patches:
patch.set_x(patch.get_x() + 0.3 * patch.get_width())
ax1.legend(title="Count")
ax2 = ax1.twinx()
sns.barplot(data=df, x="Business", y="Ageing", hue="location", palette="bright", ci=None, ax=ax2, edgecolor="white")
ax2.legend(title="Ageing")
ax1.autoscale_view()
plt.show()
However, I would definitely prefer two subplots:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
df = pd.read_csv("test.csv")
df.sort_values(by=["Business"],inplace=True)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
sns.countplot(data=df, x="Business", hue="location", ax=ax1)
ax1.legend(title="Count")
sns.barplot(data=df, x="Business", y="Ageing", hue="location", ci=None, ax=ax2)
ax2.legend(title="Ageing")
plt.show()
Since you prefer now the distribution, you can combine the countplot with a stripplot:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
df = pd.read_csv("test.csv")
df.sort_values(by=["Business"],inplace=True)
ax1 = sns.countplot(data=df, x="Business", hue="location")
ax2 = ax1.twinx()
sns.stripplot(data=df, x="Business", y="Ageing", hue="location", jitter=True, dodge=True, ax=ax2, linewidth=1)
ax1.legend(title="", loc="upper center")
ax2.legend_.remove()
plt.show()

How to create specific plots using Pandas and then store them as PNG files?

So I am trying to create histograms for each specific variable in my dataset and then save it as a PNG file.
My code is as follows:
import pandas as pd
import matplotlib.pyplot as plt
x=combined_databook.groupby('x_1').hist()
x.figure.savefig("x.png")
I keep getting "AttributeError: 'Series' object has no attribute 'figure'"
Use matplotlib to create a figure and axis objects, then tell pandas which axes to plot on using the ax argument. Finally, use matplotlib (or the fig) to save the figure.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Sample Data (3 groups, normally distributed)
df = pd.DataFrame({'gp': np.random.choice(list('abc'), 1000),
'data': np.random.normal(0, 1, 1000)})
fig, ax = plt.subplots()
df.groupby('gp').hist(ax=ax, ec='k', grid=False, bins=20, alpha=0.5)
fig.savefig('your_fig.png', dpi=200)
your_fig.png
Instead of using *.hist() I would use matplotlib.pyplot.hist().
Example :
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
y =[10, 20,30,40,100,200,300,400,1000,2000]
x = np.arange(10)
fig = plt.figure()
ax = plt.subplot(111)
ax.plot(x, y, label='$y = Values')
plt.title('my plot')
ax.legend()
plt.show()
fig.savefig('tada.png')

How to fit cell size to the content in a dataframe plot table?

I'm trying to find a way to make the rows height of a Pandas DataFrame plot table fit to their content. If not possible, is there an alternative way to draw this kind of plot?
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
fig, ax = plt.subplots(1, 1,figsize =(8,2))
df = pd.DataFrame(np.round(np.random.rand(5, 3),2), columns=['a', 'b', 'c'],index=['1\na','2\na','3\na','4\na','5\na'])
ax.get_xaxis().set_visible(False) # Hide Ticks
df.plot(table=True, ax=ax)
fig.dpi = 600
Just remove every "\na" in your Dataframe index list and your problem should be solved.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
fig, ax = plt.subplots(1, 1,figsize =(8,2))
df = pd.DataFrame(np.round(np.random.rand(5, 3),2), columns=['a', 'b', 'c'],index=['1','2','3','4','5'])
ax.get_xaxis().set_visible(False) # Hide Ticks
df.plot(table=True, ax=ax)
fig.dpi = 600

Python seaborn legends cut off

The figure resulting from the Python code below unfortunately cuts off part of the legends. How can I avoid this? Did I miss a parameter in the sns call or is this due to how I've set up my PyCharm IDE?
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv('gm_2008_region.csv')
df = df.drop('Region', axis=1)
plt.figure()
sns.heatmap(df.corr(), square=True, cmap='RdYlGn')
plt.show()
This is the resulting figure:
The .csv file can be found here.
Try adding plt.subplots_adjust(bottom=0.28) as follows:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv('gm_2008_region.csv')
df = df.drop('Region', axis=1)
plt.figure()
sns.heatmap(df.corr(), square=True, cmap='RdYlGn')
plt.subplots_adjust(bottom=0.28)
plt.show()
Giving you:
You might want to change the figsize of plt.figure such as...
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv('gm_2008_region.csv')
df = df.drop('Region', axis=1)
plt.figure(figsize=(12, 8))
sns.heatmap(df.corr(), square=True, cmap='RdYlGn')
plt.show()

Categories