Need help changing x axis intervals on sns.lmplot - python

As you can see from the picture below, some of the data is cut off on the end. Does anyone know how to fix that? Also I want the intervals for weeks on the x axis to be (1,2,3...13) for weeks 1-13. Thanks.

Since you did not provide the data, I used an example below:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(111)
df = pd.DataFrame({'Week':np.tile(np.arange(1,15),2),
'Score':np.random.uniform(np.repeat([0,1],14),
np.repeat([1,2],14),28),
'Win':np.repeat(['0','1'],14)
})
sns.lmplot returns a FacetGrid so you can set the axis ticks like this:
g = sns.lmplot(data=df,x='Week',y='Score',hue='Win')
g.set(xlim = (0.5,14.5))
g.set(xticks=range(14))

Related

Plotting tendency line in Python

I want to plot a tendency line on top of a data plot. This must be simple but I have not been able to figure out how to get to it.
Let us say I have the following:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=list('A'))
sns.lineplot(data=df)
ax.set(xlabel="Index",
ylabel="Variable",
title="Sample")
plt.show()
The resulting plot is:
What I would like to add is a tendency line. Something like the red line in the following:
I thank you for any feedback.
A moving average is one method (my first thought, and already suggested).
Another method is to use a polynomial fit. Since you had 100 points in your original data, I picked a 10th order fit (square root of data length) in the example below. With some modification of your original code:
idx = [i for i in range(100)]
rnd = np.random.randint(0,100,size=100)
ser = pd.Series(rnd, idx)
fit = np.polyfit(idx, rnd, 10)
pf = np.poly1d(fit)
plt.plot(idx, rnd, 'b', idx, pf(idx), 'r')
This code provides a plot like this:
You can do something like this using Rolling Average:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = np.random.randint(0,100,size=(100, 1))
df["rolling_avg"] = df.A.rolling(7).mean().shift(-3)
sns.lineplot(data=df)
plt.show()
You could also do a Regression plot to analyse how data can be interpolated using:
ax = sns.regplot(x=df.index, y="A",
data=df,
scatter_kws={"s": 10},
order=10,
ci=None)

python plotting multiple bars

I've been trying to do it for several hours and I have a mistake every time. I want to create 3 bar plots in one graph. The y-axis is to be between 0 and 1000.
The end result should be this
Thats my code:
import matplotlib.pyplot as plt
import numpy as np
import csv
df = pd.read_csv('razemKM.csv')
dfn = pd.read_csv('razemNPM.csv')
print(df)
y=[0,1000]
a=(df["srednia"]-df["odchStand"])
a1=df["srednia"]
a2=(df["srednia"]+df["odchStand"])
plt.bar(y,a,width=0.1,color='r')
plt.bar(y,a1,width=0.1,color='g')
plt.bar(y,a2,width=0.1,color='y')
plt.show()
You can use pandas plot function:
df['Sum'] = df["srednia"]+df["odchStand"]
df['Dif'] = df["srednia"]-df["odchStand"]
df.plot.bar(y=['Diff','srednia', 'Sum'],width=0.1)
plt.show()

Seaborn Scatterplot X Values Missing

I have a scatter plot im working with and for some reason im not seeing all the x values on my graph
#%%
from pandas import DataFrame, read_csv
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
file = r"re2.csv"
df = pd.read_csv(file)
#sns.set(rc={'figure.figsize':(11.7,8.27)})
g = sns.FacetGrid(df, col='city')
g.map(plt.scatter, 'type', 'price').add_legend()
This is an image of a small subset of my plots, you can see that Res is displaying, the middle bar should be displaying Con and the last would be Mlt. These are all defined in the type column from my data set but are not displaying.
Any clue how to fix?
Python is doing what you tell it to do. Just pick different features, presumably things that make more sense for plotting, if you want to generate a more interesting plots. See this generic example below.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="darkgrid")
tips = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip", hue="smoker", data=tips);
Personally, I like plotly plots, which are dynamic, more than I like seaborn plots.
https://plotly.com/python/line-and-scatter/

Simple Graph Does Not Represent Data

This is a very straightforward question. I have and x axis of years and a y axis of numbers increasing linearly by 100. When plotting this with pandas and matplotlib I am given a graph that does not represent the data whatsoever. I need some help to figure this out because it is such a small amount of code:
The CSV is as follows:
A,B
2012,100
2013,200
2014,300
2015,400
2016,500
2017,600
2018,700
2012,800
2013,900
2014,1000
2015,1100
2016,1200
2017,1300
2018,1400
The Code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv("CSV/DSNY.csv")
data.set_index("A", inplace=True)
data.plot()
plt.show()
The graph this yields is:
It is clearly very inconsistent with the data - any suggestions?
The default behaviour of matplotlib/pandas is to draw a line between successive data points, and not to mark each data point with a symbol.
Fix: change data.plot() to data.plot(style='o'), or df.plot(marker='o', linewidth=0).
Result:
All you need is sort A before plotting.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv("CSV/DSNY.csv").reset_index()
data = data.sort_values('A')
data.set_index("A", inplace=True)
data.plot()
plt.show()

Smoothing curve for matplotlib.pyplot using pandas or numpy/scipy

I have a series of data which consists of values from several experiments (1-40, in the MWE it is 1-5). The overall amount of entries in my original data is ~4.000.000, which I try to smooth in order to display it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import spline
from statsmodels.nonparametric.smoothers_lowess import lowess
df = pd.DataFrame()
df["values"] = np.random.randint(100000, 200000, 1000)
df["id"] = [1,2,3,4,5] * 200
plt.figure(1, figsize=(11.69,8.27))
# Both fail for my amount of data:
plt.plot(spline(df["values"], df["id"], range(100)), "r-")
plt.plot(lowess(df["values"], df["id"]), "r-")
Both, scipy.interplate and statsmodels.nonparametric.smoothers_lowess.lowess, throw out of memory exceptions for my data. Is there any efficient way to solve this like in, e.g., GNU R using ggplot2 and geom_smooth()?
I can't quite tell what you're getting at with all the dimensions to your data, but one very simple thing you can try is to just use the 'markevery' kwarg like so:
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(1,100,1E7)
y=x**2
plt.figure(1, figsize=(11.69,8.27))
plt.plot(x,y,markevery=100)
plt.show()
This will only plot every nth point (n=100 here).
If that doesn't help then you may want to try just a simple numpy interpolation with fewer samples like so:
x_large=np.linspace(1,100,1E7)
y_large=x**2
x_small=np.linspace(1,100,1E3)
y_small=np.interp(x_small,x_large,y_large)
plt.plot(x_small,y_small)

Categories