'MatplotlibDeprecationWarning' - Warning when trying to plot histogram - python

I'm trying to use matplotlib to plot a histogram but keep running into this error:
MatplotlibDeprecationWarning: The resize_event function was deprecated in Matplotlib 3.6 and will be removed two minor releases later. Use callbacks.process('resize_event', ResizeEvent(...)) instead.
Here's my code; feedback on how I can clean up the logical expressions also welcome.
lower_quartile = df['2020 Population'].quantile(0.25)
mid_quartile = df['2020 Population'].quantile(0.5)
upper_quartile = df['2020 Population'].quantile(0.75)
new_data = df.loc[df['2020 Population'] > lower_quartile]
final_2020_range = new_data.loc[df['2020 Population'] < upper_quartile]
check = final_2020_range['2020 Population']
plt.hist(check)

Seems like you can find your answer here:
https://github.com/matplotlib/matplotlib/issues/23921
In short: it's a bug, and it will be corrected in 3.6.1

I'm facing the same issue,
I found a work-around:
import matplotlib
matplotlib.use('TkAgg')
Error will still be there, but now you can see plots.

Related

Jupyter Notebook graph has very inaccurate scale?

this is my first post and I hope it's okay. My mentor gave me a use case he found online to teach me machine learning on Jupyter. I had a problem with the graphing section, even though I'm sure the code in that part is accurate:
df21.plot(figsize=(20,10), fontsize=12,subplots=True, style=["-","o"], title = "Pump 2 - 100 Values")
plt.show()
The graphs seems to appear as two points or a single straight line, even though the df21 dataset I'm using has 100 lines, and the values are not binary:
Graphs just look like straight lines
Screenshot of the use case
I tried changing format to just plots and found that the points are actually all there, the scale of the axes is just incredibly squished:
Graph with only plots
And I have no idea what to do now, and I haven't been able to find any solutions online. Any advice is appreciated!
After going through the use case you added and trying the code myself i did not find any problem with the plot section. probably there is a problem with you parts before plotting. This is the code from the use case:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/sa-mw-dach/manuela-dev/master/ml-models/anomaly-detection/raw-data.csv')
df['time'] = pd.to_datetime(df['ts'], unit='ms')
df.set_index('time', inplace=True)
df.drop(columns=['ts'], inplace=True)
df21 = df.head(200)
df21.plot(figsize=(20,10), fontsize=12, subplots=True, style=["-","o"], title = "Pump 2 - 100 Values")
plt.show()
And this is the output:
Maybe try a different environment to run your notebook i tried running it on google colab.

Pandas: where is autocorrelation_plot?

I'm trying to plot an autocorrelation_plot() of a time series using pandas.
According to this SO post pandas.tools was removed in 0.24.0 and the autocorrelation_plot function can now be found in the pandas.plotting library. However the API shows no reference to this function.
I'm able to plot an autocorrelation by importing the function but where can I find the documentation?
from pandas.plotting import autocorrelation_plot # works fine
slope = -1
offset = 250
noise_scale = 100
npts = 100
x = np.linspace(0, 100, npts)
y = slope*x + noise_scale*np.random.rand(npts) + offset
autocorrelation_plot(y)
Python: 3.7.2
Pandas: 0.24.1
I think this would probably be more appropriate as an issue in GitHub.
In any case, autocorrelation_plot and the similar plots (andrews_curves, radviz,...) are probably going to be moved out of pandas, into a separate package. So you can expect to have to call something like pandas_matplotlib.autocorrelation_plot() in the future (see #28177).
In the meantime, I'm adding it and some other missing functions to the documentation in #28179. When the pull request is merged, you'll be able to see the docs in https://dev.pandas.io. But there is nothing very interesting for autocorrelation_plot:
Have a look at:
https://github.com/pandas-dev/pandas/blob/v0.24.1/pandas/plotting/_misc.py#L600
Looks like it was buried in the plotting._misc source code.
You can at least find a reference and a short doc here: https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#visualization-autocorrelation
Btw, you can search the docs for any keyword: https://pandas.pydata.org/pandas-docs/stable/search.html?q=autocorrelation_plot&check_keywords=yes&area=default#

nltk dispersion_plot() function not working. Has the line-style "|" been removed from matplotlib?

I am trying to draw lexical dispersion plots using nltk dispersion_plot() function. My code is
from nltk.book import *
text4.dispersion_plot(["freedom","citizens"])
The resulting plot I get is
]1)
After doing some google search and going through the code of dispersion_plot() function (https://www.nltk.org/_modules/nltk/draw/dispersion.html), I found that it uses "b|" as its line style in plot() function. But as per matplotlib documentation there are only four line styles possible {'-', '--', '-.', ':'} (https://matplotlib.org/gallery/lines_bars_and_markers/line_styles_reference.html).
So my doubt is whether line-style "|" was there earlier but has been removed now because of which dispersion_plot() is unable to draw plots or is there some other reason.
And also what is the workaround for this problem?
I had a similar issue with my dispersion_plot (the problem appeared when I ran Jupyter on Google Collaboratory).
This cleared it up:
plt.style.use('default')

How to graph a function in Python using plotnine library

I've been a longtime R user, recently transitioning over to Python, and I've been trying to carry over my knowledge of plotting with ggplot2, since it is so intuitive. Plotnine is supposedly the most ggplot2-esque plotting library, and I've successfully recreated most graphs with it, except critically how to plot regular functions.
In base R, you can easily define an eq., as in so, input the result into a stat_function() layer, and set the limits of the graph in place of the data arg., and successfully plot a parabola or the like. However, the syntax for setting the graph's limits must be different in Python (perhaps using numpy?), and equations are defined using sympy, which is another divergence for me.
So how would I go about plotting functions with plotnine? The above two hurdles are the two differences with ggplot2 that I think are causing me trouble, since plotnine has so few examples online.
P.S. This is an example of what I want to recreate in Python using plotnine:
> library(ggplot2)
> basic_plot <- function(x) x^2 + 2.5
> graph <- ggplot(data.frame(x=c(-5,5)), aes(x=x)) +
+ stat_function(fun = basic_plot)
> graph
You do not need numpy, it works just fine the "standard" way! :)
from plotnine import *
import pandas as pd
(ggplot(pd.DataFrame(data={"x": [-5, 5]}), aes(x="x"))
+ stat_function(fun=lambda x: x**2+2.5))
One of the main differences that caused me problems was the same as posted in the question. Specifically:
in R
aes(x = x) or aes(x)
in plotnine
aes(x = 'x')

LinAlgError: SVD did not converge in Linear Least Squares when trying polyfit

If I try to run the script below I get the error: LinAlgError: SVD did not converge in Linear Least Squares. I have used the exact same script on a similar dataset and there it works. I have tried to search for values in my dataset that Python might interpret as a NaN but I cannot find anything.
My dataset is quite large and impossible to check by hand. (But I think my dataset is fine). I also checked the length of stageheight_masked and discharge_masked but they are the same. Does anyone know why there is an error in my script and what can I do about it?
import numpy as np
import datetime
import matplotlib.dates
import matplotlib.pyplot as plt
from scipy import polyfit, polyval
kwargs = dict(delimiter = '\t',\
skip_header = 0,\
missing_values = 'NaN',\
converters = {0:matplotlib.dates.strpdate2num('%d-%m-%Y %H:%M')},\
dtype = float,\
names = True,\
)
rating_curve_Gillisstraat = np.genfromtxt('G:\Discharge_and_stageheight_Gillisstraat.txt',**kwargs)
discharge = rating_curve_Gillisstraat['discharge'] #change names of columns
stageheight = rating_curve_Gillisstraat['stage'] - 131.258
#mask NaN
discharge_masked = np.ma.masked_array(discharge,mask=np.isnan(discharge)).compressed()
stageheight_masked = np.ma.masked_array(stageheight,mask=np.isnan(discharge)).compressed()
#sort
sort_ind = np.argsort(stageheight_masked)
stageheight_masked = stageheight_masked[sort_ind]
discharge_masked = discharge_masked[sort_ind]
#regression
a1,b1,c1 = polyfit(stageheight_masked, discharge_masked, 2)
discharge_predicted = polyval([a1,b1,c1],stageheight_masked)
print 'regression coefficients'
print (a1,b1,c1)
#create upper and lower uncertainty
upper = discharge_predicted*1.15
lower = discharge_predicted*0.85
#create scatterplot
plt.scatter(stageheight,discharge,color='b',label='Rating curve')
plt.plot(stageheight_masked,discharge_predicted,'r-',label='regression line')
plt.plot(stageheight_masked,upper,'r--',label='15% error')
plt.plot(stageheight_masked,lower,'r--')
plt.axhline(y=1.6,xmin=0,xmax=1,color='black',label='measuring range')
plt.title('Rating curve Catsop')
plt.ylabel('discharge')
plt.ylim(0,2)
plt.xlabel('stageheight[m]')
plt.legend(loc='upper left', title='Legend')
plt.grid(True)
plt.show()
I don't have your data file, but it almost always that case that when you get that error you have NaN's or infinity in your data. Look for both of those using pd.notnull or np.isfinite
As others have pointed out, the problem is likely that there are rows without numericals for the algorithm to work with. This is an issue with most regressions.
That's the problem. The solution then, is to do something about that. And that depends on the data. Often, you can replace the NaNs with 0s, using Pandas .fillna(0) for example. Sometimes, you might have to interpolate missing values, and Pandas .interpolate() is probably the simplest solution to that as well. Or, when it's not a time series, you might be able to simply drop the rows with NaNs in them, using for example Pandas .dropna() method. Or, sometimes it's not about the NaNs, but about the infs or others, and then there are other solutions for that: https://stackoverflow.com/a/55293137/12213843
Exactly which way to go about it, is up to the data. And it's up to you to interpret the data. And domain knowledge goes a long way to interpret the data well.
As ski_squaw mentions the error is most of the time due to NaN's, however for me this error came after a windows update. I was using numpy version 1.16. Moving my numpy version to 1.19.3 solved the issue. (run pip install numpy==1.19.3 --user in the cmd)
This gitHub issue explains it more:
https://github.com/numpy/numpy/issues/16744
Numpy 1.19.3 doesn't work on Linux and 1.19.4 doesn't work on Windows.
I developed a code on windows 8.
So now I'm using windows 10 and the problem popped up!
It was resolved as #Joris said.
pip install numpy==1.19.3
my example after fix:
def calculating_slope(x):
x = x.replace(np.inf, np.nan).replace(-np.inf, np.nan).dropna()
if len(x)>1:
slope = np.polyfit(range(len(x)), x, 1)[0]
else:
slope = 0
return slope

Categories