Using Polyfit to create 3rd degree polynomial with dynamic CSV's - python

What the data look like although I may end up with more than just three columns:
TotalArea,Pressure,Intensity
12054.2,-0.067,39.579
11980.2,-0.061,41.011
11948,-0.055,42.08
11889.5,-0.04,45.732
11863.6,-0.03,50.573
My goal: I would like to take this CSV file and create A polynomial that will fit the column labeled Intensity and TotalArea.
My code (omitting anything I believe to be purely decorative):
Graph = pd.read_csv("C:Data.csv")
Pl = Graph.dropna()
Bottom = Pl["TotalArea"]
Right = Pl["Intensity"]
arr = Pl.values
x = Bottom
y2 = Right
fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
xx = arr[:, [0]]
b = xx.ravel()
print(b)
yy = arr[:, [2]]
c = xx.ravel()
y3 = np.polyfit(b, c, 3)
ax2 = ax1.twinx()
ax2.plot(x, y2, color = "r", label='Intensity /Area')
plt.show()
My error: (has to do with polyfit values)
Traceback (most recent call last):
File "/mnt/WinPartition/Users/tomchi/Documents/Programming/Eclipse/PythonDevFiles/so_test.py", line 47, in <module>
ax2.plot(x, y3)
File "/usr/local/lib/python3.5/dist-packages/matplotlib/__init__.py", line 1855, in inner
return func(ax, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/matplotlib/axes/_axes.py", line 1527, in plot
for line in self._get_lines(*args, **kwargs):
File "/usr/local/lib/python3.5/dist-packages/matplotlib/axes/_base.py", line 406, in _grab_next_args
for seg in self._plot_args(this, kwargs):
File "/usr/local/lib/python3.5/dist-packages/matplotlib/axes/_base.py", line 383, in _plot_args
x, y = self._xy_from_xy(x, y)
File "/usr/local/lib/python3.5/dist-packages/matplotlib/axes/_base.py", line 242, in _xy_from_xy
"have shapes {} and {}".format(x.shape, y.shape))
ValueError: x and y must have same first dimension, but have shapes (310,) and (2,)
So, my question is: What exactly does this mean? Is it due to pandas dataframe? Can I solve this in a quick manner? Can I be of any more assistance?
I realise now that polyfit just gives me the coefficients to my polynomial
[ -2.27230868e-23 2.74362531e-19 1.00000000e+00 -1.90568829e-12]

Related

matplotlib triplot and tricontourf

I'm attempting to plot a 2D dataset having unstructured coordinates in matplotlib using tricontourf. I'm able to generate a plot of the 'mesh' with triplot, however when I use the same Triangulation object for tricontourf, I get an error (see below). What am I missing? Example:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
def lower(x):
return 2 + 1*x
def upper(x):
return 60 + 4*x
ni = 10
nj = 12
x = np.linspace(1,15,ni)
## make a trapezoid
xy = np.zeros((ni,nj,2),dtype=np.float32)
for i in range(len(x)):
y = np.linspace(lower(x[i]),upper(x[i]),nj)
xy[i,:,0] = x[i]
xy[i,:,1] = y
## add noise
xy += -0.1 + 0.2*np.random.rand(ni,nj,2)
## make tris 'indices list'
xi, yi = np.meshgrid(range(ni), range(nj), indexing='xy')
inds_list = np.stack((xi,yi), axis=2)
inds_list = np.reshape(inds_list, (ni*nj,2), order='C')
inds_list = np.ravel_multi_index((inds_list[:,0],inds_list[:,1]), (ni,nj), order='C')
inds_list = np.reshape(inds_list, (ni,nj), order='F')
tris = np.zeros((2*(ni-1)*(nj-1),3), dtype=np.int64)
ci=0
for i in range(ni-1):
for j in range(nj-1):
tris[ci,0] = inds_list[i+1, j+1]
tris[ci,1] = inds_list[i, j+1]
tris[ci,2] = inds_list[i, j ]
ci+=1
tris[ci,0] = inds_list[i, j ]
tris[ci,1] = inds_list[i+1, j ]
tris[ci,2] = inds_list[i+1, j+1]
ci+=1
triangulation = mpl.tri.Triangulation(x=xy[:,:,0].ravel(), y=xy[:,:,1].ravel(), triangles=tris)
fig1 = plt.figure(figsize=(4, 4/(16/9)), dpi=300)
ax1 = plt.gca()
ax1.triplot(triangulation, lw=0.5)
#ax1.tricontourf(triangulation)
fig1.tight_layout(pad=0.25)
plt.show()
...produces
however, uncommenting the line with ax1.tricontourf
throws the error:
Traceback (most recent call last):
File "test.py", line 54, in <module>
ax1.tricontourf(triangulation)
File "C:\Users\steve\AppData\Roaming\Python\Python38\site-packages\matplotlib\tri\tricontour.py", line 307, in tricontourf
return TriContourSet(ax, *args, **kwargs)
File "C:\Users\steve\AppData\Roaming\Python\Python38\site-packages\matplotlib\tri\tricontour.py", line 29, in __init__
super().__init__(ax, *args, **kwargs)
File "C:\Users\steve\AppData\Roaming\Python\Python38\site-packages\matplotlib\contour.py", line 812, in __init__
kwargs = self._process_args(*args, **kwargs)
File "C:\Users\steve\AppData\Roaming\Python\Python38\site-packages\matplotlib\tri\tricontour.py", line 45, in _process_args
tri, z = self._contour_args(args, kwargs)
File "C:\Users\steve\AppData\Roaming\Python\Python38\site-packages\matplotlib\tri\tricontour.py", line 60, in _contour_args
z = np.ma.asarray(args[0])
IndexError: list index out of range
I am using:
Python version: 3.8.9
matplotlib version: 3.5.1
I would say you need to provide the array of values to contour, e.g.:
x= xy[:,:,0].ravel()
z= np.random.rand(x.shape[0])
ax1.tricontourf(triangulation, z)

Plotting a line between a pair of y coordinates

I am trying to plot the following code where data1, data2, data3 are vectors.
data1 = np.array(means1)
print('data1=',data1)
data2 = np.array(ci_l)
print('data2',data2)
data3 = np.array(ci_h)
print('data3',data3)
x = data1
y = np.concatenate([data2[:,None],data3[:,None]], axis=1)
print('x=', x,'y=',y)
plt.plot(x, [i for (i,j) in y], 'rs', markersize = 4)
plt.plot(x, [j for (i,j) in y], 'bo', markersize = 4)
plt.show()
For each x points as you see in the code I have two y points. When I run the code I obtain the following output:
data1= [[22.8]
[31.6]
[27.4]
[30.4]
[30.6]]
data2 [[21.80474319]
[30.60474319]
[26.40474319]
[29.40474319]
[29.60474319]]
data3 [[23.79525681]
[32.59525681]
[28.39525681]
[31.39525681]
[31.59525681]]
x= [[22.8]
[31.6]
[27.4]
[30.4]
[30.6]] y= [[[21.80474319]
[23.79525681]]
[[30.60474319]
[32.59525681]]
[[26.40474319]
[28.39525681]]
[[29.40474319]
[31.39525681]]
[[29.60474319]
[31.59525681]]]
and this figure:
My question is how to plot a line that connect each y pair? My problem is similar to this:
< Matplotlib how to draw vertical line between two Y points >
I try to add the following line as suggested to the code:
plt.plot((x,x),([i for (i,j) in y], [j for (i,j) in y]),c='black')
but I obtain the following error:
Traceback (most recent call last):
File "/Users/oltiana/Desktop/datamining/chapter4.py", line 151, in <module>
plt.plot((x,x),([i for (i,j) in y], [j for (i,j) in y]),c='black')
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/matplotlib/pyplot.py", line 3019, in plot
return gca().plot(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 1605, in plot
lines = [*self._get_lines(*args, data=data, **kwargs)]
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/matplotlib/axes/_base.py", line 315, in __call__
yield from self._plot_args(this, kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/matplotlib/axes/_base.py", line 504, in _plot_args
raise ValueError(f"x and y can be no greater than 2D, but have "
ValueError: x and y can be no greater than 2D, but have shapes (2, 5, 1) and (2, 5, 1)
I try to solve the problem using shape and reshape bout still does not work. Any suggestion would be helpful for me. Thank you!
I notice that your arrays of data all look like two-dimensional lists - each number is the only element in a list of its own! ([[22.8], [31.6], ...] instead of [22.8, 31.6, ...])
That's why you're getting the shape error. There are a few ways to fix this, but one easy way is to call .flatten() on each array. This reduces it to be one-dimensional, and your code will work fine with data like that.
data1 = np.array(means1).flatten()
data2 = np.array(ci_l).flatten()
data3 = np.array(ci_h).flatten()
...
try writing
for x1,y1y2 in zip(x,y):
plt.plot([x1,x1],y1y2,'k-') #'k-' to prevent automatic coloring
BEFORE
plt.plot(x, [i for (i,j) in y], 'ro', markersize = 4)
plt.plot(x, [j for (i,j) in y], 'bs', markersize = 4)
This will make a two-point plot for each pair of points.
This will work visually, but may mess up automatic legends

Linear Regression : ValueError: x and y must have same first dimension, but have shapes (10, 1) and (1, 1)

Iam still newbie at pyhton, and i try to show the linear regression graph but got some error
this is the code :
linear_regression = lm.LinearRegression()
linear_x = data.Luas.values.reshape(-1,1)
linear_y = data.Harga.values.reshape(-1,1)
linear_regression.fit(linear_x,linear_y)
print("Intercept = ",linear_regression.intercept_)
print("Coefisien = ",linear_regression.coef_)
print("Persamaan menggunakan fungsi Linear Regression :")
print("Y=",linear_regression.intercept_,"+",linear_regression.coef_,"X")
print("Prediksi Luas Tanah (X) = 1800")
print("Maka:")
result = linear_regression.predict([[1800]])
print("Harga Tanah(Y) = ",result,"jt")
plt.scatter(linear_x,linear_y,color='black')
plt.plot(linear_x,linear_regression.predict([[1800]]),color='blue')
plt.title('Luas Tanah/Area VS Harga/Price')
plt.ylabel('Harga Tanah/Price (jt)')
plt.xlabel('Luas Tanah/Area')
plt.show()
the error :
Traceback (most recent call last):
File "D:/Tugas/smt6/data mining/tugasKlasifikasi/klasifikasi.py", line 55, in <module>
plt.plot(linear_x,linear_regression.predict([[1800]]),color='blue')
File "C:\Users\Thor\Anaconda3\envs\coba\lib\site-packages\matplotlib\pyplot.py", line 2796, in
plot is not None else {}), **kwargs)
File "C:\Users\Thor\Anaconda3\envs\coba\lib\site-packages\matplotlib\axes\_axes.py", line 1665,
in plot lines = [*self._get_lines(*args, data=data, **kwargs)]
File "C:\Users\Thor\Anaconda3\envs\coba\lib\site-packages\matplotlib\axes\_base.py", line 225,
in __call__yield from self._plot_args(this, kwargs)
File "C:\Users\Thor\Anaconda3\envs\coba\lib\site-packages\matplotlib\axes\_base.py", line 391,
in _plot_args x, y = self._xy_from_xy(x, y)
File "C:\Users\Thor\Anaconda3\envs\coba\lib\site-packages\matplotlib\axes\_base.py", line 270,
in _xy_from_xy "have shapes {} and {}".format(x.shape, y.shape))
ValueError: x and y must have same first dimension, but have shapes (10, 1) and (1, 1)
Thank you very much for helping!
Your problem is not connected with linear regression. It appears when you want to plot:
plt.plot(linear_x,linear_regression.predict([[1800]]),color='blue')
And the issue is that linear_x has shape (10, 1) and your prediction has shape (1, 1). So you can not do that. They must provide the same first shape.

python matplotlib scatter plot colors error

I am trying to create a scatter plot with x and y grid where every point gets a color by a preassigned value:
{x: 1, y: 2, value: n}
I have a list of x and y and another list for the values, tried using this:
# make range of x(0 - 359) and y(-90 - 90)
x, y = np.meshgrid(range(0, 360), range(-90, 90))
colors = [a very long list (64800 values, one for each point)]
print(colors)
plt.scatter(x, y, c=colors)
plt.colorbar()
plt.show()
Errors:
Traceback (most recent call last):
File "C:\python3.6.6\lib\site-packages\matplotlib\colors.py", line 158, in to_rgba
rgba = _colors_full_map.cache[c, alpha]
KeyError: (1.0986122886681098, None)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\python3.6.6\lib\site-packages\matplotlib\axes\_axes.py", line 4210, in scatter
colors = mcolors.to_rgba_array(c)
File "C:\python3.6.6\lib\site-packages\matplotlib\colors.py", line 259, in to_rgba_array
result[i] = to_rgba(cc, alpha)
File "C:\python3.6.6\lib\site-packages\matplotlib\colors.py", line 160, in to_rgba
rgba = _to_rgba_no_colorcycle(c, alpha)
File "C:\python3.6.6\lib\site-packages\matplotlib\colors.py", line 211, in _to_rgba_no_colorcycle
raise ValueError("Invalid RGBA argument: {!r}".format(orig_c))
ValueError: Invalid RGBA argument: 1.0986122886681098
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/amit neumark/Documents/עמית/alpha/grbs data/grbs/find_burst_location.py", line 168, in <module>
main()
File "C:/Users/amit neumark/Documents/עמית/alpha/grbs data/grbs/find_burst_location.py", line 161, in main
ra2, dec2 = chi_square(model, relations)
File "C:/Users/amit neumark/Documents/עמית/alpha/grbs data/grbs/find_burst_location.py", line 33, in chi_square
create_plot(sums)
File "C:/Users/amit neumark/Documents/עמית/alpha/grbs data/grbs/find_burst_location.py", line 134, in create_plot
plt.scatter(x, y, c=colors)
File "C:\python3.6.6\lib\site-packages\matplotlib\pyplot.py", line 2793, in scatter
verts=verts, edgecolors=edgecolors, data=data, **kwargs)
File "C:\python3.6.6\lib\site-packages\matplotlib\__init__.py", line 1785, in inner
return func(ax, *args, **kwargs)
File "C:\python3.6.6\lib\site-packages\matplotlib\axes\_axes.py", line 4223, in scatter
.format(nc=n_elem, xs=x.size, ys=y.size)
ValueError: 'c' argument has 64800 elements, which is not acceptable for use with 'x' with size 64800, 'y' with size 64800.
The problem is in your x and y data and not in the colors c parameter. Your x and y is currently a 2d array (meshgrid). It should be a list of positions. One way to do so is to flatten your 2d meshgrids to get a 1-d array. The one to one correspondence between x and y data points will be maintained. The meshgrids work normally for scatter 3d plots.
I am choosing some random colors to provide a solution.
x, y = np.meshgrid(range(0, 360), range(-90, 90))
colors = np.random.random(360*180)
plt.scatter(x.flatten(), y.flatten(), c=colors)
plt.colorbar()
It might make more sense to plot using something like imshow or pcolormesh. This creates a "heatmap" across a grid of x,y coordinates. The x,y meshgrid is optional for these functions.
colors = np.arange(64800)
plt.pcolormesh(colors.reshape(360, 180).T)
# OR #
x, y = np.meshgrid(range(0, 360), range(-90, 90))
plt.pcolormesh(x, y, colors.reshape(360, 180).T)
You should pay attention to how you reshape colors. You can fill either by rows or by columns. The default is by rows (last axis). This is also important to note in the other answer as you flatten your meshgrid.

ValueError when using lmfit LognormalModel

I have been using lmfit for about a day now and needless to say I know very little about the library. I have been using several built-in models for curve fitting and all of them work flawlessly with the data except the Lognormal Model.
Here is my code:
from numpy import *
from lmfit.models import LognormalModel
import pandas as pd
import scipy.integrate as integrate
import matplotlib.pyplot as plt
data = pd.read_csv('./data.csv', delimiter = ",")
x = data.ix[:, 0]
y = data.ix[:, 1]
print (x)
print (y)
mod = LognormalModel()
pars = mod.guess(y, x=x)
out = mod.fit(y, pars , x=x)
print(out.best_values)
print(out.fit_report(min_correl=0.25))
out.plot()
plt.plot(x, y, 'bo')
plt.plot(x, out.init_fit, 'k--')
plt.plot(x, out.best_fit, 'r-')
plt.show()
and the error output is:
Traceback (most recent call last):
File "Cs_curve_fit.py", line 17, in <module>
pvout = pvmod.fit(y, amplitude= 1, center = 1, sigma =1 , x=x)
File "C:\Users\NAME\Anaconda3\lib\site-packages\lmfit\model.py", line 731, in fit
output.fit(data=data, weights=weights)
File "C:\Users\NAME\Anaconda3\lib\site-packages\lmfit\model.py", line 944, in fit
self.init_fit = self.model.eval(params=self.params, **self.userkws)
File "C:\Users\NAME\Anaconda3\lib\site-packages\lmfit\model.py", line 569, in eval
return self.func(**self.make_funcargs(params, kwargs))
File "C:\Users\NAME\Anaconda3\lib\site-packages\lmfit\lineshapes.py", line 162, in lognormal
x[where(x <= 1.e-19)] = 1.e-19
File "C:\Users\NAME\Anaconda3\lib\site-packages\pandas\core\series.py", line 773, in __setitem__
setitem(key, value)
File "C:\Users\NAME\Anaconda3\lib\site-packages\pandas\core\series.py", line 755, in setitem
raise ValueError("Can only tuple-index with a MultiIndex")
ValueError: Can only tuple-index with a MultiIndex
First, the error message you show cannot have come from the code you post. The error message says that line 17 of the file "Cs_curve_fit.py" reads
pvout = pvmod.fit(y, amplitude= 1, center = 1, sigma =1 , x=x)
but that is not anywhere in your code. Please post the actual code and the actual output.
Second, the problem appears to happening because the data for x is cannot be turned into a 1D numpy array. Not being able to trust your code or output, I would just suggest converting the data to 1D numpy arrays yourself as a first test. Lmfit should be able to handle Pandas series, but it just does a simple coercion to 1D numpy arrays.

Categories