Linear regression (Plotting a regression line)

Linear regression (Plotting a regression line) - python

I am trying to plot a regression line for my assignment as shown below.
# Basic Libraries
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
sb.set()
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
#Structure of data that I am working with
hsedata = pd.read_csv('train.csv')
salePrice = pd.DataFrame(hsedata['SalePrice'])
grLivArea = pd.DataFrame(hsedata['GrLivArea'])
SPtrain, SPtest, LAtrain, LAtest = train_test_split(salePrice, grLivArea, test_size = 0.2)
SPtrain Output
LAtrain Output
#Linear Regression
linreg = LinearRegression()
linreg.fit(LAtrain, SPtrain)
print('Intercept: ', linreg.intercept_)
print('Gradient: ', linreg.coef_)
Intercept: [22485.24894138]
Gradient: [[103.92233101]]
salePricePredict = pd.DataFrame(linreg.predict(LAtrain))
print(salePricePredict.shape)
f = plt.figure(figsize=(16,8))
plt.scatter(LAtrain, SPtrain)
plt.scatter(LAtrain, salePricePredict, color = 'r')
plt.show()
Scatterplot regression line
Up to this point, I managed to print out the regression line with plt.scatter() with no issue as shown above. However, I want to print out a straight line instead with the following code below:
salePricePredict = pd.DataFrame(linreg.predict(LAtrain))
f = plt.figure(figsize=(16,8))
plt.scatter(LAtrain, SPtrain)
plt.plot(LAtrain, salePricePredict, color = 'r')
plt.show()
But this produces error type:
TypeError Traceback (most recent call last)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~\anaconda3\lib\site-packages\pandas\_libs\index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()
File ~\anaconda3\lib\site-packages\pandas\_libs\index.pyx:142, in pandas._libs.index.IndexEngine.get_loc()
TypeError: '(slice(None, None, None), None)' is an invalid key
During handling of the above exception, another exception occurred:
InvalidIndexError Traceback (most recent call last)
Input In [24], in <cell line: 4>()
2 f = plt.figure(figsize=(16,8))
3 plt.scatter(LAtrain, SPtrain)
----> 4 plt.plot(LAtrain, salePricePredict, color = 'r')
5 plt.show()
File ~\anaconda3\lib\site-packages\matplotlib\pyplot.py:2757, in plot(scalex, scaley, data, *args, **kwargs)
2755 #_copy_docstring_and_deprecators(Axes.plot)
2756 def plot(*args, scalex=True, scaley=True, data=None, **kwargs):
-> 2757 return gca().plot(
2758 *args, scalex=scalex, scaley=scaley,
2759 **({"data": data} if data is not None else {}), **kwargs)
File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:1632, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
1390 """
1391 Plot y versus x as lines and/or markers.
1392
(...)
1629 (``'green'``) or hex strings (``'#008000'``).
1630 """
1631 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1632 lines = [*self._get_lines(*args, data=data, **kwargs)]
1633 for line in lines:
1634 self.add_line(line)
File ~\anaconda3\lib\site-packages\matplotlib\axes\_base.py:312, in _process_plot_var_args.__call__(self, data, *args, **kwargs)
310 this += args[0],
311 args = args[1:]
--> 312 yield from self._plot_args(this, kwargs)
File ~\anaconda3\lib\site-packages\matplotlib\axes\_base.py:487, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs)
484 kw[prop_name] = val
486 if len(xy) == 2:
--> 487 x = _check_1d(xy[0])
488 y = _check_1d(xy[1])
489 else:
File ~\anaconda3\lib\site-packages\matplotlib\cbook\__init__.py:1327, in _check_1d(x)
1321 with warnings.catch_warnings(record=True) as w:
1322 warnings.filterwarnings(
1323 "always",
1324 category=Warning,
1325 message='Support for multi-dimensional indexing')
-> 1327 ndim = x[:, None].ndim
1328 # we have definitely hit a pandas index or series object
1329 # cast to a numpy array.
1330 if len(w) > 0:
File ~\anaconda3\lib\site-packages\pandas\core\frame.py:3505, in DataFrame.__getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3628, in Index.get_loc(self, key, method, tolerance)
3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
-> 3628 self._check_indexing_error(key)
3629 raise
3631 # GH#42269
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:5637, in Index._check_indexing_error(self, key)
5633 def _check_indexing_error(self, key):
5634 if not is_scalar(key):
5635 # if key is not a scalar, directly raise an error (the code below
5636 # would convert to numpy arrays and raise later any way) - GH29926
-> 5637 raise InvalidIndexError(key)
InvalidIndexError: (slice(None, None, None), None)
However if I add in .squeeze() to both parameters for .plot as shown below, it works as intended:
salePricePredict = pd.DataFrame(linreg.predict(LAtrain))
f = plt.figure(figsize=(16,8))
plt.scatter(LAtrain, SPtrain)
plt.plot(LAtrain.squeeze(), salePricePredict.squeeze(), color = 'r')
plt.show()
With .squeeze() added
I was wondering why is this the case and what does .squeeze() do to my input datatype? I've tried reading some documentation and explanation as to what it does but I got nowhere with it.
I've also tried comparing the datatypes of what I was working with by using .shape, but .squeeze() does not seem to change anything. Any explanation is greatly appreciated!

What numpy squeeze does is removing the dimensions of size 1 from your array.
The output of your prediction is a two dimensional array, and matplotlib .plot() method can't plot a vector against a two dimensional array.
Try doing the following:
X = np.array([[1,2,3]])
print(X.shape)
print(X.squeeze().shape)
Check how X has only one dimension after using .squeeze(); it has become a vector.
You can check the same by reading the code below:
a = np.array([[1,2,3]])
b = np.array([1, 2, 3])
plt.plot(a.squeeze(), b) # This works
plt.plot(a,b) # This throws a ValueError

Related

Issue with Trackpy - InvalidIndexError when fitting EMSD to a power law from the tutorial

possibly a silly problem, but I'm stuck and I'd really appreciate someone's help. I'm trying to replicate the examples given in the walkthrough tutorial for trackpy (https://soft-matter.github.io/trackpy/dev/tutorial/walkthrough.html). I've been succesful up until I got to the point where I had to fit the ensemble mean-squared displacement to a power law (In [34]):
plt.figure()
plt.ylabel(r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]')
plt.xlabel('lag time $t$');
tp.utils.fit_powerlaw(em) # performs linear best fit in log space, plots]
This is the error message I get. Can someone suggest a solution?
Thanks!
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx:142, in pandas._libs.index.IndexEngine.get_loc()
TypeError: '(slice(None, None, None), None)' is an invalid key
During handling of the above exception, another exception occurred:
InvalidIndexError Traceback (most recent call last)
Input In [178], in <cell line: 4>()
2 plt.ylabel(r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]')
3 plt.xlabel('lag time $t$');
----> 4 tp.utils.fit_powerlaw(em)
File ~/opt/anaconda3/lib/python3.9/site-packages/trackpy/utils.py:50, in fit_powerlaw(data, plot, **kwargs)
48 if plot:
49 from trackpy import plots
---> 50 plots.fit(data, fits, logx=True, logy=True, legend=False, **kwargs)
51 return values
File ~/opt/anaconda3/lib/python3.9/site-packages/trackpy/plots.py:50, in make_axes.<locals>.wrapper(*args, **kwargs)
47 # Delete legend keyword so remaining ones can be passed to plot().
48 legend = kwargs.pop('legend', False)
---> 50 result = func(*args, **kwargs)
52 if legend:
53 handles, labels = kwargs['ax'].get_legend_handles_labels()
File ~/opt/anaconda3/lib/python3.9/site-packages/trackpy/plots.py:652, in fit(data, fits, inverted_model, logx, logy, ax, **kwargs)
650 ax.set_yscale('log')
651 if not inverted_model:
--> 652 fitlines = ax.plot(fits.index, fits, **kwargs)
653 else:
654 fitlines = ax.plot(fits.reindex(data.dropna().index),
655 data.dropna(), **kwargs)
File ~/opt/anaconda3/lib/python3.9/site-packages/matplotlib/axes/_axes.py:1632, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
1390 """
1391 Plot y versus x as lines and/or markers.
1392
(...)
1629 (``'green'``) or hex strings (``'#008000'``).
1630 """
1631 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1632 lines = [*self._get_lines(*args, data=data, **kwargs)]
1633 for line in lines:
1634 self.add_line(line)
File ~/opt/anaconda3/lib/python3.9/site-packages/matplotlib/axes/_base.py:312, in _process_plot_var_args.__call__(self, data, *args, **kwargs)
310 this += args[0],
311 args = args[1:]
--> 312 yield from self._plot_args(this, kwargs)
File ~/opt/anaconda3/lib/python3.9/site-packages/matplotlib/axes/_base.py:488, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs)
486 if len(xy) == 2:
487 x = _check_1d(xy[0])
--> 488 y = _check_1d(xy[1])
489 else:
490 x, y = index_of(xy[-1])
File ~/opt/anaconda3/lib/python3.9/site-packages/matplotlib/cbook/__init__.py:1327, in _check_1d(x)
1321 with warnings.catch_warnings(record=True) as w:
1322 warnings.filterwarnings(
1323 "always",
1324 category=Warning,
1325 message='Support for multi-dimensional indexing')
-> 1327 ndim = x[:, None].ndim
1328 # we have definitely hit a pandas index or series object
1329 # cast to a numpy array.
1330 if len(w) > 0:
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py:3505, in DataFrame.__getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:3628, in Index.get_loc(self, key, method, tolerance)
3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
-> 3628 self._check_indexing_error(key)
3629 raise
3631 # GH#42269
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:5637, in Index._check_indexing_error(self, key)
5633 def _check_indexing_error(self, key):
5634 if not is_scalar(key):
5635 # if key is not a scalar, directly raise an error (the code below
5636 # would convert to numpy arrays and raise later any way) - GH29926
-> 5637 raise InvalidIndexError(key)
InvalidIndexError: (slice(None, None, None), None)

Making a sns.pairplot using scikit wine dataset

This seems simple enough, but I can't find a solution online.
I am trying to create an sns.pairplot in Python. I have downloaded the wine dataset, kept the features that I need, and run the plot.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_wine
# Load the wine dataset
wine = datasets.load_wine()
wine = list(zip(wine.data, wine.target))
wine = load_wine()
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
wine = load_wine
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)
#This is the code that should run the plot
b=sns.pairplot(df, vars = df.columns[1 :], hue = "target", height = 2.5)
But I get this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'target'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-108-1107acc27949> in <module>
----> 1 b=sns.pairplot(df, vars = df.columns[1 :], hue = "target", height = 2.5)
2
3 plt.show()
~\anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
47 return inner_f
48
~\anaconda3\lib\site-packages\seaborn\axisgrid.py in pairplot(data, hue, hue_order, palette, vars, x_vars, y_vars, kind, diag_kind, markers, height, aspect, corner, dropna, plot_kws, diag_kws, grid_kws, size)
1923 # Set up the PairGrid
1924 grid_kws.setdefault("diag_sharey", diag_kind == "hist")
-> 1925 grid = PairGrid(data, vars=vars, x_vars=x_vars, y_vars=y_vars, hue=hue,
1926 hue_order=hue_order, palette=palette, corner=corner,
1927 height=height, aspect=aspect, dropna=dropna, **grid_kws)
~\anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
47 return inner_f
48
~\anaconda3\lib\site-packages\seaborn\axisgrid.py in __init__(self, data, hue, hue_order, palette, hue_kws, vars, x_vars, y_vars, corner, diag_sharey, height, aspect, layout_pad, despine, dropna, size)
1212 index=data.index)
1213 else:
-> 1214 hue_names = categorical_order(data[hue], hue_order)
1215 if dropna:
1216 # Filter NA from the list of unique hue names
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 'target'
The solution linked to this question: How to convert a Scikit-learn dataset to a Pandas dataset unfortunately doesn't seem to work here.
I also tried 'class' instead of target. Could it be that the 'zip' function isn't working correctly above, so the program can't identify 'target'?
Thank you in advance!

From what you typed it works like this.
from sklearn.datasets import load_iris
wine = load_wine
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)
#This is the code that should run the plot
b=sns.pairplot(df, vars = df.columns[1 :], height = 2.5)
The question is how do you want to highlight features and why?
You cut alcohol from the list so the target simply won't be aligned.
Second thing is that it's feature wise pairplot not target/class.
So all in all I don't understand what you are trying to do here

Irrelevant columns breaking dask groupby vars, but not means (and not in pandas)

Below is the MCVE of the behavior of an up-to-date dask instance on my system, and contrasts the dask error with pandas using the small CSV quoted at the end of this question.
Setup:
import numpy as np
import dask.dataframe as dd
df = dd.read_csv('hmda_lar_head_4var.csv',
engine='c',
usecols=['tract_to_msamd_income','as_of_year','agency_abbr','action_taken_name'],
dtype={'tract_to_msamd_income': np.float64,
'as_of_year':np.uint16,
'agency_abbr':'category',
'action_taken_name':'category'
})
The line that breaks: df.groupby('as_of_year').var().compute()
The error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname)
136 try:
--> 137 yield
138 except Exception as e:
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/dataframe/core.py in _emulate(func, *args, **kwargs)
3058 with raise_on_meta_error(funcname(func)):
-> 3059 return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
3060
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/compatibility.py in apply(func, args, kwargs)
46 if kwargs:
---> 47 return func(*args, **kwargs)
48 else:
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/dataframe/groupby.py in _var_agg(g, levels, ddof)
210 result /= div
--> 211 result[(n - ddof) == 0] = np.nan
212 assert isinstance(result, pd.DataFrame)
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
2425 elif isinstance(key, DataFrame):
-> 2426 self._setitem_frame(key, value)
2427 else:
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/pandas/core/frame.py in _setitem_frame(self, key, value)
2463 self._check_setitem_copy()
-> 2464 self._where(-key, value, inplace=True)
2465
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/pandas/core/generic.py in _where(self, cond, other, inplace, axis, level, try_cast, raise_on_error)
4952 if not is_bool_dtype(dt):
-> 4953 raise ValueError(msg.format(dtype=dt))
4954
ValueError: Boolean array expected for the condition, not float64
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-154-4d73414200e5> in <module>()
----> 1 df.groupby('as_of_year').var().compute()
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/dataframe/groupby.py in var(self, ddof, split_every, split_out)
796 combine_kwargs={'levels': levels},
797 split_every=split_every, split_out=split_out,
--> 798 split_out_setup=split_out_on_index)
799
800 if isinstance(self.obj, Series):
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/dataframe/core.py in apply_concat_apply(args, chunk, aggregate, combine, meta, token, chunk_kwargs, aggregate_kwargs, combine_kwargs, split_every, split_out, split_out_setup, split_out_setup_kwargs, **kwargs)
3010 meta_chunk = _emulate(apply, chunk, args, chunk_kwargs)
3011 meta = _emulate(apply, aggregate, [_concat([meta_chunk])],
-> 3012 aggregate_kwargs)
3013 meta = make_meta(meta)
3014
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/dataframe/core.py in _emulate(func, *args, **kwargs)
3057 """
3058 with raise_on_meta_error(funcname(func)):
-> 3059 return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
3060
3061
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/contextlib.py in __exit__(self, type, value, traceback)
75 value = type()
76 try:
---> 77 self.gen.throw(type, value, traceback)
78 raise RuntimeError("generator didn't stop after throw()")
79 except StopIteration as exc:
/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname)
148 ).format(" in `{0}`".format(funcname) if funcname else "",
149 repr(e), tb)
--> 150 raise ValueError(msg)
151
152
ValueError: Metadata inference failed in `apply`.
Original error is below:
------------------------
ValueError('Boolean array expected for the condition, not float64',)
Traceback:
---------
File "/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/dataframe/utils.py", line 137, in raise_on_meta_error
yield
File "/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/dataframe/core.py", line 3059, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/compatibility.py", line 47, in apply
return func(*args, **kwargs)
File "/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/dask/dataframe/groupby.py", line 211, in _var_agg
result[(n - ddof) == 0] = np.nan
File "/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/pandas/core/frame.py", line 2426, in __setitem__
self._setitem_frame(key, value)
File "/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/pandas/core/frame.py", line 2464, in _setitem_frame
self._where(-key, value, inplace=True)
File "/Users/laszlo.sandor/miniconda3/envs/idp/lib/python3.5/site-packages/pandas/core/generic.py", line 4953, in _where
raise ValueError(msg.format(dtype=dt))
Compare this with the behavior of the line that produces correct results: df.groupby('as_of_year').mean().compute()
If you set things up in supposedly similar pandas:
import pandas as pd
df = pd.read_csv('hmda_lar_head_4var.csv',
engine='c',
usecols=['tract_to_msamd_income','as_of_year','agency_abbr','action_taken_name'],
dtype={'tract_to_msamd_income': np.float64,
'as_of_year':np.uint16,
'agency_abbr':'category',
'action_taken_name':'category'
})
You see that both df.groupby('as_of_year').var() and df.groupby('as_of_year').mean() produce the correct results.
If you load only two columns, one you are grouping by and another that is meaningful to report the variance for, dask has no problem reporting the variance:
import numpy as np
import dask.dataframe as dd
df = dd.read_csv('hmda_lar_head_4var.csv',
engine='c',
usecols=['tract_to_msamd_income','as_of_year'],
dtype={'tract_to_msamd_income': np.float64,
'as_of_year':np.uint16
})
This is small comfort, as (I think) you cannot easily specify which columns to calculate groupby methods for, only select ones to report. I.e. some columns can break dask groupby var even if you ask to report only other columns.
Here is the CSV for MCVE:
tract_to_msamd_income,as_of_year,agency_abbr,action_taken_name
"85.02999877929688","2007","FRS","Loan originated"
"103.12000274658203","2007","FRS","Application withdrawn by applicant"
"127.87000274658203","2007","FRS","Loan originated"
"103.12000274658203","2007","FRS","Application denied by financial institution"
"131.14999389648438","2007","FRS","Loan originated"
"85.02999877929688","2007","FRS","Application withdrawn by applicant"
"103.12000274658203","2007","FRS","Application withdrawn by applicant"
"95.76000213623047","2007","FRS","Loan originated"
"103.12000274658203","2007","FRS","Application withdrawn by applicant"

ValueError: invalid literal for float(): 17/08/2015

I'm getting this error "ValueError: invalid literal for float(): 17/08/2015". This is the last row in the file I'm reading and it follows the same format as the others. The code for the script is below.
I'm wondering. Is the error actually occurring throughout the file but it's being flagged as the only error because it's the last of the errors, if that makes sense to anyone.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
rankings = pd.read_csv('data/rankingunitsdata.csv', parse_dates='date')
rankings.plot('date','rankingpos')
x = rankings.date.values
y = rankings.rankingpos.values
plt.plot(x,y, 'o')
plt.xlabel('Ranking Position')
plt.ylabel('Date')
plt.show()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-b6d9eb0809d3> in <module>()
----> 1 plt.plot(x,y, 'o')
2 plt.xlabel('Ranking Position')
3 plt.ylabel('Date')
4 plt.show()
C:\Anaconda3\lib\site-packages\matplotlib\pyplot.py in plot(*args, **kwargs)
3097 ax.hold(hold)
3098 try:
-> 3099 ret = ax.plot(*args, **kwargs)
3100 draw_if_interactive()
3101 finally:
C:\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in plot(self, *args, **kwargs)
1372
1373 for line in self._get_lines(*args, **kwargs):
-> 1374 self.add_line(line)
1375 lines.append(line)
1376
C:\Anaconda3\lib\site-packages\matplotlib\axes\_base.py in add_line(self, line)
1502 line.set_clip_path(self.patch)
1503
-> 1504 self._update_line_limits(line)
1505 if not line.get_label():
1506 line.set_label('_line%d' % len(self.lines))
C:\Anaconda3\lib\site-packages\matplotlib\axes\_base.py in _update_line_limits(self, line)
1513 Figures out the data limit of the given line, updating self.dataLim.
1514 """
-> 1515 path = line.get_path()
1516 if path.vertices.size == 0:
1517 return
C:\Anaconda3\lib\site-packages\matplotlib\lines.py in get_path(self)
872 """
873 if self._invalidy or self._invalidx:
--> 874 self.recache()
875 return self._path
876
C:\Anaconda3\lib\site-packages\matplotlib\lines.py in recache(self, always)
573 x = ma.asarray(xconv, np.float_)
574 else:
--> 575 x = np.asarray(xconv, np.float_)
576 x = x.ravel()
577 else:
C:\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
472
473 """
--> 474 return array(a, dtype, copy=False, order=order)
475
476 def asanyarray(a, dtype=None, order=None):
ValueError: could not convert string to float: '17/08/2015'

The error occurs because you are trying to plot some stuff with dates as strings on the x-axis while plt.plot() expects numerical values. Hence it fails when it tries to convert '17/08/2015' to a float, which cannot work.
You need to convert your x-values to datetime objects and then use plt.plot_date, which is for example demonstrated here.

matplotlib - Error passing line argument through **kwargs

I have a function who plot a line, something like that:
def tmp_plot(*args, **kwargs):
plt.plot([1,2,3,4,5],[1,2,3,4,5], *args, **kwargs)
and when I'm calling it with by passing line as a keyword argument like that:
tmp_plot(line = '-')
I get this error:
TypeError: set_lineprops() got multiple values for keyword argument 'line'
but it work fine with color argument.
I'm using matplotlib 1.4.3 and python 2.7.7
Any clues?

You can see where Matplotlib adds its own line argument in the Traceback below. This means your own keyword argument is a duplicate of Matplotlib's own one in the set_lineprops call:
In [1]: import matplotlib.pyplot as plt
In [2]: plt.plot([1,2,3], [1,4,9], line='-')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-82-f298702afcfe> in <module>()
----> 1 plt.plot([1,2,3], [1,4,9], line='-')
/Users/xnx/anaconda/envs/py33/lib/python3.3/site-packages/matplotlib/pyplot.py in plot(*args, **kwargs)
2985 ax.hold(hold)
2986 try:
-> 2987 ret = ax.plot(*args, **kwargs)
2988 draw_if_interactive()
2989 finally:
/Users/xnx/anaconda/envs/py33/lib/python3.3/site-packages/matplotlib/axes.py in plot(self, *args, **kwargs)
4137 lines = []
4138
-> 4139 for line in self._get_lines(*args, **kwargs):
4140 self.add_line(line)
4141 lines.append(line)
/Users/xnx/anaconda/envs/py33/lib/python3.3/site-packages/matplotlib/axes.py in _grab_next_args(self, *args, **kwargs)
317 return
318 if len(remaining) <= 3:
--> 319 for seg in self._plot_args(remaining, kwargs):
320 yield seg
321 return
/Users/xnx/anaconda/envs/py33/lib/python3.3/site-packages/matplotlib/axes.py in _plot_args(self, tup, kwargs)
305 ncx, ncy = x.shape[1], y.shape[1]
306 for j in range(max(ncx, ncy)):
--> 307 seg = func(x[:, j % ncx], y[:, j % ncy], kw, kwargs)
308 ret.append(seg)
309 return ret
/Users/xnx/anaconda/envs/py33/lib/python3.3/site-packages/matplotlib/axes.py in _makeline(self, x, y, kw, kwargs)
257 **kw
258 )
--> 259 self.set_lineprops(seg, **kwargs)
260 return seg
261
TypeError: set_lineprops() got multiple values for argument 'line'
Perhaps you mean ls or linestyle instead of line in any case?
In [83]: plt.plot([1,2,3], [1,4,9], ls='-')
Out[83]: [<matplotlib.lines.Line2D at 0x10ed65610>]

I would guess the internals of matplotlib are unpacking an internal dictionary of parameters in addition to the caller provided ones, without stripping out duplicates so both you and matplot lib internals are providing separate keyword parameters of the same name via two parallel routes.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Linear regression (Plotting a regression line) - python

Related

Issue with Trackpy - InvalidIndexError when fitting EMSD to a power law from the tutorial

Making a sns.pairplot using scikit wine dataset

Irrelevant columns breaking dask groupby vars, but not means (and not in pandas)

ValueError: invalid literal for float(): 17/08/2015

matplotlib - Error passing line argument through **kwargs

Categories

Resources