Working through following the Machine Learning Tutorial:
http://machinelearningmastery.com/machine-learning-in-python-step-by-step/
Specifically, Section 4.2. Unfortunately, my code is throwing an error
NameError: name 'scatter_matrix' is not defined
Here is my code:
import pandas
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pandas.read_csv(url, names=names)
scatter_matrix(dataset)
plt.show()
There's at least one Stack Overflow question on scatter_matrix, but I haven't able to figure out what's missing.
Pandas scatter_matrix - plot categorical variables
You will have to import it like this:
from pandas.plotting import scatter_matrix
Cause you've imported the Pandas. You could use it like below:
pd.scatter_matrix(dataset)
However, pandas.scatter_matrix() is deprecated. use pandas.plotting.scatter_matrix() instead
Related
I am trying to plot a dataframe which has been taken from get_data_yahoo attribute in pandas_datareader.data on python IDE using matplotlib.pyplot and I am getting an KeyError for the X-Co-ordinate in prices.plot no matter what I try. Please help!
I have tried this out :-
import matplotlib.pyplot as plt
from pandas import Series,DataFrame
import pandas_datareader.data as pdweb
import datetime
prices=pdweb.get_data_yahoo(['CVX','XOM','BP'],start=datetime.datetime(2020,2,24),
end=datetime.datetime(2020,3,20))['Adj Close']
prices.plot(x="Date",y=["CVX","XOM","BP"])
plt.imshow()
plt.show()
And I have tried this as well:-
prices=DataFrame(prices.to_dict())
prices.plot(x="Timestamp",y=["CVX","XOM","BP"])
plt.imshow()
plt.show()
Please Help...!!
P.S: I am also getting some kind of warning, please explain about it if you could :)
The issue is that the Date column isn't an actual column when you import the data. It's an index. So just use:
prices = prices.reset_index()
Before plotting. This will convert the index into a column, and generate a new, integer-labelled index.
Also, in regards to the warnings, Pandas is full of them and they are super annoying! You can turn them off with the standard python library warnings.
import warnings
warnings.filterwarnings('ignore')
I have a CSV file that I am uploading into Jupyter and I am trying to delete multiple columns at once. I thought the "DEL" command would be the best but I can't get it to work.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
tmbd_movies = pd.read_csv('tmdb-movies.csv')
tmbd_movies.head()
del(tmbd_movies['imdb_id','homepage','tagline','keywords','overview'])
The goal was to remove the following columns:
imdb_id','homepage','tagline','keywords','overview
You want this:
tmbd_movies.drop(['imdb_id','homepage','tagline','keywords','overview'], 'columns', inplace=True)
(My first ever StackOverflow question)
I'm trying to plot bitcoin's market-cap against the date using pandas and matplotlib in Python.
Here is my code:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#read in CSV file using Pandas built in method
df = pd.read_csv("btc.csv", index_col=0, parse_dates=True)
Here are some details about the data frame:
dataframe details
matplotlib code:
#Plot marketcap(usd)
plt.plot(df.index, df["marketcap(USD)"])
plt.show()
Result:
Incorrect result
The plot seems to be more like scribbles that seem to move backwards. How could I fix this?
You can plot your Pandas Series "marketcap(USD)" directly using:
df["marketcap(USD)"].plot()
See the Pandas documentation on Basic Plotting
I am trying to work with Dask because my dataframe has become large and that pandas by itself can't simply process it. I read my dataset in as follows and get the following result that looks odd, not sure why its not outputting the dataframe:
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
import dask.bag as db
import json
%matplotlib inline
Leads = db.read_text('Leads 6.4.18.txt')
Leads
This returns (instead of my pandas dataframe):
dask.bag<bag-fro..., npartitions=1>
Then when I try to rename a few columns:
Leads_updated = Leads.rename(columns={'Business Type':'Business_Type','Lender
Type':'Lender_Type'})
Leads_updated
I get:
AttributeError: 'Bag' object has no attribute 'rename'
Can someone please explain what I am not doing correctly. The ojective is to just use Dask for all these steps since it is too big for regular Python/Pandas. My understanding is the syntax used under Dask should be the same as Pandas.
I am currently facing an import issue with pandas.tools.plotting. I try to import the scatter matrix via
from pandas.tools.plotting import scatter_matrix
But I get the following error message from visual studio code:
[pylint] E0611:No name 'scatter_matrix' in module
'pandas.tools.plotting'
I also tried
from pandas.tools import scatter_matrix
but it didn't work either. Why can't I import the scatter matrix?
I am using
python 3.6.4
pandas 0.22.0
You need to use this line of code to import pandas scatter_matrix. As seen in the docs of pandas visualization.
from pandas.plotting import scatter_matrix
e.g.
scatter = pd.plotting.scatter_matrix(X, c = y, marker = 'o', s=40, hist_kwds={'bins':15}, figsize=(9,9), cmap = cmap)