I am new to Python and have a problem that I want to solve.
I used the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
path='https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/automobileEDA.csv'
df = pd.read_csv(path)
sns.regplot(x="engine-size", y="price", data=df)
plt.ylim(0,)
When I run the code the I don't get the fitted regression line, only the Scatterplot shows up.
I also get following error:
TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'
Can someone help?
Related
I am new to python and I am trying to create a heatmap off of a pivot table. Below is the code I am using. There are NaN values in my df but I made them 0's and I am still getting the same answer. I have also read that it has something to do with the way I have named my file similar to python's standard library but I do not recall doing this or know how to change it.
sns.heatmap(pivot, annot = True)
plt.yticks(rotation=0)
plt.xticks(rotation=90)
plt.show()
The error looks like this:
ImportError: cannot import name 'roperator'
These are the import I have done:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import matplotlib.pyplot as plt
import seaborn as sns
(My first ever StackOverflow question)
I'm trying to plot bitcoin's market-cap against the date using pandas and matplotlib in Python.
Here is my code:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#read in CSV file using Pandas built in method
df = pd.read_csv("btc.csv", index_col=0, parse_dates=True)
Here are some details about the data frame:
dataframe details
matplotlib code:
#Plot marketcap(usd)
plt.plot(df.index, df["marketcap(USD)"])
plt.show()
Result:
Incorrect result
The plot seems to be more like scribbles that seem to move backwards. How could I fix this?
You can plot your Pandas Series "marketcap(USD)" directly using:
df["marketcap(USD)"].plot()
See the Pandas documentation on Basic Plotting
I am trying to work with Dask because my dataframe has become large and that pandas by itself can't simply process it. I read my dataset in as follows and get the following result that looks odd, not sure why its not outputting the dataframe:
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
import dask.bag as db
import json
%matplotlib inline
Leads = db.read_text('Leads 6.4.18.txt')
Leads
This returns (instead of my pandas dataframe):
dask.bag<bag-fro..., npartitions=1>
Then when I try to rename a few columns:
Leads_updated = Leads.rename(columns={'Business Type':'Business_Type','Lender
Type':'Lender_Type'})
Leads_updated
I get:
AttributeError: 'Bag' object has no attribute 'rename'
Can someone please explain what I am not doing correctly. The ojective is to just use Dask for all these steps since it is too big for regular Python/Pandas. My understanding is the syntax used under Dask should be the same as Pandas.
I am currently facing an import issue with pandas.tools.plotting. I try to import the scatter matrix via
from pandas.tools.plotting import scatter_matrix
But I get the following error message from visual studio code:
[pylint] E0611:No name 'scatter_matrix' in module
'pandas.tools.plotting'
I also tried
from pandas.tools import scatter_matrix
but it didn't work either. Why can't I import the scatter matrix?
I am using
python 3.6.4
pandas 0.22.0
You need to use this line of code to import pandas scatter_matrix. As seen in the docs of pandas visualization.
from pandas.plotting import scatter_matrix
e.g.
scatter = pd.plotting.scatter_matrix(X, c = y, marker = 'o', s=40, hist_kwds={'bins':15}, figsize=(9,9), cmap = cmap)
Working through following the Machine Learning Tutorial:
http://machinelearningmastery.com/machine-learning-in-python-step-by-step/
Specifically, Section 4.2. Unfortunately, my code is throwing an error
NameError: name 'scatter_matrix' is not defined
Here is my code:
import pandas
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pandas.read_csv(url, names=names)
scatter_matrix(dataset)
plt.show()
There's at least one Stack Overflow question on scatter_matrix, but I haven't able to figure out what's missing.
Pandas scatter_matrix - plot categorical variables
You will have to import it like this:
from pandas.plotting import scatter_matrix
Cause you've imported the Pandas. You could use it like below:
pd.scatter_matrix(dataset)
However, pandas.scatter_matrix() is deprecated. use pandas.plotting.scatter_matrix() instead