scikitlearn breaks pandas installation - python

I have a problem having pandas and sklearn work together. Importing any module from sklearn, makes pandas run havoc.
This is a minimal example of my problem:
#!/usr/bin/env python
import pandas as pd
import sklearn.metrics as sk
df_train = pd.DataFrame()
print df_train
Which prints:
/usr/local/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.
warnings.warn(d.msg, DeprecationWarning)
If I comment the line where I import sklearn.metrics, everything works correctly
Help? :}
Jose

You can ignore the warning message with:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning,
module="pandas", lineno=570)
which should be safe for now. As #Jeff notes, it'll be fixed in pandas 0.13.

Related

error No module named 'xlrd'. how to import excel with python and pandas properly? please close this

I realized that there may be something wrong in my local dev env just now.
I tried my code on colab.
it worked well.
import pandas as pd
df = pd.read_excel('hurun-2018-top50.xlsx')
thank u all.
please close this session.
------- following is original description ---------
I am trying to import excel with python and pandas.
I already pip installed "xlrd" module.
I googled a lot and tried several different methods, none of them worked.
Here is my code.
import pandas as pd
from pandas import ExcelFile
from pandas import ExcelWriter
df = pd.read_excel('hurun-2018-top50.xlsx', index_col=0)
df = pd.read_excel('hurun-2018-top50.xlsx', sheetname='Sheet1')
df = pd.read_excel('hurun-2018-top50.xlsx')
Any response will be appreciated.

Error in data.py module "cannot import name 'wb'"

Pandas has worked fine for me for years. All of a sudden, today, I am getting this error:
File "C:\Users\Excel\Anaconda3\lib\site-packages\dautil\data.py", line 3, in <module>
from pandas.io import wb
ImportError: cannot import name 'wb'
It seems like the error is coming form data.py. Here is a screen shot.
This seemed to happen all of a sudden, and the error is triggered when I run a few different processes that call this process. I uninstalled and re-installed pandas. I am still getting the same error.
The documentation says
Starting in 0.19.0, pandas no longer supports pandas.io.data or
pandas.io.wb, so you must replace your imports from pandas.io with
those from pandas_datareader:
So, as per documentation, you should be doing this:
from pandas.io import data, wb # becomes
from pandas_datareader import data, wb
Even with pandas_datareader, the same error may happen, if this your case, then you have two solutions
for Pandas >=0.23 make sure that your pandas_datareader is > = 0.7, if for some reason you don't want to upgrade pandas_datareader to 0.7, or downgrading the pandas_datareader, then alternavly, you can do:
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader as web

Using Dask with Python causes issues when running Pandas code

I am trying to work with Dask because my dataframe has become large and that pandas by itself can't simply process it. I read my dataset in as follows and get the following result that looks odd, not sure why its not outputting the dataframe:
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
import dask.bag as db
import json
%matplotlib inline
Leads = db.read_text('Leads 6.4.18.txt')
Leads
This returns (instead of my pandas dataframe):
dask.bag<bag-fro..., npartitions=1>
Then when I try to rename a few columns:
Leads_updated = Leads.rename(columns={'Business Type':'Business_Type','Lender
Type':'Lender_Type'})
Leads_updated
I get:
AttributeError: 'Bag' object has no attribute 'rename'
Can someone please explain what I am not doing correctly. The ojective is to just use Dask for all these steps since it is too big for regular Python/Pandas. My understanding is the syntax used under Dask should be the same as Pandas.

AttributeError: module 'pandas' has no attribute 'read_csv'

Where I did wrong?
import pandas as pd
import numpy as np
msft = pd.read_csv("week_51.csv")
print(msft.head())
Step 1
Test your pandas installation:
import pandas as pd
pd.test()
Note: for this you need pytest, which comes with most popular distributions.
Step 2
If the test fails, install pandas. There are several methods to choose from depending on your setup.

Scatter_Matrix Will Not Display Using Pandas and

Working through following the Machine Learning Tutorial:
http://machinelearningmastery.com/machine-learning-in-python-step-by-step/
Specifically, Section 4.2. Unfortunately, my code is throwing an error
NameError: name 'scatter_matrix' is not defined
Here is my code:
import pandas
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pandas.read_csv(url, names=names)
scatter_matrix(dataset)
plt.show()
There's at least one Stack Overflow question on scatter_matrix, but I haven't able to figure out what's missing.
Pandas scatter_matrix - plot categorical variables
You will have to import it like this:
from pandas.plotting import scatter_matrix
Cause you've imported the Pandas. You could use it like below:
pd.scatter_matrix(dataset)
However, pandas.scatter_matrix() is deprecated. use pandas.plotting.scatter_matrix() instead

Categories