Syntax for importing scipy and sklearn modules - python

I use (just the standards) Win10, Anaconda-2018.12, Python-3.7, MKL-2019.1, mkl-service-1.1.2, Jupyter ipython-7.2. see here e.g.
I"m wondering why the following syntax works for import statements with the numpy modules but does not work for scipy or sklearn modules:
import scipy as sp
import numpy as np
A = np.random.random_sample((3, 3)) + np.identity(3)
b = np.random.rand((3))
x = sp.sparse.linalg.bicgstab(A,b)
> AttributeError Traceback (most recent call
> last) <ipython-input-1-35204bb7c2bd> in <module>()
> 3 A = np.random.random_sample((3, 3)) + np.identity(3)
> 4 b = np.random.rand((3))
> ----> 5 x = sp.sparse.linalg.bicgstab(A,b)
> AttributeError: module 'scipy' has no attribute 'sparse'
or with sklearn
import sklearn as sk
iris = sk.datasets.load_iris()
> AttributeError Traceback (most recent call
> last) <ipython-input-2-f62557c44a49> in <module>()
> 2 import sklearn as sk
> ----> 3 iris = sk.datasets.load_iris() AttributeError: module 'sklearn' has no attribute 'datasets
This syntax however does work (but are for rare commands not really lean):
import sklearn.datasets as datasets
iris = datasets.load_iris()
and
from scipy.sparse.linalg import bicgstab as bicgstab
x = bicgstab(A,b)
x[0]
array([ 0.44420803, -0.0877137 , 0.54352507])
What type of problem is that ? Could it be eliminated with reasonable effort ?

The "problem"
The behavior you're running into is actually a feature of Scipy, though it may seem like a bug at first glance. Some of the subpackages of scipy are quite large and have many members. Thus, in order to avoid lag when running import scipy (as well as to save on usage of system memory), scipy is structured so that most subpackages are not automatically imported. You can read all about it in the docs right here.
The fix
You can work around the apparent problem by exercising the standard Python import syntax/semantics a bit:
import numpy as np
A = np.random.random_sample((3, 3)) + np.identity(3)
b = np.random.rand((3))
import scipy as sp
# this won't work, raises AttributeError
# x = sp.sparse.linalg.bicgstab(A,b)
import scipy.sparse.linalg
# now that same line will work
x = sp.sparse.linalg.bicgstab(A,b)
print(x)
# output: (array([ 0.28173264, 0.13826848, -0.13044883]), 0)
Basically, if a call to sp.pkg_x.func_y is raising an AttributeError, then you can fix it by adding a line before it like:
import scipy.pkg_x
Of course, this assumes that scipy.pkg_x is a valid scipy subpackage to begin with.

Related

I'm getting the type error on writing the following code

Code on Jupyter Notebook:
import pandas as pd
import matplotlib as plt
%matplotlib inline
import numpy as np
data = pd.read_csv("E:Datascience\Bivariate\Titanic.csv")
data.head()
data.shape
data['Survived'].value_counts()
data=pd.get_dummies(data)
data.fillna(0,inplace=True)
data.shape
train=data[0:699]
test=data[700:890]
x_train=train.drop('Survived',axis=1)
y_train = train['Survived']
x_test=test.drop('Survived',axis=1)
true_p=test['Survived']
from sklearn.linear_model import LogisticRegression
logreg=LogisticRegression
logreg.fit(x_train,y_train)
Error:
Error - ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-48-cdb43f357e36> in <module>
----> 1 logreg.fit(x_train,y_train)
TypeError: fit() missing 1 required positional argument: 'y'
use this
logreg=LogisticRegression()
instead of
logreg=LogisticRegression
This will solve your issue.
You can refer this tutorial

Why am I not able to get the VIF using statsmodels api

I was looking at the following official documentation from statsmodels:
https://www.statsmodels.org/stable/generated/statsmodels.stats.outliers_influence.variance_inflation_factor.html
But when I try to run this code on a practice dataset (statsmodels.api already imported as sm)
variance_inflation_factor=sm.stats.outliers_influence.variance_inflation_factor()
vif=pd.DataFrame()
vif['VIF']=[variance_inflation_factor(X_train.values,i) for i in range(X_train.shape[1])]
vif['Predictors']=X_train.columns
I get the error message: module 'statsmodels.stats.api' has no attribute 'outliers_influence
Can anyone tell me what is the appropriate way to get this working?
variance_inflation_factor=sm.stats.outliers_influence.variance_inflation_factor() does not need to be defined by calling the function with no arguments. Instead, variance_inflation_factor is a function that takes two inputs.
import pandas as pd
import numpy as np
from statsmodels.stats.outliers_influence import variance_inflation_factor
X_train = pd.DataFrame(np.random.standard_normal((1000,5)), columns=[f"x{i}" for i
in range(5)])
vif=pd.DataFrame()
vif['VIF']=[variance_inflation_factor(X_train.values,i) for i in range(X_train.shape[1])]
vif['Predictors']=X_train.columns
print(vif)
which produces
VIF Predictors
0 1.002882 x0
1 1.004265 x1
2 1.001945 x2
3 1.004227 x3
4 1.003989 x4

Using power results in ValueError: a <= 0

I have written the following code but it fails with a ValueError.
from numpy import *
from pylab import *
t = arange(-10, 10, 20/(1001-1))
x = 1./sqrt(2*pi)*exp(power(-(t*t), 2))
Specifically, the error message I'm receiving is:
ValueError: a <= 0
x = 1./sqrt(2*pi)*exp(power(-(t*t), 2))
File "mtrand.pyx", line 3214, in mtrand.RandomState.power (numpy\random\mtrand\mtrand.c:24592)
Traceback (most recent call last):
File "D:\WinPython-64bit-3.4.4.3Qt5\notebooks\untitled1.py", line 6, in <module>
Any idea what the issue might be here?
Both numpy and pylab define a function called power, but they are completely different. Because you imported pylab after numpy using import *, the pylab version is the one you end up with. What is pylab.power? From the docstring:
power(a, size=None)
Draws samples in [0, 1] from a power distribution with positive exponent a - 1.
The moral of the story: don't use import *. In this case, it is common to use import numpy as np:
import numpy as np
t = np.arange(-10, 10, 20/(1001-1))
x = 1./np.sqrt(2*np.pi)*np.exp(np.power(-(t*t), 2))
Further reading:
Why is "import *" bad?
Idioms and Anti-Idioms in Python (That's in the Python 2 documentation, but it also applies to Python 3.)

Only length-1 arrays can be converted to Python scalars with log

from numpy import *
from pylab import *
from scipy import *
from scipy.signal import *
from scipy.stats import *
testimg = imread('path')
hist = hist(testimg.flatten(), 256, range=[0.0,1.0])[0]
hist = hist + 0.000001
prob = hist/sum(hist)
entropia = -1.0*sum(prob*log(prob))#here is error
print 'Entropia: ', entropia
I have this code and I do not know what could be the problem, thanks for any help
This is an example of why you should never use from module import *. You lose sight of where functions come from. When you use multiple from module import * calls, one module's namespace may clobber another module's namespace. Indeed, based on the error message, that appears to be what is happening here.
Notice that when log refers to numpy.log, then -1.0*sum(prob*np.log(prob)) can be computed without error:
In [43]: -1.0*sum(prob*np.log(prob))
Out[43]: 4.4058820963782122
but when log refers to math.log, then a TypeError is raised:
In [44]: -1.0*sum(prob*math.log(prob))
TypeError: only length-1 arrays can be converted to Python scalars
The fix is to use explicit module imports and explicit references to functions from the module's namespace:
import numpy as np
import matplotlib.pyplot as plt
testimg = np.random.random((10,10))
hist = plt.hist(testimg.flatten(), 256, range=[0.0,1.0])[0]
hist = hist + 0.000001
prob = hist/sum(hist)
# entropia = -1.0*sum(prob*np.log(prob))
entropia = -1.0*(prob*np.log(prob)).sum()
print 'Entropia: ', entropia
# prints something like: Entropia: 4.33996609845
The code you posted does not produce the error, but somewhere in your actual code log must be getting bound to math.log instead of numpy.log. Using import module and referencing functions with module.function will help you avoid this kind of error in the future.

scipy.interpolate.interpnd complaining about 'Delaunay' object has no attribute 'simplices'

I'm dusting off some code I wrote a few months ago, and for some reason it doesn't work anymore... In a nutshell, I'm using scipy.interpolate.LinearNDInterpolator objects to interpolate models and compare to data. Now, when I attempt to call the interpolator object with the coordinates at which I would like the interpolation, I get the following error:
In [9]: a([[3500, 3.5, 1.5]])
AttributeError Traceback (most recent call last)
<ipython-input-9-91f2103e7a0c> in <module>()
----> 1 a([[3500, 3.5, 1.5]])
/usr/lib64/python2.7/site-packages/scipy/interpolate/interpnd.so in scipy.interpolate.interpnd.NDInterpolatorBase.__call__ (scipy/interpolate/interpnd.c:3133)()
/usr/lib64/python2.7/site-packages/scipy/interpolate/interpnd.so in scipy.interpolate.interpnd.LinearNDInterpolator._evaluate_double (scipy/interpolate/interpnd.c:3954)()
/usr/lib64/python2.7/site-packages/scipy/interpolate/interpnd.so in scipy.interpolate.interpnd.LinearNDInterpolator._do_evaluate (scipy/interpolate/interpnd.c:4684)()
AttributeError: 'Delaunay' object has no attribute 'simplices'
I have never seen this error before, and the code has worked previously. Did something just change in scipy that I'm not aware of?
Thanks for looking!
I guess you use an older version of the library:
The Delaunay library has two different accessors for simplices:
"Delaunay.simplices" and "Delaunay.vertices"
shown here (newest docs): http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.Delaunay.html
Of the two Delaunay.vertices is marked "deprecated".
On Ubuntu 13.04 the simplices call does not exist however, because it still uses scipy 0.11.0:
http://docs.scipy.org/doc/scipy-0.11.0/reference/generated/scipy.spatial.Delaunay.html#scipy.spatial.Delaunay
Try with this minimal example or just rewrite your simplices call to vertices:
from __future__ import print_function
import numpy as np
from scipy.spatial import Delaunay
import sys
my_molecule = np.random.rand(400,3) #points for query
points = np.random.rand(1000, 3) #points used for Triangulation
diag = Delaunay(points)
simplices = diag.find_simplex(my_molecule)
for point,simplex in zip(my_molecule,simplices):
if simplex == -1:
print ("Point not included in diag.")
continue
print ("Doing vertices call: ")
spoints = diag.vertices[simplex]
print ("Doing simplices call: ")
spoints = diag.simplices[simplex]

Categories