AttributeError: module 'preprocessor' has no attribute 'clean' - python

I am trying to use the preprocessor library in order to clean text stored in a Pandas Data Frame. I've installed the last version (https://pypi.org/project/tweet-preprocessor/), but I receive this error message:
import preprocessor as p
#forming a separate feature for cleaned tweets
for i,v in enumerate(df['text']):
df.loc[v,'text'] = p.clean(i)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-183-94e08e1aff33> in <module>
1 #forming a separate feature for cleaned tweets
2 for i,v in enumerate(df['text']):
----> 3 df.loc[v,'text'] = p.clean(i)
AttributeError: module 'preprocessor' has no attribute 'clean'

You probably have the preprocessor module installed as well, which is entirely distinct from the tweet-preprocessor module. However, confusingly, the import preprocessor as p statement can be used for both. When both modules are installed, Python ignores tweet-preprocessor and automatically opts for preprocessor, which does not contain a clean function, hence the error you received.
To resolve this, I had to uninstall both modules with the following commands:
pip uninstall preprocessor
pip uninstall tweet-preprocessor
Then I closed all shells for a fresh start and typed:
pip install tweet-preprocessor
And finally:
>>> import preprocessor as p
>>> p.clean('#this and that')
'and that'
Merely uninstalling preprocessor did not work. Python kept importing the module despite it being uninstalled. I am not sure why, but I suspect it has something to do with caches that Python keeps in the background.

Try installing first:
pip install tweet-preprocessor
Then:
import preprocessor as p

Related

AttributeError: module 'sst' has no attribute 'train_reader'

I am very new to sentiment analysis. Trying to use Stanford Sentiment Treebank(sst) and ran into an error.
from nltk.tree import Tree
import os
import sst
trees = "C:\\Users\m\data\trees"
tree, score = next(sst.train_reader(trees))
[Output]:
AttributeError Traceback (most recent call last)
<ipython-input-19-4101f90b0b16> in <module>()
----> 1 tree, score = next(sst.train_reader(trees))
AttributeError: module 'sst' has no attribute 'train_reader'
I think you're looking for https://github.com/JonathanRaiman/pytreebank, not https://pypi.org/project/sst/.
On the python side, that error is pretty clear. Once you import the right package, though, I'm not sure I saw train_reader but I could be wrong.
UPDATE:
I'm not entirely sure why you're running into the 'sst' not having the attribute train_reader. Make sure you didn't accidentally install the 'sst' package if you're using conda. It looks like the 'sst' is referring to a privately created module and that one should work.
I got your import working but what I did was I:
Installed everything specified in the requirements.txt file.
import sst was still giving me an error so I installed nltk and sklearn to resolve that issue. (fyi, im not using conda. im just using pip and virtualenv for my own private package settings. i ran pip install nltk and pip install sklearn)
At this point, import sst worked for me.
I guess you're importing the sst package selenium-simple-test, which is not what you're looking for.
Try sst.discover() , if you get the error
TypeError: discover() missing 4 required positional arguments: 'test_loader', 'package', 'dir_path', and 'names'
You are using the selenium-simple-test package

AttributeError: module 'shodan' has no attribute 'Shodan'

I need to perform a BULK whois query using shodan API.
I came across this code
import shodan
api = shodan.Shodan('inserted my API-KEY- within single quotes')
info = api.host('8.8.8.8')
After running the module i get the following error:
Traceback (most recent call last):
File "C:/Users/PIPY/AppData/Local/Programs/Python/Python37/dam.py", line 1, in
import shodan
File "C:/Users/PIPY/AppData/Local/Programs/Python/Python37\shodan.py", line 2, in
api = shodan.Shodan('the above insereted API KEY')
AttributeError: module 'shodan' has no attribute 'Shodan'
I'm learning python and have limited scripting/programming experience.
Could you please help me out?
Cheers
You seem to have dam.py and shodan.py – Python defaults to importing from the module directory, so the installed shodan package gets masked.
Try renaming shodan.py to e.g. shodan_test.py (and of course fixing up any imports, etc.).
I have solved the issue by re-installing the shodan module under the C:\Users\PIPY\AppData\Local\Programs\Python\Python37\Scripts>pip install shodan
Thank you for the help AKX.
I had this same issue but after renaming my file as something different than shodan.py, I had to also delete the compiled class shodan.pyc to avoid the error.
Also, if you have more than one version of python installed, i.e. python2 and python3, use
python -m pip install shodan instead of pip install shodan, to ensure that you are installing the library in the same version of shodan that you are using to execute your script.
If you are executing your script with python3 shodan_test.py then use python3 -m pip install shodan

importing a package that doesn't exist

I've never seen an import issue like this before. I removed a directory from site-packages and the corresponding package is still importable.
python2
> import google
> print(google.__path__)
['/home/bamboo/.local/lib/python2.7/site-packages/google']
However this directory doesn't actually exist
ls: cannot access /home/bamboo/.local/lib/python2.7/site-packages/google: No such file or directory
I've removed everything that I'm aware of that is related to it, but there must still be something hanging around.
Digging another level deeper I tried to reload google.
python2
> import google;
> reload(google);
ImportError: No module named google
So apparently it recognizes it is gone on reload.
Checking out sys.modules you get
python2
> import sys
> print(sys.modules)
{'google': <module 'google' (built-in)>, 'copy_reg': <module 'copy_reg' from '/usr/lib/python2.7/copy_reg.pyc'> ...
which indicates that apparently google is a built-in.
Note on motivation: Usually this sort of issue would be weird, but not a show stopper. The problem for me is that the google package is masking a different package of the same name.
tl,dr: use pip to uninstall Google packages completely.
There are two issues here:
strange import/reload behaviour of the google package
removal of the google package
import/reload behavior
I can reproduce the import/reload behaviour by installing the (Google) protobuf package (many Google packages will behave in the same way).
$ mktmpenv -p $(which python2)
...
$ python --version
Python 2.7.13
$ pip install protobuf
...
Installing collected packages: six, protobuf
Successfully installed protobuf-3.5.1 six-1.11.0
>>> import google
>>> print google.__path__
['~/virtual-envs/tmp-66cd9b4d01a8dec6/lib/python2.7/site-packages/google']
>>> import sys
>>> print sys.modules['google']
<module 'google' (built-in)>
>>> reload(google)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named google
I suspect what's going on here is that Google prefer to have all Google packages installed under a single google package, but this package is not designed to be importable, hence the unexpected reload behaviour. However importing subpackages by name works as expected:
>>> import protobuf
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named protobuf
>>> from google import protobuf
>>> protobuf.__path__
['~/virtual-envs/tmp-66cd9b4d01a8dec6/lib/python2.7/site-packages/google/protobuf']
>>> reload(protobuf)
<module 'google.protobuf' from '~/virtual-envs/tmp-66cd9b4d01a8dec6/lib/python2.7/site-packages/google/protobuf/__init__.pyc'>
>>>
Removal of the google package
The question states:
I removed a directory from site-packages and the corresponding package is still importable.
This can also be reproduced:
($ rm -rf ~/virtual-envs/tmp-66cd9b4d01a8dec6/lib/python2.7/site-packages/google
$ python
>>> import google
>>> print google.__path__
['~/virtual-envs/tmp-66cd9b4d01a8dec6/lib/python2.7/site-packages/google']
>>>
The problem here is that simply removing the google directory and its contents is not enough to completely uninstall whatever Google packages are present.
The site-packages directory still contains the file protobuf-3.5.1-py2.7-nspkg.pth, which contains this code (split into separate lines for readability, the original is a single line of semi-colon separated statements):
import sys, types, os
has_mfs = sys.version_info > (3, 5)
p = os.path.join(sys._getframe(1).f_locals['sitedir'], *('google',))
importlib = has_mfs and __import__('importlib.util')
has_mfs and __import__('importlib.machinery')
m = has_mfs and sys.modules.setdefault('google', importlib.util.module_from_spec(importlib.machinery.PathFinder.find_spec('google', [os.path.dirname(p)])))
m = m or sys.modules.setdefault('google', types.ModuleType('google'))
mp = (m or []) and m.__dict__.setdefault('__path__',[])
(p not in mp) and mp.append(p)
The line
m = m or sys.modules.setdefault('google', types.ModuleType('google'))
is creating the google module in sys.modules if it doesn't already exist - this is why the google module is importable even after the directory has been deleted.
The correct way to remove the google module is by uninstalling google packages using pip:
pip uninstall protobuf
If pip isn't available in the build environment the it's a case of identifying any related files and folders (*dist-info/, *.pth) in site-packages and removing them manually.

ImportError dependency install resulting in NameError

Ive been writing a little script to bootstrap an environment for me, but ran into some confusion when attempting to handle module import errors. My intention was to catch any import error for the yaml module, and then use apt to install the module, and re-import it...
def install_yaml():
print "Attempting to install python-yaml"
print "=============== Begining of Apt Output ==============="
if subprocess.call(["apt-get", "-y", "install", "python-yaml"]) != 0 :
print "Failure whilst installing python-yaml"
sys.exit(1)
print "================= End of Apt Output =================="
#if all has gone to plan attempt to import yaml
import yaml
reload(yaml)
try:
import yaml
except ImportError:
print "Failure whilst importing yaml"
install_yaml()
grains_config = {}
grains_config['bootstrap version'] = __version__
grains_config['bootstrap time'] = "{0}".format(datetime.datetime.now())
with open("/tmp/doc.yaml", 'w+') as grains_file:
yaml.dump(grains_config, grains_file, default_flow_style=False)
Unfortunately when run I get a NameError
Traceback (most recent call last):
File "importtest-fail.py", line 32, in <module>
yaml.dump(grains_config, grains_file, default_flow_style=False)
NameError: name 'yaml' is not defined
After some research I discovered the reload builtin (Reload a previously imported module), which sounded like what I wanted to do, but still results in a NameError on the yaml modules first use.
Does anyone have any suggestions that would allow me to handle the import exception, install the dependencies and "re-import" it?
I could obviously wrap the python script in some bash to do the initial dependency install, but its not a very clean solution.
Thanks
You imported yaml as a local in install_yaml(). You'd have to mark it as a global instead:
global yaml
inside the function, or better still, move the import out of the function and put it right after calling install_yaml().
Personally, I'd never auto-install a dependency this way. Just fail and leave it to the administrator to install the dependency properly. They could be using other means (such as a virtualenv) to manage packages, for example.

Check if Python Package is installed

What's a good way to check if a package is installed while within a Python script? I know it's easy from the interpreter, but I need to do it within a script.
I guess I could check if there's a directory on the system that's created during the installation, but I feel like there's a better way. I'm trying to make sure the Skype4Py package is installed, and if not I'll install it.
My ideas for accomplishing the check
check for a directory in the typical install path
try to import the package and if an exception is throw, then install package
If you mean a python script, just do something like this:
Python 3.3+ use sys.modules and find_spec:
import importlib.util
import sys
# For illustrative purposes.
name = 'itertools'
if name in sys.modules:
print(f"{name!r} already in sys.modules")
elif (spec := importlib.util.find_spec(name)) is not None:
# If you choose to perform the actual import ...
module = importlib.util.module_from_spec(spec)
sys.modules[name] = module
spec.loader.exec_module(module)
print(f"{name!r} has been imported")
else:
print(f"can't find the {name!r} module")
Python 3:
try:
import mymodule
except ImportError as e:
pass # module doesn't exist, deal with it.
Python 2:
try:
import mymodule
except ImportError, e:
pass # module doesn't exist, deal with it.
As of Python 3.3, you can use the find_spec() method
import importlib.util
# For illustrative purposes.
package_name = 'pandas'
spec = importlib.util.find_spec(package_name)
if spec is None:
print(package_name +" is not installed")
Updated answer
A better way of doing this is:
import subprocess
import sys
reqs = subprocess.check_output([sys.executable, '-m', 'pip', 'freeze'])
installed_packages = [r.decode().split('==')[0] for r in reqs.split()]
The result:
print(installed_packages)
[
"Django",
"six",
"requests",
]
Check if requests is installed:
if 'requests' in installed_packages:
# Do something
Why this way? Sometimes you have app name collisions. Importing from the app namespace doesn't give you the full picture of what's installed on the system.
Note, that proposed solution works:
When using pip to install from PyPI or from any other alternative source (like pip install http://some.site/package-name.zip or any other archive type).
When installing manually using python setup.py install.
When installing from system repositories, like sudo apt install python-requests.
Cases when it might not work:
When installing in development mode, like python setup.py develop.
When installing in development mode, like pip install -e /path/to/package/source/.
Old answer
A better way of doing this is:
import pip
installed_packages = pip.get_installed_distributions()
For pip>=10.x use:
from pip._internal.utils.misc import get_installed_distributions
Why this way? Sometimes you have app name collisions. Importing from the app namespace doesn't give you the full picture of what's installed on the system.
As a result, you get a list of pkg_resources.Distribution objects. See the following as an example:
print installed_packages
[
"Django 1.6.4 (/path-to-your-env/lib/python2.7/site-packages)",
"six 1.6.1 (/path-to-your-env/lib/python2.7/site-packages)",
"requests 2.5.0 (/path-to-your-env/lib/python2.7/site-packages)",
]
Make a list of it:
flat_installed_packages = [package.project_name for package in installed_packages]
[
"Django",
"six",
"requests",
]
Check if requests is installed:
if 'requests' in flat_installed_packages:
# Do something
If you want to have the check from the terminal, you can run
pip3 show package_name
and if nothing is returned, the package is not installed.
If perhaps you want to automate this check, so that for example you can install it if missing, you can have the following in your bash script:
pip3 show package_name 1>/dev/null #pip for Python 2
if [ $? == 0 ]; then
echo "Installed" #Replace with your actions
else
echo "Not Installed" #Replace with your actions, 'pip3 install --upgrade package_name' ?
fi
Open your command prompt type
pip3 list
As an extension of this answer:
For Python 2.*, pip show <package_name> will perform the same task.
For example pip show numpy will return the following or alike:
Name: numpy
Version: 1.11.1
Summary: NumPy: array processing for numbers, strings, records, and objects.
Home-page: http://www.numpy.org
Author: NumPy Developers
Author-email: numpy-discussion#scipy.org
License: BSD
Location: /home/***/anaconda2/lib/python2.7/site-packages
Requires:
Required-by: smop, pandas, tables, spectrum, seaborn, patsy, odo, numpy-stl, numba, nfft, netCDF4, MDAnalysis, matplotlib, h5py, GridDataFormats, dynd, datashape, Bottleneck, blaze, astropy
In the Terminal type
pip show some_package_name
Example
pip show matplotlib
You can use the pkg_resources module from setuptools. For example:
import pkg_resources
package_name = 'cool_package'
try:
cool_package_dist_info = pkg_resources.get_distribution(package_name)
except pkg_resources.DistributionNotFound:
print('{} not installed'.format(package_name))
else:
print(cool_package_dist_info)
Note that there is a difference between python module and a python package. A package can contain multiple modules and module's names might not match the package name.
if pip list | grep -q \^'PACKAGENAME\s'
# installed ...
else
# not installed ...
fi
You can use this:
class myError(exception):
pass # Or do some thing like this.
try:
import mymodule
except ImportError as e:
raise myError("error was occurred")
Method 1
to search weather a package exists or not use pip3 list command
#**pip3 list** will display all the packages and **grep** command will search for a particular package
pip3 list | grep your_package_name_here
Method 2
You can use ImportError
try:
import your_package_name
except ImportError as error:
print(error,':( not found')
Method 3
!pip install your_package_name
import your_package_name
...
...
I'd like to add some thoughts/findings of mine to this topic.
I'm writing a script that checks all requirements for a custom made program. There are many checks with python modules too.
There's a little issue with the
try:
import ..
except:
..
solution.
In my case one of the python modules called python-nmap, but you import it with import nmap and as you see the names mismatch. Therefore the test with the above solution returns a False result, and it also imports the module on hit, but maybe no need to use a lot of memory for a simple test/check.
I also found that
import pip
installed_packages = pip.get_installed_distributions()
installed_packages will have only the packages has been installed with pip.
On my system pip freeze returns over 40 python modules, while installed_packages has only 1, the one I installed manually (python-nmap).
Another solution below that I know it may not relevant to the question, but I think it's a good practice to keep the test function separate from the one that performs the install it might be useful for some.
The solution that worked for me. It based on this answer How to check if a python module exists without importing it
from imp import find_module
def checkPythonmod(mod):
try:
op = find_module(mod)
return True
except ImportError:
return False
NOTE: this solution can't find the module by the name python-nmap too, I have to use nmap instead (easy to live with) but in this case the module won't be loaded to the memory whatsoever.
I would like to comment to #ice.nicer reply but I cannot, so ...
My observations is that packages with dashes are saved with underscores, not only with dots as pointed out by #dwich comment
For example, you do pip3 install sphinx-rtd-theme, but:
importlib.util.find_spec(sphinx_rtd_theme) returns an Object
importlib.util.find_spec(sphinx-rtd-theme) returns None
importlib.util.find_spec(sphinx.rtd.theme) raises ModuleNotFoundError
Moreover, some names are totally changed.
For example, you do pip3 install pyyaml but it is saved simply as yaml
I am using python3.8
If you'd like your script to install missing packages and continue, you could do something like this (on example of 'krbV' module in 'python-krbV' package):
import pip
import sys
for m, pkg in [('krbV', 'python-krbV')]:
try:
setattr(sys.modules[__name__], m, __import__(m))
except ImportError:
pip.main(['install', pkg])
setattr(sys.modules[__name__], m, __import__(m))
A quick way is to use python command line tool.
Simply type import <your module name>
You see an error if module is missing.
$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
>>> import sys
>>> import jocker
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named jocker
$
Hmmm ... the closest I saw to a convenient answer was using the command line to try the import. But I prefer to even avoid that.
How about 'pip freeze | grep pkgname'? I tried it and it works well. It also shows you the version it has and whether it is installed under version control (install) or editable (develop).
I've always used pylibcheck to check if a lib is installed or not, simply download it by doing pip install pylibcheck and the could could be like this
import pylibcheck
if not pylibcheck.checkPackage("mypackage"):
#not installed
it also supports tuples and lists so you can check multiple packages and if they are installed or not
import pylibcheck
packages = ["package1", "package2", "package3"]
if pylibcheck.checkPackage(packages):
#not installed
you can also install libs with it if you want to do that, recommend you check the official pypi
The top voted solution which uses techniques like importlib.util.find_spec and sys.modules and catching import exceptions works for most packages but fails in some edge cases (such as the beautifulsoup package) where the package name used in imports is somewhat different (bs4 in this case) than the one used in setup file configuration. For these edge cases, this solution doesn't work unless you pass the package name used in imports instead of the one used in requirements.txt or pip installations.
For my use case, I needed to write a package checker that checks installed packages based on requirements.txt, so this solution didn't work. What I ended up using was subprocess.check to call the pip module explicitly to check for the package installation:
import subprocess
for pkg in packages:
try:
subprocess.check_output('py -m pip show ' + pkg)
except subprocess.CalledProcessError as ex:
not_found.append(pkg)
It's a bit slower than the other methods but more reliable and handles the edge cases.
Go option #2. If ImportError is thrown, then the package is not installed (or not in sys.path).
Is there any chance to use the snippets given below? When I run this code, it returns "module pandas is not installed"
a = "pandas"
try:
import a
print("module ",a," is installed")
except ModuleNotFoundError:
print("module ",a," is not installed")
But when I run the code given below:
try:
import pandas
print("module is installed")
except ModuleNotFoundError:
print("module is not installed")
It returns "module pandas is installed".
What is the difference between them?

Categories