No module named 'gensim' in AWS Glue - python

I am trying to use Gensim in AWS Glue ETL job. I have created and tested the gensim wheel in sagemaker and it appears to be working correctly. I have added the wheel file in S3 and added the path in the "Python library path" in glue job details. Still the glue script fails with gensim not found error.
Gensim setup.py file
from setuptools import setup
setup(
name="gensim",
version="4.1.0",
packages=['gensim'],
install_requires=['Cython', 'numpy', 'scipy', 'smart-open']
)
After creating the wheel file, I tested it and it installed gensim properly.
sh-4.2$ pip install dist/gensim-4.1.0-py3-none-any.whl
Processing ./dist/gensim-4.1.0-py3-none-any.whl
Requirement already satisfied: Cython in /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/Cython-3.0.0a11-py3.7.egg (from gensim==4.1.0) (3.0.0a11)
Requirement already satisfied: smart-open in /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/smart_open-6.2.0-py3.7.egg (from gensim==4.1.0) (6.2.0)
Requirement already satisfied: numpy in /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages (from gensim==4.1.0) (1.21.6)
Requirement already satisfied: scipy in /home/ec2-user/.local/lib/python3.7/site-packages (from gensim==4.1.0) (1.7.3)
Installing collected packages: gensim
Successfully installed gensim-4.1.0
Glue job output logs show that it is installed properly
2022-09-28T18:33:40.657+05:30 Processing ./glue-python-libs-rutqf0ex/gensim-4.1.0-py3-none-any.whl
2022-09-28T18:33:40.730+05:30 Collecting smart-open
2022-09-28T18:33:40.749+05:30 Downloading smart_open-6.2.0-py3-none-any.whl (58 kB)
2022-09-28T18:33:41.500+05:30 Collecting Cython
2022-09-28T18:33:41.508+05:30 Downloading Cython-0.29.32-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (2.0 MB)
2022-09-28T18:33:42.216+05:30 Collecting numpy
2022-09-28T18:33:42.221+05:30 Downloading numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)
2022-09-28T18:33:43.126+05:30 Collecting scipy
2022-09-28T18:33:43.143+05:30 Downloading scipy-1.5.4-cp36-cp36m-manylinux1_x86_64.whl (25.9 MB)
2022-09-28T18:33:44.258+05:30 Installing collected packages: smart-open, Cython, numpy, scipy, gensim
2022-09-28T18:33:49.744+05:30
Copy
Successfully installed Cython-0.29.32 gensim-4.1.0 numpy-1.19.5 scipy-1.5.4 smart-open-6.2.0
Successfully installed Cython-0.29.32 gensim-4.1.0 numpy-1.19.5 scipy-1.5.4 smart-open-6.2.0
2022-09-28T18:33:53.981+05:30 Processing ./glue-python-libs-rutqf0ex/bio-1.3.9-py3-none-any.whl
Glue job error
ModuleNotFoundError: No module named 'gensim'
I came across these two errors in the glue error log.
ERROR: botocore 1.12.232 has requirement urllib3<1.26,>=1.20; python_version >= "3.4", but you'll have urllib3 1.26.12 which is incompatible.
Traceback (most recent call last):
File "/tmp/runscript.py", line 211, in <module>
runpy.run_path(temp_file_path, run_name='__main__')
File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmp/glue-python-scripts-cr7714l4/train-lda-model.py", line 11, in <module>
ModuleNotFoundError: No module named 'gensim'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/runscript.py", line 230, in <module>
raise e_type(e_value).with_traceback(new_stack)
File "/tmp/glue-python-scripts-cr7714l4/train-lda-model.py", line 11, in <module>
ModuleNotFoundError: No module named 'gensim'

Since "gensim" is a custom library you need to pack the dependency as a zip file, uploading it to s3 and indicate the s3 path in the Glue job param called "Python library path". AWS Documentation

Related

How do I solve IMPORT ERROR problem on Spyder

When I run the following code on the Spyder console:
!pip install pyqt5 lxml --upgrade
!cd labelImg && pyrcc5 -o libs/resources.py resources.qrc
I get the following:
Collecting pyqt5
Using cached PyQt5-5.15.7-cp37-abi3-win_amd64.whl (6.8 MB)
Requirement already satisfied: lxml in c:\users\bakang\anaconda3\lib\site-packages (4.9.1)
Requirement already satisfied: PyQt5-sip<13,>=12.11 in c:\users\bakang\anaconda3\lib\site-packages (from pyqt5) (12.11.0)
Requirement already satisfied: PyQt5-Qt5>=5.15.0 in c:\users\bakang\anaconda3\lib\site-packages (from pyqt5) (5.15.2)
Installing collected packages: pyqt5
ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\Bakang\\anaconda3\\Lib\\site-packages\\PyQt5\\QtCore.pyd'
Consider using the `--user` option or check the permissions.
Traceback (most recent call last):
File "C:\Users\Bakang\anaconda3\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Bakang\anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\Bakang\anaconda3\lib\site-packages\PyQt5\pyrcc_main.py", line 23, in <module>
from .pyrcc import *
ImportError: DLL load failed while importing pyrcc: The specified procedure could not be found.
What could be the problem, Please help.

Python PyDictionary Import error from futures

So I run python -m pip install PyDictionary and I get this output:
Collecting PyDictionary
Using cached PyDictionary-2.0.1-py3-none-any.whl (6.1 kB)
Requirement already satisfied: requests in d:\python\lib\site-packages (from PyDictionary) (2.27.1)
Collecting goslate
Using cached goslate-1.5.2.tar.gz (16 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: click in d:\python\lib\site-packages (from PyDictionary) (8.1.3)
Collecting bs4
Using cached bs4-0.0.1.tar.gz (1.1 kB)
Preparing metadata (setup.py) ... done
Collecting beautifulsoup4
Using cached beautifulsoup4-4.11.1-py3-none-any.whl (128 kB)
Requirement already satisfied: colorama in d:\python\lib\site-packages (from click->PyDictionary) (0.4.4)
Collecting futures
Using cached futures-3.0.5.tar.gz (25 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [27 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 14, in <module>
File "D:\Python\lib\site-packages\setuptools\__init__.py", line 189, in <module>
monkey.patch_all()
File "D:\Python\lib\site-packages\setuptools\monkey.py", line 99, in patch_all
patch_for_msvc_specialized_compiler()
File "D:\Python\lib\site-packages\setuptools\monkey.py", line 169, in patch_for_msvc_specialized_compiler
patch_func(*msvc14('_get_vc_env'))
File "D:\Python\lib\site-packages\setuptools\monkey.py", line 149, in patch_params
mod = import_module(mod_name)
File "D:\Python\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "D:\Python\lib\site-packages\setuptools\_distutils\_msvccompiler.py", line 20, in <module>
import unittest.mock
File "D:\Python\lib\unittest\mock.py", line 26, in <module>
import asyncio
File "D:\Python\lib\asyncio\__init__.py", line 8, in <module>
from .base_events import *
File "D:\Python\lib\asyncio\base_events.py", line 18, in <module>
import concurrent.futures
File "C:\Users\localuser\AppData\Local\Temp\pip-install-vsl457j6\futures_9b56e18f578949cd955c2218e6840e1e\concurrent\futures\__init__.py", line 8, in <module>
from concurrent.futures._base import (FIRST_COMPLETED,
File "C:\Users\localuser\AppData\Local\Temp\pip-install-vsl457j6\futures_9b56e18f578949cd955c2218e6840e1e\concurrent\futures\_base.py", line 357
raise type(self._exception), self._exception, self._traceback
^
SyntaxError: invalid syntax
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
How do I fix it so I can download PyDictionary.
I am using python 3.10.4.
I have saved Python onto a hard drive D: but it causes no other problems for pip.
The main problem I can see is that there is a problem importing Futures but from my research I have found that Futures is already downloaded for python 3 but when I try to uninstall it is says skipping as futures in not installed so maybe that is the problem but I don`t know how to solve it.
Anyone have any ideas?
This seems to be a known issue with this package. When issues like this occur, it is always best to go and check the official GitHub repository of the project and look through the open issues.
https://github.com/geekpradd/PyDictionary/issues/52#issuecomment-1105459595
I don't have any privileges on this repo. But I did some debugging I think the actual issue here is that goslate depends on futures, and futures >=3.0.0 isn't installable on python3 at all. The pypi page calls this out saying It does not work on Python 3 due to Python 2 syntax being used in the codebase. Python 3 users should not attempt to install it, since the package is already included in the standard library. -- https://pypi.org/project/futures/
I am able to work around the issue by installing futures <3.0.0 before installing PyDictionary:
Try this method but it is giving few results only. This library uses dictionary.com
pip install Py-Dictionary
from pydictionary import Dictionary
dict = Dictionary("lastname")
To store it somewhere in a variable
meanings_list = dict.meanings()
synonyms_list = dict.synonyms()
antonyms_list = dict.antonyms()
To print this
dict.print_meanings()
dict.print_synonyms()
dict.print_antonyms()

What does this traceback error mean when running this python script in Anaconda?

New to programming. Installed Anaconda on a Windows 10 machine. Had some issues running updates.
While in Base environment, I installed my first Git repo successfully:
(base) C:\Users\samsung\Anaconda3\pkgs>pip install git+git://github.com/json-transformations/jsonflatten.git
Collecting git+git://github.com/json-transformations/jsonflatten.git
Cloning git://github.com/json-transformations/jsonflatten.git to c:\users\samsung\appdata\local\temp\pip-req-build-zeiezw
Running command git clone -q git://github.com/json-transformations/jsonflatten.git 'C:\Users\samsung\AppData\Local\Temp\p
Collecting jsoncut
Downloading jsoncut-0.6-py2.py3-none-any.whl (17 kB)
Requirement already satisfied: click>=6.0 in c:\users\samsung\anaconda3\lib\site-packages (from jsonflatten==0.2) (7.0)
Requirement already satisfied: colorama in c:\users\samsung\anaconda3\lib\site-packages (from jsoncut->jsonflatten==0.2) (0
Requirement already satisfied: pygments in c:\users\samsung\anaconda3\lib\site-packages (from jsoncut->jsonflatten==0.2) (2
Building wheels for collected packages: jsonflatten
Building wheel for jsonflatten (setup.py) ... done
Created wheel for jsonflatten: filename=jsonflatten-0.2-py2.py3-none-any.whl size=8116 sha256=029aafde944303cbfe872e86a13
Stored in directory: C:\Users\samsung\AppData\Local\Temp\pip-ephem-wheel-cache-so8173tt\wheels\8f\02\52\37295acfd1368a3d2
Successfully built jsonflatten
Installing collected packages: jsoncut, jsonflatten
Successfully installed jsoncut-0.6 jsonflatten-0.2
(base) C:\Users\samsung\Anaconda3\pkgs>pip install jsonflatten
Requirement already satisfied: jsonflatten in c:\users\samsung\anaconda3\lib\site-packages (0.2)
Requirement already satisfied: click>=6.0 in c:\users\samsung\anaconda3\lib\site-packages (from jsonflatten) (7.0)
Requirement already satisfied: jsoncut in c:\users\samsung\anaconda3\lib\site-packages (from jsonflatten) (0.6)
Requirement already satisfied: pygments in c:\users\samsung\anaconda3\lib\site-packages (from jsoncut->jsonflatten) (2.5.2)
Requirement already satisfied: colorama in c:\users\samsung\anaconda3\lib\site-packages (from jsoncut->jsonflatten) (0.4.3)
I then ran jsonflatten forecast.json as a test (as well as jsonflatten C:\Users\samsung.spyder-py3\forecast.json) as the readme suggests and got the output below. I ran from base as myenv (Python) spit out a message saying jsonflatten is not recognized as a command.
(base) C:\Users\samsung\.spyder-py3>jsonflatten C:\Users\samsung\.spyder-py3\forecast.json
Traceback (most recent call last):
File "c:\users\samsung\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\samsung\anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\samsung\Anaconda3\Scripts\jsonflatten.exe\__main__.py", line 7, in <module>
File "c:\users\samsung\anaconda3\lib\site-packages\click\core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "c:\users\samsung\anaconda3\lib\site-packages\click\core.py", line 717, in main
rv = self.invoke(ctx)
File "c:\users\samsung\anaconda3\lib\site-packages\click\core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\samsung\anaconda3\lib\site-packages\click\core.py", line 555, in invoke
return callback(*args, **kwargs)
File "c:\users\samsung\anaconda3\lib\site-packages\click\decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "c:\users\samsung\anaconda3\lib\site-packages\jsonflatten\cli.py", line 63, in main
output(ctx, results, indent=4, is_json=True)
File "c:\users\samsung\anaconda3\lib\site-packages\jsoncut\cli.py", line 59, in output
output = highlighter.highlight_json(output)
File "c:\users\samsung\anaconda3\lib\site-packages\jsoncut\highlighter.py", line 53, in highlight_json
return pygments.highlight(d, JsonLexer(), formatter)
File "c:\users\samsung\anaconda3\lib\site-packages\pygments\__init__.py", line 85, in highlight
return format(lex(code, lexer), formatter, outfile)
File "c:\users\samsung\anaconda3\lib\site-packages\pygments\__init__.py", line 64, in format
formatter.format(tokens, realoutfile)
File "c:\users\samsung\anaconda3\lib\site-packages\pygments\formatters\terminal.py", line 101, in format
return Formatter.format(self, tokensource, outfile)
File "c:\users\samsung\anaconda3\lib\site-packages\pygments\formatter.py", line 95, in format
return self.format_unencoded(tokensource, outfile)
File "c:\users\samsung\anaconda3\lib\site-packages\pygments\formatters\terminal.py", line 126, in format_unencoded
outfile.write(ansiformat(color, line.rstrip('\n')))
File "c:\users\samsung\anaconda3\lib\site-packages\pygments\console.py", line 68, in ansiformat
result.append(codes[attr])
KeyError: 'darkgray'
There is a guide for troubleshooting software in Anaconda: https://www.anaconda.com/what-to-do-when-things-go-wrong-in-anaconda/ but this is a brand-new install.
Does this look like an Anaconda issue, an issue with how I am running the software or an issue with the software itself?
You're getting an error in site-packages with a module that is a dependency of something that you've installed.
Anaconda is just a distribtion, not the runtime.
The problem is from Python's pygments\console.py module, which is likely responsible for coloring the output of your jsonflatten module.
Try to see if there is a CLI flag to not colorize stuff, or don't use jsonflatten and rather use python's json.tool or separately install jq instead. (not saying those offer what you need, but they also parse JSON on the CLI)

Pylint installation failed on windows

I tried to install pylint on windows (using visual studio code). I have this exception, I can't find a solution.
I already tried to completely reinstall python but I have the exact same error.
On other PC, the same repro step works fine.
>"C:\Program Files (x86)\Python36-32\python" -m pip install pylint
Collecting pylint
Downloading pylint-1.7.2-py2.py3-none-any.whl (644kB)
100% |████████████████████████████████| 645kB 1.9MB/s
Collecting colorama; sys_platform == "win32" (from pylint)
Downloading colorama-0.3.9-py2.py3-none-any.whl
Collecting astroid>=1.5.1 (from pylint)
Downloading astroid-1.5.3-py2.py3-none-any.whl (269kB)
100% |████████████████████████████████| 276kB 4.1MB/s
Collecting isort>=4.2.5 (from pylint)
Downloading isort-4.2.15-py2.py3-none-any.whl (43kB)
100% |████████████████████████████████| 51kB 5.7MB/s
Collecting mccabe (from pylint)
Downloading mccabe-0.6.1-py2.py3-none-any.whl
Collecting six (from pylint)
Downloading six-1.10.0-py2.py3-none-any.whl
Collecting lazy-object-proxy (from astroid>=1.5.1->pylint)
Downloading lazy_object_proxy-1.3.1-cp36-cp36m-win32.whl
Collecting wrapt (from astroid>=1.5.1->pylint)
Downloading wrapt-1.10.11.tar.gz
Installing collected packages: colorama, six, lazy-object-proxy, wrapt, astroid, isort, mccabe, pylint
Running setup.py install for wrapt ... error
Exception:
Traceback (most recent call last):
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pip\compat\__init__.py", line 73, in console_to_str
return s.decode(sys.__stdout__.encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 5: invalid continuation byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pip\basecommand.py", line 215, in main
status = self.run(options, args)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pip\commands\install.py", line 342, in run
prefix=options.prefix_path,
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pip\req\req_set.py", line 784, in install
**kwargs
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pip\req\req_install.py", line 878, in install
spinner=spinner,
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pip\utils\__init__.py", line 676, in call_subprocess
line = console_to_str(proc.stdout.readline())
File "C:\Program Files (x86)\Python36-32\lib\site-packages\pip\compat\__init__.py", line 75, in console_to_str
return s.decode('utf_8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 5: invalid continuation byte
Do you have any what the problem could be?
EDIT:
As Shankar said, I installed astroid manually. It didn't work the first time. I encountered this issue: python easy_install pylint Error: The system cannot find the file specified
The installation finally worked but nothing changed for pylint.
Here is the log I receive when I try to run pylint
Traceback (most recent call last):
File "c:\program files (x86)\python36-32\lib\runpy.py", line 193, in _run_modu
le_as_main
"__main__", mod_spec)
File "c:\program files (x86)\python36-32\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Program Files (x86)\Python36-32\Scripts\pylint.exe\__main__.py", line
9, in <module>
File "c:\program files (x86)\python36-32\lib\site-packages\pylint\__init__.py"
, line 12, in run_pylint
from pylint.lint import Run
File "c:\program files (x86)\python36-32\lib\site-packages\pylint\lint.py", li
ne 43, in <module>
import astroid
File "c:\program files (x86)\python36-32\lib\site-packages\astroid\__init__.py
", line 57, in <module>
from astroid.nodes import *
File "c:\program files (x86)\python36-32\lib\site-packages\astroid\nodes.py",
line 30, in <module>
from astroid.node_classes import (
File "c:\program files (x86)\python36-32\lib\site-packages\astroid\node_classe
s.py", line 26, in <module>
from astroid import decorators
File "c:\program files (x86)\python36-32\lib\site-packages\astroid\decorators.
py", line 12, in <module>
import wrapt
File "c:\program files (x86)\python36-32\lib\site-packages\wrapt\__init__.py",
line 4, in <module>
from .wrappers import (ObjectProxy, CallableObjectProxy, FunctionWrapper,
ModuleNotFoundError: No module named 'wrapt.wrappers'
Thanks
Install
Pylint requires astroid package (the later the better).
https://github.com/PyCQA/astroid
Installation should be as simple as
python -m pip install astroid
Pylint requires isort package (the later the better).
https://github.com/timothycrosley/isort
Installation should be as simple as
python -m pip install isort
If you want to install from a source distribution, extract the tarball and run the following commands
python setup.py install
You’ll have to install dependencies in a similar way. For debian and rpm packages, use your usual tools according to your Linux distribution.
More information about installation and available distribution format may be found in the user manual in the doc subdirectory.
After these two dependencies installed , try installing pylint again.

pip not finding an installed python package

I'm trying to install the statsmodels, but I'm getting a dependency error that statsmodels requires patsy. However patsy is already installed:
Baby-Whip$ sudo pip install patsy
Downloading/unpacking patsy
Downloading patsy-0.3.0-py2.py3-none-any.whl (224kB): 224kB downloaded
Requirement already satisfied (use --upgrade to upgrade): numpy in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from patsy)
Installing collected packages: patsy
Successfully installed patsy
Cleaning up...
Then when I try to install stasmodels:
Baby-Whip$ sudo pip install statsmodels
Downloading/unpacking statsmodels
Downloading statsmodels-0.5.0.tar.gz (5.5MB): 5.5MB downloaded
Running setup.py (path:/private/tmp/pip_build_root/statsmodels/setup.py) egg_info for package statsmodels
Traceback (most recent call last):
File "<string>", line 17, in <module>
File "/private/tmp/pip_build_root/statsmodels/setup.py", line 463, in <module>
check_dependency_versions(min_versions)
File "/private/tmp/pip_build_root/statsmodels/setup.py", line 122, in check_dependency_versions
raise ImportError("statsmodels requires patsy. http://patsy.readthedocs.org")
ImportError: statsmodels requires patsy. http://patsy.readthedocs.org
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 17, in <module>
File "/private/tmp/pip_build_root/statsmodels/setup.py", line 463, in <module>
check_dependency_versions(min_versions)
File "/private/tmp/pip_build_root/statsmodels/setup.py", line 122, in check_dependency_versions
raise ImportError("statsmodels requires patsy. http://patsy.readthedocs.org")
ImportError: statsmodels requires patsy. http://patsy.readthedocs.org
Running pip freeze also lists patsy as an installed package. What am I missing here? Any help would be greatly appreciated.
See above comment for answer. Needed to install six, and run the statmodels install with the above workaround.
Sidenote: kill me

Categories