Python version: 3.6.9
I've used pickle to dump a machine learning model into a file, and when I try to run a prediction on it using Flask, it fails with ModuleNotFoundError: No module named 'predictors'. How can I fix this error so that it recognizes my model, whether I try to run a prediction via Flask or via the Python command (e.g. python predict_edu.py)?
Here is my file structure:
- video_discovery
__init__.py
- data_science
- model
- __init__.py
- predict_edu.py
- predictors.py
- train_model.py
Here's my predict_edu.py file:
import pickle
with open('model', 'rb') as f:
bow_model = pickle.load(f)
Here's my predictors.py file:
from sklearn.base import TransformerMixin
# Basic function to clean the text
def clean_text(text):
# Removing spaces and converting text into lowercase
return text.strip().lower()
# Custom transformer using spaCy
class predictor_transformer(TransformerMixin):
def transform(self, X, **transform_params):
# Cleaning Text
return [clean_text(text) for text in X]
def fit(self, X, y=None, **fit_params):
return self
def get_params(self, deep=True):
return {}
Here's how I train my model:
python data_science/train_model.py
Here's my train_model.py file:
from predictors import predictor_transformer
# pipeline = Pipeline([("cleaner", predictor_transformer()), ('vectorizer', bow_vector), ('classifier', classifier_18p)])
pipeline = Pipeline([("cleaner", predictor_transformer())])
with open('model', 'wb') as f:
pickle.dump(pipeline, f)
My Flask app is in: video_discovery/__init__.py
Here's how I run my Flask app:
FLASK_ENV=development FLASK_APP=video_discovery flask run
I believe the issue may be occurring because I'm training the model by running the Python script directly instead of using Flask, so there might be some namespace issues, but I'm not sure how to fix this. It takes a while to train my model, so I can't exactly wait on an HTTP request.
What am I missing that might fix this issue?
It seems a bit strange that you get that error when executing predict_edu.py, as it is in the same directory as predictors.py, and thus, using absolute import such as from predictors import predictor_transformer (without the dot . operator) should normally work as expected. However, below are a few options that you could try out, if the error persists.
Option 1
You could add the parent directory of the predictors file to the system PATH variable, before attempting to import the module, as described here. This should work fine for smaller projects.
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
from predictors import predictor_transformer
Option 2
Use relative imports, e.g., from .predictors import..., and make sure you run the script from the parent directory of your package, as shown below. The -m option "searches the sys.path for the named module and execute its contents as the __main__ module", and not as the top-level script. Read more about the -m option in the following references: [1], [2], [3], [4], [5], [6]. Read more about "relative imports" here: [1], [2], [3], [4].
python -m video_discovery.data_science.predict_edu
However, the PEP 8 style guide recommends using absolute imports in general.
Absolute imports are recommended, as they are usually more readable
and tend to be better behaved (or at least give better error messages)
if the import system is incorrectly configured (such as when a
directory inside a package ends up on sys.path)
In certain cases, however, absolute imports can get quite verbose, depending on the complexity of the directory structure, as shown below. On the other hand, "relative imports can be messy, particularly for shared projects where directory structure is likely to change". They are also "not as readable as absolute ones, and it is hard to tell the location of the imported resources". Read more about Python Import and Absolute vs Relative Imports.
from package1.subpackage2.subpackage3.subpackage4.module5 import function6
Option 3
Include the directory containing your package directory in PYTHONPATH and use absolute imports instead. PYTHONPATH is used to set the path for user-defined modules, so that they can be directly imported into a Python script. The PYTHONPATH variable is a string with a list of directories that need to be added to the sys.path directory list by Python. The primary use of this variable is to allow users to import modules that have not yet made into an installable Python package. Read more about it here and here.
For instance, let’s say you have a package named video_discovery (under /Users/my_user/code/video_discovery) and wanted to add the directory /Users/my_user/code to the PYTHONPATH:
On Mac
Open Terminal.app
Open the file ~/.bash_profile in your text editor – e.g. atom ~/.bash_profile
Add the following line to the end: export PYTHONPATH="/Users/my_user/code"
Save the file.
Close Terminal.app
Start Terminal.app again, to read in the new settings, and type
echo $PYTHONPATH. It should show something like /Users/my_user/code.
On Linux
Open your favorite terminal program
Open the file ~/.bashrc in your text editor – e.g. atom ~/.bashrc
Add the following line to the end: export PYTHONPATH=/home/my_user/code
Save the file.
Close your terminal application.
Start your terminal application again, to read in the new settings,
and type echo $PYTHONPATH. It should show something like /home/my_user/code.
On Windows
Open This PC (or Computer), right-click inside and select
Properties.
From the computer properties dialog, select Advanced system settings on the left.
From the advanced system settings dialog, choose the Environment variables button.
In the Environment variables dialog, click the New button in the
top half of the dialog, to make a new user variable:
Give the variable name as PYTHONPATH and in value add the path to
your module directory. Choose OK and OK again to save this variable.
Now open a cmd window and type echo %PYTHONPATH% to confirm the environment variable is correctly set. Remember to open a new cmd window to run your Python program, so that it picks up the new settings in PYTHONPATH.
Option 4
Another solution would be to install the package in an editable state (all edits made to the .py files will be automatically included in the installed package), as described here and here. However, the amount of work required to get this to work might make Option 3 a better choice for you.
The contents for the setup.py should be as shown below, and the command for installing the package should be pip install -e . (-e flag stands for "editable" and . stands for "current directory").
from setuptools import setup, find_packages
setup(name='myproject', version='1.0', packages=find_packages())
From https://docs.python.org/3/library/pickle.html:
pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored.
When you run python data_science/train_model.py and import from predictors, Python imports predictors as a top-level module and predictor_transformer is in that module.
However, when you run a prediction via Flask from the parent folder of video_discovery, predictor_transformer is in the video_discovery.data_science.predictors module.
Use relative imports and run from a consistent path
train_model.py: Use relative import
# from predictors import predictor_transformer # -
from .predictors import predictor_transformer # +
Train model: Run train_model with video_discovery as top-level module
# python data_science/train_model.py # -
python -m video_discovery.data_science.train_model # +
Run a prediction via a Python command: Run predict_edu with video_discovery as top-level module
# python predict_edu.py # -
python -m video_discovery.data_science.predict_edu # +
Run a prediction via Flask: (no change, already run with video_discovery as top-level module)
FLASK_ENV=development FLASK_APP=video_discovery flask run
Related
I have a project structure like that:
PRJ_V2
venv
logs
run.py
MyPackage
__init__.py
myclass.py
myclass2.py
Analysis
predictive.ipynb
in myclass.py I have a class Myclass
In run.py I can import it with from MyPackage.myclass import Myclass and run the program without problems.
but in predictive.ipynb I can't. Also as I am making changes in myclass I need to import it with importlib.import_module, to allow me to refresh the module.
I've tried with all combinations without success like importlib.import_module("myclass", "MyPackage") or importlib.import_module("myclass", "..") (and "...")
With "regular" import: like from ..MyPackage.myclass import Myclass throws "attempted relative import beyond top-level package", and with from MyPackage.myclass import Myclass throws "No module named MyPackage" error
I'm a bit saturated with reading questions here without finding a solution, but I still don't understand how the system really works, and if there is another way to do it. I'm using pyhton version 3.7
The only condition here is that run.py must be able to work as it does now (it is called from a shedueled system script, doing a "cd PRJ_V2" to change directory, activate venv and execute ".\venv\Scripts\python.exe run.py"), and at the same time I need to use notebook to do manual analysis.
Thanks in advance.
There is many ways of solving this issue, adding the path for instance will work, but maybe the easier way is going to the proper project root directory using jupyter magic commands:
You can check your directory running in the first cell:
%pwd
And then you can go to the parent directory using:
%cd ..
After this, if you are in the project root directory you can make your imports normally. Next time you can just run your Jupyter session from the parent directory and you won't need to navigate there.
Check your path where run jupyter notebook. Where did you run jupyter notebook?
if you run jupyter from
$PRJ_V2 >> jupyter notebook
your kernel will import from parent folder so you can success with
import MyPackage
this code.
And you should check MyPackage/__init__.py for import your class. You need this code
from myclass import Myclass
I made a script to validate and clean a complex DataFrame with calls to a database. I divided the script into modules. I import them frequently between files to use certain functions and variables that are spread across the different files and directories.
Please see image to see how files are distributed across directories:
Directory structure
An example would look like this:
# Module to clean addresses: address_cleaner.py
def address_cleaner(dataframe):
...
return dataframe
# Pipeline to run all cleaning functions in order looks like: pipeline.py
from file1 import function1
...
def pipeline():
df = function1
...function2(df)
...function3(df)
...
return None
# Executable file where I request environment variables and run pipeline: exe.py
from pipeline import pipeline
import os
...
pipeline()
...
When I run this on Unix:
% cd myproject
% python executable.py
This is one of the import cases, which I import to avoid hardcoding environment variable string names:
File "path/to/executable.py", line 1, in <module>
from environment_constants import SSH_USERNAME, SSH_PASSWORD, PROD_USERNAME, PROD_PASSWORD
ModuleNotFoundError: No module named 'environment_constants
I get a ModuleNotFoundError when I run executable.py on Unix, that calls the pipeline shown above and it seems as if all the imports that I did between files to use a function, variable or constant from them, especially those in different directories didn't reach to each other. These directories all belong to the same parent directory "CD-cleaner".
Is there a way to make these files read from each other even if they are in different folders of the script?
Thanks in advance.
Either create proper python modules that you can install and manage with pip or (easier) just always use your root as your working directory. Set the import paths accordingly and then just run the file from the root folder such as python generate/generate_dataframe.py when you are in the joor-cd-cleaner directory.
I am working on a package which has a folder structure like:
Root
|---Source
|---Testing
|---Test_Utils
|---test_fixtures.py
|---test_integration.py
|---test_unit.py
There are a fair amount of relative references flying around (e.g. in test_integration.py I need to import classes from the files in the Source folder as well as test harnesses and data from the Test_Utils folder.
So far I've managed this by using complete references e.g.:
from Root.Testing.Test_Utils.test_fixtures import *
Which seemed to work fine until actually trying to run nosetests. This is because nose seems only to find test files in the active directory (not the root working directory), so I have to cd Testing before running nosetests.. at which point the relative references break with:
ModuleNotFoundError: No module named 'Root'
How can I get round this seeming incompatibility (without using pytest, since I am using test generators (i.e. using yield) which I believe are deprecated in pytest)?
PYTHONPATH sets the search path for importing python modules.
If you are using (Mac or GNU/Linux distro), add this to your ~/.bashrc.
# add this line to ~/.bashrc
export PYTHONPATH=PYTHONPATH:/path/to/folder # path where is your Root folder
# after this we need to reload ~/.bashrc
$ source ~/.bashrc
# if source ~/.bashrc don't work simple restart your terminal
# after this we can echo out our PYTHONPATH environment variable to check if it was added successfully
$ echo PYTHONPATH
# it need to output your folder
I create two python files, and the directory/file relations is as follows:
mytest---
|---mycommon.py
|---myMainDir---
|----myMain.py
In mycommon.py:
def myFunc(a):
...
And in myMain.py:
import sys
sys.path.append(os.path.join(os.path.dirname(os.path.abspath('__file__')), '..'))
import mycommon.py
mycommon.myFunc("abc")
Then I created exe using pyinstaller:
pyinstall.py -F mytest\myMainDir\myMain.py
MyMain.exe is created, but when run, is tells that can not find mycommon module.
PyInstaller's official manual describes this issue:
Some Python scripts import modules in ways that PyInstaller cannot detect: for example, by using the __import__() function with variable data, or manipulating the sys.path value at run time. If your script requires files that PyInstaller does not know about, you must help it.
It also suggests what should be done in such a case:
If Analysis recognizes that a module is needed, but cannot find that module, it is often because the script is manipulating sys.path. The easiest thing to do in this case is to use the --paths= option to list all the other places that the script might be searching for imports:
pyi-makespec --paths=/path/to/thisdir --paths=/path/to/otherdir myscript.py
These paths will be added to the current sys.path during analysis.
Therefore, please specify the --paths argument while building the application. The manual states that specifying the -p argument is equivalent:
-p dir_list, --paths=dir_list
Set the search path(s) for imported modules (like using PYTHONPATH). Use this option to help PyInstaller to search in the right places when your code modifies sys.path for imports. Give one or more paths separated by ; (under Windows) or : (all other platforms), or give the option more than once to give multiple paths to search.
Also I had to fight a bit to get pyinstaller correctly import python scripts in a subfolder where the path to the subfolder was set relatively via sys.path.insert.
The answer by Yoel was correct for me but I needed careful setting of paths in Windows. Here is what I did:
My main py is:
D:\_Development\pCompareDBSync\pCompareDBSync\pCompareDBSync.py
My imported py is:
D:\_Development\pCompareDBSync\pCompareDBSync\py\pCompareNVR.py
(I have many of such imported py's in folder .\py\ but here i just use a single one as example)
So my main PY, I have the following include:
sys.path.insert(0, 'py')
try:
from pCompareNVR import fgetNV_sN_dict
from pCompareNVR import findNVRJobInDBSync
from pCompareNVR import getNVRRecords
from pCompareNVR import saveNVRRecords
from pCompareNVR import compareNVRs
except Exception as e:
print('Can not import files:' + str(e))
input("Press Enter to exit!")
sys.exit(0)
pyinstaller --onefile pCompareDBSync.py
-> pCompareDBSync.exe that did NOT include py/pCompareNVR.py
I had to include the absolute pad to the main PY and the imported PY's:
pyinstaller --onefile --paths=D:\_Development\pCompareDBSync\pCompareDBSync\ --paths=D:\_Development\pCompareDBSync\pCompareDBSync\py pCompareDBSync.py
-> pCompareDBSync.exe that did now include py/pCompareNVR.py -> OK
And that solved this issue for me!
I am having the same issue as OP (and as this comes up a lot in google searches I though I would add my experience).
Similar folder layout, save for a common folder containing mycommon.py in the same location. I am running PyInstaller from myMainDir as part of a CI build step.
I had tried the suggested solutions: setting --paths, declaring the hidden imports in the spec file etc. I still could not get it working.
I ended up 'solving' (read hacking) the problem by adding a step in the build script to copy the common folder into myMainDir before running PyInstaller.
I have two directories in my project:
project/
src/
scripts/
"src" contains my polished code, and "scripts" contains one-off Python scripts.
I would like all the scripts to have "../src" added to their sys.path, so that they can access the modules under the "src" tree. One way to do this is to write a scripts/__init__.py file, with the contents:
scripts/__init__.py:
import sys
sys.path.append("../src")
This works, but has the unwanted side-effect of putting all of my scripts in a package called "scripts". Is there some other way to get all my scripts to automatically call the above initialization code?
I could just edit the PYTHONPATH environment variable in my .bashrc, but I want my scripts to work out-of-the-box, without requiring the user to fiddle with PYTHONPATH. Also, I don't like having to make account-wide changes just to accommodate this one project.
Even if you have other plans for distribution, it might be worth putting together a basic setup.py in your src folder. That way, you can run setup.py develop to have distutils put a link to your code onto your default path (meaning any changes you make will be reflected in-place without having to "reinstall", and all modules will "just work," no matter where your scripts are). It'd be a one-time step, but that's still one more step than zero, so it depends on whether that's more trouble than updating .bashrc. If you use pip, the equivalent would be pip install -e /path/to/src.
The more-robust solution--especially if you're going to be mirroring/versioning these scripts on several developers' machines--is to do your development work inside a controlled virtual environment. It turns out virtualenv even has built-in support for making your own bootstrap customizations. It seems like you'd just need an after_install() hook to either tweak sitecustomize, run pip install -e, or add a plain .pth file to site-packages. The custom bootstrap could live in your source control along with the other scripts, and would need to be run once for each developer's setup. You'd also have the normal benefits of using virtualenv (explicit dependency versioning, isolation from system-wide configuration, and standardization between disparate machines, to name a few).
If you really don't want to have any setup steps whatsoever and are willing to only run these scripts from inside the 'project' directory, then you could plop in an __init__.py as such:
project/
src/
some_module.py
scripts/
__init__.py # special "magic"
some_script.py
And these are what your files could look like:
# file: project/src/some_module.py
print("importing %r" % __name__)
def some_function():
print("called some_function() inside %s" % __name__)
--------------------------------------------------------
# file: project/scripts/some_script.py
import some_module
if __name__ == '__main__':
some_module.some_function()
--------------------------------------------------------
# file: project/scripts/__init__.py
import sys
from os.path import dirname, abspath, join
print("doing magic!")
sys.path.insert(0, join(dirname(dirname(abspath(__file__))), 'src'))
Then you'd have to run your scripts like so:
[~/project] $ python -m scripts.some_script
doing magic!
importing 'some_module'
called some_function() inside some_module
Beware! The scripts can only be called like this from inside project/:
[~/otherdir] $ python -m scripts.some_script
ImportError: no module named scripts
To enable that, you're back to editing .bashrc, or using one of the options above. The last option should really be a last resort; as #Simon said, you're really fighting the language at that point.
If you want your scripts to be runnable (I assume from the command line), they have to be on the path somewhere.
Something sounds odd about what you're trying to do though. Can you show us an example of exactly what you're trying to accomplish?
You can add a file called 'pathHack.py' in the project dir and put something like this into it:
import os
import sys
pkgDir = os.path.dirname(__file__)
sys.path.insert(os.path.join(pkgDir, 'scripts')
Then, in a python file in your project dir, start by:
import pathHack
And now you can import stuff from the scripts dir without the 'scripts.' prefix. If you have only one file in this directory, and you don't care about hiding this kind of thing, you may inline this snippet.