Python Module Missing Exception in Docker Container, But Clearly Exists - python

I have a Docker container that when run, starts a single Python script. On my local machine, the Python script executes without issue. However, the script is unable to find the relevant library files (no external libraries, part of my own repo) that I've confirmed do exist within the container.
Error:
Directory Structure Inside Container:
Dockerfile:
Import Statements:
The repository does contain multiple Dockerfiles in different directories for easier deployment, but removing them did not change this behavior.

Adding sys.path.append("/tmp/") fixed the issue (the directory containing the upper most directory referenced in the import statements).

Related

Pyinstaller on a large multi-directory project

I am trying to compile my python program into a singular executable. The libraries I am using are: Pandas, numpy, ConfigParser and cx_oracle. I am also running in an Anacondas environment, using python 3.9. The file structure is as follows:
Code
Calculation
calualtions.py
transformations.py
Db
config_parse.py
db1.py
db2.py
save_to_db.py
ifAdded.py
file_generation.py
I also have a config file, used to set input and output local directories, as well as responsible for getting the database connection string. The script is tested locally, and works as intended, listening to change in a selected directory, making calculations, using some of the database data and records the result in a local directory. The entry point is ifAdded.py at the root of the project.
I tried using PyIstaller, however ran into a problem, after the project is already compiled. The script runs, however suddenly exits with a KeyError, at a spot where the config is processed. My guess is the only reason this could be, is because ConfigParser library is not compiled correctly with the project. I tried passing this as a hidden import option, but it did not help. How can I bundle this project correctly with the config file, being outside the proejct, so it can be adjusted after compilation?

How to prevent Python from search the current working directory for modules?

I have a Python script which imports the datetime module. It works well until someday I can run it in a directory which has a Python script named datetime.py. Of course, there are a few ways to resolve the issue. First, I run the script in a directory that does not contain the script datetime.py. Second, I can rename the Python script datetime.py. However, neither of the 2 approaches are perfect ways. Suppose one ship a Python script, he never knows where users will run the Python script. Another possible fix is to prevent Python from search the current working directory for modules. I tried to remove the empty path ('') from sys.path but it works in an interactive Python shell but not in a Python script. The invoked Python script still searches the current path for modules. I wonder whether there is any way to disable Python from searching the current path for modules?
if __name__ == '__main__':
if '' in sys.path:
sys.path.remove('')
...
Notice that it deosn't work even if I put the following code to the beginning of the script.
import sys
if '' in sys.path:
sys.path.remove('')
Below are some related questions on StackOverflow.
Removing path from Python search module path
pandas ImportError C extension when io.py in same directory
initialization of multiarray raised unreported exception python
Are you sure that Python is searching for that module in the current directory, and not on the script directory? I don't think Python adds the current directory to the sys.path, except in one case. Doing so could even be a security risk (akin to having . on the UNIX PATH).
According to the documentation:
As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first
So, '' as a representation of the current directory happens only if run from interpreter (that's why your interactive shell test worked) or if the script is read from the standard input (something like cat modquest/qwerty.py | python). Neither is a rather 'normal' way of running Python scripts, generally.
I'm guessing that your datetime.py stands side by side with your actual script (on the script directory), and that it just happens that you're running the script from that directory (that is script directory == current directory).
If that's the actual scenario, and your script is standalone (meaning just one file, no local imports), you could do this:
sys.path.remove(os.path.abspath(os.path.dirname(sys.argv[0])))
But keep in mind that this will bite you in the future, once the script gets bigger and you split it into multiple files, only to spend several hours trying to figure out why it is not importing the local files...
Another option is to use -I, but that may be overkill:
-I
Run Python in isolated mode. This also implies -E and -s. In isolated mode sys.path contains neither the script’s directory nor the user’s site-packages directory. All PYTHON* environment variables are ignored, too. Further restrictions may be imposed to prevent the user from injecting malicious code.

ModuleNotFoundError when running script from Terminal

I have the following folder structure:
app
__init__.py
utils
__init__.py
transform.py
products
__init__.py
fish.py
In fish.py I'm importing transform as following: import utils.transform.
When I'm running fish.py from Pycharm, it works perfectly fine. However when I am running fish.py from the Terminal, I am getting error ModuleNotFoundError: No module named 'utils'.
Command I use in Terminal: from app folder python products/fish.py.
I've already looked into the solutions suggested here: Importing files from different folder, adding a path to the application folder into the sys.path helps. However I am wondering if there is any other way of making it work without adding two lines of code into the fish.py. It's because I have many scripts in the /products directory, and do not want to add 2 lines of code into each of them.
I looked into some open source projects, and I saw many examples of importing modules from a parallel folder without adding anything into sys.path, e.g. here:
https://github.com/jakubroztocil/httpie/blob/master/httpie/plugins/builtin.py#L5
How to make it work for my project in the same way?
You probably want to run python -m products.fish. The difference between that and python products/fish.py is that the former is roughly equivalent to doing import products.fish in the shell (but with __name__ set to __main__), while the latter does not have awareness of its place in a package hierarchy.
This expands on #Mad Physicist's answer.
First, assuming app is itself a package (since you added __init__.py to it) and utils and products are its subpackages, you should change the import to import app.utils.transform, and run Python from the root directory (the parent of app). The rest of this answer assumes you've done this. (If it wasn't your intention making app the root package, tell me in a comment.)
The problem is that you're running app.products.fish as if it were a script, i.e. by giving the full path of the file to the python command:
python app/products/fish.py
This makes Python think this fish.py file is a standalone script that isn't part of any package. As defined in the docs (see here, under <script>), this means that Python will search for modules in the same directory as the script, i.e. app/products/:
If the script name refers directly to a Python file, the directory
containing that file is added to the start of sys.path, and the file
is executed as the __main__ module.
But of course, the app folder is not in app/products/, so it will throw an error if you try to import app or any subpackage (e.g. app.utils).
The correct way to start a script that is part of a package is to use the -m (module) switch (reference), which takes a module path as an argument and executes that module as a script (but keeping the current working directory as a module search path):
If this option is given, [...] the current directory
will be added to the start of sys.path.
So you should use the following to start your program:
python -m app.products.fish
Now when app.products.fish tries to import the app.utils.transform module, it will search for app in your current working directory (which contains the app/... tree) and succeed.
As a personal recommendation: don't put runnable scripts inside packages. Use packages only to store all the logic and functionality (functions, classes, constants, etc.) and write a separate script to run your application as you wish, putting it outside the package. This will save you from this kind of problems (including the double import trap), and has also the advantage that you can write several run configurations for the same package by just making a separate startup script for each.

Bundle pdflatex to run on an AWS Lambda with a custom AMI image

My goal is to create an Amazon Lambda Function to compile .tex files into .pdf using the pdflatex tool through python.
I've built an EC2 instance using Amazon's AMI and installed pdflatex using yum:
yum install texlive-collection-latex.noarch
This way, I can use the pdflatex and my python code works, compiling my .tex into a .pdf the way I want.
Now, I need to create a .zip file bundle containing the pdflatex tool; latexcodec (a python library I've used, no problem with this one); and my python files: handler (lambda function handler) and worker (which compiles my .tex file).
This bundle is the deployment package needed to upload my code and libraries to Amazon Lambda.
The problem is: pdflatex has a lot of dependencies, and I'd have to gather everything in one place. I've found a script which does that for me:
http://www.metashock.de/2012/11/export-binary-with-lib-dependencies/
I've set my PATH to find the pdflatex binary at the new directory so I can use it and I had an issue: pdflatex couldn't find some dependencies. I was able to fix it by setting an environment variable to the folder where the script moved everything to:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/ec2-user/lambda/local/lib64:/home/ec2-user/lambda/local/usr/lib64"
At this point, I was running pdflatex directly, through bash. But my python script was firing an error when trying to use the pdflatex:
mktexfmt: No such file or directory
I can't find the format file `pdflatex.fmt'!
I was also able to solve this by moving the pdflatex.fmt and texmf.cnf files to my bundle folder and setting some environment variables as well:
export TEXFORMATS=/home/ec2-user/lambda/local/usr/bin
And now, my current problem, the python script keeps throwing the following error:
---! /home/ec2-user/lambda/local/usr/bin/pdflatex.fmt doesn't match pdftex.pool
(Fatal format file error; I'm stymied)
I've found some possible solutions; deleting a .texmf-var folder, which in my case, does not exist; using fmtutil, which I don't have in my AMI image...
1 - Was I missing any environment variable?
2 - Or moving my pdflatex binary and all its dependencies the wrong way?
3 - Is there any correct way to move a binary and all its dependencies so it can be used in other machine (considering the env variables)?
Lambda environment is a container and not a common EC2 Instance. All files in your .zip is deployed in /var/task/ inside the container. By the way, everything is mounted as read-only, except the directory /tmp. So, it's impossible to run a yum, for example.
For you case, I'd recommend you to put the binaries in your zip and invoke it in /var/task/<binary name>. Remember to put a binary compiled statically in a linux compatible with the container's kernel.
samoconnor is doing pretty much exactly what you want in https://github.com/samoconnor/lambdalatex. Note that he sets environment variables in his handler function
os.environ['PATH'] += ":/var/task/texlive/2017/bin/x86_64-linux/"
os.environ['HOME'] = "/tmp/latex/"
os.environ['PERL5LIB'] = "/var/task/texlive/2017/tlpkg/TeXLive/"
that might do the trick for you as-well.

Error adding local file to docker image with docker-py

I have been working with docker-py, in order to build images and launch containers all in one script. So far it has been very smooth. However, I am currently having issues with the ADD/COPY commands in the Dockerfile string variable.
I need to add a file from the source directory directly into the image. With standard Dockerfiles, I have been able to achieve this successfully, using the docker ADD command. But using docker-py, it throws the exception:
Exception: Error building docker image: lstat simrun.py: no such file or directory
The script simrun.py is stored in the same directory as the docker-py script, so I cannot understand why I would be receiving this exception. The relative line in dockerpy.py is:
ADD ./simrun.py /opt
Is there something that I've missed, or will this functionality just not work in docker-py yet?
You need to set the path in the docker build context using the path parameter.
See here

Categories