Trying to run python code from repo without instructions - python

I am trying to run a code from this repo without success. There are no instructions on how to run it. I suspect I should run FactcheckingRANLP/Factchecking_clean/classification/lstm_train.py and then run .../lstm_test.py.
The problem is that this code uses import statements as a module, referencing to folders and files that are in different directories, for example, in lstm_train.py:
File "lstm_train.py", line 3, in <module>
from classification.lstm_utils import *
ModuleNotFoundError: No module named 'classification'
This is the tree structure of the classification folder:
.
├── classification
│   ├── __init__.py
│   ├── __init__.pyc
│   ├── lstm_repres.py
│   ├── lstm_test.py
│   ├── lstm_train.py
│   ├── lstm_train.pyc
│   ├── lstm_utils.py
│   ├── lstm_utils.pyc
│   ├── __pycache__
│   │   ├── __init__.cpython-36.pyc
│   │   ├── lstm_train.cpython-36.pyc
│   │   └── lstm_utils.cpython-36.pyc
│   └── svm_run.py
I would like to know how can I make python run lsmt_train/test.py files in such a manner that the import statements contained within them are compiled correctly. I prefer not to modify the code as this could possibly generate a lot of errors..

You could add the path pointing to the classification folder to your python path variable.
I suggest using the sys package:
import sys
sys.path.append('<<<PathToRepo>>>/FactcheckingRANLP/Factchecking_clean')
With the repo classification directory added to your python path, the import statements should work.
EDIT:
Correction; in the initial post I suggested adding the path to .../classification to your path variable, instead the parent folder .../Factchecking_clean is required as the file imports the module 'classification'.
Also, in Lucas Azevedo's answer, the parent directory path is added in the repository lstm_train file itself. While this definitely works, I still think it should be possible without editing the original repository.
I took a look at the repo in question and files like lstm_train.py are scripts which should be executed with the python working directory set as '<<<PathToRepo>>>/FactcheckingRANLP/Factchecking_clean'.
There are a few ways to do so:
You could open the project in a python IDE and configure your execution to use the directory .../Factchecking_clean as the working directory. In pycharm for example this could be done by importing the repo directory .../Factchecking_clean as a project. The following image shows how to set a working directory for execution in pycharm:
I think the repository was developed with this execution configuration set up.
Another possibility is to execute the python script from within another python file. This seems to be rather inconvenient to me, regardless you could do so by creating a separate python file with:
import sys
import os
sys.path.append('<<<PathToRepo>>>/FactcheckingRANLP/Factchecking_clean')
os.chdir('<<<PathToRepo>>>/FactcheckingRANLP/Factchecking_clean')
exec(open('./classification/lstm_train.py').read())
This adds the Factchecking_clean directory to the python path (using sys.path.append()) to be able to import stuff like classification.utils. The working directory is set by os.chdir() and finally exec(open('<<<filepath>>>')).read() executes the lstm_train file with the correct working directory and path variable set up.
Executing the new python file with the code above works for me (without editing the original repository).
However, as scripts like lstm_train.py actually are used to execute specific parts of the code provided in the rest of the repository modules, I think editing these files for experimental purposes is fine. In general, when working with repositories like this I recommend using an IDE (like pycharm) with a correctly set up configuration (method 1).

Ended up having to modify the repository's code, following the suggestion of Jeff.
import sys,os
parent_dir_path = os.path.abspath(__file__+"/../..")
sys.path.append(parent_dir_path)
adding the path until classification doesnt work, as the import is done mentioning the classification folder. The parent directory is the one that has to be added.
__file__ gives the current file name and os.path.abspath resolves the path navigation done with /../..

Related

Proper ways to set the path of my app in Python

I have a question in how to properly create a path in Python (Python 3.x).
I developed a small scraping app in Python with the following directory structure.
root
├── Dockerfile
├── README.md
├── tox.ini
├── src
│   └── myapp
│   ├── __init__.py
│   ├── do_something.py
│   └── do_something_else.py
└── tests
├── __init__.py
├── test_do_something.py
└── test_do_something_else.py
When I want to run my code, I can go to the src directory and do with
python do_something.py
But, because do_something.py has an import statement from do_something_else.py, it fails like:
Traceback (most recent call last):
File "src/myapp/do_something.py", line 1, in <module>
from src.myapp.do_something_else import do_it
ModuleNotFoundError: No module named 'src'
So, I eventually decided to use the following command to specify the python path:
PYTHONPATH=../../ python do_something.py
to make sure that the path is seen.
But, what are the better ways to feed the path so that my app can run?
I want to know this because when I run pytest via tox, the directory that I would run the command tox would be at the root so that tox.ini is seen by tox package. If I do that, then I most likely run into a similar problem due to the Python path not properly set.
Questions I want to ask specifically are:
where should I run my main code when creating my own project like this? root as like python src/myapp/do_something.py? Or, go to the src/myapp directory and run like python do_something.py?
once, the directory where I should execute my program is determined, what is the correct way to import modules from other py file? Is it ok to use from src.myapp.do_something_else import do_it (this means I must add path from src directory)? Or, different way to import?
What are ways I can have my Python recognize the path? I am aware there are several ways to make the pass accessible as below:
a. write export PYTHONPATH=<path_of_my_choice>:$PYTHONPATH to make the
path accessible temporarily, or write that line in my .bashrc to make it permanent (but it's hard to reproduce when I want to automate creating Python environment via ansible or other automation tools)
b. write import sys; sys.path.append(<root>) to have the root as an accessible path
c. use pytest-pythonpath package (but this is not really a generic answer)
Thank you so much for your inputs!
my environment
OS: MacOS and Amazon Linux 2
Python Version: 3.7
Dependency in Python: pytest, tox
I would suggest to use setup.py to make this a python package. Then you can install it in development mode python setup.py develop. This way it will be available in your python environment w/o needing to specify the PYTHONPATH.
For testing, you can simply install the package python setup.py install.
Hope that helps.
Two simple steps should make it happen. Python experts can comment if this is a good way to do it (especially going by the concluding caution raised towards the end of this post).
I would have done it like below.
First I would have put a "__init__.py" in root so that hierarchy looks like below. This way python will treat the folder as a package.
root
├── Dockerfile
├── README.md
├── tox.ini
├── __init__.py
├── src
│ └── myapp
│ ├── __init__.py
│ ├── do_something.py
│ └── do_something_else.py
└── tests
├── __init__.py
├── test_do_something.py
└── test_do_something_else.py
Then in "do_something.py", I would have added these lines at the top. In the second line please put the full path to the "root" directory.
import sys
sys.path += ['/home/SomeUserName/SomeFolderPath/root']
from src.myapp.do_something_else import do_it
Please note that the second line will essentially modify the sys.path by adding the root folder path (I guess until the interpreter quits). If this is not what you can afford then I am sorry.

How to get full path of resource file from within python script?

Ok, so i made a python script places inside a package. The tree looks something like this:
├── foo
│   ├── __init__.py
│   ├── funcs
│   │   ├── __init__.py
│   │   └── stuff.py
│   ├── resources
│   │   └── haarcascade_frontalface_default.xml
│   └── scripts
│      ├── __init__.py
│      └── script.py
└── setup.py
So inside the script file, im using openCV's cv2 to detect faces, and for that the cv2.CascadeClassifier requires the path of the XML file located under /resources. Now because this is a script, i need to be able to run it from anywhere, so a relative path to the resource file sadly doesn't do the trick. How can I get the absolute path to the xml file from within script.py? You can assume that the script and the xml file is located relative to each other respectively, just like the example above. Thanks :))
PS: Bonus if the solution works with eggs as well. Much appreciated
Currently the best way to do this is importlib.resources. Since Python 3.7 it is available in the standard library. For earlier versions, there is a backport called importlib_resources.
Follow the documentation.
In your case this should more or less look like this :
import importlib.resources
xml_path = importlib.resources.path('foo.resources', 'haarcascade_frontalface_default.xml')
There are many advantages to this, most importantly it is standard and it will work wherever the package is installed even if it's in a zip file.
In your case, you might have to add an __init__.py file to your resources directory.
Using the os module works, but if you have access to a python version >= 3.4, then pathlib is an alternative that handles itself a little easier and performs better across platforms:
from pathlib import Path
# when using pathlib.Path, slashes get automatically transformed into the
# correct path-division character, depending on the platform
RESOURCES_PATH = Path(__file__).parent.parent / "resources"
face_cascade = cv2.CascadeClassifier()
face_cascade.load(RESOURCES_PATH / "haarcascade_frontalface_default.xml")
If you find yourself defining lots of these kinds of constants, consider putting all of them in a file like foo/util.py so that they are easily reusable within your project and don't need to be re-declared or imported from a script.
An even better option in python versions >=3.7 is using importlib.resources.path, which resolves resources automatically from the package root, so you don't need to find it by hand by walking up from __file__:
import importlib
face_cascade = cv2.CascadeClassifier()
with importlib.resources.path("foo.resources", "haarcascade_frontalface_default.xml") as haar_resource:
# haar_resource is a pathlib.Path object here as well, so plugging it is simple
face_cascade.load(haar_resource)
This is a lot more elegant and should be the preferred solution given it's available.
I'm not sure I understand the question correctly, but maybe os.path will help? Something like:
>>> import os
>>> os.path.abspath("directory/somefile.txt")
'C:/somedirectory/directory/directory/somefile.txt'

unit test structure and import

I want to know how to properly import file in my test file without using __init__.py in test root folder. Last sentence of this artice states, that test directory should not have init file.
I don't know much about python so I would like to know:
1) Why not
2) How to import file tested.py to the test_main.py in order to test its functionality without using init file as a script that insert paths to PYTHONPATHS?
My project has a following structure
.
├── __init__.py
├── my
│   ├── __init__.py
│   ├── main.py
│   └── my_sub
│   ├── __init__.py
│   └── tested.py
└── test
└── test_main.py
Files contains following code
#/Python_test/__init__.py
import my.main
#/Python_test/my/__init__.py
#empty
#/Python_test/my/main.py
from .my_sub.tested import inc
print(inc(5))
#/Python_test/my/my_sub/__init__.py
#empty
#/Python_test/my/my_sub/tested.py
def inc(x):
return x+1
#/Python_test/test/test_main.py
from Python_pytest.my.my_sub.tested import func
def test_answer():
assert func(3) == 3
When I run the code from command line python __init__.py it prints 6, which is correct.
I would like to test the function inc() in tested.py file.
1. I installed pytest package as a testing framework,
2. created test file similar to the one from tutorial here called test_main.py.
3. Added __init__.py with a code that finds path of the root directory and adds it to sys.path
It worked well but then I read that this shouldn't be done this way. But how should it be done? I have hard time reading some unit tested code from some github repositories that are tested and don't use the init file. (one of them is boto3) I can't find any clue that suggest how to properly use it.
I also tried to use relative imports this way
from ..my.main import func
but it throws ValueError: attempted relative import beyond top-level package. Which is ok. But I tried it anyway.
Now I don't know how to do that. Tutorials concerning importing usually states that we should add paths of imported modules to sys.path (if they are not present already) but how should I do that when there shouldn't be the init file which can hold the functionality?

Python can't import my package

I have the following directory structure:
myapp
├── a
│   ├── amodule.py
│   └── __init__.py
├── b
│   ├── bmodule.py
│   ├── __init__.py
└── __init__.py
In a/amodule.py
I have this snippet which calls a simple function in b/bmodule.py
from myapp.b import bmodule
b.myfunc()
But when i run python a/amodule.py I get this error:
File "a/amodule.py", line 1, in <module>
from myapp.b import bmodule
ImportError: No module named 'myapp'
What am I doing wrong?
you need to put your project root onto your python path
you can set the PYTHONPATH environmental variable
or you can alter sys.path before importing
or you can use an IDE like pycharm that will do this kind of thing for you
(although it will probably be from b import blah)
there is likely other ways to resolve this issue as well
watch out for circular imports ...
(in python 3 you can also do relative imports... although I am not a big fan of this feature)
from ..b import blah
the best way to allow
from myapp.b import whatever
would be to edit your .bashrc file to always add your parent path to the PYTHONPATH
export PYTHONPATH=$PYTHONPATH;/home/lee/Code
now every time you log into the system python will treat your Code folder as a default place to look for import modules, regardless of where the file is executed from

Import issues in Python

I am having an import error problem in a test script. It looks like its something to do with the directory structure.
I have the following folder structure:
A
├── F1
│   ├── __init__.py
│   └── Src
│   └── F2
│   └── __init__.py
└── tests1
└── tests1
└── test_script.py
A/F1/Src/F2
F1 has "__init__py" in its level
F2 has "__init__.py" in its level
In the same level as F1, there is another folder "tests1"
tests1/tests1/test_script.py
in test_script.py, I have a line which says
from F1.src.F2 import C
With the above, I get an error saying, no module named "F1.src.F2"
Does someone know what is going on here?
from F1.src.F2 import C is an absolute import. In order for it to work, "F1" has to be located somewhere on your Python path (sys.path). Generally this also includes the current directory if you ran Python on the command line.
So if the A directory is not one of the directories on your Python path and is not your current working directory, there is no reason the import would work.
Module names are case sensitive. You have Src in one place and src in another, but I'm not sure that reflects your actual directory structure or just what you typed here.
Using a relative import will not work if you are running test_script.py as a script (Which is what it sounds like.) So, what you really want to do is make sure that either you run the script from the A directory, or go whole hog, make your project into a legit package with a setup.py and use a test runner such as tox.
I just had to create a shared library with the "egg" file.
As simple as that but it occurred to me late!

Categories