Confused about using relative and absolute path in Python - python

I have a script called "test.py" and refers to a config file called "cfg.yaml". These two reside in the same directory called "test/scripts".
test/scripts/test.py
test/script/cfg.yaml
Now I am writing a bash script inside "test/data1/data2" called task.sh
From inside of task.sh, I want to make a call to the python script
test.sh contents are as below:
#!/bin/sh
python ../../scripts/test.py
test.py opens and reads the cfg.yaml like open("cfg.yaml") but when the test.sh is called, it fails because "cfg.yaml" is NOT referred with relative path. How do I resolve this?

try
#!/bin/sh
cd ../../scripts
python test.py
I assume in test.py you are referencing the yaml with a local path that expects the script to be running from the scripts directory (not from data1/data2)
as an aside you can always print os.getcwd() to see what your working directory is while running the python script (or just pwd in bash)

If you want to refer to a file in terms of its relative location to the script, you can't just use a relative path for that, because the script may not be in the current working directory. In fact, in your case, it's guaranteed not to be in the current working directory.
The usual quick&dirty solution is to get the script path explicitly:
import os
import sys
scriptdir = os.path.abspath(os.path.dirname(sys.argv[0]))
That sys.argv[0] is the full name of your script—which on most platforms means either an absolute path, or a relative path that's at least still valid at this point. If you're worried about those other platforms, you may want to look at __file__. Of course if you're running the script out of a zipfile or similar, neither one will be useful.
open(os.path.join(scriptdir, "cfg.yaml"))
(Or you can even forcibly os.chdir(scriptdir) if you want, but that can be a bad idea if you're, e.g., accepting pathnames from the user.)
The right solution is to write your script and data files to be installed (even if you don't actually install anything and run it out of the source tree), and then use pkg_resources to find the data files. That will also solve a whole lot of other problems for you—but it does require a bit more learning first. See the Python Packaging User Guide if you want to get started on that road, and Package Data and Data Files as particular starting points.

Related

File Not Found Error in Scrapy Where File Exists [duplicate]

My book states:
Every program that runs on your computer has a current working directory, or cwd. Any filenames or paths that do not begin with the root folder are assumed to be under the current working directory
As I am on OSX, my root folder is /. When I type in os.getcwd() in my Python shell, I get /Users/apple/Documents. Why am I getting the Documents folder in my cwd? Is it saying that Python is using Documents folder? Isn't there any path heading to Python that begins with / (the root folder)? Also, does every program have a different cwd?
Every process has a current directory. When a process starts, it simply inherits the current directory from its parent process; and it's not, for example, set to the directory which contains the program you are running.
For a more detailed explanation, read on.
When disks became large enough that you did not want all your files in the same place, operating system vendors came up with a way to structure files in directories. So instead of saving everything in the same directory (or "folder" as beginners are now taught to call it) you could create new collections and other new collections inside of those (except in some early implementations directories could not contain other directories!)
Fundamentally, a directory is just a peculiar type of file, whose contents is a collection of other files, which can also include other directories.
On a primitive operating system, that was where the story ended. If you wanted to print a file called term_paper.txt which was in the directory spring_semester which in turn was in the directory 2021 which was in the directory studies in the directory mine, you would have to say
print mine/studies/2021/spring_semester/term_paper.txt
(except the command was probably something more arcane than print, and the directory separator might have been something crazy like square brackets and colons, or something;
lpr [mine:studies:2021:spring_semester]term_paper.txt
but this is unimportant for this exposition) and if you wanted to copy the file, you would have to spell out the whole enchilada twice:
copy mine/studies/2021/spring_semester/term_paper.txt mine/studies/2021/spring_semester/term_paper.backup
Then came the concept of a current working directory. What if you could say "from now on, until I say otherwise, all the files I am talking about will be in this particular directory". Thus was the cd command born (except on old systems like VMS it was called something clunkier, like SET DEFAULT).
cd mine/studies/2021/spring_semester
print term_paper.txt
copy term_paper.txt term_paper.backup
That's really all there is to it. When you cd (or, in Python, os.chdir()), you change your current working directory. It stays until you log out (or otherwise exit this process), or until you cd to a different working directory, or switch to a different process or window where you are running a separate command which has its own current working directory. Just like you can have your file browser (Explorer or Finder or Nautilus or whatever it's called) open with multiple windows in different directories, you can have multiple terminals open, and each one runs a shell which has its own independent current working directory.
So when you type pwd into a terminal (or cwd or whatever the command is called in your command language) the result will pretty much depend on what you happened to do in that window or process before, and probably depends on how you created that window or process. On many Unix-like systems, when you create a new terminal window with an associated shell process, it is originally opened in your home directory (/home/you on many Unix systems, /Users/you on a Mac, something more or less like C:\Users\you on recent Windows) though probably your terminal can be configured to open somewhere else (commonly Desktop or Documents inside your home directory on some ostensibly "modern" and "friendly" systems).
Many beginners have a vague and incomplete mental model of what happens when you run a program. Many will incessantly cd into whichever directory contains their script or program, and be genuinely scared and confused when you tell them that you don't have to. If frobozz is in /home/you/bin then you don't have to
cd /home/you/bin
./frobozz
because you can simply run it directly with
/home/you/bin/frobozz
and similarly if ls is in /bin you most definitely don't
cd /bin
./ls
just to get a directory listing.
Furthermore, like the ls (or on Windows, dir) example should readily convince you, any program you run will look in your current directory for files. Not the directory the program or script was saved in. Because if that were the case, ls could only produce a listing of the directory it's in (/bin) -- there is nothing special about the directory listing program, or the copy program, or the word processor program; they all, by design, look in the current working directory (though again, some GUI programs will start with e.g. your Documents directory as their current working directory, by design, at least if you don't tell them otherwise).
Many beginners write scripts which demand that the input and output files are in a particular directory inside a particular user's home directory, but this is just poor design; a well-written program will simply look in the current working directory for its input files unless instructed otherwise, and write output to the current directory (or perhaps create a new directory in the current directory for its output if it consists of multiple files).
Python, then, is no different from any other programs. If your current working directory is /Users/you/Documents when you run python then that directory is what os.getcwd() inside your Python script or interpreter will produce (unless you separately os.chdir() to a different directory during runtime; but again, this is probably unnecessary, and often a sign that a script was written by a beginner). And if your Python script accepts a file name parameter, it probably should simply get the operating system to open whatever the user passed in, which means relative file names are relative to the invoking user's current working directory.
python /home/you/bin/script.py file.txt
should simply open(sys.argv[1]) and fail with an error if file.txt does not exist in the current directory. Let's say that again; it doesn't look in /home/you/bin for file.txt -- unless of course that is also the current working directory of you, the invoking user, in which case of course you could simply write
python script.py file.txt
On a related note, many beginners needlessly try something like
with open(os.path.join(os.getcwd(), "input.txt")) as data:
...
which needlessly calls os.getcwd(). Why is it needless? If you have been following along, you know the answer already: the operating system will look for relative file names (like here, input.txt) in the current working directory anyway. So all you need is
with open("input.txt") as data:
...
One final remark. On Unix-like systems, all files are ultimately inside the root directory / which contains a number of other directories (and usually regular users are not allowed to write anything there, and system administrators with the privilege to do it typically don't want to). Every relative file name can be turned into an absolute file name by tracing the path from the root directory to the current directory. So if the file we want to access is in /home/you/Documents/file.txt it means that home is in the root directory, and contains you, which contains Documents, which contains file.txt. If your current working directory were /home you could refer to the same file by the relative path you/Documents/file.txt; and if your current directory was /home/you, the relative path to it would be Documents/file.txt (and if your current directory was /home/you/Music you could say ../Documents/file.txt but let's not take this example any further now).
Windows has a slightly different arrangement, with a number of drives with single-letter identifiers, each with its own root directory; so the root of the C: drive is C:\ and the root of the D: drive is D:\ etc. (and the directory separator is a backslash instead of a slash, although you can use a slash instead pretty much everywhere, which is often a good idea for preserving your sanity).
Your python interpreter location is based off of how you launched it, as well as subsequent actions taken after launching it like use of the os module to navigate your file system. Merely starting the interpreter will place you in the directory of your python installation (not the same on different operating systems). On the other hand, if you start by editing or running a file within a specific directory, your location will be the folder of the file you were editing. If you need to run the interpreter in a certain directory and you are using idle for example, it is easiest to start by creating a python file there one way or another and when you edit it you can start a shell with Run > Python Shell which will already be in that directory. If you are using the command line interpreter, navigate to the folder where you want to run your interpreter before running the python/python3/py command. If you need to navigate manually, you can of course use the following which has already been mentioned:
import os
os.chdir('full_path_to_your_directory')
This has nothing to do with osx in particular, it's more of a concept shared by all unix-based systems, and I believe Windows as well. os.getcwd() is the equivalent of the bash pwd command - it simply returns the full path of the current location in which you are in. In other words:
alex#suse:~> cd /
alex#suse:/> python
Python 2.7.12 (default, Jul 01 2016, 15:34:22) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getcwd()
'/'
It depends from where you started the python shell/script.
Python is usually (except if you are working with virtual environments) accessible from any of your directory. You can check the variables in your path and Python should be available. So the directory you get when you ask Python is the one in which you started Python. Change directory in your shell before starting Python and you will see you will it.
os.getcwd() has nothing to do with OSX in particular. It simply returns the directory/location of the source-file. If my source-file is on my desktop it would return C:\Users\Dave\Desktop\ or let say the source-file is saved on an external storage device it could return something like G:\Programs\. It is the same for both unix-based and Windows systems.

How to prevent Python from search the current working directory for modules?

I have a Python script which imports the datetime module. It works well until someday I can run it in a directory which has a Python script named datetime.py. Of course, there are a few ways to resolve the issue. First, I run the script in a directory that does not contain the script datetime.py. Second, I can rename the Python script datetime.py. However, neither of the 2 approaches are perfect ways. Suppose one ship a Python script, he never knows where users will run the Python script. Another possible fix is to prevent Python from search the current working directory for modules. I tried to remove the empty path ('') from sys.path but it works in an interactive Python shell but not in a Python script. The invoked Python script still searches the current path for modules. I wonder whether there is any way to disable Python from searching the current path for modules?
if __name__ == '__main__':
if '' in sys.path:
sys.path.remove('')
...
Notice that it deosn't work even if I put the following code to the beginning of the script.
import sys
if '' in sys.path:
sys.path.remove('')
Below are some related questions on StackOverflow.
Removing path from Python search module path
pandas ImportError C extension when io.py in same directory
initialization of multiarray raised unreported exception python
Are you sure that Python is searching for that module in the current directory, and not on the script directory? I don't think Python adds the current directory to the sys.path, except in one case. Doing so could even be a security risk (akin to having . on the UNIX PATH).
According to the documentation:
As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first
So, '' as a representation of the current directory happens only if run from interpreter (that's why your interactive shell test worked) or if the script is read from the standard input (something like cat modquest/qwerty.py | python). Neither is a rather 'normal' way of running Python scripts, generally.
I'm guessing that your datetime.py stands side by side with your actual script (on the script directory), and that it just happens that you're running the script from that directory (that is script directory == current directory).
If that's the actual scenario, and your script is standalone (meaning just one file, no local imports), you could do this:
sys.path.remove(os.path.abspath(os.path.dirname(sys.argv[0])))
But keep in mind that this will bite you in the future, once the script gets bigger and you split it into multiple files, only to spend several hours trying to figure out why it is not importing the local files...
Another option is to use -I, but that may be overkill:
-I
Run Python in isolated mode. This also implies -E and -s. In isolated mode sys.path contains neither the script’s directory nor the user’s site-packages directory. All PYTHON* environment variables are ignored, too. Further restrictions may be imposed to prevent the user from injecting malicious code.

How is git able to run C scripts in the current directory as a command on terminal?

I have been looking into bash and shell recently and been trying to work out how git is able to make a terminal command that runs C scripts in the current directory e.g git init, git push/pull etc.
I have been trying to simulate it in python, by making an executable python script in my home directory,
#!/usr/bin/env python
import os
print("current_dir: ",os.getcwd()) #prints /Users/usr/folder/script.py
Then I create a .command file that calls the python script
cd
cd FOLDER/
python3 SCRIPT.py
and editing the bash profile to export a variable to run the .command file.
export mycommand=/Users/urn/folder/command.command
Although this is not even nearly close to the way git achieves its command line. For example, when I run my script is not actually a terminal command it is just an environment variable, hence the $.
$mycommand
Secondly, this goes to the directory of the python file and runs the script from with that directory, therefore the output will always be the same
/Users/usr/folder/script.py
Although in git it runs the file in the current directory. Therefore the print statement would change depending on the terminal directory.
How am I able to create my own 'terminal command' such as git init to run my python script in whatever directory I'm in. Ps, I'm on mac.
Any help would be greatly appreciated :)
It sounds like you are missing at least two basic concepts:
The search path: when you issue a command that is not a shell function or built-in and that has an unqualified name (one with no / characters), the shell searches a list of zero or more directories for a matching executable file. This list of directories is the "path", and it is stored in the PATH environment variable. You can change it if you wish. This is how the shell finds the git program when you do not specify a path to it.
Executable scripts: Python, shell, Perl, etc. programs that must be run via an interpreter can be made executable by name alone by including an appropriate shebang line as the very first line and assigning an executable mode. You include an appropriate shebang line in your example Python program, in fact, but you seem not to understand its significance, because you explicitly launch the script via the python3 command. The shebang line is just another comment to Python, but it is meaningful to the system.
It seems like you probably also are missing some other concepts, like the fact that your script doesn't need to be in the current working directory for you to run it via the python3 launcher, path notwithstanding. Just specify its full pathname. Alternatively, Python has its own variation on a path, PYTHONPATH, by which it can locate modules and packages. These are alternatives, however -- the first two points are enough to achieve your apparent objective. Specifically,
Keep the shebang line in your script, though your default python is probably v2.7, so if you really want to run it specifically via python3 then modify the shebang line to say so:
#!/usr/bin/env python3
Make sure the file has executable mode. For example,
chmod 0755 /Users/urn/bin/SCRIPT.py
Then you should be able to execute the script from anywhere via its full pathname.
To access it from anywhere via its simple name, ensure that the directory containing it is in your path. For this purpose, it would be wise to choose an appropriate directory, such as /usr/local/bin or /Users/urn/bin (you may need to create the directory first). Whichever you choose, ensure that that directory is in your PATH. For example, edit /Users/urn/.bash_profile, creating it if necessary, and ensure that it contains (say) the commands
PATH=$PATH:/Users/urn/bin
export PATH
That will take effect in new Terminal windows you open afterward, but not automatically in any that are already open. In those windows, you will be able to run the script, from anywhere, via its simple name.

python FileNotFoundError

I was writing a program that accesses a .txt file at the start of the program with the open() function. It ran without any errors on the IDE and I was also able to read the text file when running from the IDE without any issues. Although when I ran from the Python Launcher it threw a "FileNotFoundError"
Here's my code:
directions_object = open('warcards_directions.txt','r')
Further to Dan D's comment.. try putting this on the line in front of your open() call:
from os.path import abspath
print(abspath('warcards_directions.txt'))
You'll see that python looks in different places depending on where you run it from .. because it looks for files relative to the current working directory, which changes depending on how you run python.
This is a common problem for new comers. See here How to import files in python using sys.path.append? for some solutions (note the underlying problem in that post is the same as this one.. the fact that they're trying to import a file, and here we're trying to open one is not too important).
Also I'll add that I often reference things relative to the script itself... like this:
from os.path import abspath, join, dirname
script_dir = dirname(__file__)
txt_path = abspath(join(script_dir, "..", "path", "to", "warcards_directions.txt"))
This works if your txt file and your python script stay in the same place relative to each other (but might be installed in different places).
E.g. above assumes your script lives in C:\Foo\scripts\script.py and your text file lives in C:\Foo\path\to\warcards_directions.txt. The method above will work fine where ever you run the script from and it'll work if you move or rename the C:\Foo dir (e.g. to C:\Program Files\Bar). But it'll break if you decide to move scripts.py down a directory into C:\Foo (at which point you change the way txt_path is initialized to fix).
When you said "python launcher", do you mean the command line?
python myScript.py
If you, you will need to cd into the directory where the file is at before you can execute the script. Otherwise, provide the full path to the txt file in your script.

Python - Interpreting paths differently when running in shell vs. when running within another script?

I am experiencing an odd problem that I'm not quite sure how to tackle. I have a Python-selenium script that uses relative paths to log results to a text-file.
Here is the part of the script which sets up the log-file:
log_file = './demo-logfiles/log_file_template.txt'
sys.stdout = open('log_file_template.txt', 'a',)
As you can see, it uses a relative path to a folder. If I run this script as:
python demo.py firefox MAC, it runs flawlessly and the logfile gets sent to the proper folder.
If I run this exact Python script from within a larger shell-script, it returns an error that the './demo-logfiles/log_file_template.txt' doesn't exist.
I have found that if I change the script to '../demo-logfiles/log_file_template.txt' it works in the larger shell script, but stops working if I run it normally.
It either works in one, or the other. What is the reason for the relative directories being interpreted in different ways? I would not like to have two separate scripts for running in Python/shell.
The original python script is in the directory /blah/blah/DEMO/demo.py, and the the shell script that runs it is in /blah/blah/DEMO/demo-autotest/autotest_logger.sh
I have confirmed that this problem occurs for any script I try to run. I shouldn't have to change the original Python code to make it work with the shell script. I already accounted for it in the shell script, and it successfully runs the file.
Thanks.
You should never use a "." (or any relative path) in a directory path in a script unless you really mean you want to refer to the directory that the user is running the script from. If you want to refer to a location relative to the script that's running, you can do the following:
import os
import sys
directory = os.path.dirname(os.path.abspath(__file__))
sys.stdout = open(os.path.join(directory, "demo-logfiles", "log_file_template.txt"), "a")
Best practices side note: you should probably use the logging module rather than reassigning sys.stdout.
The term "relative directories" means a path is relative to something. You probably assume it's relative to the script which contains the path but that's not correct.
Relative paths are relative to the current directory of the process which interprets the script, i.e. the folder in which you started python. If you're in the shell, you can see the current directory with echo $PWD
If you start python in /blah/blah, then that becomes the current directory and all relative paths are relative to /blah/blah.
See the answer of David Hollman for how to get the path of the current script and then how to build paths relative to that.

Categories