File not found Jupyter notebook

File not found Jupyter notebook - python

I am having trouble loading a file in jupyter notebook.
Here is my project tree:
-- home
---- cdsw
------ my_main.py
------ notebooks
-------- my_notebook.ipynb
------ dns
-------- assets
---------- stopwords.txt
-------- bilans
---------- my_module.py
Know that '/home/cdsw/" is in my PYTHONPATH - the same interpreter in which I launch jupyter -.
In my_module.py I have these lines:
PATH_STOPWORDS: Final = os.path.join("dns", "assets", "stopwords.txt")
STOPWORDS: Final = load_stopwords(PATH_STOPWORDS)
load_stopwords is basically just a open(PATH_STOPWORDS, 'r').
So my problem is that when I import dns.bilans.my_module inside my_main.py it works fine: file is correctly loaded.
Yet, when I import it from my_notebook.ipynb, it does not :
FileNotFoundError: [Errno 2] No such file or directory: 'dns/assets/stopwords.txt'
So my_module is indeed founded by jupyter kernel (because it reads the code lines of the file) but can't use the relative path provided like it does from a run in a terminal.
When I use a open(relpath, 'r') inside a module, I don't need to go all through the project tree right ? Indeed it DOES work in my_main.py ...
I really don't get it ...
The output of os.getcwd() in jupyter is "/home/cdsw/notebooks".

This existing SO question suggests how to find files relative to the position of a Python code file. It isn't exactly the same question, however, and I believe that this technique is so important for every Python programmer to understand, that I'm going to provide a more thorough answer.
Given a piece of Python code, one can compute the path of the directory of the source file containing that code via:
here = os.path.dirname(__file__)
Having the position of the relevant source file, it is easy to compute an absolute path to any data file that has a well known location relative to that source file. In this case, the way to do that is:
stopwords_path = os.path.join(here, '..', '..', 'assets', 'stopwords.txt')
This path can be supplied to open() or used in any other way to refer to the stopwords.txt data file. Here, the way to use this path would be:
load_stopwords(stopwords_path)
I use this technique to not only find files that accompany code in a particular module, but also to find files that are in other locations throughout my source tree. As long as the code and data file exist in the same source repository, or are shipped together in a single Python package, the relative path will not change from installation to installation, and so this technique will work.
In general, you should avoid the use of relative paths. Whenever possible, you should also avoid having to tell your code where to find something. For any situation, ask yourself how you can obtain a reliable absolute path that you can then use to then locate whatever it is you're wanting to access.

Related

Python - File Path not found if script run from another directory

I'm trying to run a script that works without issue when I run using in console, but causes issue if I try to run it from another directory (via IPython %run <script.py>)
The issue comes from this line, where it references a folder called "Pickles".
with open('Pickles/'+name+'_'+date.strftime('%y-%b-%d'),'rb') as f:
obj = pickle.load(f)
In Console:
python script.py <---works!
In running IPython (Jupyter) in another folder, it causes a FileNotFound exception.
How can I make any path references within my scripts more robust, without putting the whole extended path?
Thanks in advance!

Since running in the console the way you show works, the Pickles directory must be in the same directory as the script. You can make use of this fact so that you don't have to hard code the location of the Pickles directory, but also don't have to worry about setting the "current working directory" to be the directory containing Pickles, which is what your current code requires you to do.
Here's how to make your code work no matter where you run it from:
with open(os.path.join(os.path.dirname(__file__), 'Pickles', name + '_' + date.strftime('%y-%b-%d')), 'rb') as f:
obj = pickle.load(f)
os.path.dirname(__file__) provides the path to the directory containing the script that is currently running.
Generally speaking, it's a good practice to always fully specify the locations of things you interact with in the filesystem. A common way to do this as shown here.
UPDATE: I updated my answer to be more correct by not assuming a specific path separator character. I had chosen to use '/' only because the original code in the question already did this. It is also the case that the code given in the original question, and the code I gave originally, will work fine on Windows. The open() function will accept either type of path separator and will do the right thing on Windows.

You have to use absolute paths. Also to be cross platform use join:
First get the path of your script using the variable __file__
Get the directory of this file with os.path.dirname(__file__)
Get your relative path with os.path.join(os.path.dirname(__file__), "Pickles", f"{name}_{date.strftime('%y-%b-%d')}")
it gives you:
with open(os.path.join(os.path.dirname(__file__), "Pickles", f"{name}_{date.strftime('%y-%b-%d')}"), 'rb') as f:
obj = pickle.load(f)

Navigating directories

After getting the path to the current working directory using:
cwd = os.getcwd()
How would one go up one folder: C:/project/analysis/ to C:/project/ and enter a folder called data (C:/project/data/)?

In general it a bad idea to 'enter' a directory (ie change the current directory), unless that is explicity part of the behaviour of the program.
In general to open a file in one directory 'over from where you are you can do .. to navigate up one level.
In your case you can open a file using the path ../data/<filename> - in other words use relative file names.
If you really need to change the current working directory you can use os.chdir() but remember this could well have side effects - for example if you import modules from your local directory then using os.chdir() will probably impact that import.

As per Python documentation, you could try this:
os.chdir("../data")

Python os.getcwd() is not working on subfolders in VSCODE

I have a python file, converted from a Jupiter Notebook, and there is a subfolder called 'datasets' inside this file folder. When I'm trying to open a file that is inside that 'datasets' folder, with this code:
import pandas as pd
# Load the CSV data into DataFrames
super_bowls = pd.read_csv('/datasets/super_bowls.csv')
It says that there is no such file or folder. Then I add this line
os.getcwd()
And the output is the top-level folder of the project, and not the subfolder when is this python file. And I think maybe that's the reason why it's not working.
So, how can I open that csv file with relative paths? I don't want to use absolute path because this code is going to be used in another computers.
Why os.getcwd() is not getting the actual folder path?

My observation, the dot (.) notation to move to the parent directory sometimes does not work depending on the operating system. What I generally do to make it os agnostic is this:
import pandas as pd
import os
__location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))
super_bowls = pd.read_csv(__location__ + '/datasets/super_bowls.csv')
This works on my windows and ubantu machine equally well.
I am not sure if there are other and better ways to achieve this. Would like to hear back if there are.

(edited)
Per your comment below, the current working directory is
/Users/ivanparra/AprendizajePython/
while the file is in
/Users/ivanparra/AprendizajePython/Jupyter/datasets/super_bowls.csv
For that reason, going to the datasets subfolder of the current working directory (CWD) takes you to /Users/ivanparra/AprendizajePython/datasets which either doesn't exist or doesn't contain the file you're looking for.
You can do one of two things:
(1) Use an absolute path to the file, as in
super_bowls = pd.read_csv("/Users/ivanparra/AprendizajePython/Jupyter/datasets/super_bowls.csv")
(2) use the right relative path, as in
super_bowls = pd.read_csv("./Jupyter/datasets/super_bowls.csv")
There's also (3) - use os.path.join to contact the CWD to the relative path - it's basically the same as (2).
(you can also use

The answer really lies in the response by user2357112:
os.getcwd() is working fine. The problem is in your expectations. The current working directory is the directory where Python is running, not the directory of any particular source file. – user2357112 supports Monica May 22 at 6:03
The solution is:
data_dir = os.path.dirname(__file__)

Try this code
super_bowls = pd.read_csv( os.getcwd() + '/datasets/super_bowls.csv')

I noticed this problem a few years ago. I think it's a matter of design style. The problem is that: your workspace folder is just a folder, not a project folder. Most of the time, your relative reference is based on the current file.
VSCode actually supports the dynamic setting of cwd, but that's not the default. If your work folder is not a rigorous and professional project, I recommend you adding the following settings to launch.json. This is the simplest answer you need.
"cwd": "${fileDirname}"

Thanks to everyone that tried to help me. Thanks to the Roy2012 response, I got a code that works for me.
import pandas as pd
import os
currentPath = os.path.dirname(__file__)
# Load the CSV data into DataFrames
super_bowls = pd.read_csv(currentPath + '/datasets/super_bowls.csv')
The os.path.dirname gives me the path of the current file, and let me work with relative paths.
'/Users/ivanparra/AprendizajePython/Jupyter'
and with that it works like a charm!!
P.S.: As a side note, the behavior of os.getcwd() is quite different in a Jupyter Notebook than a python file. Inside the notebook, that function gives the current file path, but in a python file, gives the top folder path.

Weird python file path behavior

I have this folder structure, within edi_standards.py I want to open csv/transaction_groups.csv
But the code only works when I access it like this os.path.join('standards', 'csv', 'transaction_groups.csv')
What I think it should be is os.path.join('csv', 'transaction_groups.csv') since both edi_standards.py and csv/ are on the same level in the same folder standards/
This is the output of printing __file__ in case you doubt what I say:
>>> print(__file__)
~/edi_parser/standards/edi_standards.py

when you're running a python file, the python interpreter does not change the current directory to the directory of the file you're running.
In your case, you're probably running (from ~/edi_parser):
standards/edi_standards.py
For this you have to hack something using __file__, taking the dirname and building the relative path of your resource file:
os.path.join(os.path.dirname(__file__),"csv","transaction_groups.csv")
Anyway, it's good practice not to rely on the current directory to open resource files. This method works whatever the current directory is.

I do agree with Answer of Jean-Francois above,
I would like to mention that os.path.join does not consider the absolute path of your current working directory as the first argument
For example consider below code
>>> os.path.join('Functions','hello')
'Functions/hello'
See another example
>>> os.path.join('Functions','hello','/home/naseer/Python','hai')
'/home/naseer/Python/hai'
Official Documentation
states that whenever we have given a absolute path as a argument to the os.path.join then all previous path arguments are discarded and joining continues from the absolute path argument.
The point I would like to highlight is we shouldn't expect that the function os.path.join will work with relative path. So You have to submit absolute path to be able to properly locate your file.

Primer needed in python pathnames

I am a very novice coder, and Python is my first (and, practically speaking, only) language. I am charged as part of a research job with manipulating a collection of data analysis scripts, first by getting them to run on my computer. I was able to do this, essentially by removing all lines of coding identifying paths, and running the scripts through a Jupyter terminal opened in the directory where the relevant modules and CSV files live so the script knows where to look (I know that Python defaults to the location of the terminal).
Here are the particular blocks of code whose function I don't understand
import sys
sys.path.append('C:\Users\Ben\Documents\TRACMIP_Project\mymodules/')
import altdata as altdata
I have replaced the pathname in the original code with the path name leading to the directory where the module is; the file containing all the CSV files that end up being referenced here is also in mymodules.
This works depending on where I open the terminal, but the only way I can get it to work consistently is by opening the terminal in mymodules, which is fine for now but won't work when I need to work by accessing the server remotely. I need to understand better precisely what is being done here, and how it relates to the location of the terminal (all the documentation I've found is overly technical for my knowledge level).
Here is another segment I don't understand
import os.path
csvfile = 'csv/' + model +'_' + exp + '.csv'
if os.path.isfile(csvfile): # csv file exists
hcsvfile = open(csvfile )
I get here that it's looking for the CSV file, but I'm not sure how. I'm also not sure why then on some occasions depending on where I open the terminal it's able to find the module but not the CSV files.
I would love an explanation of what I've presented, but more generally I would like information (or a link to information) explaining paths and how they work in scripts in modules, as well as what are ways of manipulating them. Thanks.

sys.path
This is simple list of directories where python will look for modules and packages (.py and dirs with __init__.py file, look at modules tutorial). Extending this list will allow you to load modules (custom libs, etc.) from non default locations (usually you need to change it in runtime, for static dirs you can modify startup script to add needed enviroment variables).
os.path
This module implements some useful functions on pathnames.
... and allows you to find out if file exists, is it link, dir, etc.
Why you failed loading *.csv?
Because sys.path responsible for module loading and only for this. When you use relative path:
csvfile = 'csv/' + model +'_' + exp + '.csv'
open() will look in current working directory
file is either a string or bytes object giving the pathname (absolute or relative to the current working directory)...
You need to use absolute paths by constucting them with os.path module.

I agree with cdarke's comment that you are probably running into an issue with backslashes. Replacing the line with:
sys.path.append(r'C:\Users\Ben\Documents\TRACMIP_Project\mymodules')
will likely solve your problem. Details below.
In general, Python treats paths as if they're relative to the current directory (where your terminal is running). When you feed it an absolute path-- which is a path that includes the root directory, like the C:\ in C:\Users\Ben\Documents\TRACMIP_Project\mymodules-- then Python doesn't care about the working directory anymore, it just looks where you tell it to look.
Backslashes are used to make special characters within strings, such as line breaks (\n) and tabs (\t). The snag you've hit is that Python paths are strings first, paths second. So the \U, \B, \D, \T and \m in your path are getting misinterpreted as special characters and messing up Python's path interpretation. If you prefix the string with 'r', Python will ignore the special characters meaning of the backslash and just interpret it as a literal backslash (what you want).
The reason it still works if you run the script from the mymodules directory is because Python automatically looks in the working directory for files when asked. sys.path.append(path) is telling the computer to include that directory when it looks for commands, so that you can use files in that directory no matter where you're running the script. The faulty path will still get added, but its meaningless. There is no directory where you point it, so there's nothing to find there.
As for path manipulation in general, the "safest" way is to use the function in os.path, which are platform-independent and will give the correct output whether you're working in a Windows or a Unix environment (usually).
EDIT: Forgot to cover the second part. Since Python paths are strings, you can build them using string operations. That's what is happening with the line
csvfile = 'csv/' + model +'_' + exp + '.csv'
Presumably model and exp are strings that appear in the filenames in the csv/ folder. With model = "foo" and exp = "bar", you'd get csv/foo_bar.csv which is a relative path to a file (that is, relative to your working directory). The code makes sure a file actually exists at that path and then opens it. Assuming the csv/ folder is in the same path as you added in sys.path.append, this path should work regardless of where you run the file, but I'm not 100% certain on that. EDIT: outoftime pointed out that sys.path.append only works for modules, not opening files, so you'll need to either expand csv/ into an absolute path or always run in its parent directory.
Also, I think Python is smart enough to not care about the direction of slashes in paths, but you should probably not mix them. All backslashes or all forward slashes only. os.path.join will normalize them for you. I'd probably change the line to
csvfile = os.path.join('csv\', model + '_' + exp + '.csv')
for consistency's sake.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.