How do I read a csv file in a seperate parent folder?

How do I read a csv file in a seperate parent folder? - python

I have a directory with the following subdirectories
|---Data
|
|---Notebooks
The Data directory contains csv files, while the Notebooks directory contains my Jupyter notebook files. How do I access the file from the Data directory with a notebook at the Notebooks directory?
My initial idea was this:
df = pd.read_csv('../Data/csvFile')
However the code block renders a file not found error in Pandas.

The path you'll need will be relative to the directory you're running the Python process from normally, but you can use the following trick to make it relative to a file you're running instead:
from pathlib import Path
df = pd.read_csv(Path(__file__).parent/'Data'/'csvFile')
However the __file__ variable is not defined in Jupyter notebooks, so you'll have to use this hack instead to get the directory the notebook is in:
from pathlib import Path
df = pd.read_csv(Path(globals()['_dh'][0])/'Data'/'csvFile')
Note that this hack only works when running from Jupyter. Unfortunately I don't have a one-size-fits-all approach, unless you count detecting if you're in Jupyter, and then selecting which method you'll use based off of that.

Related

How to find the path of a specific file from another folder

Problem definition:
I am running python scripts from Excel using xlwings and I have a file that I am using in these scripts.
I used to have the path specific to my computer (ex--> C:\users\myuser\some_folder\myfile).
now I am sending these scripts to my collogues to use, and I want the path of the file to be automatically configured according to the location of the file on their machines. I tried using os package.
I tried this post and this post and finally tried moving the file in the same location and used os.getcwd() to get the working directory and concatenated with the file name to get the path.
these methods worked fine when I am running the python scripts alone (not from Excel), but When I tried running from Excel it did not work because for some reason when running python scripts from Excel the working directory changes to C:\\ProgramData\\Anaconda3\\
and no longer sees the file. also, these method (according to my understanding) uses the path of the directory from which the file is running.
they are only seeing files in C:\\ProgramData\\Anaconda3\\.
my 1st thought was to try to search for the folder name using this solution but the problem is that I do not know which location the end user will store the folder in.
What I am thinking now is find a way to locate (form this location C:\\ProgramData\\Anaconda3\\ (where python is run from Excel)) the folder which the file is stored at and from there easily grab the file path. after searching the web I did not find a suitable solution for my case.
so, is there a way to do this using os or any other package?

__file__ contains the absolute path to the executed python script without being effected by the current working directory
# example script located at /usr/bin/foo.py
print(__file__)
output:
/usr/bin/foo.py

How to set up a relative path I think?

I am new to programing so if you need more detail please let me know.
I am using anaconda prompt for a lab exercise and I had to download data for this lab. It says to save it in the same place as the anaconda3 folder was downloaded so that I could use a reference path to get the data.
The problem I am running into is that when I copy and paste the command given to me which is
df = pd.read_csv('../data/gapminder.tsv', sep='\t')
it gives me the error no such file or directory.
Now I know that I can use an absolute path to get to the file a different way I am just curious where I should save the gapminder file so that this command given to me works.

If you want that path to work, the folder with the python file in it and the folder named "data" must be siblings inside the same parent folder.
some_parent_folder_name?
anacoda_notebooks_or_something?
your_notebook_file_name
data
gapminder.tsv

how to get the file path for the ipython notebook in use? (the equivalent to file)

I want to change dir to the parent dir of my jupyter notebook.
I can not take the notebook path using os.path.basename(os.path.dirname(os.path.realpath(__file__))) as __file__ is not defined.
How can I get the dir of the ipynb file I am using in order to os.chdir() to it?

You can't
https://github.com/ipython/ipython/issues/10123
The reason is because you're always running in the kernel, and in theory multiple notebooks could connect to that kernel.
However - by default if you're starting a notebook, the current working directory is set to the path of the notebook. So the closest, is to call os.getcwd()
I just created the most boring published notebook example to demonstrate this, you can see that, this notebook reflects it's path, and this one that's in a subdirectory also reflects the proper path

Need to read a csv file that is in a different path than the one where jupyter notebook is run on

I am looking to read a csv file present in my local drive ex: C:\Users\Studyfolder\abc.csv
My python libraries are installed in another directory - a path created for python 3 libraries ex:C:\Users\Anaconda3_2\envs\py3
On Anaconda Prompt- i have set my cd path as - C:\Users\Anaconda3_2\envs\py3 since naturally, all python libraries will be installed there
On Jupyter Notebook, I am looking to read the csv file to extract the dataframe. For definite reason when i run the command
df = pd.read_csv('abc.csv'), the file wouldn't be found under the path cd'd on Anaconda prompt
Should i be saving all my data files in the same path where python libraries are installed OR there is a better way i can still read the file without having to save it in the cd path shown above?
P.S New to Jupyter notebooks and Python in general
import pandas as pd
df = pd.read_csv('abc.csv')
df.head()
FileNotFoundError Traceback (most recent call last)
in
1 # load abc data into data frame
----> 2 df = pd.read_csv('abc.csv')

If you activate your anaconda environment, the jupyter environment should be tied to that interpreter. In that case, it doesn't really matter where you start your notebook from, it will always have access to the libraries installed there. For example:
conda activate py3
This will now tie conda to that environment:
import sys
sys.path
['C:\\Users\\Anaconda3_2\\envs\\py3'...]
So you can start jupyter wherever, as long as you pass a legitimate path. The full path will work anywhere:
# I'm at C:\Users\Anaconda3
df = pd.read_csv("C:\\Users\\Studyfolder\\abc.csv")
If you want to use relative paths, it's very dependent on where you call jupyter notebook from:
# Still at C:\\Users\\Anaconda3
df = pd.read_csv("..\\Studyfolder\\abc.csv")
Where the .. indicates to go back a directory

In general, I would suggest writing your scripts outside the python library in a directory structure that works best for your projects. The simple reason is that your python libraries are already added to PYTHONPATH and hence are available anywhere, whereas your project files are only available from your project folders.
In your case, there are two approaches you can take -
Provide the file location as a relative path to your current directory. So, df = pd.read_csv('..\..\..\Studyfolder\abc.csv')
Add the folder path into your system path by adding these statements before reading your file -
import sys
sys.path.append("C:\\Users\Studyfolder")

Accessing Root in Jupyter Notebook

I've started a notebook from the D:/ drive, but navigated a few directories down. My notebook is at D:/dir1/dir2/notebook.ipynb. In my current notebook, I want to execute a script in the root of D:/, where my notebook session was started from.
I want to avoid relative path changes, and was hoping there's a way to access the directory location of where I had started the notebook (the location corresponding to localhost:xxxx/tree. Is that possible?

This is a slightly hacky way, but works:
import jupyter_core
import os,glob,json
jrd = jupyter_core.paths.jupyter_runtime_dir()
with open(glob.glob(jrd+'/nbserver-*.json')[0]) as json_file:
root_dir = json.load(json_file)['notebook_dir']
The reason for the globbing is because the json file you are looking for has a number corresponding to the process id (PID) in its name.
Therefore this method will be guaranteed to work if you have only one notebook instance. If you know the PID you don't need to use glob

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.