Python add path of data directory

Python add path of data directory - python

I want to add a path to my data directory in python, so that I can read/write files from that directory without including the path to it all the time.
For example I have my working directory at /user/working where I am currently working in the file /user/working/foo.py. I also have all of my data in the directory /user/data where I want to excess the file /user/data/important_data.csv.
In foo.py, I could now just read the csv with pandas using
import pandas as pd
df = pd.read_csv('../data/important_data.csv')
which totally works. I just want to know if there is a way to include /user/data as a main path for the file so I can just read the file with
import pandas as pd
df = pd.read_csv('important_data.csv')
The only idea I had was adding the path via sys.path.append('/user/data'), which didnt work (I guess it only works for importing modules).
Is anyone able to provide any ideas if this is possible?
PS: My real problem is of course more complex, but this minimal example should be enough to handle my problem.

It looks like you can use os.chdir for this purpose.
import os
os.chdir('/user/data')
See https://note.nkmk.me/en/python-os-getcwd-chdir/ for more details.

If you are keeping everything in /user/data, why not use f-strings to make this easy? You could assign the directory to a variable in a config and then use it in the string like so:
In a config somewhere:
data_path = "/user/data"
Reading later...
df = pd.read_csv(f"{data_path}/important_data.csv")

Related

What kind of file path I can use when importing csv into pandas dataframe?

I would like to import a csv into a dataframe in a way that if the code is copied to another computer the path of the file still points to the correct place inside the project.
I tried this:
csv_filename = '.price/data/table.csv'
df = pd.read_csv('csv_filename', sep=';')
It doesn't work.
If I use the full path (C:\Users\eniko\PycharmProjects\pythonProject3\price\data\table.csv) it works perfect.
So my question would be if there is a method to point to the file inside the Pycharm project when importing the csv, instead of using the full path of the file location?
Thanks in advance for any help.

Remove the quotes around csv_filename
csv_filename = '.price/data/table.csv'
df = pd.read_csv(csv_filename, sep=';')

Python os.getcwd() is not working on subfolders in VSCODE

I have a python file, converted from a Jupiter Notebook, and there is a subfolder called 'datasets' inside this file folder. When I'm trying to open a file that is inside that 'datasets' folder, with this code:
import pandas as pd
# Load the CSV data into DataFrames
super_bowls = pd.read_csv('/datasets/super_bowls.csv')
It says that there is no such file or folder. Then I add this line
os.getcwd()
And the output is the top-level folder of the project, and not the subfolder when is this python file. And I think maybe that's the reason why it's not working.
So, how can I open that csv file with relative paths? I don't want to use absolute path because this code is going to be used in another computers.
Why os.getcwd() is not getting the actual folder path?

My observation, the dot (.) notation to move to the parent directory sometimes does not work depending on the operating system. What I generally do to make it os agnostic is this:
import pandas as pd
import os
__location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))
super_bowls = pd.read_csv(__location__ + '/datasets/super_bowls.csv')
This works on my windows and ubantu machine equally well.
I am not sure if there are other and better ways to achieve this. Would like to hear back if there are.

(edited)
Per your comment below, the current working directory is
/Users/ivanparra/AprendizajePython/
while the file is in
/Users/ivanparra/AprendizajePython/Jupyter/datasets/super_bowls.csv
For that reason, going to the datasets subfolder of the current working directory (CWD) takes you to /Users/ivanparra/AprendizajePython/datasets which either doesn't exist or doesn't contain the file you're looking for.
You can do one of two things:
(1) Use an absolute path to the file, as in
super_bowls = pd.read_csv("/Users/ivanparra/AprendizajePython/Jupyter/datasets/super_bowls.csv")
(2) use the right relative path, as in
super_bowls = pd.read_csv("./Jupyter/datasets/super_bowls.csv")
There's also (3) - use os.path.join to contact the CWD to the relative path - it's basically the same as (2).
(you can also use

The answer really lies in the response by user2357112:
os.getcwd() is working fine. The problem is in your expectations. The current working directory is the directory where Python is running, not the directory of any particular source file. – user2357112 supports Monica May 22 at 6:03
The solution is:
data_dir = os.path.dirname(__file__)

Try this code
super_bowls = pd.read_csv( os.getcwd() + '/datasets/super_bowls.csv')

I noticed this problem a few years ago. I think it's a matter of design style. The problem is that: your workspace folder is just a folder, not a project folder. Most of the time, your relative reference is based on the current file.
VSCode actually supports the dynamic setting of cwd, but that's not the default. If your work folder is not a rigorous and professional project, I recommend you adding the following settings to launch.json. This is the simplest answer you need.
"cwd": "${fileDirname}"

Thanks to everyone that tried to help me. Thanks to the Roy2012 response, I got a code that works for me.
import pandas as pd
import os
currentPath = os.path.dirname(__file__)
# Load the CSV data into DataFrames
super_bowls = pd.read_csv(currentPath + '/datasets/super_bowls.csv')
The os.path.dirname gives me the path of the current file, and let me work with relative paths.
'/Users/ivanparra/AprendizajePython/Jupyter'
and with that it works like a charm!!
P.S.: As a side note, the behavior of os.getcwd() is quite different in a Jupyter Notebook than a python file. Inside the notebook, that function gives the current file path, but in a python file, gives the top folder path.

Import Excel File Using Pandas

I'm trying to import and excel file that I have stored in a folder within a GitHub repository. Based on that the file path should be
"C:\\Users\\'username'\\Documents\\GitHub\\'repository'\\'folder'\\'filename'.xlsx"
But when I enter the code
import pandas as pd
xlsfile="C:\\Users\\'username'\\Documents\\GitHub\\'repository'\\'folder'\\'filename'.xlsx"
xl1=pd.read_excel(xlsfile,sheet_name='sheet',skiprows=21)
I get an error that says the file path I entered doesn't exist. I know that the entire path to the file exists because my working directory also contains the file, so what could I be doing wrong?
I have no experience coding. Thanks.

Remove the "'" in your filename? Is your sheet really named 'sheet'? I think the default is 'sheet1' ect.

There can be multiple things, as Joe stated you probably don't have ' ' around your file names, I'm assuming that they included those so that you input your local filepath in there (i.e. replace 'username' with Jack.Donaghue and so on) an example of this would look something like:"C:/Users/Jack_Donague/Documents/GitHub/YourRepoName/data/datafilename.xlsx"
Also as colbster pointed out to confirm what your sheet is named. I've also experienced some issues with \ vs / in the file names since I'm working on Windows10.
I would recommend trying
import pandas as pd
xlsfile="C:/Users/'username'/Documents/GitHub/'repository'/'folder'/'filename'.xlsx"
xl1=pd.read_excel(xlsfile,sheet_name='sheet',skiprows=21)

Adding a path to pandas to_csv function

I have a small chunk of code using Pandas that reads an incoming CSV, performs some simple computations, adds a column, and then turns the dataframe into a CSV using to_csv.
I was running it all in a Jupyter notebook and it worked great, the output csv file would be there right in the directory when I ran it. I have now changed my code to be run from the command line, and when I run it, I don't see the output CSV files anywhere. The way that I did this was saving the file as a .py, saving it into a folder right on my desktop, and putting the incoming csv in the same folder.
From similar questions on stackoverflow I am gathering that right before I use to_csv at the end of my code I might need to add the path into that line as a variable, such as this.
path = 'C:\Users\ab\Desktop\conversion'
final2.to_csv(path, 'Combined Book.csv', index=False)
However after adding this, I am still not seeing this output CSV file in the directory anywhere after running my pretty simple .py code from the command line.
Does anyone have any guidance? Let me know what other information I could add for clarity. I don't think sample code of the pandas computations is necessary, it is as simple as adding a column with data based on one of my incoming columns.

Join the path and the filename together and pass that to pd.to_csv:
import os
path = 'C:\Users\ab\Desktop\conversion'
output_file = os.path.join(path,'Combined Book.csv')
final2.to_csv(output_file, index=False)

Im pretty sure that you have mixed up the arguments, as shown here. The path should include the filename in it.
path = 'C:\Users\ab\Desktop\conversion\Combined_Book.csv'
final2.to_csv(path, index=False)
Otherwise you are trying to overwrite the whole folder 'conversions' and add a complicated value separator.

I think below is what you are looking for , absolute path
import pandas as pd
.....
final2.to_csv('C:\Users\ab\Desktop\conversion\Combined Book.csv', index=False)
OR for an example:
path_to_file = "C:\Users\ab\Desktop\conversion\Combined Book.csv"
final2.to_csv(path_to_file, encoding="utf-8")

Though late answer but would be useful for someone facing similar issues. It is better to dynamically get the csv folder path instead of hardcoding it. We can do so using os.getcwd(). Later join the csv folder path with csv file name using os.path.join(os.getcwd(),'csvFileName')
Example:
import os
path = os.getcwd()
export_path = os.path.join(path,'Combined Book.csv')
final2.to_csv(export_path, index=False, header=True)

Creating a standalone file using Pandas code

I have little to no background in Python or computer science so I’ll try my best to explain what I want to accomplish. I have a Pandas script in Jupyter notebook that edits an Excel .csv file and exports it as an Excel .xlsx file. Basically the reason why we want to do this is because we get these same Excel spreadsheets full of unwanted and disorganized data from the same source. I want other people at my office that don’t have Python to be able to use this script to edit these spreadsheets. From what I understand, this involves creating a standalone file.
Here is my code from Pandas that exports a new spreadsheet:
import pandas as pd
from pandas import ExcelWriter
test = pd.DataFrame.from_csv('J:/SDGE/test.csv', index_col=None)
t = test
for col in ['Bill Date']:
t[col] = t[col].ffill()
T = t[t.Meter.notnull()]
T = T.reset_index(drop=True)
writer = ExcelWriter('PythonExport.xlsx')
T.to_excel(writer,'Sheet5')
writer.save()
How can I make this code into a standalone executable file? I've seen other forums with responses to similar problems, but I still don't understand how to do this.

First, you need to change some parts in your code to make it work for anybody, without the need for them to edit the Python code.
Secondly, you will need to convert your file to an executable (.exe).
There is only one part in your code that needs to be changed to work for everyone: the csv file name and directory
Since your code only works when the file "test.csv" is in the "J:/SDGE/" directory, you can follow one of the following solutions:
Tell everyone who uses the program that the file must be in a precise public directory and named "test.csv" in order to work. (bad)
Change your program to allow for input from the user. This is a little more complex, but is the solution that people probably want:
Add an import for a file selector at the top:
from tkinter.filedialog import askopenfilename
Replace
'J:/SDGE/test.csv'
With
askopenfilename()
This should be the final python script:
import pandas as pd
from pandas import ExcelWriter
from tkinter.filedialog import askopenfilename #added this
test = pd.DataFrame.from_csv(askopenfilename(), index_col=None)
t = test
for col in ['Bill Date']:
t[col] = t[col].ffill()
T = t[t.Meter.notnull()]
T = T.reset_index(drop=True)
writer = ExcelWriter('PythonExport.xlsx')
T.to_excel(writer,'Sheet5')
writer.save()
However, you want this as an executable program, that way others don't have to have python installed and know how to run the script. There are several ways to turn your new .py file into an executable. I would look into this thread.

If you want to run a python script on anyone's system, you will need to have Python installed in that system.
Once you have that, just create a .bat file for the command that you'd be using to execute the python file through CMD.
Step 1: Open Notepad and create a new file
Step 2: Write the command as follows in the file (Just replace the path and filename according to you)
python file.py
Step 3: Save it as script.bat (Select All Types from the list of file types while saving)
Now you can run that batch file as any other program and it will run the code for you. The only thing you need to make while you distribute this batch file and python script is to make sure that both the files are kept in the same location. Or else you will have to add the full path in front of file.py

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python add path of data directory - python

It looks like you can use os.chdir for this purpose. import os os.chdir('/user/data') See https://note.nkmk.me/en/python-os-getcwd-chdir/ for more details.

If you are keeping everything in /user/data, why not use f-strings to make this easy? You could assign the directory to a variable in a config and then use it in the string like so: In a config somewhere: data_path = "/user/data" Reading later... df = pd.read_csv(f"{data_path}/important_data.csv")

Related

What kind of file path I can use when importing csv into pandas dataframe?

Python os.getcwd() is not working on subfolders in VSCODE

Import Excel File Using Pandas

Adding a path to pandas to_csv function

Creating a standalone file using Pandas code

Categories

Resources