Adding a path to pandas to_csv function

Adding a path to pandas to_csv function - python

I have a small chunk of code using Pandas that reads an incoming CSV, performs some simple computations, adds a column, and then turns the dataframe into a CSV using to_csv.
I was running it all in a Jupyter notebook and it worked great, the output csv file would be there right in the directory when I ran it. I have now changed my code to be run from the command line, and when I run it, I don't see the output CSV files anywhere. The way that I did this was saving the file as a .py, saving it into a folder right on my desktop, and putting the incoming csv in the same folder.
From similar questions on stackoverflow I am gathering that right before I use to_csv at the end of my code I might need to add the path into that line as a variable, such as this.
path = 'C:\Users\ab\Desktop\conversion'
final2.to_csv(path, 'Combined Book.csv', index=False)
However after adding this, I am still not seeing this output CSV file in the directory anywhere after running my pretty simple .py code from the command line.
Does anyone have any guidance? Let me know what other information I could add for clarity. I don't think sample code of the pandas computations is necessary, it is as simple as adding a column with data based on one of my incoming columns.

Join the path and the filename together and pass that to pd.to_csv:
import os
path = 'C:\Users\ab\Desktop\conversion'
output_file = os.path.join(path,'Combined Book.csv')
final2.to_csv(output_file, index=False)

Im pretty sure that you have mixed up the arguments, as shown here. The path should include the filename in it.
path = 'C:\Users\ab\Desktop\conversion\Combined_Book.csv'
final2.to_csv(path, index=False)
Otherwise you are trying to overwrite the whole folder 'conversions' and add a complicated value separator.

I think below is what you are looking for , absolute path
import pandas as pd
.....
final2.to_csv('C:\Users\ab\Desktop\conversion\Combined Book.csv', index=False)
OR for an example:
path_to_file = "C:\Users\ab\Desktop\conversion\Combined Book.csv"
final2.to_csv(path_to_file, encoding="utf-8")

Though late answer but would be useful for someone facing similar issues. It is better to dynamically get the csv folder path instead of hardcoding it. We can do so using os.getcwd(). Later join the csv folder path with csv file name using os.path.join(os.getcwd(),'csvFileName')
Example:
import os
path = os.getcwd()
export_path = os.path.join(path,'Combined Book.csv')
final2.to_csv(export_path, index=False, header=True)

Related

Python add path of data directory

I want to add a path to my data directory in python, so that I can read/write files from that directory without including the path to it all the time.
For example I have my working directory at /user/working where I am currently working in the file /user/working/foo.py. I also have all of my data in the directory /user/data where I want to excess the file /user/data/important_data.csv.
In foo.py, I could now just read the csv with pandas using
import pandas as pd
df = pd.read_csv('../data/important_data.csv')
which totally works. I just want to know if there is a way to include /user/data as a main path for the file so I can just read the file with
import pandas as pd
df = pd.read_csv('important_data.csv')
The only idea I had was adding the path via sys.path.append('/user/data'), which didnt work (I guess it only works for importing modules).
Is anyone able to provide any ideas if this is possible?
PS: My real problem is of course more complex, but this minimal example should be enough to handle my problem.

It looks like you can use os.chdir for this purpose.
import os
os.chdir('/user/data')
See https://note.nkmk.me/en/python-os-getcwd-chdir/ for more details.

If you are keeping everything in /user/data, why not use f-strings to make this easy? You could assign the directory to a variable in a config and then use it in the string like so:
In a config somewhere:
data_path = "/user/data"
Reading later...
df = pd.read_csv(f"{data_path}/important_data.csv")

pandas to_csv - can't save a named csv into a specific path

Trying the simple task of converting a df to a csv, then saving it locally to a specific path. I'll censor the file name with "..." because I'm not sure about the privacy policy.
I checked many topics and I still run into errors, so I'll detail all the steps I went through, with the output:
Attempt #1:
import pandas as pd
file_name = "https://raw.githubusercontent.com/.../titanic.csv"
df = pd.read_csv(file_name)
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp'
df.to_csv(path, 'myDataFrame.csv')
TypeError: "delimiter" must be a 1-character string
Attempt #2: After searching about "delimiter" issues, I discovered it was linked to the sep argument, so I used a solution from this link to set a proper separator and encoding.
import pandas as pd
file_name = "https://raw.githubusercontent.com/.../titanic.csv"
df = pd.read_csv(file_name)
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp'
df.to_csv(path, "myDataFrame.csv", sep=b'\t', encoding='utf-8')
TypeError: to_csv() got multiple values for argument 'sep'
After playing with different arguments, it seems like my code use 'myDataFrame.csv' as the sep argument because when I remove it the code does not return error, but no file is create at the specified path, or anywhere (I checked the desktop, my documents, default "downloads" file).
I checked the documentation pandas.DataFrame.to_csv and I didn't find a parameter that would allow me to set a specific name to the csv I want to locally create. I've been running in circles trying things that lead me to the same errors, without finding a way to save anything locally.
Even the simple code below does not create any csv, and I really don't understand why:
import pandas as pd
file_name = "https://raw.githubusercontent.com/.../titanic.csv"
df = pd.read_csv(file_name)
df.to_csv("myDataFrame.csv")
I just want to simply save a dataframe into a csv, give it a name, and specify the path to where this csv is saved on my hard drive.

You receive the error because you are delivering to many values. The path and the file name must be provided in one string:
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp'
df.to_csv(path +'/myDataFrame.csv')
This saves your file at the path you like.
However, the file of your last attempt should be saved at the root of your PYTHONPATH. Make sure to check on that. The file must exist.

There is no seperate argument to give filename.
Provide name of the file in the path.
Change the code to this
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp\myDataFrame.csv'
df.to_csv(path)

What kind of file path I can use when importing csv into pandas dataframe?

I would like to import a csv into a dataframe in a way that if the code is copied to another computer the path of the file still points to the correct place inside the project.
I tried this:
csv_filename = '.price/data/table.csv'
df = pd.read_csv('csv_filename', sep=';')
It doesn't work.
If I use the full path (C:\Users\eniko\PycharmProjects\pythonProject3\price\data\table.csv) it works perfect.
So my question would be if there is a method to point to the file inside the Pycharm project when importing the csv, instead of using the full path of the file location?
Thanks in advance for any help.

Remove the quotes around csv_filename
csv_filename = '.price/data/table.csv'
df = pd.read_csv(csv_filename, sep=';')

Creating a standalone file using Pandas code

I have little to no background in Python or computer science so I’ll try my best to explain what I want to accomplish. I have a Pandas script in Jupyter notebook that edits an Excel .csv file and exports it as an Excel .xlsx file. Basically the reason why we want to do this is because we get these same Excel spreadsheets full of unwanted and disorganized data from the same source. I want other people at my office that don’t have Python to be able to use this script to edit these spreadsheets. From what I understand, this involves creating a standalone file.
Here is my code from Pandas that exports a new spreadsheet:
import pandas as pd
from pandas import ExcelWriter
test = pd.DataFrame.from_csv('J:/SDGE/test.csv', index_col=None)
t = test
for col in ['Bill Date']:
t[col] = t[col].ffill()
T = t[t.Meter.notnull()]
T = T.reset_index(drop=True)
writer = ExcelWriter('PythonExport.xlsx')
T.to_excel(writer,'Sheet5')
writer.save()
How can I make this code into a standalone executable file? I've seen other forums with responses to similar problems, but I still don't understand how to do this.

First, you need to change some parts in your code to make it work for anybody, without the need for them to edit the Python code.
Secondly, you will need to convert your file to an executable (.exe).
There is only one part in your code that needs to be changed to work for everyone: the csv file name and directory
Since your code only works when the file "test.csv" is in the "J:/SDGE/" directory, you can follow one of the following solutions:
Tell everyone who uses the program that the file must be in a precise public directory and named "test.csv" in order to work. (bad)
Change your program to allow for input from the user. This is a little more complex, but is the solution that people probably want:
Add an import for a file selector at the top:
from tkinter.filedialog import askopenfilename
Replace
'J:/SDGE/test.csv'
With
askopenfilename()
This should be the final python script:
import pandas as pd
from pandas import ExcelWriter
from tkinter.filedialog import askopenfilename #added this
test = pd.DataFrame.from_csv(askopenfilename(), index_col=None)
t = test
for col in ['Bill Date']:
t[col] = t[col].ffill()
T = t[t.Meter.notnull()]
T = T.reset_index(drop=True)
writer = ExcelWriter('PythonExport.xlsx')
T.to_excel(writer,'Sheet5')
writer.save()
However, you want this as an executable program, that way others don't have to have python installed and know how to run the script. There are several ways to turn your new .py file into an executable. I would look into this thread.

If you want to run a python script on anyone's system, you will need to have Python installed in that system.
Once you have that, just create a .bat file for the command that you'd be using to execute the python file through CMD.
Step 1: Open Notepad and create a new file
Step 2: Write the command as follows in the file (Just replace the path and filename according to you)
python file.py
Step 3: Save it as script.bat (Select All Types from the list of file types while saving)
Now you can run that batch file as any other program and it will run the code for you. The only thing you need to make while you distribute this batch file and python script is to make sure that both the files are kept in the same location. Or else you will have to add the full path in front of file.py

"CSV file does not exist" for a filename with embedded quotes

I am currently learning Pandas for data analysis and having some issues reading a csv file in Atom editor.
When I am running the following code:
import pandas as pd
df = pd.read_csv("FBI-CRIME11.csv")
print(df.head())
I get an error message, which ends with
OSError: File b'FBI-CRIME11.csv' does not exist
Here is the directory to the file: /Users/alekseinabatov/Documents/Python/"FBI-CRIME11.csv".
When i try to run it this way:
df = pd.read_csv(Users/alekseinabatov/Documents/Python/"FBI-CRIME11.csv")
I get another error:
NameError: name 'Users' is not defined
I have also put this directory into the "Project Home" field in the editor settings, though I am not quite sure if it makes any difference.
I bet there is an easy way to get it to work. I would really appreciate your help!

Have you tried?
df = pd.read_csv("Users/alekseinabatov/Documents/Python/FBI-CRIME11.csv")
or maybe
df = pd.read_csv('Users/alekseinabatov/Documents/Python/"FBI-CRIME11.csv"')
(If the file name has quotes)

Just referring to the filename like
df = pd.read_csv("FBI-CRIME11.csv")
generally only works if the file is in the same directory as the script.
If you are using windows, make sure you specify the path to the file as follows:
PATH = "C:\\Users\\path\\to\\file.csv"

Had an issue with the path, it turns out that you need to specify the first '/' to get it to work!
I am using VSCode/Python on macOS

I also experienced the same problem I solved as follows:
dataset = pd.read_csv('C:\\Users\\path\\to\\file.csv')

Being on jupyter notebook it works for me including the relative path only. For example:
df = pd.read_csv ('file.csv')
But, for example, in vscode I have to put the complete path:
df = pd.read_csv ('/home/code/file.csv')

You are missing '/' before Users. I assume that you are using a MAC guessing from the file path names. You root directory is '/'.

I had the same issue, but it was happening because my file was called "geo_data.csv.csv" - new laptop wasn't showing file extensions, so the name issue was invisible in Windows Explorer.
Very silly, I know, but if this solution doesn't work for you, try that :-)

Just change the CSV file name. Once I changed it for me, it worked fine. Previously I gave data.csv then I changed it to CNC_1.csv.

What worked for me:
import csv
import pandas as pd
import os
base =os.path.normpath(r"path")
with open(base, 'r') as csvfile:
readCSV = csv.reader(csvfile, delimiter='|')
data=[]
for row in readCSV:
data.append(row)
df = pd.DataFrame(data[1:],columns=data[0][0:15])
print(df)
This reads in the file , delimit by |, and appends to list which is converted to a pandas df (taking 15 columns)

Make sure your source file is saved in .csv format. I tried all the steps of adding the full path to the file, including and deleting the header=0, adding skiprows=0 but nothing works as I saved the excel file(data file) in workbook format and not in CSV format. so keep in mind to first check your file extension.

Adnane's answer helped me.
Here's my full code on mac, hope this helps someone. All my csv files are saved in /Users/lionelyu/Documents/Python/Python Projects/
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
path = '/Users/lionelyu/Documents/Python/Python Projects/'
aapl = pd.read_csv(path + 'AAPL_CLOSE.csv',index_col='Date',parse_dates=True)
cisco = pd.read_csv(path + 'CISCO_CLOSE.csv',index_col='Date',parse_dates=True)
ibm = pd.read_csv(path + 'IBM_CLOSE.csv',index_col='Date',parse_dates=True)
amzn = pd.read_csv(path + 'AMZN_CLOSE.csv',index_col='Date',parse_dates=True)

Run "pwd" command first in cli to find out what is your current project's direction and then add the name of the file to your path!

Try this
import os
cd = os.getcwd()
dataset_train = pd.read_csv(cd+"/Google_Stock_Price_Train.csv")

In my case I just removed .csv from the end. I am using ubuntu.
pd.read_csv("/home/mypc/Documents/pcap/s2csv")

Sometimes we ignore a little bit issue which is not a Python or IDE fault
its logical error
We assumed a file .csv which is not a .csv file its a Excell Worksheet file have a look
When you try to open that file using Import compiler will through the error
have a look
To Resolve the issue
open your Target file into Microsoft Excell and save that file in .csv format
it is important to note that Encoding is important because it will help you to open the file when you try to open it with
with open('YourTargetFile.csv','r',encoding='UTF-8') as file:
So you are set to go
now Try to open your file as this
import csv
with open('plain.csv','r',encoding='UTF-8') as file:
load = csv.reader(file)
for line in load:
print(line)
Here is the Output

What works for me is
dataset = pd.read_csv('FBI_CRIME11.csv')
Highlight it and press enter. It also depends on the IDE you are using. I am using Anaconda Spyder or Jupiter.

I am using a Mac. I had the same problem wherein .csv file was in the same folder where the python script was placed, however, Spyder still was unable to locate the file. I changed the file name from capital letters to all small letters and it worked.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding a path to pandas to_csv function - python

Join the path and the filename together and pass that to pd.to_csv: import os path = 'C:\Users\ab\Desktop\conversion' output_file = os.path.join(path,'Combined Book.csv') final2.to_csv(output_file, index=False)

I think below is what you are looking for , absolute path import pandas as pd ..... final2.to_csv('C:\Users\ab\Desktop\conversion\Combined Book.csv', index=False) OR for an example: path_to_file = "C:\Users\ab\Desktop\conversion\Combined Book.csv" final2.to_csv(path_to_file, encoding="utf-8")

Related

Python add path of data directory

pandas to_csv - can't save a named csv into a specific path

What kind of file path I can use when importing csv into pandas dataframe?

Creating a standalone file using Pandas code

"CSV file does not exist" for a filename with embedded quotes

Categories

Resources