I am currently learning Pandas for data analysis and having some issues reading a csv file in Atom editor.
When I am running the following code:
import pandas as pd
df = pd.read_csv("FBI-CRIME11.csv")
print(df.head())
I get an error message, which ends with
OSError: File b'FBI-CRIME11.csv' does not exist
Here is the directory to the file: /Users/alekseinabatov/Documents/Python/"FBI-CRIME11.csv".
When i try to run it this way:
df = pd.read_csv(Users/alekseinabatov/Documents/Python/"FBI-CRIME11.csv")
I get another error:
NameError: name 'Users' is not defined
I have also put this directory into the "Project Home" field in the editor settings, though I am not quite sure if it makes any difference.
I bet there is an easy way to get it to work. I would really appreciate your help!
Have you tried?
df = pd.read_csv("Users/alekseinabatov/Documents/Python/FBI-CRIME11.csv")
or maybe
df = pd.read_csv('Users/alekseinabatov/Documents/Python/"FBI-CRIME11.csv"')
(If the file name has quotes)
Just referring to the filename like
df = pd.read_csv("FBI-CRIME11.csv")
generally only works if the file is in the same directory as the script.
If you are using windows, make sure you specify the path to the file as follows:
PATH = "C:\\Users\\path\\to\\file.csv"
Had an issue with the path, it turns out that you need to specify the first '/' to get it to work!
I am using VSCode/Python on macOS
I also experienced the same problem I solved as follows:
dataset = pd.read_csv('C:\\Users\\path\\to\\file.csv')
Being on jupyter notebook it works for me including the relative path only. For example:
df = pd.read_csv ('file.csv')
But, for example, in vscode I have to put the complete path:
df = pd.read_csv ('/home/code/file.csv')
You are missing '/' before Users. I assume that you are using a MAC guessing from the file path names. You root directory is '/'.
I had the same issue, but it was happening because my file was called "geo_data.csv.csv" - new laptop wasn't showing file extensions, so the name issue was invisible in Windows Explorer.
Very silly, I know, but if this solution doesn't work for you, try that :-)
Just change the CSV file name. Once I changed it for me, it worked fine. Previously I gave data.csv then I changed it to CNC_1.csv.
What worked for me:
import csv
import pandas as pd
import os
base =os.path.normpath(r"path")
with open(base, 'r') as csvfile:
readCSV = csv.reader(csvfile, delimiter='|')
data=[]
for row in readCSV:
data.append(row)
df = pd.DataFrame(data[1:],columns=data[0][0:15])
print(df)
This reads in the file , delimit by |, and appends to list which is converted to a pandas df (taking 15 columns)
Make sure your source file is saved in .csv format. I tried all the steps of adding the full path to the file, including and deleting the header=0, adding skiprows=0 but nothing works as I saved the excel file(data file) in workbook format and not in CSV format. so keep in mind to first check your file extension.
Adnane's answer helped me.
Here's my full code on mac, hope this helps someone. All my csv files are saved in /Users/lionelyu/Documents/Python/Python Projects/
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
path = '/Users/lionelyu/Documents/Python/Python Projects/'
aapl = pd.read_csv(path + 'AAPL_CLOSE.csv',index_col='Date',parse_dates=True)
cisco = pd.read_csv(path + 'CISCO_CLOSE.csv',index_col='Date',parse_dates=True)
ibm = pd.read_csv(path + 'IBM_CLOSE.csv',index_col='Date',parse_dates=True)
amzn = pd.read_csv(path + 'AMZN_CLOSE.csv',index_col='Date',parse_dates=True)
Run "pwd" command first in cli to find out what is your current project's direction and then add the name of the file to your path!
Try this
import os
cd = os.getcwd()
dataset_train = pd.read_csv(cd+"/Google_Stock_Price_Train.csv")
In my case I just removed .csv from the end. I am using ubuntu.
pd.read_csv("/home/mypc/Documents/pcap/s2csv")
Sometimes we ignore a little bit issue which is not a Python or IDE fault
its logical error
We assumed a file .csv which is not a .csv file its a Excell Worksheet file have a look
When you try to open that file using Import compiler will through the error
have a look
To Resolve the issue
open your Target file into Microsoft Excell and save that file in .csv format
it is important to note that Encoding is important because it will help you to open the file when you try to open it with
with open('YourTargetFile.csv','r',encoding='UTF-8') as file:
So you are set to go
now Try to open your file as this
import csv
with open('plain.csv','r',encoding='UTF-8') as file:
load = csv.reader(file)
for line in load:
print(line)
Here is the Output
What works for me is
dataset = pd.read_csv('FBI_CRIME11.csv')
Highlight it and press enter. It also depends on the IDE you are using. I am using Anaconda Spyder or Jupiter.
I am using a Mac. I had the same problem wherein .csv file was in the same folder where the python script was placed, however, Spyder still was unable to locate the file. I changed the file name from capital letters to all small letters and it worked.
Related
I am not sure why I am getting this error although sometimes my code works fine!
Excel file format cannot be determined, you must specify an engine manually.
Here below is my code with steps:
1- list of columns of customers Id:
customer_id = ["ID","customer_id","consumer_number","cus_id","client_ID"]
2- The code to find all xlsx files in a folder and read them:
l = [] #use a list and concat later, faster than append in the loop
for f in glob.glob("./*.xlsx"):
df = pd.read_excel(f).reindex(columns=customer_id).dropna(how='all', axis=1)
df.columns = ["ID"] # to have only one column once concat
l.append(df)
all_data = pd.concat(l, ignore_index=True) # concat all data
I added the engine openpyxl
df = pd.read_excel(f, engine="openpyxl").reindex(columns = customer_id).dropna(how='all', axis=1)
Now I got a different error:
BadZipFile: File is not a zip file
pandas version: 1.3.0
python version: python3.9
os: MacOS
is there a better way to read all xlsx files from a folder ?
Found it. When an excel file is opened for example by MS excel a hidden temporary file is created in the same directory:
~$datasheet.xlsx
So, when I run the code to read all the files from the folder it gives me the error:
Excel file format cannot be determined, you must specify an engine manually.
When all files are closed and no hidden temporary files ~$filename.xlsx in the same directory the code works perfectly.
Also make sure you're using the correct pd.read_* method. I ran into this error when attempting to open a .csv file with read_excel() instead of read_csv(). I found this handy snippet here to automatically select the correct method by Excel file type.
if file_extension == 'xlsx':
df = pd.read_excel(file.read(), engine='openpyxl')
elif file_extension == 'xls':
df = pd.read_excel(file.read())
elif file_extension == 'csv':
df = pd.read_csv(file.read())
https://stackoverflow.com/a/32241271/17411729
link to an answer on how to remove hidden files
Mac = go to folder press cmd + shift + .
will show the hidden file, delete it, run it back.
In macOS, an "invisible file" named ".DS_Store" is automatically generated in each folder. For me, this was the source of the issue. I solved the problem with an if statement to bypass the "invisible file" (which is not an xlsx, so thus would trigger the error)
for file in os.scandir(test_folder):
filename = os.fsdecode(file)
if '.DS_Store' not in filename:
execute_function(file)
You can filter out the unwanted temp files by checking if file starts with "~".
import os
for file in os.listdir(folder path):
if not file.startswith("~") and file.endswith(".xlsx"):
print(file)
I also got an 'Excel file format...' error when I manually changed the 'CSV' suffix to 'XLS'. All I had to do was open excel and save it to the format I wanted.
Looks like an easy fix for this one. Go to your excel file, whether it is xls or xlsx or any other extension, and do "save as" from file icon. When prompted with options. Save it as CSV UTF-8(Comma delimited)(*.csv)
In my case, I usedxlrd. So in terminal:
pip install xlrd
If pandas is not installed, install it:
pip install pandas
Now read the excel file this way:
import pandas as pd
df = pd.read_excel("filesFolder/excelFile.xls", engine='xlrd')
I would like to import a csv into a dataframe in a way that if the code is copied to another computer the path of the file still points to the correct place inside the project.
I tried this:
csv_filename = '.price/data/table.csv'
df = pd.read_csv('csv_filename', sep=';')
It doesn't work.
If I use the full path (C:\Users\eniko\PycharmProjects\pythonProject3\price\data\table.csv) it works perfect.
So my question would be if there is a method to point to the file inside the Pycharm project when importing the csv, instead of using the full path of the file location?
Thanks in advance for any help.
Remove the quotes around csv_filename
csv_filename = '.price/data/table.csv'
df = pd.read_csv(csv_filename, sep=';')
i'm writing a software that reads a csv file at after some steps creates another csv file as output, the software is working fine but when i try to create an executable with pyinstaller i have an error saying that my software can't find the input csv file. Here is how i am reading the csv file as input, i've also tryed to change the pathname with no luck:
import pandas as pd
def lettore():
RawData = pd.read_csv('rawdata.csv', sep=';')
return RawData
how can i solve the problem?
Your code searches for the file it the same folder where the exe is launched.
It is equivalent to
import os
import pandas
filepath = os.path.join(os.getcwd(), 'filename.csv')
df = pd.read_csv(filepath)
Do not use relative paths when you create an exe.
I can give you two other options:
Use an input to get the right file path when running the exe (or eventually use argparse).
filepath = input("insert your csv: ")
df = pd.read_csv(filepath)
Define an absolute path and build it in your code (you cannot change it after building and the program will read the file only from that path).
Edit: after reading your comment, see also
How to reliably open a file in the same directory as a Python script
I'm trying to import and excel file that I have stored in a folder within a GitHub repository. Based on that the file path should be
"C:\\Users\\'username'\\Documents\\GitHub\\'repository'\\'folder'\\'filename'.xlsx"
But when I enter the code
import pandas as pd
xlsfile="C:\\Users\\'username'\\Documents\\GitHub\\'repository'\\'folder'\\'filename'.xlsx"
xl1=pd.read_excel(xlsfile,sheet_name='sheet',skiprows=21)
I get an error that says the file path I entered doesn't exist. I know that the entire path to the file exists because my working directory also contains the file, so what could I be doing wrong?
I have no experience coding. Thanks.
Remove the "'" in your filename? Is your sheet really named 'sheet'? I think the default is 'sheet1' ect.
There can be multiple things, as Joe stated you probably don't have ' ' around your file names, I'm assuming that they included those so that you input your local filepath in there (i.e. replace 'username' with Jack.Donaghue and so on) an example of this would look something like:"C:/Users/Jack_Donague/Documents/GitHub/YourRepoName/data/datafilename.xlsx"
Also as colbster pointed out to confirm what your sheet is named. I've also experienced some issues with \ vs / in the file names since I'm working on Windows10.
I would recommend trying
import pandas as pd
xlsfile="C:/Users/'username'/Documents/GitHub/'repository'/'folder'/'filename'.xlsx"
xl1=pd.read_excel(xlsfile,sheet_name='sheet',skiprows=21)
I have a small chunk of code using Pandas that reads an incoming CSV, performs some simple computations, adds a column, and then turns the dataframe into a CSV using to_csv.
I was running it all in a Jupyter notebook and it worked great, the output csv file would be there right in the directory when I ran it. I have now changed my code to be run from the command line, and when I run it, I don't see the output CSV files anywhere. The way that I did this was saving the file as a .py, saving it into a folder right on my desktop, and putting the incoming csv in the same folder.
From similar questions on stackoverflow I am gathering that right before I use to_csv at the end of my code I might need to add the path into that line as a variable, such as this.
path = 'C:\Users\ab\Desktop\conversion'
final2.to_csv(path, 'Combined Book.csv', index=False)
However after adding this, I am still not seeing this output CSV file in the directory anywhere after running my pretty simple .py code from the command line.
Does anyone have any guidance? Let me know what other information I could add for clarity. I don't think sample code of the pandas computations is necessary, it is as simple as adding a column with data based on one of my incoming columns.
Join the path and the filename together and pass that to pd.to_csv:
import os
path = 'C:\Users\ab\Desktop\conversion'
output_file = os.path.join(path,'Combined Book.csv')
final2.to_csv(output_file, index=False)
Im pretty sure that you have mixed up the arguments, as shown here. The path should include the filename in it.
path = 'C:\Users\ab\Desktop\conversion\Combined_Book.csv'
final2.to_csv(path, index=False)
Otherwise you are trying to overwrite the whole folder 'conversions' and add a complicated value separator.
I think below is what you are looking for , absolute path
import pandas as pd
.....
final2.to_csv('C:\Users\ab\Desktop\conversion\Combined Book.csv', index=False)
OR for an example:
path_to_file = "C:\Users\ab\Desktop\conversion\Combined Book.csv"
final2.to_csv(path_to_file, encoding="utf-8")
Though late answer but would be useful for someone facing similar issues. It is better to dynamically get the csv folder path instead of hardcoding it. We can do so using os.getcwd(). Later join the csv folder path with csv file name using os.path.join(os.getcwd(),'csvFileName')
Example:
import os
path = os.getcwd()
export_path = os.path.join(path,'Combined Book.csv')
final2.to_csv(export_path, index=False, header=True)