Import Excel File Using Pandas

Import Excel File Using Pandas - python

I'm trying to import and excel file that I have stored in a folder within a GitHub repository. Based on that the file path should be
"C:\\Users\\'username'\\Documents\\GitHub\\'repository'\\'folder'\\'filename'.xlsx"
But when I enter the code
import pandas as pd
xlsfile="C:\\Users\\'username'\\Documents\\GitHub\\'repository'\\'folder'\\'filename'.xlsx"
xl1=pd.read_excel(xlsfile,sheet_name='sheet',skiprows=21)
I get an error that says the file path I entered doesn't exist. I know that the entire path to the file exists because my working directory also contains the file, so what could I be doing wrong?
I have no experience coding. Thanks.

Remove the "'" in your filename? Is your sheet really named 'sheet'? I think the default is 'sheet1' ect.

There can be multiple things, as Joe stated you probably don't have ' ' around your file names, I'm assuming that they included those so that you input your local filepath in there (i.e. replace 'username' with Jack.Donaghue and so on) an example of this would look something like:"C:/Users/Jack_Donague/Documents/GitHub/YourRepoName/data/datafilename.xlsx"
Also as colbster pointed out to confirm what your sheet is named. I've also experienced some issues with \ vs / in the file names since I'm working on Windows10.
I would recommend trying
import pandas as pd
xlsfile="C:/Users/'username'/Documents/GitHub/'repository'/'folder'/'filename'.xlsx"
xl1=pd.read_excel(xlsfile,sheet_name='sheet',skiprows=21)

Related

Python add path of data directory

I want to add a path to my data directory in python, so that I can read/write files from that directory without including the path to it all the time.
For example I have my working directory at /user/working where I am currently working in the file /user/working/foo.py. I also have all of my data in the directory /user/data where I want to excess the file /user/data/important_data.csv.
In foo.py, I could now just read the csv with pandas using
import pandas as pd
df = pd.read_csv('../data/important_data.csv')
which totally works. I just want to know if there is a way to include /user/data as a main path for the file so I can just read the file with
import pandas as pd
df = pd.read_csv('important_data.csv')
The only idea I had was adding the path via sys.path.append('/user/data'), which didnt work (I guess it only works for importing modules).
Is anyone able to provide any ideas if this is possible?
PS: My real problem is of course more complex, but this minimal example should be enough to handle my problem.

It looks like you can use os.chdir for this purpose.
import os
os.chdir('/user/data')
See https://note.nkmk.me/en/python-os-getcwd-chdir/ for more details.

If you are keeping everything in /user/data, why not use f-strings to make this easy? You could assign the directory to a variable in a config and then use it in the string like so:
In a config somewhere:
data_path = "/user/data"
Reading later...
df = pd.read_csv(f"{data_path}/important_data.csv")

pandas to_csv - can't save a named csv into a specific path

Trying the simple task of converting a df to a csv, then saving it locally to a specific path. I'll censor the file name with "..." because I'm not sure about the privacy policy.
I checked many topics and I still run into errors, so I'll detail all the steps I went through, with the output:
Attempt #1:
import pandas as pd
file_name = "https://raw.githubusercontent.com/.../titanic.csv"
df = pd.read_csv(file_name)
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp'
df.to_csv(path, 'myDataFrame.csv')
TypeError: "delimiter" must be a 1-character string
Attempt #2: After searching about "delimiter" issues, I discovered it was linked to the sep argument, so I used a solution from this link to set a proper separator and encoding.
import pandas as pd
file_name = "https://raw.githubusercontent.com/.../titanic.csv"
df = pd.read_csv(file_name)
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp'
df.to_csv(path, "myDataFrame.csv", sep=b'\t', encoding='utf-8')
TypeError: to_csv() got multiple values for argument 'sep'
After playing with different arguments, it seems like my code use 'myDataFrame.csv' as the sep argument because when I remove it the code does not return error, but no file is create at the specified path, or anywhere (I checked the desktop, my documents, default "downloads" file).
I checked the documentation pandas.DataFrame.to_csv and I didn't find a parameter that would allow me to set a specific name to the csv I want to locally create. I've been running in circles trying things that lead me to the same errors, without finding a way to save anything locally.
Even the simple code below does not create any csv, and I really don't understand why:
import pandas as pd
file_name = "https://raw.githubusercontent.com/.../titanic.csv"
df = pd.read_csv(file_name)
df.to_csv("myDataFrame.csv")
I just want to simply save a dataframe into a csv, give it a name, and specify the path to where this csv is saved on my hard drive.

You receive the error because you are delivering to many values. The path and the file name must be provided in one string:
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp'
df.to_csv(path +'/myDataFrame.csv')
This saves your file at the path you like.
However, the file of your last attempt should be saved at the root of your PYTHONPATH. Make sure to check on that. The file must exist.

There is no seperate argument to give filename.
Provide name of the file in the path.
Change the code to this
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp\myDataFrame.csv'
df.to_csv(path)

Adding a path to pandas to_csv function

I have a small chunk of code using Pandas that reads an incoming CSV, performs some simple computations, adds a column, and then turns the dataframe into a CSV using to_csv.
I was running it all in a Jupyter notebook and it worked great, the output csv file would be there right in the directory when I ran it. I have now changed my code to be run from the command line, and when I run it, I don't see the output CSV files anywhere. The way that I did this was saving the file as a .py, saving it into a folder right on my desktop, and putting the incoming csv in the same folder.
From similar questions on stackoverflow I am gathering that right before I use to_csv at the end of my code I might need to add the path into that line as a variable, such as this.
path = 'C:\Users\ab\Desktop\conversion'
final2.to_csv(path, 'Combined Book.csv', index=False)
However after adding this, I am still not seeing this output CSV file in the directory anywhere after running my pretty simple .py code from the command line.
Does anyone have any guidance? Let me know what other information I could add for clarity. I don't think sample code of the pandas computations is necessary, it is as simple as adding a column with data based on one of my incoming columns.

Join the path and the filename together and pass that to pd.to_csv:
import os
path = 'C:\Users\ab\Desktop\conversion'
output_file = os.path.join(path,'Combined Book.csv')
final2.to_csv(output_file, index=False)

Im pretty sure that you have mixed up the arguments, as shown here. The path should include the filename in it.
path = 'C:\Users\ab\Desktop\conversion\Combined_Book.csv'
final2.to_csv(path, index=False)
Otherwise you are trying to overwrite the whole folder 'conversions' and add a complicated value separator.

I think below is what you are looking for , absolute path
import pandas as pd
.....
final2.to_csv('C:\Users\ab\Desktop\conversion\Combined Book.csv', index=False)
OR for an example:
path_to_file = "C:\Users\ab\Desktop\conversion\Combined Book.csv"
final2.to_csv(path_to_file, encoding="utf-8")

Though late answer but would be useful for someone facing similar issues. It is better to dynamically get the csv folder path instead of hardcoding it. We can do so using os.getcwd(). Later join the csv folder path with csv file name using os.path.join(os.getcwd(),'csvFileName')
Example:
import os
path = os.getcwd()
export_path = os.path.join(path,'Combined Book.csv')
final2.to_csv(export_path, index=False, header=True)

opoening data using pandas in python 3.7

hi i tried to open some data that i downloaded to my documents using pandas with python 3.7
but it doesnt work
this is my code :
import pandas as pd
users=pd.read_csv("ml-100k/u.user",sep="|",names=["User ID","Age","Gender",
"aciation" ,"zipcode"])
user.head()
the eror is :
FileNotFoundError: File b'ml-100k/u.user' does not exist
how can it be that the file doesnt exist if i downloaded it ?
thaks:)

It seems your issue is that your data file is not in the path of your python session. There are a few ways to fix this.
First, your file has .user extension. I believe it should be a .csv extension for pd.read_csv(). Rename the file to make sure the extension and the name of the file are correct. I also advise to make the filename code friendly, substitute whitespaces for _ or - and remove non-alphanumeric characters #*/().
One solution is to provide the full path to the pd.read_csv() function.
pd.read_csv("/home/user/folder/file_name.csv",
sep="|",names=["User ID","Age","Gender","aciation" ,"zipcode"])
If you are using ipython or jupyter notebook you can navigate to the same folder where your file is at with cd path_to_file_folder command and simply pass the file name to the command:
pd.read_csv("file_name.csv",sep="|",
names=["User ID","Age","Gender","aciation" ,"zipcode"])
For more robust solutions check this discussion.

"CSV file does not exist" for a filename with embedded quotes

I am currently learning Pandas for data analysis and having some issues reading a csv file in Atom editor.
When I am running the following code:
import pandas as pd
df = pd.read_csv("FBI-CRIME11.csv")
print(df.head())
I get an error message, which ends with
OSError: File b'FBI-CRIME11.csv' does not exist
Here is the directory to the file: /Users/alekseinabatov/Documents/Python/"FBI-CRIME11.csv".
When i try to run it this way:
df = pd.read_csv(Users/alekseinabatov/Documents/Python/"FBI-CRIME11.csv")
I get another error:
NameError: name 'Users' is not defined
I have also put this directory into the "Project Home" field in the editor settings, though I am not quite sure if it makes any difference.
I bet there is an easy way to get it to work. I would really appreciate your help!

Have you tried?
df = pd.read_csv("Users/alekseinabatov/Documents/Python/FBI-CRIME11.csv")
or maybe
df = pd.read_csv('Users/alekseinabatov/Documents/Python/"FBI-CRIME11.csv"')
(If the file name has quotes)

Just referring to the filename like
df = pd.read_csv("FBI-CRIME11.csv")
generally only works if the file is in the same directory as the script.
If you are using windows, make sure you specify the path to the file as follows:
PATH = "C:\\Users\\path\\to\\file.csv"

Had an issue with the path, it turns out that you need to specify the first '/' to get it to work!
I am using VSCode/Python on macOS

I also experienced the same problem I solved as follows:
dataset = pd.read_csv('C:\\Users\\path\\to\\file.csv')

Being on jupyter notebook it works for me including the relative path only. For example:
df = pd.read_csv ('file.csv')
But, for example, in vscode I have to put the complete path:
df = pd.read_csv ('/home/code/file.csv')

You are missing '/' before Users. I assume that you are using a MAC guessing from the file path names. You root directory is '/'.

I had the same issue, but it was happening because my file was called "geo_data.csv.csv" - new laptop wasn't showing file extensions, so the name issue was invisible in Windows Explorer.
Very silly, I know, but if this solution doesn't work for you, try that :-)

Just change the CSV file name. Once I changed it for me, it worked fine. Previously I gave data.csv then I changed it to CNC_1.csv.

What worked for me:
import csv
import pandas as pd
import os
base =os.path.normpath(r"path")
with open(base, 'r') as csvfile:
readCSV = csv.reader(csvfile, delimiter='|')
data=[]
for row in readCSV:
data.append(row)
df = pd.DataFrame(data[1:],columns=data[0][0:15])
print(df)
This reads in the file , delimit by |, and appends to list which is converted to a pandas df (taking 15 columns)

Make sure your source file is saved in .csv format. I tried all the steps of adding the full path to the file, including and deleting the header=0, adding skiprows=0 but nothing works as I saved the excel file(data file) in workbook format and not in CSV format. so keep in mind to first check your file extension.

Adnane's answer helped me.
Here's my full code on mac, hope this helps someone. All my csv files are saved in /Users/lionelyu/Documents/Python/Python Projects/
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
path = '/Users/lionelyu/Documents/Python/Python Projects/'
aapl = pd.read_csv(path + 'AAPL_CLOSE.csv',index_col='Date',parse_dates=True)
cisco = pd.read_csv(path + 'CISCO_CLOSE.csv',index_col='Date',parse_dates=True)
ibm = pd.read_csv(path + 'IBM_CLOSE.csv',index_col='Date',parse_dates=True)
amzn = pd.read_csv(path + 'AMZN_CLOSE.csv',index_col='Date',parse_dates=True)

Run "pwd" command first in cli to find out what is your current project's direction and then add the name of the file to your path!

Try this
import os
cd = os.getcwd()
dataset_train = pd.read_csv(cd+"/Google_Stock_Price_Train.csv")

In my case I just removed .csv from the end. I am using ubuntu.
pd.read_csv("/home/mypc/Documents/pcap/s2csv")

Sometimes we ignore a little bit issue which is not a Python or IDE fault
its logical error
We assumed a file .csv which is not a .csv file its a Excell Worksheet file have a look
When you try to open that file using Import compiler will through the error
have a look
To Resolve the issue
open your Target file into Microsoft Excell and save that file in .csv format
it is important to note that Encoding is important because it will help you to open the file when you try to open it with
with open('YourTargetFile.csv','r',encoding='UTF-8') as file:
So you are set to go
now Try to open your file as this
import csv
with open('plain.csv','r',encoding='UTF-8') as file:
load = csv.reader(file)
for line in load:
print(line)
Here is the Output

What works for me is
dataset = pd.read_csv('FBI_CRIME11.csv')
Highlight it and press enter. It also depends on the IDE you are using. I am using Anaconda Spyder or Jupiter.

I am using a Mac. I had the same problem wherein .csv file was in the same folder where the python script was placed, however, Spyder still was unable to locate the file. I changed the file name from capital letters to all small letters and it worked.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Import Excel File Using Pandas - python

Remove the "'" in your filename? Is your sheet really named 'sheet'? I think the default is 'sheet1' ect.

Related

Python add path of data directory

pandas to_csv - can't save a named csv into a specific path

Adding a path to pandas to_csv function

opoening data using pandas in python 3.7

"CSV file does not exist" for a filename with embedded quotes

Categories

Resources