csv file not found Juypterlab python3 - python

Goal: import a csv file with pandas
Code used:
import pandas as pd
data = pd.read_csv('data/master_data_complete.csv')
I have done the following:
uploaded the csv file to the folder 'data'
checked to make sure the original file is in fact saved as a csv
tried creating a new folder called 'mydata' and uploading the csv there (same result with this new filepath)
quit and reloaded jupyterlab
right click the file I wish to import and 'copy file path', so I'm sure the file path is accurate
Outcome:
FileNotFoundError: [Errno 2] File b'data/master_data_complete.csv' does not exist: b'data/master_data_complete.csv'

I think the problem is maybe and i repeat maybe in the path,
you are usin and absolute path
'data/master_data_complete.csv'
you can try with a relative path
'./data/master_data_complete.csv'
but only if the notebook and the data folder are in the same folder
-parent
-notebook
-data
-master_data_complete.csv

Update
import pandas as pd
import os
os.getcwd()
I found the working directory and added the file name onto the end of it. I was able to import it.
'/home/jovyan/data/demo'

Related

Find a Excel file in directory, compress and send it to another folder

I have an Excel file WK6 that is downloaded in the below folder:
C:\Users\kj\Scripts\Sh\Result\Wk6
The Python script should first navigate till the above directory and then find the excel file WK6 (the name of the excel file changes as per week) and compress it. Then move it to some other directory.
Please help me understand how can I find and compress the file in python?
for zipping the file you can use following:
from zipfile import ZipFile
import os
path = r'C:\Users\kj\Scripts\Sh\Result\Wk6'
os.chdir(path)
ZipFile('<new path>/name.zip', 'w').write('Wk6.csv')

Importing DataFrames - no such file or directory

I am trying to import a csv file as a dataframe into a Jupyter notebook.
rest_relation = pd.read_csv('store_id_relation.csv', delimiter=',')
But I get this error
FileNotFoundError: [Errno 2] No such file or directory: 'store_id_relation.csv'
The store_id_relation.csv is definitely in the data folder, and I have tried adding data\ to the file location but I get the same error. Whats going wrong here?
Using the filename only works if the file is located in the current working directory. You can check the current working directory using os.getcwd().
import os
current_directory = os.getcwd()
print(current_directory)
One way you can make your code work is to change the working directory to the one where the file is located.
os.chdir("Path to wherever your file is located")
Or you can substitute the filename with the full path to the file. The full path would look something like C:\Users\Documents\store_id_relation.csv.
Always try to pass the full path whenever possible
By default, read_csv looks for the file in the current working directory
Provide the full path to the read_csv function
However in your case , data and Data are different , and file paths are case sensitive
You can try the below path to fetch the file and convert it into a dataframe
rest_relation = pd.read_csv('Data\\store_id_relation.csv', delimiter=',')
In case where you don't want to change your read_csv()'s parameter, just make sure you are in a correct path of directory which your .csv file is in. Otherwise you can change your directory into there and then run the python file or simply change read_csv()'s parameter.

How to read a CSV from a folder without file name in Python

I need to read a CSV file from a folder, which is generating from another Module. If that Module fails it won't generate the folder which will have a CSV file.
Example:
path = 'c/files' --- fixed path
When Module successfully runs it will create a folder called output and a file in it.
path =
'c/files/output/somename.csv'
But here is a catch everytime it generates a output folder, CSV file has a different name.
First i need to check if that output folder and a CSV file is there or not.
After that I need to read that CSV.
The following will check for existance of output folder as well as csv file and read the csv file:
import os
import pandas as pd
if 'output' in os.listdir('c/files'):
if len(os.listdir('c/files/output')>0:
x=[i for i in os.listdir('c/files/output') if i[-3:]=='csv][0]
new_file=pd.read_csv(x)
glob.glob can help.
glob.glob('c/files/output/*.csv') returns either an empty list or a list with (hopefully) the path to a single file
You may also try to get the latest file based on creation time, after you have done check on existence of a directory (from above post). Something like below
list_of_files = glob.glob("c/files/*.csv")
latest_file = max(list_of_files, key=os.path.getctime)
latest_file is the latest created file in your directory.

Forloop for transforming all pdfs in a directory as excel files not working - python

I am trying to convert all pdfs in a folder into excel files. To do so, I am using the following code, though I am receiving the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'filepath.pdf'
Here is the non-functioning code:
# import packages needed
import glob
!pip install tabula-py
import tabula
# set up working directory
my_dir = 'C:/Users/myfolderwithpdfs'
# transform the pdfs into excel files
for filepath in glob.iglob('my_dir/*.pdf'):
tabula.convert_into("filepath.pdf","filepath.xlsx", output_format="xlsx")
When I use either only the for loop to print the list of my files (as follows)
for filepath in glob.iglob('my_dir/*.pdf'):
print(filepath)
or tranform a single file
tabula.convert_into("myfilename.pdf", "myfilename.xlsx", output_format="xlsx")
I encounter no problems or errors with my code.
You should corret the my_dir in the loop because it is looking for a dir called "my_dir", replace by the actual directory. Also you should only use the filepath refererence created in the loop, no need to use an actual string.
# import packages needed
import glob
import tabula
# transform the pdfs into excel files
for filepath in glob.iglob('C:/Users/myfolderwithpdfs/*.pdf'):
tabula.convert_into(filepath, output_format="xlsx")

How do I get Pandas to recognize a *.zip file?

Objective: I want to read a CSV file contained in a *.zip file
My problem is that Python acts as there is not file.
FileNotFoundError: [Errno 2] No such file or directory: 'N:\\l\\xyz.zip'
The code I am using is the below (using a windows machine):
import pandas as pd
data = pd.read_csv(r"N:\l\xyz.zip",
compression="zip")
Any help would be appreciated?
data = pd.read_csv("N:\l\xyz.zip",
compression="zip")
EDIT:This is on an s3 bucket.
I have used below code for the zipped file in working directory & it worked
df = pd.read_csv("test_wbcd.zip", compression="zip")
So, the problem should be with the file path. Kindly check your file location.

Categories