How do I get Pandas to recognize a *.zip file? - python

Objective: I want to read a CSV file contained in a *.zip file
My problem is that Python acts as there is not file.
FileNotFoundError: [Errno 2] No such file or directory: 'N:\\l\\xyz.zip'
The code I am using is the below (using a windows machine):
import pandas as pd
data = pd.read_csv(r"N:\l\xyz.zip",
compression="zip")
Any help would be appreciated?
data = pd.read_csv("N:\l\xyz.zip",
compression="zip")
EDIT:This is on an s3 bucket.

I have used below code for the zipped file in working directory & it worked
df = pd.read_csv("test_wbcd.zip", compression="zip")
So, the problem should be with the file path. Kindly check your file location.

Related

Having issues reading a .csv file present in a directory

I have a following path defined where I have
label_csv_file = r"C:/Users/username/Downloads/Personal Stuff/Python Directory files/Scripts/CSV Copy Scripts/label.csv"
I was trying to read the csv using pandas,
import pandas as pd
labelpd = pd.read_csv(label_csv_file)
I get the following error:
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/username/Downloads/Personal Stuff/Python Directory files/Scripts/CSV Copy Scripts/label.csv', but the file is clearly there
I blacked out my name, instead of username my name is in the path, so I don't understand why I am having the issue, I have changed the label.csv to be just label as well. Its not just pandas, I was having issues opening with even with
open(label_csv_file, 'rb'),
where the label_csv_file path had been set, what is the issue here?
I have tried changing the file name to just label as it is already a .csv file that does not work either, I have changed the file path to just say /label, that doesn't work either

pandas cannot read csv in same directory

i am having this issue since like 2 months, it didnt bother me at first but now that im trying to import a file with pd or even a normal txt file with open() it gives me this Exception:
File "C:\Users\lcc_zarkos\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\common.py", line 642, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'marketing.xlsx'
if i use the full path it just runs normally.
people would say "just use full path then" but this is a really bad solution when it comes to using this program on multiple devices with different paths or stuff like that
so i hope you have any solutions.
here is the code:
import pandas as pd
df = pd.read_csv('marketing.xlsx')
image:
vscode
folder
edit:
it has none to do with the code itself but more like windows or my pc
FileNotFoundError means that the file path you've given to pandas point to an non existing file.
In your case this can be caused by:
you haven't put your file in the current working directory of your script
you have a typo in your file name
In both case, I would print the file path, and check the location using a file browser, you will find your mistake:
print(os.path.join(os.getcwd(), 'marketing.xlsx'))
i see a spaces in the file name there. I tried on mac, and it works very well.
['main.py', 'marketing .xlsx', 'requirements.txt']

csv file not found Juypterlab python3

Goal: import a csv file with pandas
Code used:
import pandas as pd
data = pd.read_csv('data/master_data_complete.csv')
I have done the following:
uploaded the csv file to the folder 'data'
checked to make sure the original file is in fact saved as a csv
tried creating a new folder called 'mydata' and uploading the csv there (same result with this new filepath)
quit and reloaded jupyterlab
right click the file I wish to import and 'copy file path', so I'm sure the file path is accurate
Outcome:
FileNotFoundError: [Errno 2] File b'data/master_data_complete.csv' does not exist: b'data/master_data_complete.csv'
I think the problem is maybe and i repeat maybe in the path,
you are usin and absolute path
'data/master_data_complete.csv'
you can try with a relative path
'./data/master_data_complete.csv'
but only if the notebook and the data folder are in the same folder
-parent
-notebook
-data
-master_data_complete.csv
Update
import pandas as pd
import os
os.getcwd()
I found the working directory and added the file name onto the end of it. I was able to import it.
'/home/jovyan/data/demo'

DataFrame.to_csv throws error '[Errno 2] No such file or directory'

I am trying to write a DataFrame to a .csv file:
now = datetime.datetime.now()
date = now.strftime("%Y-%m-%d")
enrichedDataDir = "/export/market_data/temp"
enrichedDataFile = enrichedDataDir + "/marketData_optam_" + date + ".csv"
dbutils.fs.ls(enrichedDataDir)
df.to_csv(enrichedDataFile, sep='; ')
This throws me the following error
IOError: [Errno 2] No such file or directory:
'/export/market_data/temp/marketData_optam_2018-10-12.csv'
But when i do
dbutils.fs.ls(enrichedDataDir)
Out[72]: []
There is no error! When i go on the directory levels (one level higher):
enrichedDataDir = "/export/market_data"
dbutils.fs.ls(enrichedDataDir)
Out[74]:
[FileInfo(path=u'dbfs:/export/market_data/temp/', name=u'temp/', size=0L)
FileInfo(path=u'dbfs:/export/market_data/update/', name=u'update/', size=0L)]
This works, too. This mean for me that i have really all the folders which i want to access. But i dont know thy the .to_csv option throws the error. I also have checked the permissions, which are fine!
The main problem was, that i am using Micrsoft Azure Datalake Store for storing those .csv files. And for whatever reason, it is not possible through df.to_csv to write to Azure Datalake Store.
Due to the fact that i was trying to use df.to_csv i was using a Pandas DataFrame instead of a Spark DataFrame.
I changed to
from pyspark.sql import *
df = spark.createDataFrame(result,['CustomerId', 'SalesAmount'])
and then write to csv via the following lines
from pyspark.sql import *
df.coalesce(2).write.format("csv").option("header", True).mode("overwrite").save(enrichedDataFile)
And it works.
Here is a more general answer.
If you want to load file from DBFS to Pandas dataframe, you can do this trick.
Move the file from dbfs to file
%fs cp dbfs:/FileStore/tables/data.csv file:/FileStore/tables/data.csv
Read data from file dir
data = pd.read_csv('file:/FileStore/tables/data.csv')
Thanks
have you tried opening the file first ? (replace last row of your first example with below code)
from os import makedirs
makedirs(enrichedDataDir)
with open(enrichedDataFile, 'w') as output_file:
df.to_csv(output_file, sep='; ')
check the permissions on the sas token you used for the container when you mounted this path.. if it starts with "sp=racwdlmeopi" then you have a sas token with immutable storage.. your token should start with "sp=racwdlmeop"

import csv into pandas dataframe

I am facing difficulty importing a csv file to python from my desktop. It seems that the file or the location is not being recognized while reading.
Have tried several different methods to import, but every time it gives the same error:
IOError: [Errno 2] No such file or directory: '/Users/uditasingh/Desktop/Analysis/monthly_visits.csv'
for the code:
import csv
cr = csv.reader(open("/Users/uditasingh/Desktop/Analysis/monthly_visits.csv","rb"))
I have obtained the path of the csv file from the file 'properties'.
Don't understand what seems to be going wrong.
Please help!
Thanks
My bet is that it searches for the Users directory in the code's working dir and obviously can't find it. Try to use the full path, ie 'C:/Users/......`.
try to write
cr = csv.reader(open("\Users\uditasingh\Desktop\Analysis....))

Categories