I'm trying to retrieve csv-formatted data with pandas from a .ods file on a shared folder (mounted using nfs on my machine), and I have trouble getting the data when someone else is working on the file.
In that case, the file is locked, which makes perfect sense to avoid concurrent edition. One can see it when opening the file with LibreOffice for example, or just staring at the folder as a. .~lock file is present.
However, in my case, I'm just trying to open the file to read it with pandas, not edit it. Libre Office offers this possibility for instance. How is it pandas cannot provide that functionality ?
To be more precise, here is the command:
sheet_df = pd.read_excel(filepath, sheet_name= "Sheet2", engine="odf", skiprows=3)
and the output
File "/Users/user_name/job.py", line 148, in read_file
sheet_df = pd.read_excel(filepath, sheet_name= "Sheet2", engine="odf", skiprows=3)
File "/Users/user_name/.pyenv/versions/virtualenv_prod/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/Users/user_name/.pyenv/versions/virtualenv_prod/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 364, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/Users/user_name/.pyenv/versions/virtualenv_prod/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1233, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "/Users/user_name/.pyenv/versions/virtualenv_prod/lib/python3.9/site-packages/pandas/io/excel/_odfreader.py", line 35, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "/Users/user_name/.pyenv/versions/virtualenv_prod/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 420, in __init__
self.book = self.load_workbook(self.handles.handle)
File "/Users/user_name/.pyenv/versions/virtualenv_prod/lib/python3.9/site-packages/pandas/io/excel/_odfreader.py", line 46, in load_workbook
return load(filepath_or_buffer)
File "/Users/user_name/.pyenv/versions/virtualenv_prod/lib/python3.9/site-packages/odf/opendocument.py", line 982, in load
z = zipfile.ZipFile(odffile)
File "/Users/user_name/.pyenv/versions/3.9.2/lib/python3.9/zipfile.py", line 1257, in __init__
self._RealGetContents()
File "/Users/user_name/.pyenv/versions/3.9.2/lib/python3.9/zipfile.py", line 1322, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
I'm using python 3.9.2, on a MAC BigSur by the way.
Am I missing something, or pandas.read_excel cannot only read a file ?
Related
I have a main directory containing multiple subdirectories which again have excel files. I want to loop through the directories and read the excel files into a pandas dataframe and make a collated dataframe containing the data from all the excel files.
The code that I've written so far gets the excel files from the subdirectories but I am unable to get them into a dataframe. Can someone help me with the same. The code that I've written so far is as follows:
import os
import pandas as pd
fin = pd.DataFrame()
rootdir = 'C:\\Divyam Projects\\ISB Work\\NHB Data'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
df = pd.read_csv(file)
fin.append(df)
print(fin)
In the above code I'm trying to declare a dataframe 'fin' and then trying to append it with the information from different excel files. The code is giving an error:
Traceback (most recent call last):
File "c:\Divyam Projects\ISB Work\NHB Data\main.py", line 8, in <module>
df = pd.read_csv(file)
File "C:\Users\divya\anaconda3\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\divya\anaconda3\lib\site-packages\pandas\io\parsers.py", line 454, in _read
data = parser.read(nrows)
File "C:\Users\divya\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "C:\Users\divya\anaconda3\lib\site-packages\pandas\io\parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 859, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 874, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 928, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 915, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 2070, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 3
The sample header for the excel file is
I have been getting the below errors for trying to open a file with xlsx extension.
I have tried using pandas with engine as openpyxl, openpyxl library but still the error remains the same.
Code -
import pandas as pd
filepath=r'C:\Users\smriti.rastogi\eclipseworkspace\demoproject\testfile1.xlsx'
readFile = pd.read_excel(filepath, sheet_name='Sheet1')
readFile.head()
Any help is appreciated.
File ".\test.py", line 25, in <module>
readFile = pd.read_excel(filepath, sheet_name='Sheet1')
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_base.py", line 364, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_base.py", line 1233, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_openpyxl.py", line 522, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_base.py", line 420, in __init__
self.book = self.load_workbook(self.handles.handle)
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_openpyxl.py", line 533, in load_workbook
return load_workbook(
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 317, in load_workbook
reader.read()
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\styles\stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\styles\stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "C:\Users\smriti.rastogi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'xfid'
One of your style in your document is probably corrupted. Try to copy/paste your data to another blank sheet as plain text. Re-try with the new document.
This worked , what is xfid , haven't found any relavant links related to it.Will be helpful if you could share some resources.
Unfortunately, there is no resources about this problem. You have to read the source code according to the traceback to find the bug.
I am trying to Import a csv file saved in a local Folder. When I use Anaconda Python Notebook I have no Problems, while using Zeppelin I do have issues.
The code I am using, that works fine in Anaconda, is:
#import csv data
frequency=pd.read_csv("C:\\Users\\L18938\\Desktop\\Vehicle_to_grid\\analysis\\Frequency_March_2018.csv", nrows=86401)
However, when running it on Zeppelin, I receive:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 389, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 730, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1390, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4025)
File "pandas/parser.pyx", line 667, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:8031)
IOError: File C:\Users\L18938\Desktop\Vehicle_to_grid\analysis\Frequency_March_2018.csv does not exist
Obviously, the file exists and there are no Errors in the path spelling.
I have tryied / or double \, but nothing changes. Also
os.chdir("C:/Users/L18938/Desktop/Vehicle_to_grid/analysis")
or
os.listdir("C:/Users/L18938/Desktop/Vehicle_to_grid/analysis")
Any idea? thank you in advance
Your Traceback let show you that the python interpreter is running in Unix file path mode (/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py)
When you are under Anaconda, you are in pure windows and your traceback will be something like (C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py)
Anaconda will reach file with a Windows type file-path, and Zeppelin will reach file in a UNIX type file-path.
Your issue is definitely relative to how you specify your path in Zeppelin, you can't use Windows path, but you you may try something like that:
frequency=pd.read_csv("file:///C:/Users/L18938/Desktop/Vehicle_to_grid/analysis/Frequency_March_2018.csv", nrows=86401)
I am trying to walk a directory tree and for each csv encountered on the walk I would like to open the file and read columns 0 and 15 into a data-frame (after which I'll process and move onto the next file. I can walk the directory tree using the following:
rootdir = r'C:/Users/stacey/Documents/Alco/auditopt/'
for dirName,sundirList, fileList in os.walk(rootdir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
df = pd.read_csv(fname, header=1, usecols=[0,15],parse_dates=[0], dayfirst=True,index_col=[0], names=['date', 'total_pnl_per_pos'])
print(df)
but I'm getting the error message:
FileNotFoundError: File b'auditopt.os-pnl.BBG_XASX_ARB_S-BBG_XTKS_7240_S.csv' does not exist.
I am trying to read files which do exist. They are in an MS Excel .csv format so I don't know if that is an issue - if it is, would someone let me know how I read an MS Excel .csv into a data-frame please.
The full stack trace is as follows:
Found directory: C:/Users/stacey/Documents/Alco/auditopt/
Found directory: C:/Users/stacey/Documents/Alco/auditopt/roll_597_oe_2017-03-10
tradeopt.os-pnl.BBG_XASX_ARB_S-BBG_XTKS_7240_S.csv
Traceback (most recent call last):
File "<ipython-input-24-3753e367432d>", line 1, in <module>
runfile('C:/Users/stacey/Documents/scripts/Pair_Results_Code_1.0.py', wdir='C:/Users/stacey/Documents/scripts')
File "C:\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/stacey/Documents/scripts/Pair_Results_Code_1.0.py", line 49, in <module>
main()
File "C:/Users/stacey/Documents/scripts/Pair_Results_Code_1.0.py", line 36, in main
df = pd.read_csv(fname, header=1, usecols=[0,15],parse_dates=[0], dayfirst=True,index_col=[0], names=['date', 'total_pnl_per_pos'])
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 389, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 730, in __init__
self._make_engine(self.engine)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1390, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4184)
File "pandas\parser.pyx", line 667, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:8449)
FileNotFoundError: File b'tradeopt.os-pnl.BBG_XASX_ARB_S-BBG_XTKS_7240_S.csv' does not exist
When reading in the file, you need to provide the full path. os.walk by default does not supply the full path. You'll need to supply it yourself.
Use os.path.join to make this easy.
import os
full_path = os.path.join(dirName, file)
df = pd.read_csv(full_path, ...)
I wrote the following script that runs perfectly when using pyCharm, but when I go to run it in a terminal it gives me these errors:
File "/Users/Chris/PycharmProjects/firstfile/trial.py", line 6, in <module>
r = pf.read_csv('python.csv')
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1213, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 358, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3427)
File "pandas/parser.pyx", line 628, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6861)
IOError: File python.csv does not exist
Could someone point in the the right direction? I am guessing that it has to do with the csv file not being in the right path or directory. Right now I have the csv file saved in the same folder as my .py project. I also checked and made sure I have the right packages installed, so I do not think it is that.
import csv
import pandas as pf
r = pf.read_csv('python.csv')
r.head()
print r.describe()
tradeDates = r['Trade Date'].unique()
r.name = 'Trade Date'
for trades in tradeDates:
outfilename = trades
printName = outfilename + ".csv"
print printName
r[r['Trade Date'] == trades].to_csv(printName, index=False)
When you run python /Users/Chris/PycharmProjects/firstfile/trial.py python looks for csv file in your current directory, not in /Users/Chris/PycharmProjects/firstfile.
You either need to change your directory before running the code, or you need to use the full path in trial.py like this:
import csv
import pandas as pf
r = pf.read_csv('/Users/Chris/PycharmProjects/firstfile/python.csv')
r.head()