File doesnt exist error before user inputs file name - python

I am working with streamlit in python to produce a tool that takes a user's input of a csv filename, and then carries out cleaning/tabulating of the data within the file.
I have encountered an issue where before the user has entered their filename, my streamlit site shows a "FileNotFoundError: [Errno 2] No such file or directory:"
This is expected because the user has not entered their filename yet - however once filename is entered the code runs smoothly. I am hoping to overcome this issue but as a relative newcomer to Python I am quite unsure how!
Please see code snippet below
autocall_gbp_file = str(st.text_input("Please type in your Autocall File Name (GBP)"))
filepath = M:/Desktop/AutomationProject/
express_gbp = pd.read_csv(filepath + autocall_gbp_file + ".csv")
st.write('Saved!')
The exact error I get before any user input has been taken is:
FileNotFoundError: [Errno 2] No such file or directory:
'M:/Desktop/AutomationProject/.csv'
Traceback:
File "C:\Users\adavie18\.conda\envs\projectenv\lib\site-
packages\streamlit\scriptrunner\script_runner.py", line 475, in
_run_script
exec(code, module.__dict__)
File "M:\Desktop\AutomationProject\AutocallApp.py", line 179, in
<module>
express_gbp = pd.read_csv(filepath+autocall_gbp_file+".csv")
File "C:\Users\adavie18\.conda\envs\projectenv\lib\site-
packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\adavie18\.conda\envs\projectenv\lib\site-
packages\pandas\io\parsers\readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\adavie18\.conda\envs\projectenv\lib\site-
packages\pandas\io\parsers\readers.py", line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\adavie18\.conda\envs\projectenv\lib\site-
packages\pandas\io\parsers\readers.py", line 933, in __init__
self._engine = self._make_engine(f, self.engine)
File "C:\Users\adavie18\.conda\envs\projectenv\lib\site-
packages\pandas\io\parsers\readers.py", line 1217, in _make_engine
self.handles = get_handle( # type: ignore[call-overload]
File "C:\Users\adavie18\.conda\envs\projectenv\lib\site-
packages\pandas\io\common.py", line 789, in get_handle
handle = open(
Thanks in advance to anyone who can offer a suggestion!

The general pattern for both Streamlit and Python in general is to test for the value existing:
if autocall_gbp_file:
express_gbp = pd.read_csv(filepath + autocall_gbp_file + ".csv")
When the Streamlit app runs before a user inputs something, the value of autocall_gbp_file is None. By writing if autocall_gbp_file:, you're only running the pandas read_csv after someone has entered a value.
Separately, you're better off developing this with st.file_uploader than using text_input, as the Streamlit app doesn't necessarily have access to the user filesystem and same drive mapping as the machine you are developing on. By using st.file_uploader, you're literally providing the actual file, not a reference to where it might be located.

Related

OSError: [Errno 22] Invalid argument: - Changing backslash to forward slash not helping! (Windows)

I am working with streamlit to create a tool that takes user input (csv file name) and cleans/produces output as a dataframe. I continuously get OSError: [Errno 22] Invalid argument: 'M:/Desktop/AutomationProject/'
I am aware of all the past solves of this error, however they all say change backslash to forward slash on windows and this is a quick fix, however after doing this I still have the same issue.
Note my tool still works when inputting the file name, just consistently shows an error (below)
Thanks in advance for your help!
Code:
st.header('1 - Express Autocalls')
autocall_gbp_file = str(st.text_input("Please type in your Autocall File Name (GBP)"))
express_gbp = pd.read_csv("M:/Desktop/AutomationProject/" + autocall_gbp_file)
OSError: [Errno 22] Invalid argument: 'M:/Desktop/AutomationProject/'
Traceback:
File "C:\Users\adavie18.conda\envs\projectenv\lib\site->packages\streamlit\scriptrunner\script_runner.py", line 475, in _run_script
exec(code, module.dict)
File "M:\Desktop\AutomationProject\AutocallApp.py", line 176, in
express_gbp = pd.read_csv("M:/Desktop/AutomationProject/" + autocall_gbp_file)
File "C:\Users\adavie18.conda\envs\projectenv\lib\site-packages\pandas\util_decorators.py", >line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\adavie18.conda\envs\projectenv\lib\site-packages\pandas\io\parsers\readers.py", >line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\adavie18.conda\envs\projectenv\lib\site-packages\pandas\io\parsers\readers.py", >line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\adavie18.conda\envs\projectenv\lib\site-packages\pandas\io\parsers\readers.py", >line 933, in init
self._engine = self._make_engine(f, self.engine)
File "C:\Users\adavie18.conda\envs\projectenv\lib\site-packages\pandas\io\parsers\readers.py", >line 1217, in _make_engine
self.handles = get_handle( # type: ignore[call-overload]
File "C:\Users\adavie18.conda\envs\projectenv\lib\site-packages\pandas\io\common.py", line 789, >in get_handle
handle = open(
The usual best practice to keep OS paths consistent across platforms in pythong is using the os module:
import os
path1 = "Desktop/" + "folder1/" + "folder2/"
with open(path1, "r") as file:
pass
# here, script is not consistent across OS,
# and can be difficult to format correctly for Windows
# instead, do:
path2 = os.path.join("Desktop", "folder1", "folder2")
with open(path2, "r") as file:
pass
# now, your script can find your Windows files,
# and the same script works for MacOS, Linux platforms
This helps keep consistency across platforms, so you can avoid meticulous string formatting

Pandas and glob: convert all xlsx files in folder to csv – TypeError: __init__() got an unexpected keyword argument 'xfid'

I have a folder with many xlsx files that I'd like to convert to csv files.
During my research, if found several threads about this topic, such as this or that one. Based on this, I formulated the following code using glob and pandas:
import glob
import pandas as pd
path = r'/Users/.../xlsx files'
excel_files = glob.glob(path + '/*.xlsx')
for excel in excel_files:
out = excel.split('.')[0]+'.csv'
df = pd.read_excel(excel) # error occurs here
df.to_csv(out)
But unfortunately, I got the following error message that I could not interpret in this context and I could not figure out how to solve this problem:
Traceback (most recent call last):
File "<input>", line 11, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1131, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 475, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 391, in __init__
self.book = self.load_workbook(self.handles.handle)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 486, in load_workbook
return load_workbook(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
reader.read()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'xfid'
Does anyone know how to fix this? Thanks a lot for your help!
I had the same problem here. After some hours thinking and searching I realized the problem is, actually, the file. I opened it using MS Excel, and save. Alakazan, problem solved.
The file was downloaded, so i think it's a "security" error or just an error from how the file was created. xD
EDIT:
It's not a security problem, but actually an error from the generation of file. The correct has the double of kb the wrong file.
An solution is: if using xlrd==1.2.0 the file can be opened, you can, after doing this, call read_excel to the Book(file opened by xlrd).
import xlrd
# df = pd.read_excel('TabelaPrecos.xlsx')
# The line above is the same result
a = xlrd.open_workbook('TabelaPrecos.xlsx')
b = pd.read_excel(a)

Writing data to an existing excel sheet using openpyxl

I'm quit new to coding in general.
What i want to achieve is to make an script that runs to a list of employers in excel and weekly generate a new hour-sheet. And by generating i mean copy for every employer an empty hour-sheet and rename it, and also change the week-number and employer-name in the newly made copy.
I didn't start with a loop, because i first wanted to made the part that change the employers-name and week-number. I've already search the internet for some answers, but i can't get the code to work, keep getting error messages.
So here is my code so far:
import os
import shutil
import time
from openpyxl import load_workbook
#calculate the year and week number
from time import strftime
year = (time.strftime("%Y"))
week = str(int(time.strftime("%W"))+1)
year_week = year + "_" + week
#create weekly houresheets per employer
employer = "Adam"
hsheets_dir = "C:\\test\\"
old_file_name = "blanco.xlsx"
new_file_name = employer + "_" + year_week + ".xlsx"
dest_filename = (hsheets_dir + new_file_name)
shutil.copy2((hsheets_dir + old_file_name), dest_filename)
#change employer name and weeknumber
def insert_xlsx(dest, empl, wk):
#Open an xlsx for reading
print (dest)
wb = load_workbook(filename = dest)
#Get the current Active Sheet
ws = wb.get_sheet_by_name("Auto")
ws.cell(row=1,column=2).value = empl
ws.cell(row=2,column=2).value = wk
wb.save(dest)
insert_xlsx(dest_filename, employer, week_str)
And here is the error message i keep getting:
Traceback (most recent call last):
File "G:\ALL\Urenverantwoording\Wekelijks\Genereer_weekstaten.py", line 46, in <module>
insert_xlsx(dest_filename, employer, week)
File "G:\ALL\Urenverantwoording\Wekelijks\Genereer_weekstaten.py", line 44, in insert_xlsx
wb.save(dest)
File "C:\Python34\lib\site-packages\openpyxl\workbook\workbook.py", line 298, in save
save_workbook(self, filename)
File "C:\Python34\lib\site-packages\openpyxl\writer\excel.py", line 198, in save_workbook
writer.save(filename, as_template=as_template)
File "C:\Python34\lib\site-packages\openpyxl\writer\excel.py", line 181, in save
self.write_data(archive, as_template=as_template)
File "C:\Python34\lib\site-packages\openpyxl\writer\excel.py", line 87, in write_data
self._write_worksheets(archive)
File "C:\Python34\lib\site-packages\openpyxl\writer\excel.py", line 114, in _write_worksheets
write_worksheet(sheet, self.workbook.shared_strings,
File "C:\Python34\lib\site-packages\openpyxl\writer\worksheet.py", line 302, in write_worksheet
xf.write(comments)
File "C:\Python34\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "C:\Python34\lib\site-packages\openpyxl\xml\xmlfile.py", line 51, in element
self._write_element(el)
File "C:\Python34\lib\site-packages\openpyxl\xml\xmlfile.py", line 78, in _write_element
xml = tostring(element)
File "C:\Python34\lib\xml\etree\ElementTree.py", line 1126, in tostring
short_empty_elements=short_empty_elements)
File "C:\Python34\lib\xml\etree\ElementTree.py", line 778, in write
short_empty_elements=short_empty_elements)
File "C:\Python34\lib\xml\etree\ElementTree.py", line 943, in _serialize_xml
short_empty_elements=short_empty_elements)
File "C:\Python34\lib\xml\etree\ElementTree.py", line 943, in _serialize_xml
short_empty_elements=short_empty_elements)
File "C:\Python34\lib\xml\etree\ElementTree.py", line 935, in _serialize_xml
v = _escape_attrib(v)
File "C:\Python34\lib\xml\etree\ElementTree.py", line 1093, in _escape_attrib
_raise_serialization_error(text)
File "C:\Python34\lib\xml\etree\ElementTree.py", line 1059, in _raise_serialization_error
"cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize 3 (type int)
Can somewone put me in the right directions?
Many thanks
I think based on your responses then that the problem lies with your existing hour-sheet Excel spreadsheet:
Try starting with a copy of your existing spreadsheet and removing all of the entries. Hopefully this too will work.
If this fails, start with a new blank spreadsheet.
Bit by bit copy the existing data and repeat your script.
By doing this you will might be able to isolate the feature which is not compatible with openpyxl.
Alternatively, you might be able to write the whole thing from your Python script, and skip trying to modify a semi-filled in one. This would then be 100% compatible.

Pandas sometimes writes empty or damaged files

I've been using pandas for a while and I think it is a great tool. I made a program to generate some excel files from some data collected by the user. The final user have been testing and using it for 6 months; it never failled till yesterday, when it generated a dagamaged excel file. When I opened it with a text editor, it was totally blank. The code to generate this file is this:
escritor = pandas.ExcelWriter(direccion, engine='xlsxwriter')
listaTotal.to_excel(escritor, index = False)
escritor.save()
and:
escritor = pandas.ExcelWriter(direccion + '.xlsx', engine='xlsxwriter')
self.listaFact.to_excel(escritor, index = False, startrow = 1, startcol = 0, sheet_name = 'Hoja1')
escritor.save()
The second code fragment also uses some format options for the 'xlsxwriter', an example here:
format = workbook.add_format()
format.set_font_size(9)
format.set_font_name('Sans Serif 12cpi')
format.set_border()
format.set_text_wrap()
This error happened twice; about 1 month ago and yesterday. I can't duplicate the error, I don't know what happened. And also the traceback is here, it shows the problem when the program reads the file, but this file was generated by the code posted before:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python27\lib\lib-tk\Tkinter.py", line 1532, in __call__
return self.func(*args)
File "C:\Users\WINNER\Documents\Visual Studio 2013\Projects\PythonApplication4\PythonApplication4\PythonApplication4.py", line 792, in botonGenerarPedido
self.generarPedido()
File "C:\Users\WINNER\Documents\Visual Studio 2013\Projects\PythonApplication4\PythonApplication4\PythonApplication4.py", line 904, in generarPedido
self.generarVentasDia()
File "C:\Users\WINNER\Documents\Visual Studio 2013\Projects\PythonApplication4\PythonApplication4\PythonApplication4.py", line 927, in generarVentasDia
listaTotal = pandas.io.excel.read_excel(direccion)
File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 151, in read_excel
return ExcelFile(io, engine=engine).parse(sheetname=sheetname, **kwds)
File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 188, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27\lib\site-packages\xlrd\book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Python27\lib\site-packages\xlrd\book.py", line 1230, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Python27\lib\site-packages\xlrd\book.py", line 1224, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\x00\x00\x00\x00\x00\x00\x00\x00'

Strategy to open a corrupt csv file in pandas

I have got a bunch of csv files that I am loading in Pandas just fine, but one file is acting up I'm opening it this way :
df = pd.DataFrame.from_csv(csv_file)
error:
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py",
line 1268, in from_csv
encoding=encoding,tupleize_cols=False) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py",
line 400, in parser_f
return _read(filepath_or_buffer, kwds) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py",
line 198, in _read
parser = TextFileReader(filepath_or_buffer, **kwds) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py",
line 479, in init
self._make_engine(self.engine) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py",
line 586, in _make_engine
self._engine = CParserWrapper(self.f, **self.options) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py",
line 957, in init
self._reader = _parser.TextReader(src, **kwds) File "parser.pyx", line 477, in pandas.parser.TextReader.cinit
(pandas/parser.c:4434) File "parser.pyx", line 599, in
pandas.parser.TextReader._get_header (pandas/parser.c:5831)
pandas.parser.CParserError: Passed header=0 but only 0 lines in file
To me, this means that there is some sort of corruption in the file, having a quick look is seems fine, it is a big file though and visually checking every single line is not an option, what would be a good strategy to troubleshoot a csv file that pandas won't open ?
thank you
Looks like pandas assigns line 0 as the header. Try calling:
df = pd.DataFrame.from_csv(csv_file,header=None)
or
df = pd.DataFrame.read_csv(csv_file,header=None)
However, it's strange that the files seems to have zero lines (i.e. it's empty). Maybe the filepath is wrong?
if in Linux open it with head in the operating system to inspect it then fix it with awk or sed.. if in windows, you could also try vim to inspect and fix it. In short it probably is not best to fix the file in Pandas. You most likely have odd line endings (since the error message says 0 lines) so heading the file or cat or using Vim is needed to determine the line endings so that you can decide how best to fix or handle.
I encountered the issue like you:
/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.13.1_601_g4663353-py2.7-macosx-10.9-x86_64.egg/pandas/io/parsers.pyc in init(self, src, **kwds)
970 kwds['allow_leading_cols'] = self.index_col is not False
971
--> 972 self._reader = _parser.TextReader(src, **kwds)
973
974 # XXX
/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.13.1_601_g4663353-py2.7-macosx-10.9-x86_64.egg/pandas/parser.so in pandas.parser.TextReader.cinit (pandas/parser.c:4628)()
/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.13.1_601_g4663353-py2.7-macosx-10.9-x86_64.egg/pandas/parser.so in pandas.parser.TextReader._get_header (pandas/parser.c:6068)()
CParserError: Passed header=0 but only 0 lines in file
My code is:
df = pd.read_csv('/Users/steven/Documents/Mywork/Python/sklearn/beer/data')
Finally, I found I have made a mistake: I sent a path of directory instead of file to read_csv.
The correct code is:
df = pd.read_csv('/Users/steven/Documents/Mywork/Python/sklearn/beer/data/beer_reviews.csv')
It runs right.
So, I think the reason of your issue lies in the file you sent. Maybe it is path of directory just as I have done. Maybe the file is empty or corrupt, or in wrong encoding set.
I hope the above is helpful to you.

Categories