I want to overwrite specific cells in an already existing excel file. I've searched and found this answer, writing to existing workbook using xlwt. I've applied it as the following,
def wrtite_to_excel (self):
#first I must open the specified excel file, sice open_file is in the same class, hence we can get it using self.sheet.
bookwt = copy(self.workbook)
sheetwt= bookwt.get_sheet(0)
#now, I must know the column that was estimated so I overwrite it,
colindex= self.columnBox.current() #returns the index of the estimated column
for i in range (1, self.grid.shape[0]):
if (str (self.sheet.cell_value(i,colindex)).lower () == self.missingBox.get().lower()):
#write the estimated value:
sheetwt.write (i, colindex, self.grid[i])
bookwt.save(self.filename + '.out' + os.path.splitext(self.filename)[-1])
Notice that, self.workbook already exists in another method in the same class this way,
def open_file (self, file_name):
try:
self.workbook = xlrd.open_workbook(file_name)
I really don't know what this means, '.out' + os.path.splitext(self.filename)[-1], but it seems that it causes the modified file to be saved in the same path of the original one with a different name.
After running the program, a new Excel file gets saved in the same path of the original one, however it is saved with a weird name as data.xlsx.out.xlsx and it doesn't open. I think it's caused by this line '.out' + os.path.splitext(self.filename)[-1]. I removed that line in order to overwrite the original file and not saving a copy, but when running the program I become unable to open the original file and I get an error message saying that the file can't be opened because the file format or extension is not valid.
What I really want is to modify the original file not to create a modified copy.
EDIT: SiHa's answer could modify the existing file without creating a copy if only the file name is specified like this,
bookwt.save(self.filename)
And, it could save a new copy this way,
filepath, fileext = os.path.splitext(self.filename)
bookwt.save(filepath + '_out' + fileext)
Or as the line provided in my code in the question. However, in all of these methods the same problem exists, where after modifying the file it can't be opened. After searching I found that the problem could be solved by changing the extension of the original file from .xlsx to .xls. After making this change, the problem was solve. This is the link where I found the solution http://www.computing.net/answers/office/the-file-formatfile-extension-is-not-valid/19454.html
Thank You.
To explain the line in question:
(self.filename + '.out' Means concatenate `.out' to the end of the original filename.
+ os.path.splitext(self.filename)[-1]) Means split the filename into a list of ['path', 'extension'] then concatenate the last element (the extension) back onto the end again.
So you end up with data.xlsx.out.xlsx
You should just be able to use bookwt.save(self.filename), although you may run in to errors if you still have the file open for reading. It may be safer to create a copy in a similar manner to the above:
filepath, fileext = os.path.splitext(self.filename)
bookwt.save(filepath + '_out' + fileext)
Which should give you data_out.xlsx
You can save excel file as CSV files this means that when they are open by python they show the values in plain text seperated by commas for example the spreadsheet with the address in the columns a to b and the rows 1 to 2 would look like this
A1,B1
A2,B2
this means that you can edit them like normal files and excel can still open them
Related
I am currently using two different types of python scripts. One extracts data and saves it as a CSV file, and the other characterizes data. Both work perfectly separately, but I am trying to find a way to characterize the data from the outputted CSV file without having to run them separately. Importing script1 into script2 is easy, but reading the CSV file from script1 is what I can't figure out. I am going to provide the output of script1 and where I am trying to insert it in script2:
# create file or append to file
filename = '%s.csv' % gps
if os.path.exists(filename):
append_write = 'a' # append if already exists
else:
append_write = 'w' # make a new file if not
# save file
with open('!/usr/bin/env python/%s.csv' % gps, mode=append_write) as features_file:
features_writer = csv.writer(features_file, delimiter=' ', quotechar='"', quoting=csv.QUOTE_MINIMAL)
(!/usr/bin/env python has replaced the directory I am actually saving this CSV file in due to privacy reasons.)
I am trying to then place the file from this output into the following command:
x_new = pd.read_csv('filename %s.csv gps' , names = attributes)
I have tried a variety of ways to input the script1 output into this command, but can't find the correct way to do this. Please help me out. If any further information is needed please let me know.
This line is definitely wrong:
x_new = pd.read_csv('filename %s.csv gps' , names = attributes)
probably you mean:
x_new = pd.read_csv(filename , names = attributes)
Or
x_new = pd.read_csv('%s.csv' % gps , names = attributes)
but I don't think the filename part should be there either.
But don't write it to a file. You can just keep manipulating the data. And if you want to save the file anyway, you don't have to read the file, you can still just keep manipulating the array/frame you have.
also open(filename,"w") is all you need, if the file doesn't exist, it will be created.
You should try using formatted strings and putting variable directly into your file filepath (this assumes you're using Python 3+) :
x_new = pd.read_csv(f'{gps}.csv')
I am trying to code a function where I grab data from my database, which already works correctly.
This is my code for the headers prior to adding the actual records:
with open('csv_template.csv', 'a') as template_file:
#declares the variable template_writer ready for appending
template_writer = csv.writer(template_file, delimiter=',')
#appends the column names of the excel table prior to adding the actual physical data
template_writer.writerow(['Arrangement_ID','Quantity','Cost'])
#closes the file after appending
template_file.close()
This is my code for the records which is contained in a while loop and is the main reason that the two scripts are kept separate.
with open('csv_template.csv', 'a') as template_file:
#declares the variable template_writer ready for appending
template_writer = csv.writer(template_file, delimiter=',')
#appends the data of the current fetched values of the sql statement within the while loop to the csv file
template_writer.writerow([transactionWordData[0],transactionWordData[1],transactionWordData[2]])
#closes the file after appending
template_file.close()
Now once I have got this data ready for excel, I run the file in excel and I would like it to be in a format where I can print immediately, however, when I do print the column width of the excel cells is too small and leads to it being cut off during printing.
I have tried altering the default column width within excel and hoping that it would keep that format permanently but that doesn't seem to be the case and every time that I re-open the csv file in excel it seems to reset completely back to the default column width.
Here is my code for opening the csv file in excel using python and the comment is the actual code I want to use when I can actually format the spreadsheet ready for printing.
#finds the os path of the csv file depending where it is in the file directories
file_path = os.path.abspath("csv_template.csv")
#opens the csv file in excel ready to print
os.startfile(file_path)
#os.startfile(file_path, 'print')
If anyone has any solutions to this or ideas please let me know.
Unfortunately I don't think this is possible for CSV file formats, since they are just plaintext comma separated values and don't support formatting.
I have tried altering the default column width within excel but every time that I re-open the csv file in excel it seems to reset back to the default column width.
If you save the file to an excel format once you have edited it that should solve this problem.
Alternatively, instead of using the csv library you could use xlsxwriter instead which does allow you to set the width of the columns in your code.
See https://xlsxwriter.readthedocs.io and https://xlsxwriter.readthedocs.io/worksheet.html#worksheet-set-column.
Hope this helps!
The csv format is nothing else than a text file, where the lines follow a given pattern, that is, a fixed number of fields (your data) delimited by comma. In contrast an .xlsx file is a binary file that contains specifications about the format. Therefore you may want write to an Excel file instead using the rich pandas library.
You can add space like as it is string so it will automatically adjust the width do it like this:
template_writer.writerow(['Arrangement_ID ','Quantity ','Cost '])
I'm stuck in a very basic problem of I/O in python. I'd like to insert some line in existing file (called ofe, output file), extracted from an source file (called ife, input file) according to arguments passed by user as stored in an list called lineRange (which has an index idx and values lineNumber).
This is the result:
for ifeidx,ifeline in enumerate(ife,1): #for each line of the input file...
with open(outFile,'r+') as ofe:
for idx,lineNumber in enumerate(lineRange,1): #... check if it's present in desired list of lines...
if (ifeidx == lineNumber): #...if found...
ofeidx = 0
for ofeidx, ofeline in enumerate(ofe,1):
if (ofeidx == idx): #...just scroll the the output file and find which is the exact position in desired list...
ofe.write(ifeline) #...put the desired line in correct order. !!! This is always appending at the end of out file!!!!
break
Problem is, the write() method is always pointing to the end of file, appending the lines instead of inserting them when scrolling the output file.
I really don't understand what's happening since the file is open in read+write (r+) mode, neither append (a) nor read+append (r+a) mode, .
I'm also aware that code will (should) overwrite the output file lines. Additional information are the OS WIndow7, Python version 2.7 and development tool is Eclipse with PyDev 3.7.1.xx
Any suggestion on what I'm doing wrong?
You can start by reading the whole file with readlines(), which will return a list. After that you just need to do list.insert(index, value) and write it again back to the file.
with open(outFile, "r") as f:
data = f.readlines()
data.insert(index, value)
with open(outFile, "w+") as f:
f.write(data)
Of course you should change this approach if you are dealing with a huge file.
By the way, if you are not using the with statement you should close the file in the end.
I'm creating a xlsx output with xlsxwriter into a temporary file using tempfile module, I store the path to this temporary file inside a variable that I later use in another script to open it.
The problem is that sometimes opening the file fails with the error :
"[Errno 2] No such file or directory: '/tmp/xls5TnVsx'"
Sorry I don't have an exact idea about the frequency of this problem occurring but it seems like it happens from time to time, so I don't understand why...
This is how I save into a temporary file :
f = tempfile.NamedTemporaryFile(prefix="xls",delete=False)
xlsfilename = f.name
Then to create the xlsx output :
wb = xlsxwriter.Workbook(filename)
ws = wb.add_worksheet(sheetName)
# Write header
....
# Write data
for row, row_data in enumerate(data, start=1):
for column, key in enumerate(headers):
....
wb.close()
f.close()
Then in a Python CGI script I use the variable xlsxfilename which is the path to the script to open it :
print "Content-type: application/msexcel"
print "Content-Disposition: attachment; filename="+xlsfilename
print
try :
print open(xlsfilename,"rb").read()
finally:
try:
xlsfilename.close()
except:
pass
os.unlink(xlsfilename)
What am I doing wrong here and any ideas on how to solve this by maybe using another method to storing into a temporary file?
I believe the issue here is that your program is overwriting the created file with its own output, as the
wb = xlsxwriter.Workbook(filename)
statement creates a new file. The conditions under which this might be deleted will depend on when the named temporary file is deleted (technically this happens on close()).
You should think about using mkstemp instead, since you already explicity delete the file you are creating. Overwriting that file, whose name is guaranteed unique and which is not deleted automatically, should be more controllable.
So I've been using Python 3.2, and OpenPyXL's iterable workbook as demonstrated here in the "Optimized Reader" example.
My problem arises when I try to use this strategy to read a file or files that I've extracted from a simple .zip archive (both manually and through the python zipfile package). When I call .get_highest_column() I get "A" and .get_highest_row() I get 1, and when asked to print each cell's value as shown here:
wb = load_workbook(filename = file_name, use_iterators = True)
ws = wb.worksheets[0] # Only need to read the first sheet, nothing fancy
for row in ws.iter_rows():
for entry in row:
print(entry.internal_value)
It prints the values in A1, A2, A3, A4, A5, A6, and A7, regardless of how large the file actually is. There isn't any reason for this in the file itself, and it will open in Excel perfectly fine. I'm quite stumped as to why it does it like this, but I assume that the unzipped XLSX is formatted differently prior to being saved from within Excel, and OpenPyXL cannot interpret it correctly. I even renamed the '.xlsx' to '.zip' so that I could explore the file and examine the differences, but couldn't tell much except that the one saved from Excel also has a subfolder called "theme" within the "xl" folder that the previous version does not, with font and formatting data.
IMPORTANT NOTE: When I open it and re-save it with the same filename from within Excel and then run this bit of code, it works perfectly - returns correct greatest row and column values, and correctly prints every cell value. I've tried instead saving the workbook through OpenPyXL immediately after opening it, but this yields the same erroneous results.
Basically, I need to discover a method to properly extract a .xlsx file from a .zip file so that it can be read with OpenPyXL. There are many many files that need to be processed like this, so it must be external to Excel, and hopefully as efficient as possible.
Cheers!
It sounds like this has nothing to do with the extraction from the zipfile, as the problem also occurs if you manually extract the files.
I would try to store the files opened and saved with Excel in a zipfile and see what happens. If that works, then clearly the way the original .xlsx files were generated is the problem.
I strongly suspect that to be the case.
If that is the problem, see if you can extract the .xlsx files (they are zipfiles themselves) and compare the one you re-saved with Excel to the original problematic one. xml does not compare easily as Excel can rearrange most things at will, but you might be able to do a diff.