I'm trying to write a VLOOKUP function using OpenPyxl to a column of cells. Everything about the code works just fine, except that excel crashes when I try to open the document after writing the functions to the cells.
I've tried writing the exact same functions but with within parentheses and then opening the excel document and removing the parentheses manually, which also works perfect. Then, Excel calculates the values exactly as it should.
I'm wondering if there's a formatting error going on? Is there anything I have overlooked when trying to write functions using Openpyxl?
Basically the code I want to work:
wb = load_workbook(path_result + '/' + 'File.xlsx')
ws = wb['Main 2018-04-17']
ws[{B}{2}].value = =VLOOKUP('Main 2018-04-17'!A2;'Data 2018-04-17'!C2:E100;2;FALSE)"
wb.save(path_result + '/' + 'File.xlsx')
This is covered in the documentation: you must use a comma to separate arguments. See http://openpyxl.readthedocs.io/en/stable/usage.html#using-formulae
Related
I'm a total novice when it comes to programming. I'm trying to write a Python 3 program that will produce an Excel workbook based on the contents of a CSV file. So far, I understand how to create the workbook, and I'm able to dynamically create worksheets based on the contents of the CSV file, but I'm having trouble writing to each individual worksheet.
Note, in the example that follows, I'm providing a static list, but my program dynamically creates a list of names based on the contents of the CSV file: the number of names that will be appended to the list varies from 1 to 60, depending on the assay in question.
import xlsxwriter
workbook = xlsxwriter.Workbook('C:\\Users\\Jabocus\\Desktop\\Workbook.xlsx')
list = ["a", "b", "c", "d"]
for x in list:
worksheet = workbook.add_worksheet(x)
worksheet.write("A1", "Hello!")
workbook.close()
If I run the program as it appears above, I get a SyntaxError, and IPython points to workbook.close() as the source of the problem.
However, if I exclude the line where I try to write "Hello!" to cell A1 in every worksheet, the program runs as I'd expect: I end up with Workbook.xlsx on my desktop, and it has 4 worksheets named a, b, c, and d.
The for loop seemed like a good choice to me, because my program will need to handle a variety of CSV formats (I'd rather write one program that can process data from every assay at my lab than a program for each assay).
My hope was that by using worksheet.write() in the way that I did, Python would know that I want to write to the worksheet that I just created (i.e. I thought worksheet would act as the name for each worksheet during each iteration of the loop despite explicitly naming each worksheet something new).
It feels like the iteration is the problem here, and I know that it has something to do with how I'm trying to reference each Worksheet in the write() step (because I'm not giving any of the Worksheet objects an explicit name), but I don't know how to proceed. What's a good way that I might approach this problem?
I'm not sure exactly what is wrong with your code, but I can tell you this:
I copied your code exactly (except for changing the path to be my desktop) and it worked fine.
I believe your issue could be one of three things:
You have a buggy/old version of XlsxWriter
You have a file called Workbook.xlsx on your Desktop already that is corrupted or causing some issues (open in another program.)
You have some code other than what you posted.
To account for all of these possibilities, I would recommend that you:
Reinstall XlsxWriter:
In a Command Prompt run pip uninstall XlsxWriter followed by pip install XlsxWriter
Change the filename of the workbook you are opening:
workbook = xlsxwriter.Workbook('C:\\Users\\Jabocus\\Desktop\\Workbook2.xlsx')
Try running the code that you posted exactly, then incrementally add to it until it stops working.
Did you try something like worksheet.write(0, 0, 'Hello')
instead of worksheet.write('A1', 'Hello')
I have 2 files, lets say 't1.xlsx' and 't2.xlsx'.
What i want to do is to do the VLOOKUP fucntion inside the t1 file using the data from t2 file.
I try to paste
"sheet["O2"].value = "=VLOOKUP(C:C;'C:\\Users\\KKK\\Desktop\\sheets\\excellent\\
[t2.xlsx]baza'!$A$2:$AI$10480;25;0)"
where baza is a sheet name, but sadly when i try open the file it says it can not be open due to the error and offers me repairing tool.
rest of the code:
import openpyxl
wb = openpyxl.load_workbook('t1.xlsx')
sheets = wb.get_sheet_names()
sheet = wb.get_sheet_by_name('Sheet1')
[VLOOKUP STUFF FROM BEFORE]
wb.save("t1.xlsx")
With more complicated formulae you should always check the syntax in the XML because they are often stored differently than they appear in Excel. This is covered in the documentation. You might be okay simply using a comma as a separator but I suspect you'll also have change the path of the file and use a Python raw string (the r prefix).
When using openpyxl to create spreadsheets based on untrusted input (for example, data exports from a web application for admin analysis), formulas can be a vector for script injection. If excel executes malicious formulas in a spreadsheet, they can take over the admin's machine or exfiltrate data.
For example, this simple workbook adds a formula:
from openpyxl import Workbook
wb = Workbook()
ws = wb.active()
ws.append(["=1 + 2"])
ws.save(filename='/tmp/formula.xlsx')
When opening /tmp/formula.xlsx in excel, the formula is executed. =1 + 2 is benign, but it could also be something more evil like =2+5+cmd|' /C calc'!A0. [reference]
How can I write data to a worksheet to ensure that it is not interpreted as a formula? It would be convenient to retain formatting for non-executable data like dates and numbers, rather than coercing everything to strings.
You're right that code injection is a risk, though it's arguably Excel's job to sandbox here and if you're worried about this then you really ought to think about additional protections.
We do expose the calculation node of the workbook settings so I think changing wb.calculation.fullCalcOnLoad = False might do want you need. But you'll probably need to read the specification to be certain.
I had this issue recently. I went with a hack which adds a tab character to the start of a string value if it starts with an =. Something like this:
if value and value[0] == '"':
value = "\t" + value
Another method would be to use cell.set_explicit_value:
wb['A1'].set_explicit_value(value, data_type="s")
So I've been using Python 3.2, and OpenPyXL's iterable workbook as demonstrated here in the "Optimized Reader" example.
My problem arises when I try to use this strategy to read a file or files that I've extracted from a simple .zip archive (both manually and through the python zipfile package). When I call .get_highest_column() I get "A" and .get_highest_row() I get 1, and when asked to print each cell's value as shown here:
wb = load_workbook(filename = file_name, use_iterators = True)
ws = wb.worksheets[0] # Only need to read the first sheet, nothing fancy
for row in ws.iter_rows():
for entry in row:
print(entry.internal_value)
It prints the values in A1, A2, A3, A4, A5, A6, and A7, regardless of how large the file actually is. There isn't any reason for this in the file itself, and it will open in Excel perfectly fine. I'm quite stumped as to why it does it like this, but I assume that the unzipped XLSX is formatted differently prior to being saved from within Excel, and OpenPyXL cannot interpret it correctly. I even renamed the '.xlsx' to '.zip' so that I could explore the file and examine the differences, but couldn't tell much except that the one saved from Excel also has a subfolder called "theme" within the "xl" folder that the previous version does not, with font and formatting data.
IMPORTANT NOTE: When I open it and re-save it with the same filename from within Excel and then run this bit of code, it works perfectly - returns correct greatest row and column values, and correctly prints every cell value. I've tried instead saving the workbook through OpenPyXL immediately after opening it, but this yields the same erroneous results.
Basically, I need to discover a method to properly extract a .xlsx file from a .zip file so that it can be read with OpenPyXL. There are many many files that need to be processed like this, so it must be external to Excel, and hopefully as efficient as possible.
Cheers!
It sounds like this has nothing to do with the extraction from the zipfile, as the problem also occurs if you manually extract the files.
I would try to store the files opened and saved with Excel in a zipfile and see what happens. If that works, then clearly the way the original .xlsx files were generated is the problem.
I strongly suspect that to be the case.
If that is the problem, see if you can extract the .xlsx files (they are zipfiles themselves) and compare the one you re-saved with Excel to the original problematic one. xml does not compare easily as Excel can rearrange most things at will, but you might be able to do a diff.
How to get the inputs from excel and use those inputs in python.
Take a look at xlrd
This is the best reference I found for learning how to use it: http://www.dev-explorer.com/articles/excel-spreadsheets-and-python
Not sure if this is exactly what you're talking about, but:
If you have a very simple excel file (i.e. basically just one table filled with string-values, nothing fancy), and all you want to do is basic processing, then I'd suggest just converting it to a csv (comma-seperated value file). This can be done by "saving as..." in excel and selecting csv.
This is just a file with the same data as the excel, except represented by lines seperated with commas:
cell A:1, cell A:2, cell A:3
cell B:1, cell B:2, cell b:3
This is then very easy to parse using standard python functions (i.e., readlines to get each line of the file, then it's just a list that you can split on ",").
This if of course only helpful in some situations, like when you get a log from a program and want to quickly run a python script which handles it.
Note: As was pointed out in the comments, splitting the string on "," is actually not very good, since you run into all sorts of problems. Better to use the csv module (which another answer here teaches how to use).
import win32com
Excel=win32com.client.Dispatch("Excel.Application")
Excel.Workbooks.Open(file path)
Cells=Excel.ActiveWorkBook.ActiveSheet.Cells
Cells(row,column).Value=Input
Output=Cells(row,column).Value
If you can save as a csv file with headers:
Attrib1, Attrib2, Attrib3
value1.1, value1.2, value1.3
value2,1,...
Then I would highly recommend looking at built-in the csv module
With that you can do things like:
csvFile = csv.DictReader(open("csvFile.csv", "r"))
for row in csvFile:
print row['Attrib1'], row['Attrib2']