I want to write to an Excel sheet via pywin32. I can do it actually without problem. But I couldnt format a range of cells in sheet. I want to align the values centerly inside cells. And also i need to fill the cells with color. How can I do it?
Thanks in advance.
I've not specifically done this using python before, but I'm assuming you're using the COM automation interface to excel.
This page has an example that seems to cover both alignment and filling cells with colour in C#, so it should be fairly easy to adapt to python. Assuming you have a Worksheet object called sheet, and the Excel automation object is called Excel, I'm guessing it might look a bit like this:
//Format A1:D1 as center alignment,
sheet.Range("A1", "D1").VerticalAlignment = Excel.XlVAlign.xlVAlignCenter
sheet.Range("A1", "D1").HorizontalAlignment = Excel.XlHAlign.xlHAlignCenter
sheet.Range("A1", "D1").Interior.ColorIndex = Excel.XlColorIndex.Red
If you don't have access to the Excel.XlAlign and XlColorIndex constants from python then you can just replace them with the specific integers they represent, though I'm not entirey sure where you could get them from. Probably from a VBA Reference Site or similar. (Though that link I provided doesn't seem to allow you to expand each of the entries in the list, so you may need to look elsewhere)
EDIT: Just had a play about with excel automation via the python console, and it seems to work alright:
>>> from win32com.client import Dispatch
>>> xlApp = Dispatch("Excel.Application")
>>> xlWb = xlApp.Workbooks.Add()
>>> xlSht = xlWb.WorkSheets(1)
>>> xlSht.Range("A1", "D1").VerticalAlignment = 1
>>> xlSht.Range("A1", "D1").Interior.ColorIndex = 6
>>> # The background color of A1-D1 should now be yellow
>>> xlSht.Cells(1, 1).VerticalAlignment = 1
If you can't find any good reference on what the various alignment/colour constants are, then I'd just play about with python on the console like this, then open the resulting worksheet in excel and have a look at the results to figure things out.
You can find the official reference for the office 2003 automation API here
Specifically, you'll probably find the range documentation most usefull.
Related
My organization has a clean export for bills of materials (BOM). I would like to automatically parse the excel file to check the BOM for certain attributes.
At the moment, I'm using Python with openpyxl.
I can read the excel workbook and worksheet just fine, but I cannot seem to find the attribute that contains the "outline level" of each row (I fully concede that I may be using the wrong terminology... another term candidate might be "group").
When I look at my excel file using excel, I see this at the left of the screen:
I would like to extract the 1 2 3 4 5 from each of the rows and to tell what grouping they were in.
My initial code is:
from pathlib import Path
import openpyxl as xl
path = Path('<path-to-my-file>.xlsx')
wb = xl.load_workbook(filename=path)
sh = wb.worksheets[0]
# ... would like to put outline level reading code here
From reading other questions, I suspect that I need to look at the row_dimension.group method of the worksheet, but I can't seem to get a handle on the syntax or the exact attribute that I'm looking for.
Thanks for the post. I was struggling with the same problem and seing your post gave me an idea!
I overcome it with the following code:
from pathlib import Path
import openpyxl as xl
path = Path('<path-to-my-file>.xlsx')
wb = xl.load_workbook(filename=path)
sh = wb.worksheets[0]
for row in sorted(sheet.row_dimensions):
outline1=sheet.dimensions[row].outlineLevel
outline2=sheet.dimensions[row].outline_level
print(row,sheet.dimensions[row], outline1, outline2 )
Maybe you can use the following code to gather individual row outline levels as an integer. I use a similar code to find maximum outline level in a sheet with some more lines.
for index in range(ws.min_row, ws.max_row):
row_level = ws.row_dimensions[index].outline_level + 1
In here row level variable is the outline level, you may use as required. But please double check +1, if I remember correctly, to get true level, you need to increase variable by one.
I am trying to create a timeline slicer using win32com python. I am currently using win32com to manipulate excel data but in the data, my client wants me to set the upper limit and lower limit of the timeline slicer to certain month. I have googled a lot and i have came to a conclusion that the only way I could do it is by coding it in VBA and implement it in python like here. I have no experience in VBA and I was wondering if there is a way to use win32com python instead of VBA win32com python.
Edit:
After using "Assign Macro" in Excel, this is the code regarding my timeslicer:
ActiveWorkbook.SlicerCaches("NativeTimeline_Goods_Receipt_Date").TimelineState. _
SetFilterDateRange "01/01/2020", "30/04/2020"
Now i need to change it into python and assign the start date & end date into variable. So far i have this:
from win32com.client import Dispatch
excel = win32.gencache.EnsureDispatch('Excel.Application')
test_wb = excel.Workbooks.Open(test_file)
date_sl = test_wb.SlicerCaches("NativeTimeline_Goods_Receipt_Date")
Apparently in Excel, there is a program within it that could record anything you click so if you want to manipulate the filter/slicer, you can right click the element, and then choose "Assign Macro". Then you can click away as it records your clicks. Once youre done, you can view it by again choosing "Assign Macro" and a pop-up window will be available and you can choose your_filter/slicer_name_Click and it will provid you the VBA code. All you have to do is change it so it fits python format.
Updated answer for converting the VBA into python code:
By referring to this link, i was able to convert the VBA into python code and adjust the date based on your choice of date.
So the VBA code is this:
ActiveWorkbook.SlicerCaches("NativeTimeline_Goods_Receipt_Date").TimelineState. _
SetFilterDateRange "01/01/2020", "30/04/2020"
And the python version of it is:
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.DisplayAlerts = False
excel.Visible = True
test_wb = excel.Workbooks.Open(test_file, None, False)
date_sl = test_wb.SlicerCaches("NativeTimeline_Goods_Receipt_Date")
date_sl.TimelineState.SetFilterDateRange("01/01/2020", "30/04/2020")
In my case, i need to change the date to set based on when i run the code and so on so i can just assign the date to a variable and substitute the hardcoded date with it.
As title, I have a dataset with about 13000 rows and 255 columns (actually I have more than 255 columns but RODBC package seems to limit the number of columns exported to 255, so I trimmed it a bit) that need to be exported to xls/xlsx file.
I tried RODBC and xlsx package, both takes more than 5 minutes for export. I wonder if there is any other more efficient way of doing this?
I knew a little bit of python (using python to connect to outlook for listing emails in mailbox), if there is way for export using python instead, it is welcomed also.
update 01
Quite a few suggested using csv, it may not very possible in my case because there is a field containing free text that I cannot control what kind of character is entered in that field, making selection of separator difficult.
update 02
thank you for the suggestions, but I found that the R packages are fine only if the dataframe is relatively small and it is even slow for dataframe with all columns being character. Any suggestions?
There are lots of options:
Use xlsx with mutliple sheets (you've tried this and it's too slow, I know)
Use write.csv should be faster and it's readable by Excel
Use odbcConnectExcel2007 within RODBC
Use the package bigmemory to help you manage the large dataframe, especially if you can make it into a sparse matrix
XLConnect which worked for this guy with the same problem
Write it to a SQL datatabase with RODBC or RPostgreSQL, etc and then make a connection to the DB within Excel. I do this a lot. Here's a related resource.
Use Pandas
Create a tab-delimited text file and then import it to Excel: write.table (table,sep="\t",quote=FALSE,row.names=FALSE,file=file.name)
Use fread
Try a cloud-based solution (I'm not sure if this will actually be faster, but it would at least be a trendy solution with extra benefits such as providing a nice way to store your data safely and let you query whatever you need from it using Excel on any computer)
RExcel
XLLoop
Finally, here's a nice little article on "A Million Ways to Connect R and Excel" which you may find useful, though I think I've actually given you more options than the article does.
I would start with the most simple solutions, like fread, then work your way to the relatively more complicated solutions if you're still not getting the results you want.
Depending on the exact nature of your project, you might even benefit from parallelism or multicore processing. Those don't boost your I/O speed in most cases, but it could speed up any processing/transformation of your data which takes place in your process, thus making your overall data pipeline faster.
Python is also very well-equipped to handle this problem, but there are so many solutions within R, hopefully you won't need to resort to switching languages just to write out data. Still, you could try
XlsxWriter in Constant Memory mode, or
Optimized Reader and Writer of the openpyxl package
if you want to try a Python-based solution.
try to use openxlsx package its quite fast.
https://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf
Install package openxlsx
load the library openxlsx
use write.xlsx() or writeData() command to write into xlsx file
A small example of basic operations using openxlsx library
taken from openxlsx documentation
`## setup a workbook with 3 worksheets
wb <- createWorkbook()
addWorksheet(wb = wb, sheetName = "Sheet 1", gridLines = FALSE)
writeDataTable(wb = wb, sheet = 1, x = iris)
addWorksheet(wb = wb, sheetName = "mtcars (Sheet 2)", gridLines = FALSE)
writeData(wb = wb, sheet = 2, x = mtcars)
addWorksheet(wb = wb, sheetName = "Sheet 3", gridLines = FALSE)
writeData(wb = wb, sheet = 3, x = Formaldehyde)
worksheetOrder(wb)
names(wb)
worksheetOrder(wb) <- c(1,3,2) # switch position of sheets 2 & 3
writeData(wb, 2, 'This is still the "mtcars" worksheet', startCol = 15)
worksheetOrder(wb)
names(wb) ## ordering within workbook is not changed
saveWorkbook(wb, "worksheetOrderExample.xlsx", overwrite = TRUE)
worksheetOrder(wb) <- c(3,2,1)
saveWorkbook(wb, "worksheetOrderExample2.xlsx", overwrite = TRUE)`
Gani
I made a script that opens a .xls file, writes a few new values in it, then saves the file.
Later, the script opens it again, and wants to find the answers in some cells which contain formulas.
If I call that cell with openpyxl, I get the formula (ie: "=A1*B1").
And if I activate data_only, I get nothing.
Is there a way to let Python calculate the .xls file? (or should I try PyXll?)
I realize this question is old, but I ran into the same problem and extensive searching didn't produce an answer.
The solution is in fact quite simple so I will post it here for posterity.
Let's assume you have an xlsx file that you have modified with openpyxl. As Charlie Clark mentioned openpyxl will not calculate the formulas, but if you were to open the file in excel the formulas would be automatically calculated. So all you need to do is open the file and then save it using excel.
To do this you can use the win32com module.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel.Workbooks.Open(r'absolute/path/to/your/file')
# this must be the absolute path (r'C:/abc/def/ghi')
workbook.Save()
workbook.Close()
excel.Quit()
That's it. I've seen all these suggestions to use Pycel or Koala, but that seems like a bit of overkill if all you need to do is tell excel to open and save.
Granted this solution is only for windows.
There is actually a project that takes Excel formulas and evaluates them using Python: Pycel. Pycel uses Excel itself (via COM) to extract the formulas, so in your case you would skip that part. The project probably has something useful that you can use, but I can't vouch for its maturity or completeness. It was not really developed for the general public.
There is also a newer project called Koala which builds on both Pycel and OpenPyXL.
Another approach, if you can't use Excel but you can calculate the results of the formulas yourself (in your Python code), is to write both the value and the formula into a cell (so that when you read the file, you can just pull the value, and not worry about the formula at all). As of this writing, I haven't found a way to do it in OpenPyXL, but XlsxWriter can do it. From the documentation:
XlsxWriter doesn’t calculate the value of a formula and instead stores the value 0 as the formula result. It then sets a global flag in the XLSX file to say that all formulas and functions should be recalculated when the file is opened. This is the method recommended in the Excel documentation and in general it works fine with spreadsheet applications. However, applications that don’t have a facility to calculate formulas, such as Excel Viewer, or some mobile applications will only display the 0 results.
If required, it is also possible to specify the calculated result of the formula using the options value parameter. This is occasionally necessary when working with non-Excel applications that don’t calculate the value of the formula. The calculated value is added at the end of the argument list:
worksheet.write_formula('A1', '=2+2', num_format, 4)
With this approach, when it's time to read the value, you would use OpenPyXL's data_only option. (For other people reading this answer: If you use xlrd, then only the value is available anyway.)
Finally, if you do have Excel, then perhaps the most straightforward and reliable thing you can do is automate the opening and resaving of your file in Excel (so that it will calculate and write the values of the formulas for you). xlwings is an easy way to do this from either Windows or Mac.
The formula module works for me. For detail please refer to https://pypi.org/project/formulas/
from openpyxl import load_workbook
import formulas
#The variable spreadsheet provides the full path with filename to the excel spreadsheet with unevaluated formulae
fpath = path.basename(spreadsheet)
dirname = path.dirname(spreadsheet)
xl_model = formulas.ExcelModel().loads(fpath).finish()
xl_model.calculate()
xl_model.write(dirpath=dirname)
#Use openpyxl to open the updated excel spreadsheet now
wb = load_workbook(filename=spreadsheet,data_only=True)
ws = wb.active
I run into the same problem, and after some time researching I ended up using pyoo ( https://pypi.org/project/pyoo/ ) which is for openoffice/libreoffice so available in all platforms and is more straightforward since communicates natively and doesn't require to save/close the file . I tried several other libraries but found the following problems
xlswings: Only works if you have Excel installed and Windows/MacOS so I couldn't evaluate
koala : Seems that it's broken, after networkx 2.4 update.
openpyxl: As pointed out by others, it isn't able to calculate formulas so I was looking into combining it with pycel to get values. I didn 't finally tried because I found pyoo . Openpyxl+pycel might not work as of now, since pycel is also relying on networkx library.
No, and in openpyxl there will never be. I think there is a Python library that purports to implements an engine for such formualae which you can use.
xlcalculator can do this job. https://github.com/bradbase/xlcalculator
from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator
filename = r'use_case_01.xlsm'
compiler = ModelCompiler()
new_model = compiler.read_and_parse_archive(filename)
evaluator = Evaluator(new_model)
# First!A2
# value is 0.1
#
# Fourth!A2
# formula is =SUM(First!A2+1)
val1 = evaluator.evaluate('Fourth!A2')
print("value 'evaluated' for Fourth!A2:", val1)
evaluator.set_cell_value('First!A2', 88)
# now First!A2 value is 88
val2 = evaluator.evaluate('Fourth!A2')
print("New value for Fourth!A2 is", val2)
Which results in the following output;
file_name use_case_01.xlsm ignore_sheets []
value 'evaluated' for Fourth!A2: 1.1
New value for Fourth!A2 is 89
I am trying to edit several excel files (.xls) without changing the rest of the sheet. The only thing close so far that I've found is the xlrd, xlwt, and xlutils modules. The problem with these is it seems that xlrd evaluates formulae when reading, then puts the answer as the value of the cell. Does anybody know of a way to preserve the formulae so I can then use xlwt to write to the file without losing them? I have most of my experience in Python and CLISP, but could pick up another language pretty quick if they have better support. Thanks for any help you can give!
I had the same problem... And eventually found the next module:
from openpyxl import load_workbook
def Write_Workbook():
wb = load_workbook(path)
ws = wb.get_sheet_by_name("Sheet_name")
c = ws.cell(row = 2, column = 1)
c.value = Some_value
wb.save(path)
==> Doing this, my file got saved preserving all formulas inserted before.
Hope this helps!
I've used the xlwt.Formula function before to be able to get hyperlinks into a cell. I imagine it will also work with other formulas.
Update: Here's a snippet I found in a project I used it in:
link = xlwt.Formula('HYPERLINK("%s";"View Details")' % url)
sheet.write(row, col, link)
As of now, xlrd doesn't read formulas. It's not that it evaluates them, it simply doesn't read them.
For now, your best bet is to programmatically control a running instance of Excel, either via pywin32 or Visual Basic or VBScript (or some other Microsoft-friendly language which has a COM interface). If you can't run Excel, then you may be able to do something analogous with OpenOffice.org instead.
We've just had this problem and the best we can do is to manually re-write the formulas as text, then convert them to proper formulas on output.
So open Excel and replace =SUM(C5:L5) with "=SUM(C5:L5)" including the quotes. If you have a double quote in your formula, replace it with 2 double quotes, as this will escape it, so = "a" & "b" becomes "= ""a"" & ""b"" ")
Then in your Python code, loop over every cell in the source and output sheets and do:
output_sheet.write(row, col, xlwt.ExcelFormula.Formula(source_cell[1:-1]))
We use this SO answer to make a copy of the source sheet to be the output sheet, which even preserves styles, and avoids overwriting the hand written text formulas from above.