How to edit Excel (xlsx and xlsm) in python - python

I am very new to Python and this is my first project in python.
What I am doing is...
1. Retrieved the data from Sql server
2. Put the data in predefined excel template (specific worksheet).
3. If is there any data in this sheet then it should be replaced and only column name should remain in the sheet.
3. Another sheet in excel template contains a Pivot representation of data from step 2.
4. I need to refresh this pivot with new data from sheet1.
5. no of row in sheet1 can be changed depends on data from database.
I am fine with Step1 but unable oto perform excel operations.
I tried openpyxl but not able to much understand of it.
https://openpyxl.readthedocs.io/en/stable/
code:
from openpyxl import load_workbook
wb2 = load_workbook('CnA_Rec.xlsx')
print (wb2.sheetnames)
rawsheet = wb2.get_sheet_by_name('RawData')
print (rawsheet.cell_range)
Error with above code:
AttributeError: 'Worksheet' object has no attribute 'cell_range'
I can access individual cell but not range.
I need to select current range and replace it will new data.
ref link: https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.cell_range.html
Can any one point me to some online example for the same or any sample code for this.

So, then let go for it with openpyxl. Where is your problem? This is a very basic start. We can change this script during the process.
import openpyxl
wb = openpyxl.load_workbook('hello_world.xlsx')
# do magic with openpyxl here and save
ws = wb.worksheets[0]
ws.cell(row=1, column=3).value = 'Hello' # example
ws.cell(row=2, column=3).value = 'World' # example
for i in range(2,20):
ws.cell(row=i,column=1).value = 'Row:' + str(i)
data = [ws.cell(row=i,column=1).value for i in range(1,11)]
print(data)
wb.save('hello_world.xlsx')

Related

Python: How to save excel workbook without ruining dynamic spill/array formulas

Short description of the problem:
I am currently accessing an Excel workbook from Python with openpyxl.
I have some dynamic spill formulas in sheet1, like filter(), byrow() and unique().
With the python script, I am doing some operations in sheet2, but I am not touching sheet1 (where the dynamic spill formulas are located).
When using workbook.save() method in Python, I experience that the dynamic formulas in sheet1 are ruined and static, not having the dynamic functionality they had before interacting with python.
What can I do? Use a parameter in .save()? Use another method?
Detailed description of problem (with pictures):
I have a workbook called Original, with the following three sheets:
nums
dynamic
dump
In "nums" I have a cell for ID (AA), and a column with some numerical values (picture1).
In "dynamic" I have some dynamic formulas like byrow() and filter() that updates automatically with the values in ID and Values-column of "nums" (picture2).
The sheet "dump" is for now empty.
I have a second workbook called Some_data, which have one sheet with a 3-column dataframe (picture3).
I am dumping the 3-column dataframe of Some_data into the empty "dump"-sheet of Original with a Python script, and then using the workbook.save() method to save the new workbook.
The code is here:
import pandas as pd
from openpyxl import load_workbook
Some_data = filepath of the workbook
Original = filepath of the workbook
df = pd.read_excel(Some_data, engine = "openpyxl")
wb = load_workbook(filename = Original)
ws = wb["dump"]
rownr = 2
for index, row in df.iterrows():
ws["B"+str(rownr)] = row["col1"]
ws["C"+str(rownr)] = row["col2"]
ws["D"+str(rownr)] = row["col3"]
rownr+=1
wb.save(filepath of new workbook)
Now, the newly saved workbook's sheet "dump" has now been populated.
The problem is that the dynamic formulas in the sheet "dynamic" has been ruined, although the python script does not interact with any of the sheets "nums" or "dynamic".
First of all - the dynamic array formulas (like filter) now have brackets around them (picture4), and the dynamic array formulas are not dynamic anymore (there are no blue line around the array when selected, and they do not update automatically; picture5).
I need help with what to do. I want to save the excel-file, but with the dynamic array formulas not being ruined.
Thank you for your help, in advance.
Frode

does this library assume the Google Spreadsheet will have one sheet only?

I am trying to use this library to pull data from a Googlespreadsheet with two sheets in it, I can get data only from the first sheet but not the second sheet. sheet = client.open("sheetname").sheet1, if I change sheet1 to sheet2 I get the following error sheet = client.open("filename").sheet2 AttributeError: 'Spreadsheet' object has no attribute 'sheet2' how do I fix this? any help is appreciated!
.sheet1 is used as a shortcut.
In order to get the second sheet try that:
sheet = client.open("filename").get_worksheet(1)
1 means second sheet (starting from 0).
References:
Official documentation
In this case, you can use get_worksheet, worksheet and worksheets.
Sample script:
sh = client.open("###Spreadsheet name###") # or client.open_by_key(spreadsheetId)
worksheet = sh.get_worksheet(1) # Use the index of the sheet. 0 is the 1st sheet.
worksheet = sh.worksheet('Sheet2') # Use the sheet name of the sheet.
worksheet = sh.worksheets()[1] # In this case, all sheets are included in the array.
Note:
In the current stage, it seems that sh.sheet1 is only the 1st sheet.
Reference:
Selecting a Worksheet

Is there a way to save data in named Excel cells using Python?

I have used openpyxl for outputting values in Excel in my Python code. However, now I find myself in a situation where the cell locations in excel file may change based on the user. To avoid any problems with the program, I want to name the cells where the code can save the output to. Is there any way to have Python interact with named ranges in Excel?
For a workbook level defined name
import openpyxl
wb = openpyxl.load_workbook("c:/tmp/SO/namerange.xlsx")
ws = wb["Sheet1"]
mycell = wb.defined_names['mycell']
for title, coord in mycell.destinations:
ws = wb[title]
ws[coord] = "Update"
wb.save('updated.xlsx')
print("{} {} updated".format(ws,coord))
I was able to find the parameters of the named range using defined_names. After that I just worked like it was a normal Excel cell.
from openpyxl import load_workbook
openWB=load_workbook('test.xlsx')
rangeDestination = openWB.defined_names['testCell']
print(rangeDestination)
sheetName=str(rangeDestination.attr_text).split('!')[0]
cellName = str(rangeDestination.attr_text).split('!')[1]
sheetToWrite=openWB[sheetName]
cellToWrite=sheetToWrite[cellName]
sheetToWrite[cellName]='TEST-A3'
print(sheetName)
print(cellName)
openWB.save('test.xlsx')
openWB.close()

Python: Write a dataframe to an already existing excel which contains a sheet with images

I have been working on this for too long now. I have an Excel with one sheet (sheetname = 'abc') with images in it and I want to have a Python script that writes a dataframe on a second separate sheet (sheetname = 'def') in the same excel file. Can anybody provide me with some example code, because everytime I try to write the dataframe, the first sheet with the images gets emptied.
This is what I tried:
book = load_workbook('filename_of_file_with_pictures_in_it.xlsx')
writer = pd.ExcelWriter('filename_of_file_with_pictures_in_it.xlsx', engine = 'openpyxl')
writer.book = book
x1 = np.random.randn(100, 2)
df = pd.DataFrame(x1)
df.to_excel(writer, sheet_name = 'def')
writer.save()
book.close()
It saves the random numbers in the sheet with the name 'def', but the first sheet 'abc' now becomes empty.
What goes wrong here? Hopefully somebody can help me with this.
Interesting question! With openpyxl you can easily add values, keep the formulas but cannot retain the graphs. Also with the latest version (2.5.4), graphs do not stay. So, I decided to address the issue with
xlwings :
import xlwings as xw
wb = xw.Book(r"filename_of_file_with_pictures_in_it.xlsx")
sht=wb.sheets.add('SheetMod')
sht.range('A1').value = np.random.randn(100, 2)
wb.save(r"path_new_file.xlsx")
With this snippet I managed to insert the random set of values and saved a new copy of the modified xlsx.As you insert the command, the excel file will automatically open showing you the new sheet- without changing the existing ones (graphs and formulas included). Make sure you install all the interdependencies to get xlwings to run in your system. Hope this helps!
You'll need to use an Excel 'reader' like Openpyxl or similar in combnination with Pandas for this, pandas' to_excel function is write only so it will not care what is inside the file when you open it.

How to get the value from merged cells in xlsx file using python?

I am trying to get the value from cell with row = 11 and column B and C. See screenshot for more clarification.
I tried following code using xlrd package but it does not print anything.
import xlrd
path = "C:/myfilepath/data.xlsx"
workbook = xlrd.open_workbook(path)
sheet = workbook.sheet_by_index(0)
sheet.cell_value(10,1)
sheet.cell_value(10,2)
I am not able to output the value from particular merged cells using xlrd package in python.
Above code should print the cell value i.e PCHGFT001KS
I don't know how xlrd works, but I do know how the lovely openpyxl works. You should use openpyxl! it's a robust tool for working with xlsx files. (NOT xls).
import openpyxl
wb = openpyxl.load_workbook(excel)
ws = wb[wb.get_sheet_names()[0]]
print(ws['B11'].value)
Extra:
If you want to unmerge those blocks you can do the following.
for items in ws.merged_cell_ranges:
ws.unmerge_cells(str(items))
wb.save(excel)

Categories