How to fill a block of cells in xlsxwriter? - python

I had a problem with my xlsxwriter code, it works perfectly fine but I need to figure out how to select a block of cells. For example - from A1 to J10 as it depicted on screenshot.
Does xlsxwriter have such a function? I've searched several formats, such as:
worksheet.write('A1:J1', '...')
But it write only on A1. So for example - how can I fill all highlighted area with one word without writing code for all of 10 rows?

Here is how I write my output from a pandas dataframe to an excel template.
Please note that if data is already present in the cells where you are trying to write the dataframe, it will not be overwritten and the dataframe will be written to a new sheet which is my i have included a step to clear existing data from the template. I have not tried to write output on merged cells so that might throw an error.
Setup
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
file_path='Template.xlsx'
book=load_workbook(file_path)
writer = pd.ExcelWriter(file_path, engine='openpyxl')
writer.book = book
sheet_name="Template 1"
sheet=book[sheet_name]
Set first row and first column in the excel template where output is to be pasted.
If my output is to be pasted starting in cell N2, row_start will be 2 and col_start will be 14
row_start=2
col_start=14
Clear existing data in excel template
for c_idx, col in enumerate(df.columns,col_start):
for r_idx in range(row_start,10001):
sheet.cell(row=r_idx, column=c_idx, value="")
Write dataframe to excel template
rows=dataframe_to_rows(df,index=False)
for r_idx, row in enumerate(rows,row_start):
for c_idx, col in enumerate(row,col_start):
sheet.cell(row=r_idx, column=c_idx, value=col)
writer.save()
writer.close()

Related

Format and manipulate data across multiple Excel sheets in Python using openpyxl before converting to Dataframe

I need some help with editing the sheets within my Excel workbook in python, before I stack the data using pd.concat(). Each sheet (~100) within my Excel workbook is structured identically, with the unique identifier for each sheet being a 6-digit code that is found in line 1 of the worksheet.
I've already done the following steps to import the file, unmerge rows 1-4, and insert a new column 'C':
import openpyxl
import pandas as pd
wb = openpyxl.load_workbook('data_sheets.xlsx')
for sheet in wb.worksheets:
sheet.merged_cells
for merge in list(sheet.merged_cells):
sheet.unmerge_cells(range_string=str(merge))
sheet.insert_cols(3, 1)
print(sheet)
wb.save('workbook_test.xlsx')
#concat once worksheets have been edited
df= pd.concat(pd.read_excel('workbook_test.xlsx, sheet_name= None), ignore_index= True)
Before stacking the data however, I would like to make the following additonal (sequential) changes to every sheet:
Extract from row 1 the right 8 characters (in excel the equivalent of this would be =RIGHT(A1, 8) - this is to pull the unique code off of each sheet, which will look like '(000000)'.
Populate column C from rows 6-282 with the unique code.
Delete rows 1-5
The end result would make each sheet within the workbook look like this:
Is this possible to do with openpyxl, and if so, how? Any direction or assistance with this would be much appreciated - thank you!
Here is a 100% openpyxl approach to achieve what you're looking for :
from openpyxl import load_workbook
wb = load_workbook("workbook_test.xlsx")
for ws in wb:
ws.unmerge_cells("A1:O1") #unmerge first row till O
ws_uid = ws.cell(row=1, column=1).value[-8:] #get the sheet's UID
for num_row in range(6, 282):
ws.cell(row=num_row, column=3).value = '="{}"'.format(ws_uid) #write UID in Column C
ws.delete_rows(1, 5) #delete first 5 rows
wb.save("workbook_test.xlsx")
NB : This assume there is already an empty column (C).

Formatting of Excel sheets in Python

In my project I am opening an Excel file with multiple sheets. I want to manipulate "sheet2" in Python (which works fine) and after that overwrite the old "sheet2" with the new one but KEEP the formatting.. so something like this:
import pandas as pd
update_sheet2 = pd.read_excel(newest_isaac_file, sheet_name='sheet2')
#do stuff with the sheet
with pd.ExcelWriter(filepath, engine='openpyxl', if_sheet_exists='replace', mode='a',
KEEP_FORMATTING = True) as writer:
df.to_excel(writer, sheet_name=sheetname, index=index)
In other words: Is there a way to get the formatting from an existing Excel sheet?
I could not find anything about that. I know I can manually set the formatting in Python but the formatting of the existing sheet is really complicated and has to stay the same.
thanks for your help!
As per your comment, try this code. It will open a file (Sample.xlsx), go to a sheet (Sheet1), insert new row at 15, copy the text and formatting from row 22 and paste it in the empty row (row 15). Code and final screen shot attached.
import openpyxl
from copy import copy
wb=openpyxl.load_workbook('Sample.xlsx') #Load workbook
ws=wb['Sheet1'] #Open sheet
ws.insert_rows(15, 1) #Insert one row at 15 and move everything one row downwards
for row in ws.iter_rows(min_row=22, max_row=22, min_col=1, max_col=ws.max_column): # Read values from row 22
for cell in row:
ws.cell(row=15, column=cell.column).value = cell.value #Update value to row 22 to new row 15
ws.cell(row=15, column=cell.column)._style = copy(cell._style) #Copy formatting
wb.save('Sample.xlsx')
How excel looks after running the code

Add Data Frame to Excel in Specific Cell

I am using openpyxl to write a pandas data frame to an excel spreadsheet. I am reading the excel workbook in from a template as the client wants a specific header, format, etc.
If I want to add the table starting at row 17, how do I modify the code below to achieve this? There are formatted, merged cells in rows 1 through 16 so when I try to save in current form it gives me AttributeError: 'MergedCell' object attribute 'value' is read-only. I saw a similar stack question but respondents advised just adding a bunch of blank cells before adding the data frame--a solution that won't work for me.
import pandas as pd
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
wb = load_workbook('proto_temp.xlsx')
ws_rq = wb["Results Query"]
rows = dataframe_to_rows(df)
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws_rq.cell(row=r_idx, column=c_idx, value=value)
wb.save("df_to_xl.xlsx")
Question:
Is there data after row 17?
If Answer == NO, then see below
for r in dataframe_to_rows(df, index=True, header=True):
ws.append(r)
If Answer == YES, then the answer is slightly more complicated.
Options
Structure your code so you run the code above, before other cells are written below.
Read XLSX into DF, make the changes and then write it back.
Lastly, you could loop through a range and write cell-by-cell (feels too computationally expensive - but I figured I'd suggest it anyways).
sheet.cell(row=2, column=2).value = 2
Hope this provides value :)

Can I modify specific sheet from Excel file and write back to the same without modifying other sheets using Pandas | openpyxl

I'll try to explain my problem with an example:
Let's say I have an Excel file test.xlsx which has five tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4 and sheet5. I am interested to read and modify data in sheet2.
My sheet2 has some columns whose cells are dropdowns and those dropdown values are defined in sheet4 and sheet5. I don't want to touch sheet4 and sheet5. (I mean sheet4 & sheet5 have some references to cells on Sheet2).
I know that I can read all the sheets in excel file using pd.read_excel('test.xlsx', sheetnames=None) which basically gives all sheets as a dictionary(OrderedDict) of DataFrames.
Now I want to modify my sheet2 and save it without disturbing others. So is it posibble to do this using Python Pandas library.
[UPDATE - 4/1/2019]
I am using Pandas read_excel to read whatever sheet I need from my excel file, validating the data with the data in database and updating the status column in the excelfile.
So for writing back the status column in excel I am using openpyxl as shown in the below pseudo code.
import pandas as pd
import openpyxl
df = pd.read_excel(input_file, sheetname=my_sheet_name)
df = df.where((pd.notnull(df)), None)
write_data = {}
# Doing some validations with the data and building my write_data with key
# as (row_number, column_number) and value as actual value to put in that
# cell.
at the end my write_data looks something like this:
{(2,1): 'Hi', (2,2): 'Hello'}
Now I have defined a seperate class named WriteData for writing data using openpyxl
# WriteData(input_file, sheet_name, write_data)
book = openpyxl.load_workbook(input_file, data_only=True, keep_vba=True)
sheet = book.get_sheet_by_name(sheet_name)
for k, v in write_data.items():
row_num, col_num = k
sheet.cell(row=row_num, column=col_num).value = v
book.save(input_file)
Now when I am doing this operation it is removing all the formulas and diagrams. I am using openpyxl 2.6.2
Please correct me if I am doing anything wrong! Is there any better way to do?
Any help on this will be greatly appreciated :)
To modify a single sheet at a time, you can use pandas excel writer:
sheet2 = pd.read_excel("test.xlsx", sheet = "sheet2")
##modify sheet2 as needed.. then to save it back:
with pd.ExcelWriter("test.xlsx") as writer:
sheet2.to_excel(writer, sheet_name="sheet2")

Read column / row merged in excel using python

I'm having trouble reading the information from a spreadsheet that has merged rows and columns. I've tried using merged_cell to get the value but I don't understand how it works. Can anyone help me?
Using the script below, I can get the values​​, except for the merged cells. How can I get all of the cells, including the merged ones?
from xlrd import open_workbook,cellname
workbook = open_workbook('file.xls')
sheet = workbook.sheet_by_index(0)
for row_index in range(sheet.nrows):
for col_index in range(sheet.ncols):
print cellname(row_index,col_index),'-',
print sheet.cell_value(row_index,col_index)

Categories