Openpyxl copy and paste as values in new workbook - python

I am trying to copy the first 100 rows in a source file to a new destination file with openpyxl. My source file has formulas, but I want to copy and paste as values in the new workbook. When I add data_only=True, (see code below), it copies only the values of my source sheet and therefore not the data in the formula cells - these are just empty in the destination file. How do I copy everything and paste as values in the destination sheet?
WB1 = load_workbook("sample_book.xlsx")
WB1_WS1 = WB1["Ark2"]
WB2 = Workbook()
#Create new worksheet in new workbook
for i in range(1,2):
WB2.create_sheet(f"WS{i}")
#Delete first sheet
WB2.remove(WB2.worksheets[0])
#Define the ranges and sheets
copy_ranges = [100]
copy_to_sheets = ["WS1"]
# Copy the values from the rows in WB1 to WB2
for i in range (len(copy_ranges, data_only=True)):
#Set the sheet to compy to
ws = WB2[copy_to_sheets[i]]
#initialize row offset
offset = 1
for s in range (i):
offset+=copy_ranges[s]
#copy the row and append
for j in range(offset, offset + copy_ranges[i]):
#if j==0:
# continue
for row in WB1_WS1.iter_rows(min_row=j,max_row=j,min_col=1,max_col=WB1_WS1.max_column):
values_row = [cell.value for cell in row]
ws.append(values_row)
#save
WB2.save("WB2.xlsx")

You are using Len() incorrectly. Len() returns the length of a list. copy_ranges is a 1-item list, so Len(copy_ranges) = 1. If you want to access the first item in the list, you need to use an index: Len(copy_ranges[0]) = 100
I don't quite follow the 'offset' code part, and there is an issue with
offset = 1
for s in range (i):
offset+=copy_ranges[s]
On any iteration where i > 1, s will be > 1, which means offset+=copy_ranges[s] will throw an IndexError because copy_ranges is a 1-item list and you are trying to access a non-existing element.
Here are two ways to copy the first 100 rows:
if you want the formula in WB2, don't pass in the data_only parameter.
## VERSION 1: Output will have formulas from WB1
WB1 = load_workbook('int_column.xlsx')
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')
if you set data_only=True, the displayed value of the cell in WB1 will be copied to WB2.
## VERSION 2: Output will have value displayed in cells in WB1
WB1 = load_workbook('int_column.xlsx', data_only=True)
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')

Related

Data only not working with openpyxl in python

I'm a beginner in Python and I'm developing a program that take some data of a .xlsx and put it into an other .xlsx.
To do so decided to use openpyxl. Here is the beginning of my code :
path1 = "sourceFile.xlsx"
path2 = "targetFile.xlsx"
sheet1 = openpyxl.load_workbook(path1, data_only=True)
sheet2 = openpyxl.load_workbook(path2)
As you can see I use the "data_only=True" to only take the data of my source file. My problem is that with this solution, "None" is returned for few cells of the source file. When I delete the "data_only=True" parameter, the formula is returned, "=B28" in these case. It's not what I want by the way that B28 cell of the target file has not the same value as B28 cell of source file.
I already search for solutions but surprisedly found nothing. If you have any idea you're welcomed !
If B28's value in the original file is different than the output file, then the issue is likely with the code you're using to copy the cells. When asked how you're extracting the cells, you gave code for extracting the value of a single cell. How are you extracting ALL the cells? For-loop? If you shared that code, we can further analyze this problem.
I'm including code which copies values from one file to another, you should be able to tweak this to your needs.
from openpyxl import load_workbook, Workbook
## VERSION 1: Output will have formulas from WB1
WB1 = load_workbook('int_column.xlsx')
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')
## VERSION 2: Output will have value displayed in cells in WB1
WB1 = load_workbook('int_column.xlsx', data_only=True)
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')
Please post more code if you need further assistance.

How to move a copied worksheet to the first position?

I would like to copy an Excel worksheet in Python using openpyxl. However, it defaults to placing the copied worksheet at the end. I want it at the front. The copy_worksheet doesn't allow specifying the position, unlike create_sheet. I'd rather not have to recreate the template.
I've considered sorting the sheets, but I'm not sure how to implement that.
Suppose I have a file called number.xlsx with an existing worksheet titled "blank" that I want to copy.
from openpyxl import load_workbook
from datetime import datetime
n = float(input("Number: "))
today = datetime.now()
m = today.month
d = today.day
y = str(today.year)
wb = load_workbook('number.xlsx')
if y in wb.sheetnames:
ws = wb[y]
ws.cell(row = 2 + d, column = 1 + m).value = n
wb.save('number.xlsx')
else:
ws = wb.copy_worksheet(wb["blank"]) #I want the copied sheet at the front, not the back
ws.title = y
ws.cell(row = 2 + d, column = 1 + m).value = n
wb.save('number.xlsx')
You can use move_sheet(sheet, offset=0) method for this. Here offset calculated as "current sheet index" + offset. Copy worksheet will add the sheet to the last of the workbook. So you need to give negative value to move sheet to index 0.
from openpyxl import load_workbook
wb = load_workbook("text.xlsx")
ws = wb.copy_worksheet(wb["sample"])
ws.title = "NewNameForCopiedSheet"
wb.move_sheet("NewNameForCopiedSheet", -(len(wb.sheetnames)-1))
I am posting an example
wb._sheets is what you use to control the order of tabs/sheets.
Get the position of sheet you want to rearrange and modify the list of sheets with new positions.
from openpyxl import Workbook
wb=Workbook()
# wb.create_sheet("Sheet")
wb.create_sheet("Sheet2")
wb.create_sheet("Sheet3")
wb.create_sheet("SheetA")
wb.create_sheet("ASheet")
wb.create_sheet("blank")
wb.save('book_original.xlsx')
blank_sheet_position = wb.worksheets.index(wb['blank'']) #get position of new sheet
blank_sheet_new_position = 0 #position where you want to move
sheets = wb._sheets.copy()
sheets.insert(blan_sheet_new_position, sheets.pop(blank_sheet_position))) #modifying the sheets list
wb._sheets = sheets
wb.save('book_myorder.xlsx')

Iterate through worksheets adding data to each iteration

I have an excel file in which all data is listed in rows(first Image), I need to take this data and list it in column A of individual worksheets in a newly created workbook(Needs to look like the 2nd image). I am having issues getting the proper 'for' loop, so the data is written each separate worksheet. My code now writes that data all on the same worksheet.
import openpyxl
import os
import time
wb = openpyxl.load_workbook('IP-Results.xlsx') #load input file
sheet = wb.get_sheet_by_name('IP-Results-32708') #get sheet from input file
wbOutput = openpyxl.Workbook() #open a new workbook
wbOutput.remove_sheet(wbOutput.get_sheet_by_name('Sheet')) #remove initial worksheet named 'sheet'
for cell in sheet['A']: #iterate through firewall names in column A and make those the title of the sheets in new workbook
value = cell.value
wbOutput.create_sheet(title=cell.value)
inputwb = wb
inputsheet = inputwb.active
outputwb = wbOutput
outputsheet = outputwb.active
maxRow = inputsheet.max_row
maxCol = inputsheet.max_column
for i in range(1, max(maxRow, maxCol) +1):
for j in range(1, min(maxRow, maxCol) + 1):
for sheet in outputwb.get_sheet_names():
outputsheet.cell(row=i, column=j).value = inputsheet.cell(row=j, column=i).value
outputsheet.cell(row=j, column=i).value = inputsheet.cell(row=i, column=j).value
wbOutput.save("Decom-" + time.strftime("%m-%d-%Y")+ ".xlsx")
'outputsheet' is assigned to refer to the first (the default) sheet in wbOutput:
outputwb = wbOutput
outputsheet = outputwb.active
Then the main loop writes to outputsheet which always refers to the same original worksheet, causing all your data to appear on the same sheet:
for i in range(1, max(maxRow, maxCol) +1):
for j in range(1, min(maxRow, maxCol) + 1):
for sheet in outputwb.get_sheet_names():
**outputsheet**.cell(row=i, column=j).value = inputsheet.cell(row=j, column=i).value
**outputsheet**.cell(row=j, column=i).value = inputsheet.cell(row=i, column=j).value
The easiest solution would be dropping the third inner loop and using get_sheet_by_name:
for i in range(1, max(maxRow, maxCol) +1):
sheet_name = inputsheet.cell(row=i, column=1).value
a_sheet = outputwb .get_sheet_by_name(sheet_name)
for j in range(1, min(maxRow, maxCol) + 1):
a_sheet.cell(row=i, column=1).value = inputsheet.cell(row=j, column=i).value
I can't test the code at the moment but the general idea should work.
edit
Although it might be worth redesigning to something like this pseudo code:
for each inputwb_row in inputworkbook:
new_sheet = create a new_sheet in outputworkbook
set new_sheet.title = inputworkbook.cell[row,1].value
for each column in inputwb_row:
new_sheet.cell[column, 1].value = inputworkbook.cell[inputwb_row ,column].value

Copy paste column range using OpenPyxl

Hi so I am trying to copy and paste W7:W46 column into another worksheet. The code I have so far,
col_j = New_Burden['W']
for idx, cell in enumerate(col_j,1):
ws1.cell(row = idx, column = 10).value = cell.value
is able to copy over the entire column, but unfortunately transfers the various headers as well. One solution I have tried is:
for row in New_Burden['W7:W46']:
for cell in row:
ws1.cell(row = 2, column = 10).value = cell.value
But that only copies the first value of W7
Copy a Range(['W7:W46']) from one Worksheet to another Worksheet:
If the Ranges are not overlapping, it's also possible in the same Worksheet.
from openpyxl import Workbook
# Create a new Workbook
wb = Workbook()
ws = wb.worksheets[0]
from openpyxl.utils import range_boundaries
# Define start Range(['J2']) in the new Worksheet
min_col, min_row, max_col, max_row = range_boundaries('J2')
# Iterate Range you want to copy
for row, row_cells in enumerate(New_Burden['W7:W46'], min_row):
for column, cell in enumerate(row_cells, min_col):
# Copy Value from Copy.Cell to given Worksheet.Cell
ws.cell(row=row, column=column).value = cell.value
If you want to do the above with multiple different Columns,
use the above in a function:
def copy_range(source_range, target_start):
# Define start Range(target_start) in the new Worksheet
min_col, min_row, max_col, max_row = range_boundaries(target_start)
# Iterate Range you want to copy
for row, row_cells in enumerate(New_Burden[source_range], min_row):
for column, cell in enumerate(row_cells, min_col):
# Copy Value from Copy.Cell to given Worksheet.Cell
ws.cell(row=row, column=column).value = cell.value
for source_range, target_start in [('W7:W46','J2'), ('Y7:Y46','A2')]:
copy_range(source_range, target_start)
Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2

Openpyxl: How to copy a row after checking if a cell contains specific value

I have a worksheet that is updated every week with thousands of rows and would need to transfer rows from this worksheet after filtering. I am using the current code to find the cells which has the value I need and then transfer the entire row to another sheet but after saving the file, I get the "IndexError: list index out of range" exception.
The code I use is as follows:
import openpyxl
wb1 = openpyxl.load_workbook('file1.xlsx')
wb2 = openpyxl.load_workbook('file2.xlsx')
ws1 = wb1.active
ws2 = wb2.active
for row in ws1.iter_rows():
for cell in row:
if cell.value == 'TrueValue':
n = 'A' + str(cell.row) + ':' + ('GH' + str(cell.row))
for row2 in ws1.iter_rows(n):
ws2.append(row2)
wb2.save("file2.xlsx")
The original code I used that used to work is below and has to be modified because of the large files which causes MS Excel not to open them (over 40mb).
n = 'A3' + ':' + ('GH'+ str(ws1.max_row))
for row in ws1.iter_rows(n):
ws2.append(row)
Thanks.
I'm not entirely sure what you're trying to do but I suspect the problem is that you have nested your copy loop.
Try the following:
row_nr = 1
for row in ws1:
for cell in row:
if cell.value == "TrueValue":
row_nr = cell.row
break
if row_nr > 1:
break
for row in ws1.iter_rows(min_row=row_nr, max_col=190):
ws2.append((cell.value for cell in row))
Question: I get the "IndexError: list index out of range" exception.
I get, from ws1.iter_rows(n)
UserWarning: Using a range string is deprecated. Use ws[range_string]
and from ws2.append(row2).
ValueError: Cells cannot be copied from other worksheets
The Reason are row2 does hold a list of Cell objects instead of a list of Values
Question: ... need to transfer rows from this worksheet after filtering
The following do what you want, for instance:
# If you want to Start at Row 2 to append Row Data
# Set Private self._current_row to 1
ws2.cell(row=1, column=1).value = ws2.cell(row=1, column=1).value
# Define min/max Column Range to copy
from openpyxl.utils import range_boundaries
min_col, min_row, max_col, max_row = range_boundaries('A:GH')
# Define Cell Index (0 Based) used to Check Value
check = 0 # == A
for row in ws1.iter_rows():
if row[check].value == 'TrueValue':
# Copy Row Values
# We deal with Tuple Index 0 Based, so min_col must have to be -1
ws2.append((cell.value for cell in row[min_col-1:max_col]))
Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2
Use a list to hold the items in each column for the particular row.
Then append the list to your ws2.
...
def iter_rows(ws,n): #produce the list of items in the particular row
for row in ws.iter_rows(n):
yield [cell.value for cell in row]
for row in ws1.iter_rows():
for cell in row:
if cell.value == 'TrueValue':
n = 'A' + str(cell.row) + ':' + ('GH' + str(cell.row))
list_to_append = list(iter_rows(ws1,n))
for items in list_to_append:
ws2.append(items)
I was able to solve this with lists for my project.
import openpyxl
#load data file
wb1 = openpyxl.load_workbook('original.xlsx')
sheet1 = wb1.active
print("loaded 1st file")
#new template file
wb2 = openpyxl.load_workbook('blank.xlsx')
sheet2 = wb2.active
print("loaded 2nd file")
header = sheet1[1:1] #grab header row
listH =[]
for h in header:
listH.append(h.value)
sheet2.append(listH)
colOfInterest= 11 # this is my col that contains the value I'm checking against
for rowNum in range(2, sheet1.max_row +1): #iterate over each row, starting with 2 to skipping header from original file
if sheet1.cell(row=rowNum, column=colOfInterest).value is not None: #interested in non blank values in column 11
listA = [] # list which will hold my data
row = sheet1[rowNum:rowNum] #creates a tuple of row's data
#print (str(rowNum)) # for debugging to show what rows are copied
for cell in row: # for each cell in the row
listA.append(cell.value) # add each cell's data as an element in the list
if listA[10] == 1: # condition1 I'm checking for by looking up the index in the list
sheet2.append(listA) # appending the sheet2's next available row
elif listA[10] > 1: # condition2 I'm checking for by looking up the index in the list
# do something else and store it in bar
sheet2.append(bar) # appending the sheet2's next available row
print("saving file...")
wb2.save('result.xlsx') # save file
print("Done!")
Tested with: Python 3.7 openpyxl 2.5.4

Categories