Data only not working with openpyxl in python - python

I'm a beginner in Python and I'm developing a program that take some data of a .xlsx and put it into an other .xlsx.
To do so decided to use openpyxl. Here is the beginning of my code :
path1 = "sourceFile.xlsx"
path2 = "targetFile.xlsx"
sheet1 = openpyxl.load_workbook(path1, data_only=True)
sheet2 = openpyxl.load_workbook(path2)
As you can see I use the "data_only=True" to only take the data of my source file. My problem is that with this solution, "None" is returned for few cells of the source file. When I delete the "data_only=True" parameter, the formula is returned, "=B28" in these case. It's not what I want by the way that B28 cell of the target file has not the same value as B28 cell of source file.
I already search for solutions but surprisedly found nothing. If you have any idea you're welcomed !

If B28's value in the original file is different than the output file, then the issue is likely with the code you're using to copy the cells. When asked how you're extracting the cells, you gave code for extracting the value of a single cell. How are you extracting ALL the cells? For-loop? If you shared that code, we can further analyze this problem.
I'm including code which copies values from one file to another, you should be able to tweak this to your needs.
from openpyxl import load_workbook, Workbook
## VERSION 1: Output will have formulas from WB1
WB1 = load_workbook('int_column.xlsx')
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')
## VERSION 2: Output will have value displayed in cells in WB1
WB1 = load_workbook('int_column.xlsx', data_only=True)
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')
Please post more code if you need further assistance.

Related

Openpyxl copy and paste as values in new workbook

I am trying to copy the first 100 rows in a source file to a new destination file with openpyxl. My source file has formulas, but I want to copy and paste as values in the new workbook. When I add data_only=True, (see code below), it copies only the values of my source sheet and therefore not the data in the formula cells - these are just empty in the destination file. How do I copy everything and paste as values in the destination sheet?
WB1 = load_workbook("sample_book.xlsx")
WB1_WS1 = WB1["Ark2"]
WB2 = Workbook()
#Create new worksheet in new workbook
for i in range(1,2):
WB2.create_sheet(f"WS{i}")
#Delete first sheet
WB2.remove(WB2.worksheets[0])
#Define the ranges and sheets
copy_ranges = [100]
copy_to_sheets = ["WS1"]
# Copy the values from the rows in WB1 to WB2
for i in range (len(copy_ranges, data_only=True)):
#Set the sheet to compy to
ws = WB2[copy_to_sheets[i]]
#initialize row offset
offset = 1
for s in range (i):
offset+=copy_ranges[s]
#copy the row and append
for j in range(offset, offset + copy_ranges[i]):
#if j==0:
# continue
for row in WB1_WS1.iter_rows(min_row=j,max_row=j,min_col=1,max_col=WB1_WS1.max_column):
values_row = [cell.value for cell in row]
ws.append(values_row)
#save
WB2.save("WB2.xlsx")
You are using Len() incorrectly. Len() returns the length of a list. copy_ranges is a 1-item list, so Len(copy_ranges) = 1. If you want to access the first item in the list, you need to use an index: Len(copy_ranges[0]) = 100
I don't quite follow the 'offset' code part, and there is an issue with
offset = 1
for s in range (i):
offset+=copy_ranges[s]
On any iteration where i > 1, s will be > 1, which means offset+=copy_ranges[s] will throw an IndexError because copy_ranges is a 1-item list and you are trying to access a non-existing element.
Here are two ways to copy the first 100 rows:
if you want the formula in WB2, don't pass in the data_only parameter.
## VERSION 1: Output will have formulas from WB1
WB1 = load_workbook('int_column.xlsx')
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')
if you set data_only=True, the displayed value of the cell in WB1 will be copied to WB2.
## VERSION 2: Output will have value displayed in cells in WB1
WB1 = load_workbook('int_column.xlsx', data_only=True)
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')

Copy column to another sheet in Python

I've been trying to copy a variable length column to another sheet through openpyxl. What I'm looking to do is copy, for example, column B from row 2 up to row = sheet.max_row and paste it into another sheet within the same workbook. Specifying the first cell in the sheet in which it will start pasting in would be nice too.
I've tried following this tutorial(copy and paste cell ranges into another workbook) to no avail.
So far I have my code set up like this:
import openpyxl
wb = openpyxl.load_workbook('workbook1.xlsx')
wb.create_sheet('sheet2') # this is where I want the cells to be pasted into
sheet = wb['sheet1'] # name of the sheet that is being analyzed
wb.save('workbook1.xlsx') #
Does anyone have any code that could help? If not, what resources are available to look at for information on how to solve this problem?
ws1 = wb.active # source
ws2 = wb['sheet2'] # destination
for cell in ws1['B:B']: #column B
print('Printing from ' + str(cell.column) + str(cell.row))
ws2.cell(row = cell.row, column = 1, value = cell.value)
wb.save('workbook1.xlsx')

Excel field to Excel field comparison using Python

I have a requirement where i need to compare excel to excel and create a third excel with True(where column value matches) and False(in case the match fails) using Python.
Can someone please assist with the piece of code with explanation.
Much appreciated, thanks in advance.
If you could please specify what tools you plan on using that would be great. We can accomplish the task in python using the openpyxl library.
Assuming that you are using python 3 with openpyxl, and your files are located in directory "C:\Users\Me\files" and are called "file1.xlsx" and "file2.xlsx":
import openpyxl
from openpyxl.utils import get_column_letter
path = 'C:\\Users\\Me\\files'
# open xcel sheets
wb1 = openpyxl.load_workbook(path + 'file1.xlsx')
ws1 = wb1.active
wb2 = openpyxl.load_workbook(path + 'file2.xlsx')
ws2 = wb2.active
# create new workbook
wb3 = openpyxl.Workbook()
ws3 = wb3.active
wb3.save(path + 'file3.xlsx')
# compare each element
for row in range(ws1.max_row):
for column in range(ws1.max_column):
column_letter = get_column_letter(column)
cell = column_letter + str(row)
if ws1[cell].value == ws2[cell].value:
ws3[cell].value = 'True'
else:
ws3[cell].value = 'False'
wb3.save(path + 'file3.xlsx')

How to filter column data using openpyxl

I am trying to apply a filter to an existing Excel file, and export it to another Excel file. I would like to extract rows that only contain the value 16, then export the table to another excel file (as shown in the picture below).
I have tried reading the openpyxl documentation multiple times and googling for solutions but I still can't make my code work. I have also attached the code and files below
import openpyxl
# Is use to create a reference of the Excel to wb
wb1 = openpyxl.load_workbook('test_data.xlsx')
wb2 = openpyxl.load_workbook('test_data_2.xlsx')
# Refrence the workbook to the worksheets
sh1 = wb1["data_set_1"]
sh2 = wb2["Sheet1"]
sh1.auto_filter.ref = "A:A"
sh1.auto_filter.add_filter_column(0, ["16"])
sh1.auto_filter.add_sort_condition("B2:D6")
sh1_row_number = sh1.max_row
sh1_col_number = sh1.max_column
rangeSelected = []
for i in range(1, sh1_row_number+1, 1):
rowSelected = []
for j in range(1, sh1_col_number+1, 1):
rowSelected.append(sh1.cell(row = i, column = j))
rangeSelected.append(rowSelected)
del rowSelected
for i in range(1, sh1_row_number+1, 1):
for j in range(1, sh1_col_number+1, 1):
sh2.cell(row = i, column = j).value = rangeSelected[i-1][j-1].value
wb1.save("test_data.xlsx")
wb2.save("test_data_2.xlsx")
The pictures shows what should be the desire result
The auto filter doesn't actually filter the data, it is just for visualization.
You probably want to filter while looping through the workbook. Please note with this code I assume you have the table headers already in the second workbook. It does not overwrite the data, it appends to the table.
import openpyxl
# Is use to create a reference of the Excel to wb
wb1 = openpyxl.load_workbook('test_data.xlsx')
wb2 = openpyxl.load_workbook('test_data_2.xlsx')
# Refrence the workbook to the worksheets
sh1 = wb1["data_set_1"]
sh2 = wb2["data_set_1"] # use same sheet name, different workbook
for row in sh1.iter_rows():
if row[0].value == 16: # filter on first column with value 16
sh2.append((cell.value for cell in row))
wb1.save("test_data.xlsx")
wb2.save("test_data_2.xlsx")

Python-Excel: How to write lines of a row to an existing excel file?

I have searched the site but I could not find anything related to the following question.
I have an existing spreadsheet that I am going to pull data from on a daily basis, the information in the spreadsheet will change everyday.
What I want to do is create a file that tracks certain information from this cell, I want it to pull the data from the spreadsheet and write it to another spreadsheet. The adding of the data to a new spreadsheet should not overwrite the existing data.I would really appreciate the help on this. See code below:
import os
import openpyxl
import xlrd
wb=openpyxl.load_workbook('Test_shorts.xlsx','r')
sheet = wb.active
rows = sheet.max_row
col = sheet.max_column
rows = rows+1
print rows
new =[]
for x in range (2, 3):
for y in range(1,10):
z= sheet.cell(row=x,column=y).value
new.append(z)
print(new)
If you want to copy the whole worksheet, you can use copy_worksheet() function directly. It will create a copy of your active worksheet.
I don't know your data, but I am sure you can finish it by yourself. Hope this may help
from openpyxl import load_workbook
file_name = "Test_shorts.xlsx"
wb = load_workbook(file_name)
sheet = wb.active
target = wb.copy_worksheet(sheet)
# you can code to append new data here
new = wb.get_sheet_by_name(target.title) # to get copied sheet
for x in range (2, 3):
for y in range(1,10):
print(x,y)
z= sheet.cell(row=x,column=y).value
new.append(z)
wb.save(file_name)
as commented, a loop of cells are required so I altered your code a little.
from openpyxl import load_workbook
file_name = "Test_shorts.xlsx"
wb = load_workbook(file_name)
current_sheet = wb.active
new_sheet = wb.create_sheet("New", 1)
for row in current_sheet.rows:
col = 0 # set the column to 0 when 1 row ends
for cell in row:
col += 1 # cell.column will return 'ABC's so I defined col for the column
new_sheet.cell(cell.row, col, cell.value)
wb.save(file_name)

Categories