Exporting to Excel Using Openpyxl - specific rows and columns - python

I have tried succesfuly to populate the data in QTableWidget using Pandas.
Now i want to export to specific rows and columns to an existing excel so i will not lose stylesheet and other data from this excel. Please , help me out finding the solution to run it properly.
Goal is to export to excel to specific rows and columns, rows in range from 7 to 30 and columns from 1 to 13 using OpenPyxl to just modify values of an existing excel. I know "appends" means to add whole data table on the bottom of the excel and i don't know what function to use instead.
def kalkuacje_exportuj(self):
columnHeaders = []
# create column header list
for j in range(self.ui.tableWidget.model().columnCount()):
columnHeaders.append(self.ui.tableWidget.horizontalHeaderItem(j).text())
df = pd.DataFrame(columns=columnHeaders)
# create dataframe object recordset
for row in range(self.ui.tableWidget.rowCount()):
for col in range(self.ui.tableWidget.columnCount()):
df.at[row, columnHeaders[col]] = self.ui.tableWidget.item(row, col).text()
from openpyxl import Workbook
wb = Workbook()
wb = load_workbook ('OFERTA_SZABLON.xlsx')
# ws1 = wb.sheetnames()
ws1 = wb["DETALE wyceniane osobno"]
# for row in ws1.iter_rows(min_row=7,
# max_row=30,
# min_col=1,
# max_col=13):
for row in range(7, 30):
for col in range(1, 13):
for r in dataframe_to_rows(df, index=False, header=False):
ws1.append(r)
# for cell in row:
# print(cell)
wb.save('OFERTA_SZABLON.xlsx')

I solved the problem like this:
from openpyxl import Workbook
wb = Workbook()
wb = load_workbook ('OFERTA_SZABLON.xlsx')
# ws1 = wb.sheetnames()
ws1 = wb["DETALE wyceniane osobno"]
# for r in dataframe_to_rows(df, index=False, header=False):
# ws1.append(r)
offset_row = 5
offset_col = 0
row = 1
for row_data in dataframe_to_rows(df, index=False, header=False):
col = 1
for cell_data in row_data:
ws1.cell(row + offset_row, col + offset_col, cell_data)
col += 1
row += 1
wb.save('OFERTA_SZABLON.xlsx')

I cannot figure this out for the life of me.
the guy above me has an error with >>> load_workbook ('OFERTA_SZABLON.xlsx')
it makes no sense and Workbook.load_workbook('') isn't a thing anyways
dataframe_to_rows doesn't seem to exist either

Related

Reading data from excel and rewriting it with a new column PYTHON

I recently managed to create a program the reads data from excel, edit it and rewrite it along with new columns and it works good, but the issue is the performance if the excel file contains 1000 rows it finishes in less than 2 mins but if it contains 10-15k rows, it can take 3-4 hours and the more I have rows the more it becomes exponentially slower which doesnt make sense for me.
My code:
Reading from xls excel:
def xls_to_dict(workbook_url):
workbook_dict = {}
book = xlrd.open_workbook(workbook_url)
sheets = book.sheets()
for sheet in sheets:
workbook_dict[sheet.name] = {}
columns = sheet.row_values(0)
rows = []
for row_index in range(1, sheet.nrows):
row = sheet.row_values(row_index)
rows.append(row)
return rows
return workbook_dict
data = xls_to_dict(filename)
Writing in the excel:
rb = open_workbook(filename, formatting_info=True)
r_sheet = rb.sheet_by_index(0)
wb = copy(rb)
w_sheet = wb.get_sheet(0)
I read and found a package called Pandas that reads xlsx and tried working on it, but failed to access the data from the DataFrame to be a dictionary. So couldn't edit it and rewrite it to compare the performance.
My code:
fee = pd.read_excel(filename)
My input row data file is:
ID. NAME. FAMILY. DOB Country Description
My output file is:
ID. NAME. FAMILY. DOB Country ModifiedDescription NATIONALITY
Any advice will be appreciated.
You can remove iterating over rows by converting sheet data to a dataframe and get values as list.
from openpyxl import load_workbook
from datetime import datetime,timedelta
from dateutil.relativedelta import relativedelta
def xls_to_dict(workbook_url):
xl = pd.ExcelFile(workbook_url)
workbook_dict = {}
for sheet in xl.sheet_names:
df = pd.read_excel(xl, sheet)
columns = df.columns
rows = df.values.tolist()
workbook_dict[sheet] = rows
return workbook_dict,columns
data,columns = xls_to_dict(filename)
for saving also you can remove for loop by using a dataframe
xl = pd.ExcelFile(filename)
sheet_name = xl.sheet_names[0] #sheet by index
df = pd.read_excel(xl, sheet_name)
df["DOB"] = pd.to_datetime(df["DOB"])
df["age"] = df["DOB"].apply(lambda x: abs(relativedelta(datetime.today(),x).years))
df["nationality"] = #logic to calculate nationality
book = load_workbook(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name)
writer.save()

Python-Excel: How to write lines of a row to an existing excel file?

I have searched the site but I could not find anything related to the following question.
I have an existing spreadsheet that I am going to pull data from on a daily basis, the information in the spreadsheet will change everyday.
What I want to do is create a file that tracks certain information from this cell, I want it to pull the data from the spreadsheet and write it to another spreadsheet. The adding of the data to a new spreadsheet should not overwrite the existing data.I would really appreciate the help on this. See code below:
import os
import openpyxl
import xlrd
wb=openpyxl.load_workbook('Test_shorts.xlsx','r')
sheet = wb.active
rows = sheet.max_row
col = sheet.max_column
rows = rows+1
print rows
new =[]
for x in range (2, 3):
for y in range(1,10):
z= sheet.cell(row=x,column=y).value
new.append(z)
print(new)
If you want to copy the whole worksheet, you can use copy_worksheet() function directly. It will create a copy of your active worksheet.
I don't know your data, but I am sure you can finish it by yourself. Hope this may help
from openpyxl import load_workbook
file_name = "Test_shorts.xlsx"
wb = load_workbook(file_name)
sheet = wb.active
target = wb.copy_worksheet(sheet)
# you can code to append new data here
new = wb.get_sheet_by_name(target.title) # to get copied sheet
for x in range (2, 3):
for y in range(1,10):
print(x,y)
z= sheet.cell(row=x,column=y).value
new.append(z)
wb.save(file_name)
as commented, a loop of cells are required so I altered your code a little.
from openpyxl import load_workbook
file_name = "Test_shorts.xlsx"
wb = load_workbook(file_name)
current_sheet = wb.active
new_sheet = wb.create_sheet("New", 1)
for row in current_sheet.rows:
col = 0 # set the column to 0 when 1 row ends
for cell in row:
col += 1 # cell.column will return 'ABC's so I defined col for the column
new_sheet.cell(cell.row, col, cell.value)
wb.save(file_name)

Read specific columns from excel for python

import xlrd
workbook = xlrd.open_workbook(filename)
sheet = workbook.sheet_by_index(0)
array = []
for i in range(2, 9):
array.append([sheet.cell(i, j).value for j in range(2, 5)])
Excel Image
I have this code and it works fine, but it's not doing what I want it to do. It is pulling the data from all the three columns of that excel file (see excel image). I only want it to pull data from column C and column E, and store that as a pair in the array. How to do that? I know there is something like skip columns and skip rows in python, but not sure how to embed that in the code I have.
Using openpyxl :-
def iter_rows(ws):
result=[]
for row in ws.iter_rows():
rowlist = []
for cell in row:
rowlist.append(cell.value)
result.append(rowlist)
return result
wb = load_workbook(filename = '/home/piyush/testtest.xlsx')
ws = wb.active
first_sheet = wb.get_sheet_names()[0]
print first_sheet
worksheet = wb.get_sheet_by_name(first_sheet)
fileList = (list(iter_rows(worksheet)))
col1 = []
col2 = []
for col in fileList:
col1.append(col[1])#1 is column index
col2.append(col[2])#2 is column index
for a in zip(col1,col2):
print a
#append as pair in another array
using pandas:-
xl = pd.ExcelFile("/home/piyush/testtest.xlsx")
df = xl.parse("Sheet1")
df.iloc[:,[col1Index,col1Index]]

Is there any method to get the number of rows and columns present in .xlsx sheet using openpyxl?

Is there any method to get the number of rows and columns present in .xlsx sheet using openpyxl ?
In xlrd,
sheet.ncols
sheet.nrows
would give the column and row count.
Is there any such method in openpyxl ?
Given a variable sheet, determining the number of rows and columns can be done in one of the following ways:
Version ~= 3.0.5 Syntax
rows = sheet.max_rows
columns = sheet.max_column
Version 1.x.x Syntax
rows = sheet.nrows
columns = sheet.ncols
Version 0.x.x Syntax
rows = sheet.max_row
columns = sheet.max_column
Worksheet has these methods: 'dim_colmax', 'dim_colmin', 'dim_rowmax', 'dim_rowmin'
Below is a small example:
import pandas as pd
writer = pd.ExcelWriter("some_excel.xlsx", engine='xlsxwriter')
workbook = writer.book
worksheet = writer.sheets[RESULTS_SHEET_NAME]
last_row = worksheet.dim_rowmax
this is the logic
number_of_rows = sheet_obj.max_row
last_row_index_with_data = 0
while True:
if sheet_obj.cell(number_of_rows, 3).value != None:
last_row_index_with_data = number_of_rows
break
else:
number_of_rows -= 1
Building upon Dani's solution and not having enough reputation to comment in there. I edited the code by adding a manual piece of control to reduce the time consumed on searching
## iteration to find the last row with values in it
nrows = ws.max_row
if nrows > 1000:
nrows = 1000
lastrow = 0
while True:
if ws.cell(nrows, 3).value != None:
lastrow = nrows
break
else:
nrows -= 1
A solution using Pandas to get all sheets row and column counts. It uses df.shape to get the counts.
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
sheetnames = xl.sheet_names # get sheetnames
for sheet in sheetnames:
df = xl.parse(sheet)
dimensions = df.shape
print('sheetname', ' --> ', sheet)
print(f'row count on "{sheet}" is {dimensions[0]}')
print(f'column count on "{sheet}" is {dimensions[1]}')
print('-----------------------------')
Try
import xlrd
location = ("Filelocation\filename.xlsx")
wb = xlrd.open_workbook(location)
s1 = wb.sheet_by_index(0)
s1.cell_value(0,0) #initializing cell from the cell position
print(" No. of rows: ", s1.nrows)
print(" No. of columns: ", s1.ncols)

python program with output in an excel spreadsheet

I am trying to write the output retrieved from a large Excel workbook in another spreadsheet using Python. However, I am not able to, it's giving me errors such as raise ValueError("column index (%r) not an int in range(256)" % arg)ValueError: column index (256) not an int in range(256), Exception: Unexpected data type .
I can understand these errors to some extent but not able to rectify my code. I have written a small script here. It will be great if some one can tell me and correct me where I am going wrong.
import xlrd
import xlwt
wb = xlwt.Workbook()
ws = wb.add_sheet('A Test Sheet')
file_location = "path/lookup_V1.xlsx"
workbook=xlrd.open_workbook(file_location)
sheet1 = workbook.sheet_by_index(0)
print sheet1.name
sheet2 = workbook.sheet_by_index(1)
print sheet2.name
print workbook.nsheets
st1c=sheet1.ncols
st2r=sheet2.nrows
st1=st1c+1
st2=st2r+1
print("fill..")
for i in xrange(0,st1c):
s1=sheet1.col_values(i)
i+1
s1.sort()
print s1
for col in xrange(st1c):
for row in xrange(st2r):
print("filling sheet...")
col=col+1
row=row+1
ws.write(row,col)
print("here")
wb.save('testfile.xls')
Try this :
import xlrd
import xlwt
wb = xlwt.Workbook()
ws = wb.add_sheet('A Test Sheet')
file_location = "test.xls"
workbook=xlrd.open_workbook(file_location)
sheet1 = workbook.sheet_by_index(0)
st1c=sheet1.ncols
st1r=sheet1.nrows
for col in xrange(st1c):
for row in xrange(st1r):
value = sheet1.col_values(col, row, row + 1)
# According to documentation : http://www.lexicon.net/sjmachin/xlrd.html#xlrd.Sheet.col_values-method
# col_values returns a slice of the values of the cells in the given column.
# That why we have to specify the index below, it is no elegant but it works
ws.write(row, col, value[0])
wb.save('testfile.xls')
This solution work for a single sheet... You may then convert it into a function and iterate over different sheets and workbook...
More elegant solution :
import xlrd
import xlwt
wb = xlwt.Workbook()
ws = wb.add_sheet('A Test Sheet')
file_location = "test.xls"
workbook=xlrd.open_workbook(file_location)
sheet1 = workbook.sheet_by_index(0)
st1c=sheet1.ncols
st1r=sheet1.nrows
for col in xrange(st1c):
for row in xrange(st1r):
# More elegant
value = sheet1.cell_value(row, col)
ws.write(row, col, value)
wb.save('testfile.xls')

Categories