I'm doing some testing using python-excel modules. I can't seem to find a way to delete a row in an excel sheet using these modules and the internet hasn't offered up a solution. Is there a way to delete a row using one of the python-excel modules?
In my case, I want to open an excel sheet, read the first row, determine if it contains some valid data, if not, then delete it.
Any suggestions are welcome.
xlwt provides as the module name suggests Excel writer (creation rather than modification) funcionality.
xlrd on the other hand provides Excel reader funcionality.
If your source excel file is rather simple (no fancy graphs, pivot tables, etc.), you should proceed this way:
with xlrd module read the contents of the targeted excel file, and then with xlwt module create new excel file which contains the necessary rows.
If you, however are running this on windows platform , you might be able to manipulate Excel directly through Microsoft COM objects, see old book reference.
I was having the same issue but found a walk around:
Use a custom filter process (Reader>Filter1>Filter2>...>Writer) to generate a copy of the source excel file but with a blank column inserted at the front. Let's call this file augmented.xls.
Then, read augmented.xls into a xlrd.Workbook object, rb, using xlrd.open_workbook().
Use xlutils.copy.copy() to convert rb into a xlwt.Workbook object, wb.
Set the value of the first column of each of the to-be-deleted rows as "x" (or other values as a marker) in wb.
Save wb back to augmented.xls.
Use another custom filter process to generate a resulting excel file from augmented.xls by omitting those rows with "x" in the first column and shifting all columns one column left (equivalent to deleting the first column of markers).
Information and examples of defining a filter process can be found in http://www.simplistix.co.uk/presentations/python-excel.pdf
Hope this help in some way.
You can use the library openpyxl. When opening a file it is both for reading and for writing. Then, with a simple function you can achieve that:
from openpyxl import load_workbook
wb = load_workbook(filename)
ws = wb.active()
first_row = ws[1]
# Your code here using first_row
if first_row not valid:
ws.delete_rows(1, amount=1)
Related
I am doing some excel reports for work and am given a book exported from SSRS daily. The book is nicely set up, with groupings applied to every sheet for an effect similar to pivot tables.
However the book comes with 32 sheets, and I eventually need to send out each sheet individually as a distinct report. Right now I am splitting them up manually, but I am wondering if there is a way to automate this while preserving the grouping.
I previously tried something like:
import xlrd
import pandas as pd
targetWorkbook = xlrd.open_workbook(r'report.xlsx', on_demand=True)
xlsxDoc = pd.ExcelFile('report.xlsx')
for sheet in targetWorkbook.sheet_names():
reportDF = pd.read_excel(xlsxDoc, sheet)
reportDF.to_excel("report - {}.xlsx".format(sheet))
However since I'm converting each sheet to a pandas datagrams, the grouping is lost.
There are multiple ways to read/interact with excel docs in python, but I can't find a clear way to pick out a sheet and save it as its own document without losing the grouping.
This is my full answer. I have used the Worksheets().Move() method. The main idea is to use win32com.client library.
This was tested and works on my Windows 10 system with Excel 2013 installed, and Python 3.7. The grouping format was moved intact with the worksheets. I am still working on getting the looping to work. I will revise my answer again when I get the looping to work.
My example has 3 worksheets, each with different grouping (subtotal) formats.
#
# Refined .Move() method, save new file using Active Worksheet property.
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb0 = excel.Workbooks.Open(r'C:\python\so\original.xlsx')
excel.Visible = True
# Move sheet1.
wb0.Worksheets(1).Move()
excel.Application.ActiveWorkbook.SaveAs(r'C:\python\so\sheet1.xlsx')
# Move sheet2, which is now the front sheet.
wb0.Worksheets(1).Move()
excel.Application.ActiveWorkbook.SaveAs(r'C:\python\so\sheet2.xlsx')
# Save single remaining sheet as sheet3.
wb0.SaveAs(r'C:\python\so\sheet3.xlsx')
wb0.Close()
excel.Application.Quit()
You would also need to install pywin32, which is not a standard library item.
https://github.com/mhammond/pywin32
pip install pywin32
I need to read this .xlsm database and some of the cells values I need are derived from Excel functions. To accomplish this I used:
from openpyxl import load_workbook
wb = load_workbook('file.xlsm', data_only=True, keep_vba=True)
ws = wb['Plan1']
And then, for every cell I wanted to read:
ws.cell(row=row, column=column).value
This works fine for getting the data out. But the problem comes with saving. When I do:
wb.save('file.xlsm')
It saves the file, but all the formulas inside the sheets are lost
My dilemma is reading the cell's displayed values on one of the database's sheet without modifying them, writing the code's output in a new sheet and saving it.
Read the file once in read-only and data-only mode to look at the values and another time keeping the VBA around. And save under a different name.
I'm having troubles writing something that I believe should be relatively easy.
I have a template excel file, that has some visualizations on it with a few spreadsheets. I want to write a scripts that loads the template, inserts an existing dataframe rows to specific cells on each sheet, and saves the new excel file as a new file.
The template already have all the cells designed and the visualization, so i will want to insert this data only without changing the design.
I tried several packages and none of them seemed to work for me.
Thanks for your help! :-)
I have written a package for inserting Pandas DataFrames to Excel sheets (specific rows/cells/columns), it's called pyxcelframe:
https://pypi.org/project/pyxcelframe/
It has very simple and short documentation, and the method you need is insert_frame
So, let's say we have a Pandas DataFrame called df which we have to insert in the Excel file ("MyWorkbook") sheet named "MySheet" from the cell B5, we can just use insert_frame function as follows:
from pyxcelframe import insert_frame
from openpyxl import load_workbook
workbook = load_workbook("MyWorkbook.xlsx")
worksheet = workbook["MySheet"]
insert_frame(worksheet=worksheet,
dataframe=df,
row_range=(5, 0),
col_range=(2, 0))
0 as the value of the second element of row_range or col_range means that there is no ending row or column specified, if you need specific ending row/column you can replace 0 with it.
Sounds like a job for xlwings. You didn't post any test data, but modyfing below to suit your needs should be quite straight-forward.
import xlwings as xw
wb = xw.Book('your_excel_template.xlsx')
wb.sheets['Sheet1'].range('A1').value = df[your_selected_rows]
wb.save('new_file.xlsx')
wb.close()
I have an Excel file and I am using Python to fill its rows and columns.
I want to use the following function to find the first empty row in the table and fill it:
from xlwt import Workbook, easyxf
def next_available_row(sheet):
str_list = filter(None, sheet.col_values(1)) # error
return str(len(str_list)+1)
wb=Workbook()
sheet=wb.add_sheet('sheet1')
sheet.write(0,0,'item')
sheet.write(0,1,'cost')
sheet.write(next_available_row(sheet),0,'potato')
sheet.write(next_available_row(sheet),1,4)
but I get the following error:
AttributeError: 'sheet' object has no attribute 'col_values'
What should I do?
The library you are using, xlwt, is for writing .xls spreadsheets only, and does not have the method col_values (to read its contents), as the error message already states (correctly).
The function next_available_row() (from How to find the first empty row of a google spread sheet using python GSPREAD?) that you want to use to search for an empty cell is based on a different library, gspread, and that is apparently not for Excel files (e.g. .xls, note there are several versions of this file type).
So you probably are looking for an entirely different library, one that reads and writes Excel files.
http://www.python-excel.org/ lists several libraries (including your xlrd):
https://pypi.python.org/pypi/xlrd
https://pypi.python.org/pypi/xlwt
https://pypi.python.org/pypi/XlsxWriter
https://pypi.python.org/pypi/openpyxl
Or maybe try to manage something by reading the file first, e.g. with xlwt's sister project, xlrd.
Seems that has no col_values method on xlwt API. http://xlwt.readthedocs.io/en/latest/api.html
Maybe using together the xlrd you can reach your goal.
http://xlrd.readthedocs.io/en/latest/api.html?highlight=col_values#xlrd-sheet
I am trying to add data to an existing excel file, the problem I am facing is that the data is getting imported but the equation and the format is being deleted in original file.
I attached my code below
import xlwt
import xlrd
from xlutils.copy import copy
#open the excel file
rb=xlrd.open_workbook('Voltage_T.xlsx')
#make a writable copy of the opened excel file
wb=copy(rb)
#read the first sheet to write to within the writable copy
w_sheet=wb.get_sheet(0)
#write or modify the value at 2nd row first column
w_sheet.write(0,1,'WWW.GOOGLE.COM')
#the last step saving the work book
wb.save('Voltage_WW.xls')
You need to set formatting_info to true
rb=xlrd.open_workbook('Voltage_T.xlsx', formatting_info = True)
However xlrd doesn't support xlsx with formatting_info at the moment. So if you really have to use .xlsx you will need another library.
I didn't used it myself so I can't tell you if it's a good library but thanks to a quick search on google XlsxWriter seems to answer your needs.