I have an excel file where the first four rows contain some header text and the actual dataset starts from row 4. I am trying to build a simple function that reads the excel file and outputs the same excel file after deleting the first 4 rows.
This is what my code looks like before I put it into a function.
import pandas as pd
from openpyxl import load_workbook, Workbook
wb = load_workbook('FILEPATH/excel.xlsx')
ws = wb['Sheet1']
ws = ws.delete_rows(0,4)
wb.save(r"FILEPATH/deleted_row.xlsx")
When I run the code it executes the file properly but when I try to open the excel file it give me errors and says that the file is corrupted. A point to note is that the excel file has some formatting on the rop rows. Is that what is causing some issues?
Any help is appreciated.
EDIT: This is what the errors look like and the file does not open.
In openpyxl, the first row should be 1, not 0. So, if you are looking to delete the first 4 rows, you should change the delete_row() from
ws = ws.delete_rows(0,4)
to
ws = ws.delete_rows(1,4)
Related
I have a problem when I'm trying to save and than read excel file in python. So this is my function:
import openpyxl
import xlrd
from xlutils.copy import copy
import pandas as pd
def write_excel():
wb = openpyxl.load_workbook('8de69ccb60047ce5.xlsx')
sheet = wb.active
sheet['D18'] = 3
wb.save('8de69ccb60047ce5.xls')
df1 = pd.read_excel('8de69ccb60047ce5.xls', sheet_name='Лист1', header=None, skiprows=1, usecols="H,I")
print(df1)
workbook = xlrd.open_workbook('8de69ccb60047ce5.xls')
worksheet = workbook.sheet_by_index(0)
print(worksheet.cell(17, 8).value)
print(worksheet.cell(18, 8).value)
I'm changing cell D18, saving file and than trying to read other cells that has formulas but I get nothing (also cell without formulas read correctly).
But if I open file manually and save it in Excel that lines of code read those cells correctly.
The problem is this line wb.save('8de69ccb60047ce5.xls'). It saves changes in file but it doesn't saves file correctly (I don't know how to discribe it). How can I read cell with formula after changing the file in python?
Save a file as sample_book.xlsx with save function.
wb.save(filename = 'sample_book.xlsx')
For more info check out this link: https://www.soudegesu.com/en/post/python/create-excel-with-openpyxl/#save-file
I have created a basic workbook with one worksheet and in this worksheet I have created a table (Insert>Table). Nothing complex in this table, just the value 1, 2, 3 (and the column header of course).
I have written this simply code
import openpyxl
thefilename = r"C:\Users\Myfile.xlsx"
book = openpyxl.load_workbook(thefilename)
book.activesheet
book.save(thefilename)
Then, when I try to open the excel file, the file is corrupted and impossible to reopen it again.
This looks like a bug but I wonder how I can detect if my excel file has a table and how to remove it ?
Is there a way to have pandas read in only the values from excel and not the formulas? It reads the formulas in as NaN unless I go in and manually save the excel file before running the code. I am just working with the basic read excel function of pandas,
import pandas as pd
df = pd.read_excel(filename, sheetname="Sheet1")
This will read the values if I have gone in and saved the file prior to running the code. But after running the code to update a new sheet, if I don't go in and save the file after doing that and try to run this again, it will read the formulas as NaN instead of just the values. Is there a work around that anyone knows of that will just read values from excel with pandas?
That is strange. The normal behaviour of pandas is read values, not formulas. Likely, the problem is in your excel files. Probably your formulas point to other files, or they return a value that pandas sees as nan.
In the first case, the sheet needs to be updated and there is nothing pandas can do about that (but read on).
In the second case, you could solve by setting explicit nan values in read_excel:
pd.read_excel(path, sheetname="Sheet1", na_values = [your na identifiers])
As for the first case, and as a workaround solution to make your work easier, you can automate what you are doing by hand using xlwings:
import pandas as pd
import xlwings as xl
def df_from_excel(path):
app = xl.App(visible=False)
book = app.books.open(path)
book.save()
app.kill()
return pd.read_excel(path)
df = df_from_excel(path to your file)
If you want to keep those formulas in your excel file just save the file in a different location (book.save(different location)). Then you can get rid of the temporary files with shutil.
I had this problem and I resolve it by moving a graph below the first row I was reading. Looks like the position of the graphs may cause problems.
you can use xlrd to read the values.
first you should refresh your excel sheet you are also updating the values automatically with python. you can use the function below
file = myxl.xls
import xlrd
import win32com.client
import os
def refresh_file(file):
xlapp = win32com.client.DispatchEx("Excel.Application")
path = os.path.abspath(file)
wb = xlapp.Wordbooks.Open(path)
wb.RefreshAll()
xlapp.CalculateUntilAsyncqueriesDone()
wb.save()
xlapp.Quit()
after the file refresh, you can start reading the content.
workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_index(0)
for rowid in range(worksheet.nrows):
row = worksheet.row(rowid)
for colid, cell in enumerate(row):
print(cell.value)
you can loop through however you need the data. and put conditions while you are reading the data. lot more flexibility
Is there way i can save an xlsx as csv and also i will need to remove formulas.
Edit-->My excel column B "price" is updated via webservice addin every 10 secs (stock prices).somehow if how if i save file using openpyxl using the option dataonly=true, i am not getting the most recent price instead it is gettiing some old values (getting the value stored the last time Excel read the sheet)
Orginal File
A B
StockId Price
13i 16.1353
14i 15.4252 --> formuala = RTD(A3,"Last", "HSC","xxx")
New File Created using opepyxl (data only true)-formula removed but price is not most recent
A B
StockId Price
13i 15.1353
14i 15.3252
Instead of using openpyxl, if i use win32 com to read the excel file, out file is still keeping the formula. is there anyway i can remove the formula.
import win32com.client
xl = win32com.client.Dispatch("Excel.Application")
wb = xl.Workbooks.Open(r"C:\Code\test.xlsx")
ws = xl.ActiveSheet
wb.SaveAs(r"C:\Code\test.csv")
wb.Close()
xl.Quit()
data_only=True applies only to reading files with openpyxl: the option is meaningless for writing files.
I'm reading a existing excel file by using openpyxl package and trying to save that file it, and it got saved but after opening that excel file no data is present. I used the following code and my requirement is to open the file in use_iterators = True mode only
from openpyxl import load_workbook
wb = load_workbook(filename = 'large_file.xlsx', use_iterators = True)
ws = wb.get_sheet_by_name(name = 'big_data')
for row in ws.iter_rows():
for cell in row:
print cell.internal_value
wb.save("large_file.xlsx")
can u guys show how to save the file and close the file after saving with out losing the data
Try loading with use_iterators = False, as use_iterators = True loads the data information differently, such that it may not contain all the information you wish to save.
Openpyxl writes and entirely new excel file based on the information it has read in, so it's not like you make a small change and just update the file. (This also means if certain features aren't supported in openpyxl (such as VB macros), these won't exist in the file you've saved.)