What is an efficient way to make excel report from python? - python

For the report I use this code:
from openpyxl import load_workbook
ReportName = "tets.xlsx"
new_row_data = [
["value", 'value2', "value3"]]
wb = load_workbook(ReportName)
# Select Second Worksheet
ws = wb.worksheets[1]
# Append 1 or 2(if multiple) new Rows - Columns A - D
for row_data in new_row_data:
# Append Row Values
ws.append(row_data)
wb.save(ReportName) #save the file
It has no errors, but I don't understand why sometimes it doesn't make the report inside the excel file, it saves the excel file, then when I open it, I don't see the values.
Do you know a better way or more solid way to make the auto report?

I would recommend using CSV files, that can be loaded into excel. Python is really great at working with CSVs.
Here is a link that might be useful.
[https://docs.python.org/3/library/csv.html][1]

Related

Convert XLSX to CSV without losing values from formulas

I've tried a few methods, including pandas:
df = pd.read_excel('file.xlsx')
df.to_csv('file.csv')
But every time I convert my xlsx file over to csv format, I lose all data within columns that include a formula. I have a formula that concatenates values from two other cells + '#domain' to create user emails, but this entire column returns blank in the csv product.
The formula is basically this:
=CONCATENATE(B2,".",E2,"#domain")
The conversion is part of a larger code workflow, but it won't work if this column is left blank. The only thing I've tried that worked was this API, but I'd rather not pay a subscription if this can be done locally on the machine.
Any ideas? I'll try whatever you throw at me - bear in mind I'm new to this, but I will do my best!
You can try to open the excel file with the openpyxl library in the data-only mode. This will prevent the raw formulas - they are going to be calculated just the way you see them in excel itself.
import openpyxl
wb = openpyxl.load_workbook(filename, data_only=True)
Watch out when youre working with you original file and save it with the openpyxl-lib in the data-only-mode all your formulas will be lost. I had this once and it was horrible. So i recommend using a copy of your file to work with.
Since you have your xlsx-file with values only you can now use the internal csv library to generate a proper csv-file (idea from this post: How to save an Excel worksheet as CSV):
import csv
sheet = wb.active # was .get_active_sheet()
with open('test.csv', 'w', newline="") as f:
c = csv.writer(f)
for r in sheet.iter_rows(): # generator; was sh.rows
c.writerow([cell.value for cell in r])

Python: Write a dataframe to an already existing excel which contains a sheet with images

I have been working on this for too long now. I have an Excel with one sheet (sheetname = 'abc') with images in it and I want to have a Python script that writes a dataframe on a second separate sheet (sheetname = 'def') in the same excel file. Can anybody provide me with some example code, because everytime I try to write the dataframe, the first sheet with the images gets emptied.
This is what I tried:
book = load_workbook('filename_of_file_with_pictures_in_it.xlsx')
writer = pd.ExcelWriter('filename_of_file_with_pictures_in_it.xlsx', engine = 'openpyxl')
writer.book = book
x1 = np.random.randn(100, 2)
df = pd.DataFrame(x1)
df.to_excel(writer, sheet_name = 'def')
writer.save()
book.close()
It saves the random numbers in the sheet with the name 'def', but the first sheet 'abc' now becomes empty.
What goes wrong here? Hopefully somebody can help me with this.
Interesting question! With openpyxl you can easily add values, keep the formulas but cannot retain the graphs. Also with the latest version (2.5.4), graphs do not stay. So, I decided to address the issue with
xlwings :
import xlwings as xw
wb = xw.Book(r"filename_of_file_with_pictures_in_it.xlsx")
sht=wb.sheets.add('SheetMod')
sht.range('A1').value = np.random.randn(100, 2)
wb.save(r"path_new_file.xlsx")
With this snippet I managed to insert the random set of values and saved a new copy of the modified xlsx.As you insert the command, the excel file will automatically open showing you the new sheet- without changing the existing ones (graphs and formulas included). Make sure you install all the interdependencies to get xlwings to run in your system. Hope this helps!
You'll need to use an Excel 'reader' like Openpyxl or similar in combnination with Pandas for this, pandas' to_excel function is write only so it will not care what is inside the file when you open it.

pandas read excel values not formulas

Is there a way to have pandas read in only the values from excel and not the formulas? It reads the formulas in as NaN unless I go in and manually save the excel file before running the code. I am just working with the basic read excel function of pandas,
import pandas as pd
df = pd.read_excel(filename, sheetname="Sheet1")
This will read the values if I have gone in and saved the file prior to running the code. But after running the code to update a new sheet, if I don't go in and save the file after doing that and try to run this again, it will read the formulas as NaN instead of just the values. Is there a work around that anyone knows of that will just read values from excel with pandas?
That is strange. The normal behaviour of pandas is read values, not formulas. Likely, the problem is in your excel files. Probably your formulas point to other files, or they return a value that pandas sees as nan.
In the first case, the sheet needs to be updated and there is nothing pandas can do about that (but read on).
In the second case, you could solve by setting explicit nan values in read_excel:
pd.read_excel(path, sheetname="Sheet1", na_values = [your na identifiers])
As for the first case, and as a workaround solution to make your work easier, you can automate what you are doing by hand using xlwings:
import pandas as pd
import xlwings as xl
def df_from_excel(path):
app = xl.App(visible=False)
book = app.books.open(path)
book.save()
app.kill()
return pd.read_excel(path)
df = df_from_excel(path to your file)
If you want to keep those formulas in your excel file just save the file in a different location (book.save(different location)). Then you can get rid of the temporary files with shutil.
I had this problem and I resolve it by moving a graph below the first row I was reading. Looks like the position of the graphs may cause problems.
you can use xlrd to read the values.
first you should refresh your excel sheet you are also updating the values automatically with python. you can use the function below
file = myxl.xls
import xlrd
import win32com.client
import os
def refresh_file(file):
xlapp = win32com.client.DispatchEx("Excel.Application")
path = os.path.abspath(file)
wb = xlapp.Wordbooks.Open(path)
wb.RefreshAll()
xlapp.CalculateUntilAsyncqueriesDone()
wb.save()
xlapp.Quit()
after the file refresh, you can start reading the content.
workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_index(0)
for rowid in range(worksheet.nrows):
row = worksheet.row(rowid)
for colid, cell in enumerate(row):
print(cell.value)
you can loop through however you need the data. and put conditions while you are reading the data. lot more flexibility

getting a row using xlwt

Anyone know how to reference a given row of newSheet shown below
import xlwt
outFile = xlwt.Workbook()
newSheet = outFile.add_sheet('Sheet 1', cell_overwrite_ok=True)
#Write a bunch of data to newSheet
For example I want to reference the first row so I can find which column has a certain header.
EDIT: I'd like to be to run this code somehow
newSheet.col(firstRow.index('some pattern')).width = 3000
xlwtis only for writing Excel files. Use xlrd for reading.
If you have written the file yourself should know what you wrote where.
Just remember in a dict or list where you wrote your header.

Reading .xlsx format in python

I've got to read .xlsx file every 10min in python.
What is the most efficient way to do this?
I've tried using xlrd, but it doesn't read .xlsx - according to documentation he does, but I can't do this - getting Unsupported format, or corrupt file exceptions.
What is the best way to read xlsx?
I need to read comments in cells too.
xlrd hasn't released the version yet to read xlsx. Until then, Eric Gazoni built a package called openpyxl - reads xlsx files, and does limited writing of them.
Use Openpyxl some basic examples:
import openpyxl
# Open Workbook
wb = openpyxl.load_workbook(filename='example.xlsx', data_only=True)
# Get All Sheets
a_sheet_names = wb.get_sheet_names()
print(a_sheet_names)
# Get Sheet Object by names
o_sheet = wb.get_sheet_by_name("Sheet1")
print(o_sheet)
# Get Cell Values
o_cell = o_sheet['A1']
print(o_cell.value)
o_cell = o_sheet.cell(row=2, column=1)
print(o_cell.value)
o_cell = o_sheet['H1']
print(o_cell.value)
# Sheet Maximum filled Rows and columns
print(o_sheet.max_row)
print(o_sheet.max_column)
There are multiple ways to read XLSX formatted files using python. Two are illustrated below and require that you install openpyxl at least and if you want to parse into pandas directly you want to install pandas, eg. pip install pandas openpyxl
Option 1: pandas direct
Primary use case: load just the data for further processing.
Using read_excel() function in pandas would be your best choice. Note that pandas should fall back to openpyxl automatically but in the event of format issues its best to specify the engine directly.
df_pd = pd.read_excel("path/file_name.xlsx", engine="openpyxl")
Option 2 - openpyxl direct
Primary use case: getting or editing specific Excel document elements such as comments (requested by OP), formatting properties or formulas.
Using load_workbook() followed by comment extraction using the comment attribute for each cell would be achieved by the following.
from openpyxl import load_workbook
wb = load_workbook(filename = "path/file_name.xlsx")
ws = wb.active
ws["A1"].comment # <- loop through row & columns to extract all comments

Categories