Pandas dataframe to Excel with Defined Name range

Pandas dataframe to Excel with Defined Name range - python

I want to write multiple df of varying sizes to Excel as my code runs.
Some tables will contain source data, and other tables will contain Excel formulas that operate on that source data.
Rather than tracking the range of cells that I wrote the source data to, I want the formula df to contain an Excel reference to the source data df.
This can be done with Excel's Names or with Excel's Table features.
For example in my formula df I can have =INDEX(my_Defined_Name_source_data, 4,3) * 2 and the Excel Name my_Defined_Name_source_data is all I need to index my source data.
Openpyxl details writing Tables here https://openpyxl.readthedocs.io/en/stable/worksheet_tables.html?highlight=tables
Tables doesn't support the merged cells which a multiindex df.to_excel will create.
So I'm looking at Defined Names instead. There's almost no documentation for writing Defined Names in openpyxl using
wb.defined_names.append()
This is what I've found https://openpyxl.readthedocs.io/en/stable/api/openpyxl.workbook.defined_name.html?highlight=definednames
What I'm asking for help with: How to write a DataFrame to Excel and also give it an Excel Defined Name. Documentation and online examples are almost non existent.
Also gratefully accepting suggestions for alternative ideas since I seem to be accessing something almost nobody else uses.

The "xlsxwriter" library allows you to create an Excel Data Table, so I wrote the following function to take a DataFrame, write it to Excel, and then transform the data to a Data Table.
def dataframe_to_excel_table(df, xl_file, xl_tablename, xl_sheet='Sheet1'):
"""
Pass a dataframe, filename, name of table and Excel sheet name.
Save an excel file of the df, formatted as a named Excel 'Data table'
* Requires "xlsxwriter" library ($ pip install XlsxWriter)
:param df: a Pandas dataframe object
:param xl_file: File name of Excel file to create
:param xl_sheet: String containing sheet/tab name
:param xl_tablename: Data table name in the excel file
:return: Nothing / New Excel file
"""
# Excel doesn't like multi-indexed df's. Convert to 1 value per column/row
# See https://stackoverflow.com/questions/14507794
df.reset_index(inplace=True) # Expand multiindex
# Write dataframe to Excel
writer = pd.ExcelWriter(path=xl_file,
engine='xlsxwriter',
datetime_format='yyyy mm dd hh:mm:ss')
df.to_excel(writer, index=False, sheet_name=xl_sheet)
# Get dimensions of data to size table
num_rows, num_cols = df.shape
# make list of dictionaries of form [{'header' : col_name},...]
# to pass so table doesn't overwrite column header names
# https://xlsxwriter.readthedocs.io/example_tables.html#ex-tables
dataframes_cols = df.columns.tolist()
col_list = [{'header': col} for col in dataframes_cols]
# Convert data in Excel file to an Excel data table
worksheet = writer.sheets[xl_sheet]
worksheet.add_table(0,0, # begin in Cell 'A1'
num_rows, num_cols-1,
{'name': xl_tablename,
'columns': col_list})
writer.save()

I fixed this by simply switching from OpenPyXL to XLSXWriter
https://xlsxwriter.readthedocs.io/example_defined_name.html?highlight=names

Related

Removing the Indexed Column when Merging 2 Excel Spreadsheets into a new Sheet in an existing Excel Spreadsheet using Pandas

I wanted to automate comparing two excel spreadsheets and updating old data (call this spreadsheet Old_Data.xlsx) with new data (from a different excel document; called New_Data.xlsx) and placing the updated data into a different sheet on on Old_Data.xlsx.
I am able to successfully create the new sheet in Old_Data.xlsx and see the changes between the two data sets, however, in the new sheet an index appears labeling the rows of data from 0-n. I've tried hiding this index so the information on each sheet in Old_Data.xlsx appears the same, however, I cannot successfully seem to get rid of the addition of the index. See the code below:
from openpyxl import load_workbook
# import xlwings as xl
import pandas as pd
import jinja2
# Load the workbook that is going to updated with new information.
wb = load_workbook('OldData.xlsx')
# Define the file path for all of the old and new data.
old_path = 'OldData.xlsx'
new_path = 'NewData.xlsx'
# Load the data frames for each Spreadsheet.
df_old = pd.read_excel(old_path)
print(df_old)
df_new = pd.read_excel(new_path)
print(df_new)
# Keep all original information why showing the differences in information and write
# to a new sheet in the workbook.
difference = pd.merge(df_old, df_new, how='right')
difference = difference.style.format.hide()
print(difference)
# Append the difference to an existing Excel File
with pd.ExcelWriter('OldData.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
difference.to_excel(writer, sheet_name="1-25-2023")
This is an image of the table of the second sheet that I creating. (https://i.stack.imgur.com/7Amdf.jpg)
I've tried adding the code:
difference = difference.style.format.hide
To get rid of the row, but I have not succeeded.

pass index = False as an argument in last line of you code. It should be something like this :-
with pd.ExcelWriter('OldData.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
difference.to_excel(writer, sheet_name="1-25-2023", index = False)
I think this should solve your problem.

How can I print attributes/meta-data of a pandas dataframe while saving it in Excel or CSV file?

I have a pandas dataframe abc which I created as follows:
abc = pd.DataFrame({"A":[1,2,3],"B":[2,3,4]})
I added some additional attributes of the dataframe as follows:
abc.attrs = {"Name":"John", "Country":"Nepal"}
I'd like to save the pandas dataframe into an Excel file in xlsx or CSV format. I can do that using abc.to_excel("filename.xlsx") or abc.to_csv("filename.csv") where filename is the required name of the file.
However, I am not able to print the attributes in the saved file. I'd like to save the dataframe in Excel file such that first row gives Name and second row gives Country in two columns as shown below:
How can I do that?

Unfortunately, .to_excel() and .to_csv() do not provide any explicit functionality to insert meta information ahead of the actual dataframe as documented for the Excel and CSV write functions.
Regardless, one could exploit the header argument to hardcode this preamble into the frame. This can be achieved, for example, with
abc.to_csv("filename.csv", header=[str(k) + ',' + str(v) + '\n' for k,v in abc.attrs.items()])
Please note, however, that data tables store homogenous data across rows and columns. Adding meta information on top makes the data harder to read and process. Consider adding it (a) in the file name, (b) in a distinct table, or (c) dropping it altogether.
Additionally, it shall be noted that as of now (Pandas 1.4.3), the attributes feature is experimental and could change/disappear at any future version which makes any implementation brittle.

How can I repeatedly edit data (a scalar value) at various locations (row_index, column_index) in an existing excel file using Pandas (in Python)

I initially made an empty excel file with column names (5 columns in each sheet) and sheet names (4 sheets with names).
When I tried to write data (a scalar value at a time, say 5) in an excel sheet using ExcelWriter, to_excel in Pandas. It deletes the previous data as well as deletes other sheets.
I don't want to aggregate the data in a variable and write it at once. Because this is a part of a time-consuming experiment and I want to save data regularly.
If the same can be done with normal python (without pandas), kindly suggest.

From pandas documentation, you need to create an ExcelWriter which opens the Excel file in append mode:
with ExcelWriter('path_to_file.xlsx', mode='a') as writer:
df.to_excel(writer, sheet_name='sheet_name')

How to write a dict of dataframes to one excel file in Pandas? Key is sheet name, Value is the dataframe

I am having much trouble trying to read in a large excel file (.xlsx), and write some of its tabs/sheets to a smaller excel file.
In one class, I get return a dict of dataframes. The key is the respective sheet/tab that the dataframe came from, type string. The value is the actual dataframe, with all of its original columns, type DataFrame. In this class, I extract certain dataframes from the original excel file.
I am able to print out my key:value pairs after extracting the dataframes of my choice, and it all looks fine. However, I believe that my real problem is writing the actual data to 1 excel file. I only get the first dataframe, without the sheet name that it came from (it becomes the default 'Sheet1'), and nothing else.
Here is the code that writes my dict to an excel file:
def frames_to_excel(self, df_dict, path):
"""Write dictionary of dataframes to separate sheets, within
1 file."""
writer = pd.ExcelWriter(path, engine='xlsxwriter')
for tab_name, dframe in df_dict.items():
dframe.to_excel(writer, sheet_name=tab_name)
writer.save()
- "path" is the select output path to write the whole dict to a xlsx fle.
- "df_dict" is the dict of dataframes.

I am very sorry for the confusion. My bug was not at all in the code I posted, or any of the classes that parse the data from the original excel file. The problem was this line of code:
excel_path = re.sub(r"(?i)original|_original", "_custom", os.path.basename(excel_path))
By doing the basename function from the os library, I was only using the file name, instead of the entire full path:
writer = pd.ExcelWriter(excel_path, engine='xlsxwriter')
-Therefore, I was not writing the correct data to the full path, and I was looking at old data from my programs output, from about 5 days ago. Thanks for everyones help.
The fix (use the proper full path that you expect):
excel_path = re.sub(r"(?i)original|_original", "_custom", excel_path)

How to read data from excel from a particular column in python

I have an excel sheet and I am reading the excel sheet using pandas in python.
Now I want to read the excel file based on a column, if the column has some value then do not read that row, if the column is empty than read that and store the values in a list.
Here is a screenshot
Excel Example
Now in the above image when the uniqueidentifier is yes then it should not read that value, but if it is empty then it should start reading from that value.
How to do that using python and how to get index so that after I have performed some function that I am again able to write to that blank unique identifier column saying that row has been read

This is possible for csv files. There you could do
iter_csv = pandas.read_csv('file.csv', iterator=True, chunksize=100000)
df = pd.concat([chunk[chunk['UniqueIdentifier'] == 'True'] for chunk in iter_csv])
But pd.read_excel does not offer to return an iterator object, maybe some other excel-readers can. But I don't no which ones. Nevertheless you could export your excel file as csv and use the solution for csv files.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas dataframe to Excel with Defined Name range - python

I fixed this by simply switching from OpenPyXL to XLSXWriter https://xlsxwriter.readthedocs.io/example_defined_name.html?highlight=names

Related

Removing the Indexed Column when Merging 2 Excel Spreadsheets into a new Sheet in an existing Excel Spreadsheet using Pandas

How can I print attributes/meta-data of a pandas dataframe while saving it in Excel or CSV file?

How can I repeatedly edit data (a scalar value) at various locations (row_index, column_index) in an existing excel file using Pandas (in Python)

How to write a dict of dataframes to one excel file in Pandas? Key is sheet name, Value is the dataframe

How to read data from excel from a particular column in python

Categories

Resources