Xlsxwriter: format three cell ranges in same worksheet - python

I would like to format A1:E14 as US Dollars, F1:K14 as percentages and A15:Z1000 as US Dollars. Is there a way to do this in XlsxWriter?
I know how to format full columns as Dollars/Percentages, but I don't know how to format parts of columns -- whatever I do last will overwrite Columns F:K.
Data is starting in pandas so happy to solve the problem there. The following does not seem to work:
sheet.set_column('A1:E14', None, money_format)
More Code:
with pd.ExcelWriter(write_path) as writer:
book = writer.book
money_fmt = book.add_format({'num_format': '$#,##0'})
pct_fmt = book.add_format({'num_format': '0.00%'})
# call func that creates a worksheet named total with no format
df.to_excel(writer, sheet_name='Total', startrow=0)
other_df.to_excel(writer, sheet_name='Total', startrow=15)
writer.sheets['Total'].set_column('A1:E14',20, money_fmt)
writer.sheets['Total'].set_column('F1:K14',20, pct_fmt)
writer.sheets['Total'].set_column('F15:Z1000', 20, money_fmt)

I cannot see a way to achieve per cell formatting using just xlsxwriter with Pandas, but it would be possible to apply the formatting in a separate step using openpyxl as follows:
import openpyxl
def write_format(ws, cell_range, format):
for row in ws[cell_range]:
for cell in row:
cell.number_format = format
sheet_name = "Total"
with pd.ExcelWriter(write_path) as writer:
write_worksheet(df, writer, sheet_name=sheet_name)
wb = openpyxl.load_workbook(write_path)
ws = wb.get_sheet_by_name(sheet_name)
money_fmt = '$#,##0_-'
pct_fmt = '0.00%'
write_format(ws, 'A1:G1', money_fmt)
write_format(ws, 'A1:E14', money_fmt)
write_format(ws, 'F1:K14', pct_fmt)
write_format(ws, 'F15:Z1000', money_fmt)
wb.save(write_path)
When attempted with xlsxwriter, it always overwrites the existing data from Pandas. But if Pandas is then made to re-write the data, it then overwrites any applied formatting. There does not appear to be any method to apply formatting to an existing cell without overwriting the contents. For example, the write_blank() function states:
This method is used to add formatting to a cell which doesn’t contain
a string or number value.

Related

Can Pandas to_excel support hyperlink style now?

I can't find an answer (or one I know how to implement) when it comes to using the excel "hyperlink" style for a column when exporting using pd.to_excel.
I can find plenty of (OLD) answers on using xlsxwriter or openpyxl. But none using the current pandas functionality.
I think it might be possible now with the updates to the .style function? But I don't know how to implement the CSS2.2 rules to emulate the hyperlink style.
import pandas as pd
df = pd.DataFrame({'ID':1, 'link':['=HYPERLINK("http://www.someurl.com", "some website")']})
df.to_excel('test.xlsx')
The desired output is for the link column, to be the standard blue underlined text that then turns purple once you have clicked the link.
Is there a way to use the built in excel styling? Or would you have to pass various css properties througha dictionary using .style?
Here is one way to do it using xlsxwriter as the Excel engine:
import pandas as pd
df = pd.DataFrame({'ID': [1, 2],
'link':['=HYPERLINK("http://www.python.org", "some website")',
'=HYPERLINK("http://www.python.org", "some website")']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Get the default URL format.
url_format = workbook.get_default_url_format()
# Apply it to the appropriate column, and widen the column.
worksheet.set_column(2, 2, 40, url_format)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output, note that the second link has been clicked and is a different color:
Note, it would be preferable to use the xlsxwriter worksheet.write_url() method since that will look like a native Excel url to the end user and also doesn't need the above trick of getting and applying the url format. However, that method can't be used directly from a pandas dataframe (unlike the formula) so you would need to iterate through the link column of the dataframe and overwrite the formulas programatically with actual links.
Something like this:
import pandas as pd
df = pd.DataFrame({'ID': [1, 2],
'link':['=HYPERLINK("http://www.python.org", "some website")',
'=HYPERLINK("http://www.python.org", "some website")']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test2.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the worksheet handle.
worksheet = writer.sheets['Sheet1']
# Widen the colum for clarity
worksheet.set_column(2, 2, 40)
# Overwrite the urls
worksheet.write_url(1, 2, "http://www.python.org", None, "some website")
worksheet.write_url(2, 2, "http://www.python.org", None, "some website")
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

Is there a way to write a dataframe starting from a specific cell in Python

I am trying to write a DataFrame into an existing Excel sheet, which will use my imported data to do some operations. I tried to use the openpyxl library and the dataframe_to_rows function to do it.
It did actually write the dataframe in the good sheet but it didn't write it from the beginning of the cells (The excels functions request the data starting from a specific cell)
Here is my code :
sheet_name = ['Zones','Bilan','CTA', "Annuel", "Ventilation"] # The names of the differents sheet where i need to import my DataFrames
vect_Data = [Data1,Data2,Data3, sortie_annuel, Ventil_data] # The DataFrames i need to import in the sheets
wb = op.load_workbook(filename = nom_excel) # I imported openpyxl as op
for i in range(len(sheet_name)):
ws = wb[sheet_name[i]]
for cells in ws :
for cell in cells :
cell.value = None
for r in dataframe_to_rows(vect_Data[i] , index=False , header=True):
ws.append(r)
wb.save(filename = nom_excel)
Is there a way to force the dataframe_to_rows function to begin from a specific cell?
Thank you for your answers.

Convert excel file with many sheets (with spaces in the name of the shett) in pandas data frame

I would like to convert an excel file to a pandas dataframe. All the sheets name have spaces in the name, for instances, ' part 1 of 22, part 2 of 22, and so on. In addition the first column is the same for all the sheets.
I would like to convert this excel file to a unique dataframe. However I dont know what happen with the name in python. I mean I was hable to import them, but i do not know the name of the data frame.
The sheets are imported but i do not know the name of them. After this i would like to use another 'for' and use a pd.merge() in order to create a unique dataframe
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
print(sheet_name.info())
Using only the code snippet you have shown, each sheet (each DataFrame) will be assigned to the variable sheet_name. Thus, this variable is overwritten on each iteration and you will only have the last sheet as a DataFrame assigned to that variable.
To achieve what you want to do you have to store each sheet, loaded as a DataFrame, somewhere, a list for example. You can then merge or concatenate them, depending on your needs.
Try this:
all_my_sheets = []
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
all_my_sheets.append(sheet_name)
Or, even better, using list comprehension:
all_my_sheets = [pd.read_excel(Matrix, sheet_name) for sheet_name in Matrix.sheet_names]
You can then concatenate them into one DataFrame like this:
final_df = pd.concat(all_my_sheets, sort=False)
You might consider using the openpyxl package:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename=file_path, read_only=True)
all_my_sheets = wb.sheetnames
# Assuming your sheets have the same headers and footers
n = 1
for ws in all_my_sheets:
records = []
for row in ws._cells_by_row(min_col=1,
min_row=n,
max_col=ws.max_column,
max_row=n):
rec = [cell.value for cell in row]
records.append(rec)
# Make sure you don't duplicate the header
n = 2
# ------------------------------
# Set the column names
records = records[header_row-1:]
header = records.pop(0)
# Create your df
df = pd.DataFrame(records, columns=header)
It may be easiest to call read_excel() once, and save the contents into a list.
So, the first step would look like this:
dfs = pd.read_excel(["Sheet 1", "Sheet 2", "Sheet 3"])
Note that the sheet names you use in the list should be the same as those in the excel file. Then, if you wanted to vertically concatenate these sheets, you would just call:
final_df = pd.concat(dfs, axis=1)
Note that this solution would result in a final_df that includes column headers from all three sheets. So, ideally they would be the same. It sounds like you want to merge the information, which would be done differently; we can't help you with the merge without more information.
I hope this helps!

Can I modify specific sheet from Excel file and write back to the same without modifying other sheets using Pandas | openpyxl

I'll try to explain my problem with an example:
Let's say I have an Excel file test.xlsx which has five tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4 and sheet5. I am interested to read and modify data in sheet2.
My sheet2 has some columns whose cells are dropdowns and those dropdown values are defined in sheet4 and sheet5. I don't want to touch sheet4 and sheet5. (I mean sheet4 & sheet5 have some references to cells on Sheet2).
I know that I can read all the sheets in excel file using pd.read_excel('test.xlsx', sheetnames=None) which basically gives all sheets as a dictionary(OrderedDict) of DataFrames.
Now I want to modify my sheet2 and save it without disturbing others. So is it posibble to do this using Python Pandas library.
[UPDATE - 4/1/2019]
I am using Pandas read_excel to read whatever sheet I need from my excel file, validating the data with the data in database and updating the status column in the excelfile.
So for writing back the status column in excel I am using openpyxl as shown in the below pseudo code.
import pandas as pd
import openpyxl
df = pd.read_excel(input_file, sheetname=my_sheet_name)
df = df.where((pd.notnull(df)), None)
write_data = {}
# Doing some validations with the data and building my write_data with key
# as (row_number, column_number) and value as actual value to put in that
# cell.
at the end my write_data looks something like this:
{(2,1): 'Hi', (2,2): 'Hello'}
Now I have defined a seperate class named WriteData for writing data using openpyxl
# WriteData(input_file, sheet_name, write_data)
book = openpyxl.load_workbook(input_file, data_only=True, keep_vba=True)
sheet = book.get_sheet_by_name(sheet_name)
for k, v in write_data.items():
row_num, col_num = k
sheet.cell(row=row_num, column=col_num).value = v
book.save(input_file)
Now when I am doing this operation it is removing all the formulas and diagrams. I am using openpyxl 2.6.2
Please correct me if I am doing anything wrong! Is there any better way to do?
Any help on this will be greatly appreciated :)
To modify a single sheet at a time, you can use pandas excel writer:
sheet2 = pd.read_excel("test.xlsx", sheet = "sheet2")
##modify sheet2 as needed.. then to save it back:
with pd.ExcelWriter("test.xlsx") as writer:
sheet2.to_excel(writer, sheet_name="sheet2")

XlsxWriter write_formula() not working always show Zero (0)

I have work with xlsxwriter and I try to print simple formula with a print of formula into the sheet
import xlsxwriter
workbook = xlsxwriter.Workbook('filename1.xlsx')
format_val=workbook.add_format()
worksheet = workbook.add_worksheet()
worksheet.write(0,1,5)
worksheet.write(1,1,2)
worksheet.write_formula(3,0, '=SUM(B1:B2)')
workbook.close()
csvf = StringIO(import_file.read().decode())
Here the Image of how to show and when I press = than output is print 7
But output also will be Zero (0). I know that XlsxWriter doesn’t calculate the result of a formula and instead stores the value 0 as the formula result. It then sets a global flag in the XLSX file to say that all formulas and functions should be recalculated when the file is opened.
I have tried this. It is working for me but it's not a proper thing when I try with Upper Case is not working: ~
num_formate=format_val.set_num_format('0')
worksheet.write_formula(5,1, '=2+1',num_formate,4)
But when it's open How to show calculate value?
If you use pandas.ExcelWriter with 'xlsxwriter' engine, and .save() it, then it the formula is as you expected. For your case:
import pandas as pd
writer = pd.ExcelWriter('filename1.xlsx', engine='xlsxwriter')
workbook = writer.book
format_val=workbook.add_format()
worksheet = workbook.add_worksheet()
worksheet.write(0,1,5)
worksheet.write(1,1,2)
worksheet.write_formula(3,0, '=SUM(B1:B2)')
writer.save()
However, this write formula in the forth row and first column (A4). If you need to be in the third row and second column (B3), you should use:
worksheet.write_formula(2,1, '=SUM(B1:B2)')

Categories