How to flag an anomaly in a data frame (row wise)? - python

Python newbie here, I will like to flag sporadic numbers that are obviously off from the rest of the row.
In simple terms, flag numbers that seem not to belong to each row. Numbers in 100s and 100000s are considered 'off the rest'
import pandas as pd
# intialise data of lists.
data = {'A':['R1', 'R2', 'R3', 'R4', 'R5'],
'B':[12005, 18190, 1021, 13301, 31119,],
'C':[11021, 19112, 19021,15, 24509 ],
'D':[10022,19910, 19113,449999, 25519],
'E':[14029, 29100, 39022, 24509, 412271],
'F':[52119,32991,52883,69359,57835],
'G':[41218, 52991,1021,69152,79355],
'H': [43211,7672991,56881,211,77342],
'J': [31211,42901,53818,62158,69325],
}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
df.describe()
I am trying to do something exactly like this
# I need help with step 1
#my code/pseudocode
# step 1: identify the values in each row that are don't belong to the group
# step 2: flag the identified values and export to excel
style_df = .applymap(lambda x: "background-color: yellow" if x else "") # flags the values that meets the criteria
with pd.ExcelWriter("flagged_data.xlsx", engine="openpyxl") as writer:
df.style.apply(lambda x: style_df, axis=None).to_excel(writer,index=False)

I used two conditions here one to check less than 1000 and another one for greater than 99999. Based on this condition, the code will highlight outliers in red color.
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a format. Light red fill with dark red text.
format1 = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
first_row = 1
first_col = 2
last_row = len(df)
last_col = 9
worksheet.conditional_format(first_row, first_col, last_row, last_col,
{'type': 'cell',
'criteria': '<',
'value': 1000,
'format': format1})
worksheet.conditional_format(first_row, first_col, last_row, last_col,
{'type': 'cell',
'criteria': '>',
'value': 99999,
'format': format1})
# Close the Pandas Excel writer and output the Excel file.
writer.save()

If you don't need to use machine learning outliers detection or Hampel filter and you already know the limits of your filter, you can simply do
def higlight_outliers(s):
# force to numeric and coerce string to NaN
s = pd.to_numeric(s, errors='coerce')
indexes = (s<1500)|(s>1000000)
return ['background-color: yellow' if v else '' for v in indexes]
styled = df.style.apply(higlight_outliers, axis=1)
styled.to_excel("flagged_data.xlsx", index=False)

I guess you could define a little better what you consider "off from the rest". This is very important when working with data.
Do you want to flag the outliers of your column B distribution for example? You could simply do a calculation of quartiles for your distributions and append those to a dict of some kind, those which are either below the lowest quartile or over the highest quartile. But you obviously would need more than those 5 rows you showed.
There are whole fields dedicated to identification of outliers using machine learning as well. The assumptions you are taking to define what should be considered "off from the rest" are very important.
Read this if you'd like more info on specifics of outlier detection:
https://towardsdatascience.com/a-brief-overview-of-outlier-detection-techniques-1e0b2c19e561

Related

PANDAS: Stylize Dataframe [duplicate]

I'm trying to output a Pandas dataframe into an excel file using xlsxwriter. However I'm trying to apply some rule-based formatting; specifically trying to merge cells that have the same value, but having trouble coming up with how to write the loop. (New to Python here!)
See below for output vs output expected:
(As you can see based off the image above I'm trying to merge cells under the Name column when they have the same values).
Here is what I have thus far:
#This is the logic you use to merge cells in xlsxwriter (just an example)
worksheet.merge_range('A3:A4','value you want in merged cells', merge_format)
#Merge Car type Loop thought process...
#1.Loop through data frame where row n Name = row n -1 Name
#2.Get the length of the rows that have the same Name
#3.Based off the length run the merge_range function from xlsxwriter, worksheet.merge_range('range_found_from_loop','Name', merge_format)
for row_index in range(1,len(car_report)):
if car_report.loc[row_index, 'Name'] == car_report.loc[row_index-1, 'Name']
#find starting point based off index, then get range by adding number of rows to starting point. for example lets say rows 0-2 are similar I would get 'A0:A2' which I can then put in the code below
#from there apply worksheet.merge_range('A0:A2','[input value]', merge_format)
Any help is greatly appreciated!
Thank you!
Your logic is almost correct, however i approached your problem through a slightly different approach:
1) Sort the column, make sure that all the values are grouped together.
2) Reset the index (using reset_index() and maybe pass the arg drop=True).
3) Then we have to capture the rows where the value is new. For that purpose create a list and add the first row 1 because we will start for sure from there.
4) Then start iterating over the rows of that list and check some conditions:
4a) If we only have one row with a value the merge_range method will give an error because it can not merge one cell. In that case we need to replace the merge_range with the write method.
4b) With this algorithm you 'll get an index error when trying to write the last value of the list (because it is comparing it with the value in the next index postion, and because it is the last value of the list there is not a next index position). So we need to specifically mention that if we get an index error (which means we are checking the last value) we want to merge or write until the last row of the dataframe.
4c) Finally i did not take into consideration if the column contains blank or null cells. In that case code needs to be adjusted.
Lastly code might look a bit confusing, you have to take in mind that the 1st row for pandas is 0 indexed (headers are separate) while for xlsxwriter headers are 0 indexed and the first row is indexed 1.
Here is a working example to achieve exactly what you want to do:
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
# Create the list where we 'll capture the cells that appear for 1st time,
# add the 1st row and we start checking from 2nd row until end of df
startCells = [1]
for row in range(2,len(df)+1):
if (df.loc[row-1,'Name'] != df.loc[row-2,'Name']):
startCells.append(row)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
lastRow = len(df)
for row in startCells:
try:
endRow = startCells[startCells.index(row)+1]-1
if row == endRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, endRow, 0, df.loc[row-1,'Name'], merge_format)
except IndexError:
if row == lastRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, lastRow, 0, df.loc[row-1,'Name'], merge_format)
writer.save()
Output:
Alternate Approach:
One can use the unique() function to find the index assigned to each unique value (car name in this example). Using the above test data,
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
for car in df['Name'].unique():
# find indices and add one to account for header
u=df.loc[df['Name']==car].index.values + 1
if len(u) <2:
pass # do not merge cells if there is only one car name
else:
# merge cells using the first and last indices
worksheet.merge_range(u[0], 0, u[-1], 0, df.loc[u[0],'Name'], merge_format)
writer.save()
I think this is a better answer to your problem
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
# Use the groupby() function to group the rows by 'Name'
grouped = df.groupby('Name')
# Use the first() function to find the first row of each group
first_rows = grouped.first()
# Create a new column 'start_row' that contains the index of the first row of each group
first_rows['start_row'] = first_rows.index.map(lambda x: (df['Name'] == x).idxmax())
# Create a new column 'end_row' that contains the index of the last row of each group
first_rows['end_row'] = grouped.last().index.map(lambda x: (df['Name'] == x).idxmax())
# Create an empty list to store the merge ranges
merge_ranges = []
# Iterate over the first_rows dataframe and add the merge ranges to the list
for index, row in first_rows.iterrows():
merge_ranges.append((row['start_row'], 0, row['end_row'], 0))
# Write the dataframe to an excel file and apply the merge ranges
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
worksheet = writer.sheets['Sheet1']
for merge_range in merge_ranges:
worksheet.merge_range(*merge_range, "", worksheet.get_default_format())
writer.save()
Alternate Approach : Other than xlsxwriter you can also use a pivot table.
dataframe=pd.pivot_table(df,index=[column name...])
df.to_excel(dataframe)
Should "just work" with set_index() and to_excel()
my_index_cols = ['Name'] # this can also be a list of multiple columns
df.set_index(my_index_cols).to_excel('filename.xlsx', index=True, header=None)
see also: https://stackoverflow.com/a/68208815/2098573

Python - Function that adjusts columns and colors cells in multiple spread sheets

Hello everyone hope you all are doing well.
Currently, I have a project I am working on that deals with a lot of data and I'm creating numerous pandas DataFrames with all the data I have and trying to compile it all into an excel file with each DataFrame having its own excel sheet. What I want to do is create a function that automatically adds each sheet to the excel file, expands the columns in each sheet, and colors cells in each sheet accordingly.
For example...
sheet14 would look something like what is attached...
Each sheet looks just like this but could have various amounts of rows but always the same amount of columns.
What I want to do is color the cells of Col1 that have a length of 1 green, length of 3 yellow, length of 5 purple, and so on.
How am I able to do this? I am able to do this with one sheet easily but to automate it is tedious because the multiple sheets part is making it difficult for me since I never had to deal with that.
Just so you know, cycled_data_aggregate looks like,
[DataFrame, 'A', 'A']
It is a <class 'list'> which contains,
[<class 'pandas.core.frame.DataFrame'>, <class 'str'>, <class 'str'>]
Thank you all so much if you help! Hope I explained everything well enough. If not just a general explanation would help as the code I made is pretty weird likely haha! :)
import pandas as pd
import openpyxl
from openpyxl.styles import Color, PatternFill, Font, Border, Side
import xlsxwriter
from xlsxwriter.utility import xl_rowcol_to_cell
out_path = "C:\\....\\....xlsx"
writer1 = pd.ExcelWriter(out_path)
def MultipleSheetAdder(cycled_data_aggregate, overwrite_sheet_name, true_false):
# If the function for cycled_data_aggregate returns None...
if cycled_data_aggregate == None:
return None
# The sheet's data
cycled_data = cycled_data_aggregate[0]
# If you want to overwrite what the sheet name is called and not use the
# cycled_data_aggregate's returned data
if true_false:
sheet_name = overwrite_sheet_name
else:
sheet_name = cycled_data_aggregate[1]
cycled_data.to_excel(writer, sheet_name=sheet_name)
for column in cycled_data:
column_length = max(cycled_data[column].astype(str).map(len).max(), len(column)) + 3
col_idx = cycled_data.columns.get_loc(column)
writer.sheets[sheet_name].set_column(col_idx, col_idx, column_length)
# Add section here to change colors of specific rows in the first two columns depending on what
# values they are.
{INSERT CODE HERE}
return None # Does this function need to even return anything?
MultipleSheetAdder(Function(raw_data), '', False)
writer1.save()
One way to add the colours is with conditional formatting. Here is an example based on your data:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Col1': ['1.2.4', '2.2', '1.2.2', '2', '1.7.4'],
'Col2': [200, 100, 130, 140, 300],
'Col3': ['Text 1', 'Text 2', 'Text 3', 'Text 4', 'Text 5']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format1 = workbook.add_format({'bg_color': 'green'})
format2 = workbook.add_format({'bg_color': 'yellow'})
format3 = workbook.add_format({'bg_color': 'purple'})
# Apply a conditional format to the cell range.
max_row = df.shape[0]
worksheet.conditional_format(1, 0, max_row, 0, {'type': 'formula',
'criteria': '=LEN($A2)=1',
'format': format1})
worksheet.conditional_format(1, 0, max_row, 0, {'type': 'formula',
'criteria': '=LEN($A2)=3',
'format': format2})
worksheet.conditional_format(1, 0, max_row, 0, {'type': 'formula',
'criteria': '=LEN($A2)=5',
'format': format3})
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

Merge rows based on value (pandas to excel - xlsxwriter)

I'm trying to output a Pandas dataframe into an excel file using xlsxwriter. However I'm trying to apply some rule-based formatting; specifically trying to merge cells that have the same value, but having trouble coming up with how to write the loop. (New to Python here!)
See below for output vs output expected:
(As you can see based off the image above I'm trying to merge cells under the Name column when they have the same values).
Here is what I have thus far:
#This is the logic you use to merge cells in xlsxwriter (just an example)
worksheet.merge_range('A3:A4','value you want in merged cells', merge_format)
#Merge Car type Loop thought process...
#1.Loop through data frame where row n Name = row n -1 Name
#2.Get the length of the rows that have the same Name
#3.Based off the length run the merge_range function from xlsxwriter, worksheet.merge_range('range_found_from_loop','Name', merge_format)
for row_index in range(1,len(car_report)):
if car_report.loc[row_index, 'Name'] == car_report.loc[row_index-1, 'Name']
#find starting point based off index, then get range by adding number of rows to starting point. for example lets say rows 0-2 are similar I would get 'A0:A2' which I can then put in the code below
#from there apply worksheet.merge_range('A0:A2','[input value]', merge_format)
Any help is greatly appreciated!
Thank you!
Your logic is almost correct, however i approached your problem through a slightly different approach:
1) Sort the column, make sure that all the values are grouped together.
2) Reset the index (using reset_index() and maybe pass the arg drop=True).
3) Then we have to capture the rows where the value is new. For that purpose create a list and add the first row 1 because we will start for sure from there.
4) Then start iterating over the rows of that list and check some conditions:
4a) If we only have one row with a value the merge_range method will give an error because it can not merge one cell. In that case we need to replace the merge_range with the write method.
4b) With this algorithm you 'll get an index error when trying to write the last value of the list (because it is comparing it with the value in the next index postion, and because it is the last value of the list there is not a next index position). So we need to specifically mention that if we get an index error (which means we are checking the last value) we want to merge or write until the last row of the dataframe.
4c) Finally i did not take into consideration if the column contains blank or null cells. In that case code needs to be adjusted.
Lastly code might look a bit confusing, you have to take in mind that the 1st row for pandas is 0 indexed (headers are separate) while for xlsxwriter headers are 0 indexed and the first row is indexed 1.
Here is a working example to achieve exactly what you want to do:
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
# Create the list where we 'll capture the cells that appear for 1st time,
# add the 1st row and we start checking from 2nd row until end of df
startCells = [1]
for row in range(2,len(df)+1):
if (df.loc[row-1,'Name'] != df.loc[row-2,'Name']):
startCells.append(row)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
lastRow = len(df)
for row in startCells:
try:
endRow = startCells[startCells.index(row)+1]-1
if row == endRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, endRow, 0, df.loc[row-1,'Name'], merge_format)
except IndexError:
if row == lastRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, lastRow, 0, df.loc[row-1,'Name'], merge_format)
writer.save()
Output:
Alternate Approach:
One can use the unique() function to find the index assigned to each unique value (car name in this example). Using the above test data,
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
for car in df['Name'].unique():
# find indices and add one to account for header
u=df.loc[df['Name']==car].index.values + 1
if len(u) <2:
pass # do not merge cells if there is only one car name
else:
# merge cells using the first and last indices
worksheet.merge_range(u[0], 0, u[-1], 0, df.loc[u[0],'Name'], merge_format)
writer.save()
I think this is a better answer to your problem
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
# Use the groupby() function to group the rows by 'Name'
grouped = df.groupby('Name')
# Use the first() function to find the first row of each group
first_rows = grouped.first()
# Create a new column 'start_row' that contains the index of the first row of each group
first_rows['start_row'] = first_rows.index.map(lambda x: (df['Name'] == x).idxmax())
# Create a new column 'end_row' that contains the index of the last row of each group
first_rows['end_row'] = grouped.last().index.map(lambda x: (df['Name'] == x).idxmax())
# Create an empty list to store the merge ranges
merge_ranges = []
# Iterate over the first_rows dataframe and add the merge ranges to the list
for index, row in first_rows.iterrows():
merge_ranges.append((row['start_row'], 0, row['end_row'], 0))
# Write the dataframe to an excel file and apply the merge ranges
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
worksheet = writer.sheets['Sheet1']
for merge_range in merge_ranges:
worksheet.merge_range(*merge_range, "", worksheet.get_default_format())
writer.save()
Alternate Approach : Other than xlsxwriter you can also use a pivot table.
dataframe=pd.pivot_table(df,index=[column name...])
df.to_excel(dataframe)
Should "just work" with set_index() and to_excel()
my_index_cols = ['Name'] # this can also be a list of multiple columns
df.set_index(my_index_cols).to_excel('filename.xlsx', index=True, header=None)
see also: https://stackoverflow.com/a/68208815/2098573

Python: xlsxwriter highlight cells by range without condition

I have a dataframe with 3 columns.
I like to highlight column a as orange, column b as green, column c as yellow but controlled by end of row.
using xlsxwriter I found examples for highlighting the entire column with ".add_format" but I didn't want the entire column to be highlighted.
How can I use xlsxwriter to highlight specific cells without using ".conditional_format"?
df = {'a': ['','',''],
'b':[1,2,2]
'c':[1,2,2]}
With xlsxwriter i am applying format using 2 different ways. Mainly with the function set_column (if you don't mind the format expanding until the end of the file) and using for loops if i do not want the format to expand until the end of the file (for example borderlines and background colors).
So this is how you can apply format to your dataframe:
import pandas as pd
# Create a test df
data = {'a': ['','',''], 'b': [1,2,2], 'c': [1,2,2]}
df = pd.DataFrame(data)
# Import the file through xlsxwriter
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Define the formats
format_orange = workbook.add_format({'bg_color': 'orange'})
format_green = workbook.add_format({'bg_color': 'green'})
format_bold = workbook.add_format({'bold': True, 'align': 'center'})
# Start iterating through the columns and the rows to apply the format
for row in range(df.shape[0]):
worksheet.write(row+1, 0, df.iloc[row,0], format_orange)
# Alternative syntax
#for row in range(df.shape[0]):
# worksheet.write(f'A{row+2}', df.iloc[row,0], format_orange)
for row in range(df.shape[0]):
worksheet.write(row+1, 1, df.iloc[row,1], format_green)
# Here you can use the faster set_column function as you do not apply color
worksheet.set_column('C:C', 15, format_bold)
# Finally write the file
writer.save()
Output:

how to do excel's 'format as table' in python

I'm using xlwt to create tables in excel. In excel there is a feature format as table which makes the table have an automatic filters for each column. Is there a way to do it using python?
You can do it with Pandas also. Here's an example:
import pandas as pd
df = pd.DataFrame({
'city': ['New York', 'London', 'Prague'],
'population': [19.5, 7.4, 1.3],
'date_of_birth': ['1625', '43', 'early 8th century'],
'status_of_magnetism': ['nice to visit', 'nice to visit', 'definetely MUST visit']
})
# initialize ExcelWriter and set df as output
writer = pd.ExcelWriter(r'D:\temp\sample.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Cities', index=False)
# worksheet is an instance of Excel sheet "Cities" - used for inserting the table
worksheet = writer.sheets['Cities']
# workbook is an instance of the whole book - used i.e. for cell format assignment
workbook = writer.book
Then define format of a cell (i.e. rotate text, set vertical and horizontal align) via workbook.add_format
header_cell_format = workbook.add_format()
header_cell_format.set_rotation(90)
header_cell_format.set_align('center')
header_cell_format.set_align('vcenter')
Then...
# create list of dicts for header names
# (columns property accepts {'header': value} as header name)
col_names = [{'header': col_name} for col_name in df.columns]
# add table with coordinates: first row, first col, last row, last col;
# header names or formatting can be inserted into dict
worksheet.add_table(0, 0, df.shape[0], df.shape[1]-1, {
'columns': col_names,
# 'style' = option Format as table value and is case sensitive
# (look at the exact name into Excel)
'style': 'Table Style Medium 10'
})
Alternatively worksheet.add_table('A1:D{}'.format(shape[0]), {...}) can be used, but for df with more columns or shifted start position the AA, AB,... combinations would have to be calculated (instead of "D")
And finally - the following loop rewrites headers and applies header_cell_format. Which we already did in worksheet.add_table(...) and so it looks redundant, but this is a way to use Excel's AutoFit option - without this all header cells would have default width (or cell height if you use the 90degs rotation) and so either not the whole content would be visble, or set_shrink() would have to be applied...and then the content wouldn't be readable :).
(tested in Office 365)
# skip the loop completly if AutoFit for header is not needed
for i, col in enumerate(col_names):
# apply header_cell_format to cell on [row:0, column:i] and write text value from col_names in
worksheet.write(0, i, col['header'], header_cell_format)
# save writer object and created Excel file with data from DataFrame
writer.save()
OK, after searching the web, I realized that with xlwt it's not possible to do it, but with XlsxWriter it's possible and very easy and convenient.
If you want to apply table formatting to a dataframe that you output to excel using XlsxWriter use the docs at https://xlsxwriter.readthedocs.io/example_pandas_table.html
Per the comment recommendation.
The following was my original less elegant solution format_tbl:
import pandas as pd
def format_tbl(writer, sheet_name, df):
outcols = df.columns
if len(outcols) > 25:
raise ValueError('table width out of range for current logic')
tbl_hdr = [{'header':c} for c in outcols]
bottom_num = len(df)+1
right_letter = chr(65-1+len(outcols))
tbl_corner = right_letter + str(bottom_num)
worksheet = writer.sheets[sheet_name]
worksheet.add_table('A1:' + tbl_corner, {'columns':tbl_hdr})
df = pd.DataFrame({
'city': ['New York', 'London', 'Prague'],
'population': [19.5, 7.4, 1.3],
'date_of_birth': ['1625', '43', 'early 8th century'],
'status_of_magnetism': ['nice to visit', 'nice to visit', 'definetely MUST visit']
})
fn_out='blah.xlsx'
with pd.ExcelWriter(fn_out, mode='w', engine='xlsxwriter') as writer:
sheet_name='xxx'
df.to_excel(writer, sheet_name=sheet_name, index=False)
format_tbl(writer, sheet_name, df)

Categories