I am using python's xlsxwriter package to format the excel report that I am generating through a mysql query.
The problem is that the report generated by sql the returns the columns dynamically so their is no way of knowing how many column will be returned before hand. I am trying to set border only to the returned number of columns. But so far I am only able to hard code the number of columns(A:DC). Can anyone help me with this, I am using the following query-
worksheet = writer.sheets['Sheet1']
formater = workbook.add_format({'border':1})
worksheet.set_column('A:DC',15,formater)
writer.save()
Set the range dynamically based on the length of the data you receive
data = [...]
worksheet.set_column(0, len(data), 15, formater)
set_column() docs for the reference.
Related
[ 10-07-2022 - For anyone stopping by with the same issue. After much searching, I have yet to find a way, that isn't convoluted and complicated, to accurately pull mixed type data from excel using Pandas/Python. My solution is to convert the files using unoconv on the command line, which preserves the formatting, then read into pandas from there. ]
I have to concatenate 1000s of individual excel workbooks with a single sheet, into one master sheet. I use a for loop to read them into a data frame, then concatenate the data frame to a master data frame. There is one column in each that could represent currency, percentages, or just contain notes. Sometimes it has been filled out with explicit indicators in the cell, Eg., '$' - other times, someone has used cell formatting to indicate currency while leaving just a decimal in the cell. I've been using a formatting routine to catch some of this but have run into some edge cases.
Consider a case like the following:
In the actual spreadsheet, you see: $0.96
When read_excel siphons this in, it will be represented as 0.96. Because of the mixed-type nature of the column, there is no sure way to know whether this is 96% or $0.96
Is there a way to read excel files into a data frame for analysis and record what is visually represented in the cell, regardless of whether cell formatting was used or not?
I've tried using dtype="str", dtype="object" and have tried using both the default and openpyxl engines.
UPDATE
Taking the comments below into consideration, I'm rewriting with openpyxl.
import openpyxl
from openpyxl import load_workbook
def excel_concat(df_source):
df_master = pd.DataFrame()
for index, row in df_source.iterrows():
excel_file = Path(row['Test Path']) / Path(row['Original Filename'])
wb = openpyxl.load_workbook(filename = excel_file)
ws = wb.active
df_data = pd.DataFrame(ws.values)
df_master = pd.concat([df_master, df_data], ignore_index=True)
return df_master
df_master1 = excel_concat(df_excel_files)
This appears to be nothing more than a "longcut" to just calling the openpyxl engine with pandas. What am I missing in order to capture the visible values in the excel files?
looking here,https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html , noticed the following
dtypeType name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use object to preserve data as stored in Excel and not interpret dtype. **If converters are specified, they will be applied INSTEAD of dtype conversion.**
converters dict, default None
Dict of functions for converting values in certain columns. Keys can either be integers or column labels, values are functions that take one input argument, the Excel cell content, and return the transformed content.
Do you think that might work for you?
I have a series of SQL database queries, that I am writing to Excel, using Xlswriter/Pandas.
I am using a simple global format, for font type and size.
Each table is different, so the only thing I want to do, is present a standard font and size.
format = workbook.add_format()
format.set_font_size(9)
format.set_font_name='Calibri'
for col_name in df1:
column_width=max(df1[col_name].astype(str).map(len).max(),len(col_name))
col_idx=df1.columns.get_loc(col_name)
if col_idx < 4:
column_width=column_width + 1
worksheet1.set_column(col_idx,col_idx,column_width,format)
writer.save()
This all work well, until I encounter a DATE.
There may be multiple date fields, or no date field in each Excel table
All the fonts in the output are 9, except the date field. All the date fields are showing up as 11 and I don't know how to resolve the issue:
Also,
the dates themselves, show up in Excel as Date-time, not Date, even though they are defined in the Database as a Date field. Converting them is also an issue. I cant seem to get rid of the Time portion.
Any help would be greatly appreciated. I have spent waay to much time on this.
Sounds to me like this problem.
If you have existing files, create a template excel file with in the correct format and use python to just fill the cells. (This scenario is the accepted answer in the post). You can also define a certain style in excel once and apply it to columns
Apparently, you have a somehow more complicated scenario. The second answer proposes to adjust the data in pandas before writing it to excel.
However, my personal guess is that it is rather formatting problem of excel so your approach seems reasonable. How about specifying the format explicitly: format = workbook.add_format({'num_format': 'yyyy-mm-dd'})? (Which youl rather align with this post. Try specifying the font height if setting the global font size does not work: 'height': 9*20 (note that you need to scale the height by 20 to use "points" as unit)
I have a working excel sheet that does not contain any tables. Instead it has multiple sections of data. I want to extract certain ranges of cells from this sheet and create a new data source that can be used to create Power BI reports.
Examples of the ranges are:
range1 = ws['A5':'N7']
range2 = ws['A12':'N13']
range3 = ws['A17':'N20']
range4 = ws['A33':'N35']
range5 = ws['A41':'N42']
When I print the values of these ranges using Python and openpyxl I get a long list of values which I would like to transform into a new dataframe with custom column headers.
How do I transform that list into a table that I can then either export to an excel or into a sql database?
Thank you
I'd use pandas.read_excel as documented here. Note the usecols, skiprows, and nrows to select the ranges. Simply call it multiple times to access different ranges.
Subsequently, I'd use pandas.to_excel (and other to_... functions) to export the dataframe to the appropriate format.
Personally, I only use openpyxl directly when I need to optimize performance, or when I can't install pandas.
Trying to do something that should be simple. XLSXWriter has a function set_column that lets me format multiple columns at the same time:
worksheet.set_column('B:D', 30, align)
However, there is no such row function, as
worksheet.set_row(5,None, percentage)
operates on but one row at a time.
I've tried doing the following to no avail:
worksheet.conditional_format('C27:W27', {'format': percentage})
How can I simply set cells C27:W27 to be percentage format?
Unfortunately, a method such as worksheet.set_row(5,None, percentage) does not exist. I ended up scrapping this project.
How do I go about selecting an entire column from excel without the headers?
For example, when I try the following code, it selects the entire column including the header:
import xlwings as xw
wb = xw.Book.caller()
wb.sheets[0].range('A:A').options(ndim=1).value
How do I select the entire column A without including the header? I basically want to use xlwings to receive values from each cell of a column from the beginning of that column till its last value (not including the header).
Please advise.
Thank you
You can directly slice the Range object (you don't need to declare the dimension as 1d arrays arrive per default als simple lists):
wb = xw.Book.caller()
wb.sheets[0].range('A:A')[1:].value
Alternatively, define an Excel Table Object (Insert > Table):
wb.sheets[0].range('Table1[[#Data]]').value
This will automatically exclude the headers, see e.g. here for the syntax.