xlwings: how to sort excel table with headers? - python

I have an excel table that I would like to sort with XlWings. The table has a header row. I tried sorting like this:
wb = xw.Book(file)
ws = wb.sheets[sheet]
ws.range(table).api.Sort(ws.range(table).api,SortOrder.xlAscending,)
But that sorts the table such that data replaces the headers, and the header row ends up at the bottom of the table.
The following produce the same results:
#Setting the range to include only the table data
ws.range("Table1[#Data]").api.Sort(ws.range("Table1[#Data]").api,SortOrder.xlAscending)
#Specifying the range has a header
ws.range(table).api.Sort(Key1=ws.range(table).api,Order1=1,Header="xlYes")
#manually excluding the header row from the range
ws.range('c4:n380').api.Sort(ws.range('c4:n380').api,SortOrder.xlAscending)
I'm at my wits end. The final table will be very large, so I'd rather not write the whole thing into a dataframe, sorting it there and re-writing it to excel.

Documentation on this topic is sketchy.
After 2 days of trying and searching, this seemed to work:
last_row = ws.range(1,1).end('down').row
first_col_range = ws.range("A2:A{row}".format(row=last_row))
data_range = ws.range("A2:N{row}".format(row=last_row))
ws.range(data_range).api.Sort(Key1=first_col_range.api, Order1=1, Header=2, Orientation=1)
I found https://learn.microsoft.com/en-us/office/vba/api/excel.range.sort to be of some help.
The solution I'm posting here refers to the example at:
https://www.dataquest.io/blog/python-excel-xlwings-tutorial/
(xlwings Tutorial: Make Excel Faster Using Python)
Good luck!

I got the same issue using these one-liners. The header row kept being sorted and moved to the last row. Here is how I solved it in Python (using VBA ListObjects).
table1 = ws.api.ListObjects("Table1")
sort_range= ws.range("Table1[#Data]").api
table1.Sort.SortFields.Add(Key=sort_range)
table1.Sort.Apply()

Related

Way to refer a column within a same name under difference merged cell?

im kinda new to pandas and stuck at how to refer a column within same name under different merged column. here some example which problem im stuck about. i wanna refer a database from worker at company C. but if im define this excel as df and
dfcompanyAworker=df[Worker]
it wont work
is there any specific way to define a database within identifical column like this ?
heres the table
https://i.stack.imgur.com/8Y6gp.png
thanks !
first read the dataset that will be used, then set the shape for example I use excel format
dfcompanyAworker = pd.read_excel('Worker', skiprows=1, header=[1,2], index_col=0, skipfooter=7)
dfcompanyAworker
where:
skiprows=1 to ignore the title row in the data
header=[1, 2] is a list because we have multilevel columns, namely Category (Company) and other data
index_col=0 to make the Date column an ​​index for easier processing and analysis
skipfooter=7 to ignore the footer at the end of the data line
You can follow or try the steps as I made the following

Python for Excel: While value in column A is the same, take remaining row data and copy paste to new worksheet

I am somewhat of a beginner to python and have encountered the following problem working with openpyxl. For example I have the sample worksheet below: Worksheet
What I am trying to do is loop through the Boat ID column and while the values of the cell are the equivalent I want to take the respective row data to the right and open a new worksheet/workbook and copy paste rows in Col B:E.
So in theory, for every Boat ID = 1 we would take every row unique to ID 1 from Cols B:E open a new workbook and paste them accordingly. Next, for every Boat ID = 2 we would take the rows 5-8 in cols B:E, open a new workbook and paste accordingly. Similarly, we would repeat the process for every Boat ID = 3.
P.S. To keep it simple I have ordered the table by Boat ID in ascending order, but if someone wants bonus points they could opine on how it would be done if the table was not ordered.
I haven't worked with openpyxl, but this is relatively simple with pandas, I would definitely check it out.
Loading the sheet is as easy as:
import pandas as pd
df = pd.read_excel("path_to_file")
You can then filter the table by doing something like:
for i in range(df['Boat Id'].max):
temp_df = df.loc[df[‘Boat ID’]==i]
temp_df.to_excel(f”path_to_save_location/sheet_{i}.xlsx”)
This doesn't handle for duplicate rows with the same boat id, but should get you most of the way!
I would also highly recommend the jupyter notebook extension for prototyping stuff in pandas

Using Styleframe to pull styles of individual cells from Excel

I'm trying to write a script that merges two excel files together. One has been has been hand processed and has a bunch custom formatting done to it, and the other is an auto-generated file. Doing the merge in pandas is simple enough, but preserving the formatting is proving troublesome. I found the styleframe library, which seems like it should simplify what I'm trying to do, as it can import style info in addition to the raw data. However, I'm having problems actually implementing the code.
My questions is this: how can I pull style information from each individual cell in the excel and then apply that to my merged dataframe? Note that the data is not formatted consistently across columns or rows, so I don't think I can apply styles in this manner. Here's the relevant portion of my code:
#iterate thorough all cells of merged dataframe
for rownum, row in output_df.iterrows():
for column, value in row.iteritems():
filename = row['File Name']
cur_style = orig_excel.loc[orig_excel['File Name'] == filename, column][0].style #pulls the style of relevant cell in the original excel document
target_style = output_df.loc[output_df['File Name'] == filename, column][0].style #style of the cell in the merged dataframe
target_style = cur_style #set style in current output_df cell to match original excel file style
This code runs (slowly) but it doesn't seem to actually apply any styling to the output styleframe
Looking through the documentation, I don't really see a method for applying styles at an individual styleframe container level--everything is geared towards doing it as a row or column. It also seems like you need to use a styler object to set the style.
Figured it out. I rejiggered my dataframe so that I could just us a .at instead of a .loc lookup. This, coupled with the apply_style_by_indexes method got me where I needed to be:
for index, row in orig_excel.iterrows():
for column, value in row.iteritems():
index_num = output_df.index.get_loc(index)
#Pull style to copy to new df
cur_style = orig_excel.at[index, column].style
#Apply original style to new df
output_df.apply_style_by_indexes(output_df.index[index_num],
cur_style,
cols_to_style = column)

Is this possible in python with pandas or xlwt

I have an existing excel. That looks like
and I have another excel that has around 40000 rows and around 300 columns. shortened version looks like
I would like to append values to my existing excel from second excel. But only values that match values in col4 from my existing excel. So i would get something like this
Hope you guys get the picture of what I am trying to do.
yes, that is possible in pandas and it is way faster than anything in excel
df_result = pd.merge(FirstTable, SecondTable, how='left', on='col4')
this will look into both the tables for column "col4" so it needs to be named this way in both the tables.
Also be aware of the fact that if you have multiple values in second table for single value in the first table it will make as many lines in the result as in the second table.
to read the excel you can use:
import pandas as pd
xl=pd.ExcelFile('MyFile.xlsx')
FirstTable = pd.read_excel(xl, 'sheet_name_FIRST_TABLE')
for more detailed description see documentation

Changing date-format to text in xlsx using openpyxl

I have written a script that reads from excel workbooks and writes new workbooks.
Each row is a separate object, and one of the columns is a date.
I have written the date as a NamedStyle using datetime to get what I think is the correct format:
date_style = NamedStyle(name='datetime', number_format='YYYY-MM-DD')
for row in range(2,ws_kont.max_row+1):
ws_kont.cell(row = row, column = 4).style = date_style
The problem is that i need to import this excel workbook to an ancient database who for some reason dont accept a date-formating, only text like this "yyyy-dd-mm".
I'm having trouble rewriting these cells as text.
I have tried using the =TEXT formula, but that wont work since you cant use the cell itself to calculate the result unless i duplicate the column for referencing in the formula:
name = str(ws_teg.cell(row = row, column = 4).coordinate)
date_f = "yyyy-mm-dd"
ws_kont[name] = "=TEXT(%s,%s)" % (name, date_f)
I need to do this a bunch of places in a couple of scripts, so I'm wondering if there is a simpler way to do this?
PS. I'm just a archaeologist trying to automate some tasks in my workday by dabbling in some simple code, please go easy on me if I seem a bit slow.
Found another article that worked out well with minmal code:
writer = pd.ExcelWriter('Sample_Master_Data_edited.xlsx', engine='xlsxwriter',
date_format='mm/dd/yyyy', datetime_format='mm/dd/yyyy')
Reference
Most likely, it won't be enough to change the format of your date - you'll have to store the date as a string instead of a datetime object.
Loop over the column and format the dates with datetime.strftime:
for row in range(1, ws_kont.max_row+1):
cell = ws_kont.cell(row = row, column = 4)
cell.value = cell.value.strftime('%Y-%m-%d')

Categories