Pandas: Cant Insert Pivot Table information into a different Sheet

Pandas: Cant Insert Pivot Table information into a different Sheet - python

I have a code where I convert a txt to xlsx, then add a column with formulas and then I want to create a Pivot Table with that information in a different Sheet. The code works without errors but it creates and empty Sheet instead of a Sheet with information.
So the code looks like this:
import numpy as np
import openpyxl
#Transforming our txt to xlsx
path = r"C:\Users\roslber\Desktop\Codes\Python\Projects\Automated routes.xlsx"
rssdata= pd.read_csv("dwp.txt", sep="\t")
rssdata.to_excel(path, index= None , header= True)
#Writing the formula column
wb = openpyxl.load_workbook(filename=path)
ws1 = wb["Sheet1"]
ws1["AC1"] = "CF Weight"
row_count= ws1.max_row
actual_row= 2
while actual_row <= row_count: #writting the formula in every row
r= str(actual_row)
ws1["AC"+r] = "=(O"+r + "*P"+r +"*Q"+r +")/28316.8"
actual_row= actual_row + 1
#Creating a new sheet with the pivot tables
df = pd.read_excel(path, 0, header= 0) #defining pivot table dataframe
wb.create_sheet("Sheet2")
pv_pack = pd.pivot_table(df, values=["actual_service_time"],\
index=["delivery_station_code"], columns=["cluster_prefix"], aggfunc=np.sum) #constructing the pivot table
print(pv_pack)
with pd.ExcelWriter(path, mode="a") as writer:
pv_pack.to_excel(writer, sheet_name="Sheet2")
writer.save() #inserting pivot table in sheet2
wb.save(path)
For data protection reasons I can´t show you the information inside the pivot table but when I print it I can see exactly what I want. The problem is that, although a Sheet2 is created correctly, The information that I can see printed doesn't appear in Sheet2. Why is this happening?
I have checked these questions:
Trouble writing pivot table to excel file
How to save a new sheet in an existing excel file, using Pandas?
Regarding to the first one, apparently openpyxl can't create a Pivot Table, but I actually don't need a Pivot Table format, I just need the pv_pack information in Sheet2 as its shown when I print it.
I tried to change my code to imitate what they did in the second question but it didn't work.
Thank you in advance
Edit answering to RJ Adriaansen:
The information in Sheet1 would look like this:
id order mtd delivery_station_code cluster_prefix actual_service_time
xh aabb1 one 1 One_ 231
xr aabb2 two 2 Two_ 135
xd aabb3 three 3 One_ 80
xh aabb8 two 1 Two_ 205
xp aabb9 three 2 One_ 1
xl aabb10 one 3 Two_ 115
And the code printed in my editor looks like this:
delivery_station_code One_ Two_
1 231 205
2 1 135
3 80 115

with automatically closes the file, so there is no need to try to save it manually. It is also not needed to create the second sheet prior to writing it. Removing writer.save() and moving wb.save(path) up will make the code work.
#Writing the formula column
wb = openpyxl.load_workbook(filename=path)
ws1 = wb["Sheet1"]
ws1["AC1"] = "CF Weight"
row_count= ws1.max_row
actual_row= 2
while actual_row <= row_count: #writting the formula in every row
r= str(actual_row)
ws1["AC"+r] = "=(O"+r + "*P"+r +"*Q"+r +")/28316.8"
actual_row= actual_row + 1
wb.save(path)
#Creating a new sheet with the pivot tables
df = pd.read_excel(path, 0, header= 0) #defining pivot table dataframe
pv_pack = pd.pivot_table(df, values=["actual_service_time"],\
index=["delivery_station_code"], columns=["cluster_prefix"], aggfunc=np.sum) #constructing the pivot table
with pd.ExcelWriter(path, mode="a") as writer:
pv_pack.to_excel(writer, sheet_name="Sheet2")

Related

Format and manipulate data across multiple Excel sheets in Python using openpyxl before converting to Dataframe

I need some help with editing the sheets within my Excel workbook in python, before I stack the data using pd.concat(). Each sheet (~100) within my Excel workbook is structured identically, with the unique identifier for each sheet being a 6-digit code that is found in line 1 of the worksheet.
I've already done the following steps to import the file, unmerge rows 1-4, and insert a new column 'C':
import openpyxl
import pandas as pd
wb = openpyxl.load_workbook('data_sheets.xlsx')
for sheet in wb.worksheets:
sheet.merged_cells
for merge in list(sheet.merged_cells):
sheet.unmerge_cells(range_string=str(merge))
sheet.insert_cols(3, 1)
print(sheet)
wb.save('workbook_test.xlsx')
#concat once worksheets have been edited
df= pd.concat(pd.read_excel('workbook_test.xlsx, sheet_name= None), ignore_index= True)
Before stacking the data however, I would like to make the following additonal (sequential) changes to every sheet:
Extract from row 1 the right 8 characters (in excel the equivalent of this would be =RIGHT(A1, 8) - this is to pull the unique code off of each sheet, which will look like '(000000)'.
Populate column C from rows 6-282 with the unique code.
Delete rows 1-5
The end result would make each sheet within the workbook look like this:
Is this possible to do with openpyxl, and if so, how? Any direction or assistance with this would be much appreciated - thank you!

Here is a 100% openpyxl approach to achieve what you're looking for :
from openpyxl import load_workbook
wb = load_workbook("workbook_test.xlsx")
for ws in wb:
ws.unmerge_cells("A1:O1") #unmerge first row till O
ws_uid = ws.cell(row=1, column=1).value[-8:] #get the sheet's UID
for num_row in range(6, 282):
ws.cell(row=num_row, column=3).value = '="{}"'.format(ws_uid) #write UID in Column C
ws.delete_rows(1, 5) #delete first 5 rows
wb.save("workbook_test.xlsx")
NB : This assume there is already an empty column (C).

Append data frame and text to existing excel file sheets in a for loop using python

I'm trying to write a functional code that will read data from multiple excel sheets, carry out some calculations, and then append a summary to the bottom of data in the excel sheet where the data was read from.
An example excel sheet data or data frame:
ID APP
1 20
2 50
3 79
4 34
5 7
6 5
7 3
8 78
Required output: summary output starts 2 rows below the original data as below
ID APP
1 20
2 50
3 79
4 34
5 7
6 5
7 3
8 78
Sumary:
Total=276
My attempt:
import pandas as pd
from excel import append_df_to_excel
path = 'data.xlsx'
Exls = pd.ExcelFile(path, engine='openpyxl')
for sheet in Exls.sheet_names:
try:
df = pd.read_excel(Exls,sheet_name=sheet)
res=df.groupby['App'].sum
writer = pd.ExcelWriter('data.xlsx', engine='xlsxwriter')
df.to_excel(writer, res, sheet_name='Sheet1', startrow = 1, index=False)
workbook = writer.book
worksheet = writer.sheets[data']
text = 'summary'
worksheet.write(0, 0, text)
writer.save()
except Exception:
continue
This code does not append any result to the excel file. Has anyone got better ideas?

The try..catch is the reason your sheet isn't updating. You have a few bugs in your code that are silently passing (the casing in the APP column does not match the spreadsheet, the groupby statement is malformed, you're missing an open quote on 'data', you're trying to write the summary to the top of the sheet instead of the bottom). Those are easy enough to patch up, though, your biggest issue is that you're trying to open an already-open file (actually, you try to open and close it for each sheet).
So what you're going to want to do is move the file operations outside the for-loop, and try to just have one file open and one file save. You can get a lot more concise that way like this:
import pandas as pd
path = 'data.xlsx'
xl = pd.ExcelFile(path)
writer = pd.ExcelWriter(xl, engine='openpyxl', mode='a')
workbook = writer.book
for worksheet in workbook.worksheets:
sheet_name = worksheet.title
df = xl.parse(sheet_name=sheet_name)
sheet_end = len(df) + 2
total = df.APP.sum()
worksheet[f'A{sheet_end}'] = 'Summary:'
worksheet[f'A{sheet_end+1}'] = f'total={total}'
writer.save()

Use:
import pandas as pd
xl = pd.ExcelFile('01.xlsx')
sheets = xl.sheet_names
writer = pd.ExcelWriter('output1.xlsx')
for sheet in sheets:
df = pd.read_excel('01.xlsx', sheet_name=sheet)
sum_ = pd.DataFrame({'ID': ['Summary:'], 'App': [df['App'].sum()]})
df = df.append(sum_, ignore_index=True)
df.to_excel(writer, sheet_name=sheet, index=False)
writer.save()
writer.close()
Note that 01.xlsx is a file with some sheets similar to the following image, and the output1.xlsx is the output file.

Copy column of cell values from one workbook to another with openpyxl

I am extracting data from one workbook's column and need to copy the data to another existing workbook.
This is how I extract the data (works fine):
wb2 = load_workbook('C:\\folder\\AllSitesOpen2.xlsx')
ws2 = wb2['report1570826222449']
#Extract column A from Open Sites
DateColumnA = []
for row in ws2.iter_rows(min_row=16, max_row=None, min_col=1, max_col=1):
for cell in row:
DateColumnA.append(cell.value)
DateColumnA
The above code successfully outputs the cell values in each row of the first column to DateColumnA
I'd like to paste the values stored in DateColumnA to this existing destination workbook:
#file to be pasted into
wb3 = load_workbook('C:\\folder\\output.xlsx')
ws3 = wb3['Sheet1']
But I am missing a piece conceptually here. I can't connect the dots. Can someone advise how I can get this data from my source workbook to the new destination workbook?

Lets say you want to copy the column starting in cell 'A1' of 'Sheet1' in wb3:
wb3 = load_workbook('C:\\folder\\output.xlsx')
ws3 = wb3['Sheet1']
for counter in range(len(DateColumnA)):
cell_id = 'A' + str(counter + 1)
ws3[cell_id] = DateColumnA[counter]
wb3.save('C:\\folder\\output.xlsx')

I ended up getting this to write the list to another pre-existing spreadsheet:
for x, rows in enumerate(DateColumnA):
ws3.cell(row=x+1, column=1).value = rows
#print(rows)
wb3.save('C:\\folder\\output.xlsx')
Works great but now I need to determine how to write the data to output.xlsx starting at row 16 instead of row 1 so I don't overwrite the first 16 existing header rows in output.xlsx. Any ideas appreciated.

I figured out a more concise way to write the source data to a different starting row on destination sheet in a different workbook. I do not need to dump the values in to a list as I did above. iter_rows does all the work and openpyxl nicely passes it to a different workbook and worksheet:
row_offset=5
for rows in ws2.iter_rows(min_row=2, max_row=None, min_col=1, max_col=1):
for cell in rows:
ws3.cell(row=cell.row + row_offset, column=1, value=cell.value)
wb3.save('C:\\folder\\DestFile.xlsx')

Read Excel with multiple headers and unnamed column

I recieve some Excel files like that :
USA UK
plane cars plane cars
2016 2 7 1 3 # a comment after the last country
2017 3 1 8 4
There is an unknown amount of countries and there can be a comment after the last column.
When I read the Excel file like that...
df = pd.read_excel(
sourceFilePath,
sheet_name = 'Sheet1',
index_col = [0],
header = [0, 1]
)
... I have a value error :
ValueError: Length of new names must be 1, got 2
The problem is I cannot use the usecols param because I don't know how many countries there is before reading my file.
How can I read such a file ?

It's possible Pandas won't be able to fix your special use case, but you can write a program that fixes the spreadsheet using openpyxl. It has really clear documentation, but here's an overview of how to use it:
import openpyxl as xl
wb = xl.load_workbook("ExampleSheet.xlsx")
for sheet in wb.worksheets:
print("Sheet Title => {}".format(sheet.title))
print("Dimensions => {}".format(sheet.dimensions)) # just returns a string
print("Columns: {} <-> {}".format(sheet.min_column, sheet.max_column))
print("Rows: {} <-> {}".format(sheet.min_row, sheet.max_row))
for r in range(sheet.min_row, sheet.max_row + 1):
for c in range(sheet.min_column, sheet.max_column + 1):
if (sheet.cell(r,c).value != None):
print("Cell {}:{} has value {}".format(r,c,sheet.cell(r,c).value))

what about just using pd.read_csv?
once loaded, you can then determine how many columns you have with df.columns

Trouble writing pivot table to excel file

I am using pandas/openpyxl to process an excel file and then create a pivot table to add to a new worksheet in the current workbook. When I execute my code, the new sheet gets created but the pivot table does not get added to the sheet.
Here is my code:
worksheet2 = workbook.create_sheet()
worksheet2.title = 'Sheet1'
workbook.save(filename)
excel = pd.ExcelFile(filename)
df = excel.parse(sheetname=0)
df1 = df[['Product Description', 'Supervisor']]
table1 = pd.pivot_table(df1, index = ['Supervisor'],
columns = ['Product Description'],
values = ['Product Description'],
aggfunc = [lambda x: len(x)], fill_value = 0)
print table1
writer = pd.ExcelWriter(filename)
table1.to_excel(writer, 'Sheet1')
writer.save()
workbook.save(filename)
When I print out my table I get this:
<lambda> \
Product Description EXPRESS 10:30 (doc) EXPRESS 10:30 (nondoc)
Supervisor
Building 0 1
Gordon 1 0
Pete 0 0
Vinny A 0 1
Vinny P 0 1
\
Product Description EXPRESS 12:00 (doc) EXPRESS 12:00 (nondoc)
Supervisor
Building 0 4
Gordon 1 2
Pete 1 0
Vinny A 1 1
Vinny P 0 1
Product Description MEDICAL EXPRESS (nondoc)
Supervisor
Building 0
Gordon 1
Pete 0
Vinny A 0
Vinny P 0
I would like the pivot table to look like this: (if my pivot table code won't make it look like this could someone help me make it look like that? I'm not sure how to add the grand total column. It has something to do with the aggfunc portion of the pivot table right?)

You can't do this because openpyxl does not currently support pivot tables. See https://bitbucket.org/openpyxl/openpyxl/issues/295 for further information.

Since pd.pivot_table returns a dataframe, you can just write the dataframe into excel.
Here is how I write my output from a pandas dataframe to an excel template.
Please note that if data is already present in the cells where you are trying to write the dataframe, it will not be overwritten and the dataframe will be written to a new sheet which is my i have included a step to clear existing data from the template. I have not tried to write output on merged cells so that might throw an error.
Setup
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
file_path='Template.xlsx'
book=load_workbook(file_path)
writer = pd.ExcelWriter(file_path, engine='openpyxl')
writer.book = book
sheet_name="Template 1"
sheet=book[sheet_name]
Set first row and first column in the excel template where output is to be pasted.
If my output is to be pasted starting in cell N2, row_start will be 2 and col_start will be 14
row_start=2
col_start=14
Clear existing data in excel template
for c_idx, col in enumerate(df.columns,col_start):
for r_idx in range(row_start,10001):
sheet.cell(row=r_idx, column=c_idx, value="")
Write dataframe to excel template
rows=dataframe_to_rows(df,index=False)
for r_idx, row in enumerate(rows,row_start):
for c_idx, col in enumerate(row,col_start):
sheet.cell(row=r_idx, column=c_idx, value=col)
writer.save()
writer.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas: Cant Insert Pivot Table information into a different Sheet - python

Related

Format and manipulate data across multiple Excel sheets in Python using openpyxl before converting to Dataframe

Append data frame and text to existing excel file sheets in a for loop using python

Copy column of cell values from one workbook to another with openpyxl

Read Excel with multiple headers and unnamed column

Trouble writing pivot table to excel file

Categories

Resources