Python pandas merge and save with existed sheets - python

i want merge multi excel file(1.xlsm, 2.xlsm....) to [A.xlsm] file with macro, 3sheets
so i try to merge
# input_file = (./*.xlsx)
all_data = pd.DataFrame()
for f in (input_file):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True, sort=False)
writer = pd.ExcelWriter(A.xlsm, engine='openpyxl')
all_data.to_excel(writer,'Sheet1')
writer.save()
the code dose not error,
but result file[A.xlsm] is error to open,
so i change extension to A.xlsx and open.
it opening is OK but disappear all Sheets and macro.
how can i merge multi xlsx file to xlsm file with macro?

I believe that if you want to use macro-enabled workbooks you need to load them with keep_vba=True:
from openpyxl import load_workbook
XlMacroFile = load_workbook('A.xlsm',keep_vba=True)

To preserve separate sheets, you can do something like
df_list = #list of your dataframes
filename = #name of your output file
with pd.ExcelWriter(filename) as writer:
for df in df_list:
df.to_excel(writer, sheet_name='sheet_name_goes_here')
This will write each dataframe in a separate sheet in your output excel file.

Related

Pandas: export all csv files from multiple xlsx files

I have many xlsx files in directory with multiple sheets inside. Without open file I don't know how sheets are called.
I want to export all sheets to csv files, one sheet = one csv file.
I know that my code is very ugly and it is not optimized well.
The header.txt file will help me in next step to create a dict to rename columns according to my pattern, so please ignore this part :)
import glob
xls_files = glob.glob('**/*.xlsx')
header = []
for xls_file in xls_files:
print(f'{xls_file =}')
file_name = xls_file.replace("xlsx\\","").replace(' ',"_").split('.')[0]
print(f'{file_name =}')
df = pd.read_excel(xls_file, sheet_name=None)
# for sheet in df.keys():
# print(f'{sheet =}')
# sheet_name = sheet.replace(".","_").replace(' ',"_")
# csv_file_name = (f'{file_name}_{sheet_name}.csv')
# sheet.to_csv(csv_file_name ,index=False)
# print(sheet.head())
for sheet in df.keys():
print(f'{sheet =}')
df_temp = pd.read_excel(xls_file, sheet_name=sheet)
sheet_name = sheet.replace(".","_").replace(' ',"_")
csv_file_name = (f'{file_name}_{sheet_name}.csv')
print(f'{csv_file_name =}')
for column in df_temp.columns:
if column not in header:
header.append(column)
print(column)
df_temp.to_csv(csv_file_name, index=False)
with open('header.txt' 'a+') as f:
for ele in header:
f.write(ele)
`
How can I improve my code to performance better and to not reread the same excel file to get into one sheet?
This works for me but it is very slow.

How to concat excels with multiple sheets into one excel?

The folder contains at least 20 excels. Each excel contains nine sheets. These excels have same type of sheets (same header but different data). I need to concat these 20 excels sheet by sheet into one excel. And the first two sheets in each excel are instruction. They are skippable. How can I achieve this? Thanks!
Example: File A Sheet 3, File B sheet 3, File A sheet 4, File B sheet 4
So eventually the combination file will be like:
I had to do something similair a while back:
This code should do the trick for you:
import pandas as pd
import os
collection = {}
for file in os.listdir():
if file.endswith(".xlsx"):
mysheets = pd.ExcelFile(file)
mysheetnames = mysheets.sheet_names
for i in mysheetnames[2:]: #change the 2 in [2:] to change how many sheets you delete
mydata = pd.read_excel(file, i)
combi = collection.get(i, [])
collection[i] = combi + [mydata]
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
for key in collection:
myresult = pd.concat(collection.get(key), sort=False)
myresult.to_excel(writer, sheet_name=key)
writer.save()

Python: How to copy Excel worksheet from multiple Excel files to one Excel file that contains all the worksheets from other Excel files

It's my first time to use pandas, I have multiple excel files, that i want to combine all into one Excel file using python pandas.
I managed to merge the content of the first sheets in each excel file into one sheet in a new excel file like this shown in the figure below:
combined sheets in one sheet
I wrote this code to implement this:
import glob
import pandas as pd
path = "C:/folder"
file_identifier = "*.xls"
all_data = pd.DataFrame()
for f in glob.glob(path + "/*" + file_identifier):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
writer = pd.ExcelWriter('combined.xls', engine='xlsxwriter')
all_data.to_excel(writer, sheet_name='Summary Sheet')
writer.save()
file_df = pd.read_excel("C:/folder/combined.xls")
# Keep only FIRST record from set of duplicates
file_df_first_record = file_df.drop_duplicates(subset=["Test summary", "Unnamed: 1", "Unnamed: 2",
"Unnamed: 3"], keep="first")
file_df_first_record.to_excel("filtered.xls", index=False, sheet_name='Summary Sheet')
But I have two issues:
How to remove cells that has "Unnamed" as shown in the previous figure
How to copy other worksheets (the second worksheet in each Excel file, not the first worksheet) from all other Excel files and put it in one Excel file with multiple worksheets and with different students names like shown in the picture.
all worksheets in one excel file
So i managed to combine worksheet1 from all Excel files in one sheet, but now I want to copy A, B, C, D, E worksheets into one Excel file that has all other remaining worksheets in other Excel files.
Each Excel file of the ones I have looks like this
single excel file
If you want to have all data gathered together in one worksheet you can use the following script:
Put all excel workbooks (i.e. excel files) to be processed into a
folder (see variable paths).
Get the paths of all workbooks in that folder using
glob.glob.
Return all worksheets of each workbook with read_excel(path, sheet_name=None) and prepare them for merging.
Merge all worksheets with pd.concat.
Export the final output to_excel.
import pandas as pd
import glob
paths = glob.glob(r"C:\excelfiles\*.xlsx")
path_save = r"finished.xlsx"
df_lst = [pd.read_excel(path, sheet_name=None).values() for path in paths]
df_lst = [y.transpose().reset_index().transpose() for x in df_lst for y in x]
df_result = pd.concat(df_lst, ignore_index=True)
df_result.to_excel(path_save, index=False, header=False)

Python: Writing Images and dataframes to the same excel file

I'm creating an excel dashboard and I want to generate an excel workbook that has some dataframes on half of the sheets, and .png files for the other half. I'm having difficulty writing them to the same file in one go. Here's what I currently have. It seems that when I run my for loop, it won't let me add additional worksheets. Any advice on how I might get my image files added to this workbook? I can't find anything about why I can't add any more worksheets Thanks!
dfs = dict()
dfs['AvgVisitsData'] = avgvisits
dfs['F2FCountsData'] = f2fcounts
writer = pd.ExcelWriter("MyData.xlsx", engine='xlsxwriter')
for name, df in dfs.items():
df.to_excel(writer, sheet_name=name, index = False)
Then I want to add a couple sheets with some images to the same excel workbook. Something like this, but where I'm not creating a whole new workbook.
workbook = xlsxwriter.Workbook('MyData.xlsx')
worksheet = workbook.add_worksheet('image1')
worksheet.insert_image('A1', 'MemberCollateral.png')
Anyone have any tips to work around this?
Here is an example of how to get a handle to the underlying XlsxWriter workbook and worksheet objects and insert an image:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_image.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Insert an image.
worksheet.insert_image('D3', 'logo.png')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
See also Working with Python Pandas and XlsxWriter in the XlsxWriter docs for more examples
Here's the solution I came up with. I still cound't find a way to do this without re-importing the workbook with load_workbook but this got the job done.
# assign dataframes to dictionary and export them to excel
avgvisits = pd.DataFrame(pd.read_sql(avgvisits(), cnxn))
f2fcounts = pd.DataFrame(pd.read_sql(f2fcounts(), cnxn))
activityencounters = pd.DataFrame(pd.read_sql(ActivityEncounters(), cnxn))
activityencountersp = activityencounters.pivot_table(values='ActivityCount', index = ['Activity'], columns= ['QuarterYear'], aggfunc=np.max)
dfs = dict()
dfs['AvgVisitsData'] = avgvisits
dfs['F2FIndirect'] = f2fcounts
dfs['ActivityEncounters'] = activityencountersp
writer = pd.ExcelWriter("MyData.xlsx", engine='xlsxwriter')
for name, df in dfs.items():
if name != 'ActivityEncounters':
df.to_excel(writer, sheet_name=name, index=False)
else:
df.to_excel(writer, sheet_name=name, index=True)
writer.save()
writer.close()
# re-import the excel book and add the graph image files
wb = load_workbook('MyData.xlsx')
png_loc = 'MemberCollateral.png'
wb.create_sheet('MemberCollateralGraph')
ws = wb['MemberCollateralGraph']
my_png = openpyxl.drawing.image.Image(png_loc)
ws.add_image(my_png, 'A1')
png_loc = 'DirectIndirect.png'
ws = wb['F2FIndirect']
my_png = openpyxl.drawing.image.Image(png_loc)
ws.add_image(my_png, 'A10')
png_loc = 'QuarterlyActivitySummary.png'
ws = wb['ActivityEncounters']
my_png = openpyxl.drawing.image.Image(png_loc)
ws.add_image(my_png, 'A10')
wb.save('MyData.xlsx')

Python - Multiple XLSX/XLSM to CSV

I have a folder with multiple *.xlsm-files for example "80-384sec -_november_2017.xlsm", "80-384sec -_december_2017.xlsm", ..... I can read a specific sheet from this file with python like this:
df_xlsmtocsv = pd.read_excel('80-384sec -_november_2017.xlsm', 'sheet3, index_col=None )
And my first solution is something like this:
for file in os.listdir():
if file.endswith(".xlsm"):
df_qctocsv = pd.read_excel(file, 'sheet3', index_col=None )
print(df_qctocsv)
with open('all.csv', 'a') as f:
df_qctocsv.to_csv(f, index=True, header=None)
How can I read multiple xlsm-files and append all new messages to a csv-file and order this for example by first column?
After converting I want to copy all this rows from the csv-file to a new sheet in an existing file "messages.xlsx".
There is a lot of ways in which you can join data frames. One possible way is this:
import pandas as pd
df = pd.DataFrame()
for file in os.listdir():
if file.endswith(".xlsm"):
df_tmp = pd.read_excel(file, 'Sheet1', index_col=None)
df = df.append(df_tmp)
df.to_csv('all.csv')
EDIT: If you want to add your dataframe to an existing xlsx file (adapted from here):
from openpyxl import load_workbook
book = load_workbook('<your-xlsx-file>')
wrt = pd.ExcelWriter('<your-output-file>', engine='openpyxl')
wrt.book = book
wrt.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(wrt, '<name-of-your-sheet>')
wrt.save()

Categories