How to open, delete columns and save a xls file in python - python

I need to know how to open a xls file that is already made, I want to delete some columns and then save the file. This is what I have but I get an error when I want to delete the columns. How do I use the DataFrame function to delete columns and then save.
Read in excel file
Workbook = xlrd.open_workbook("C:/Python/Python37/Files/firstCopy.xls", on_demand=True)
worksheet = Workbook.sheet_by_name("Sheet1")
Delete a column
df.DataFrame.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1, inplace=True)
Workbook.save('output.xls')

Without seeing your dataset and error it is hard to tell what is going on. See How to Ask and how to create a Minimal, Complete, and Verifiable example.
Here's what I would suggest:
import pandas as pd
df = pd.read_excel('firstCopy.xls')
df.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()

import pandas as pd
df = pd.read_excel('firstCopy.xls')
df.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1, inplace = True)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()

Related

AttributeError: 'ExcelFile' object has no attribute 'dropna'

I was trying to remove the empty column in an excel file using pandas using dropna() method. But I ended up with the above error message. Please find my code below :
import pandas as pd
df = pd.ExcelFile("1.xlsx")
print(df.sheet_names)
#df.dropna(how='all', axis=1)
newdf = df.dropna()
Please provide more code and context, but this might help:
import pandas as pd
excel_file_name = 'insert excel file path'
excel_sheet_name = 'insert sheet name'
# create dataframe from desired excel file
df = pd.read_excel(
excel_file_name,
engine='openpyxl',
sheet_name=excel_sheet_name
)
# drop columns with NaN values and write that into df
# # without the inplace option it would have to be
# < df = df.dropna(axis=1) >
df.dropna(axis=1, inplace=True)
# write that dataframe to excel file
with pd.ExcelWriter(
excel_file_name, # file to write to
engine='openpyxl', # which engine to use
mode='a', # use mode append (has to be used for if_sheet_exists to work)
if_sheet_exists='replace' # if that sheet exists, replace it
) as writer:
df.to_excel(writer, sheet_name=excel_sheet_name)

Skip Columns with pandas

problem
I have first concatenating all data from the available excel files into a single dataframe and then writing that dataframe into a new excel file. However, I would like to do 2 simple things:
a leave a 2columns blank for each new dataframe that will be appended
b the headers and the bold formatting has disappeared after appending the dataframes. see a pic of how one excelFile initially looked Original formatting
attempt This is my attempt Two Seperate DataFrames
data = []
for excel_file in excel_files:
print(excel_file) # the name for the dataframe
data.append(pd.read_excel(excel_file, engine="openpyxl"))
df1 = pd.DataFrame(columns=['DVT', 'Col2', 'Col3']) #blank df maybe?!this line is not imp!
#df1.style.set_properties(subset=['DVT'], {'font-weight:bold'}) !this line is not imp!
# concatenate dataframes horizontally
df = pd.concat(data, axis=1)
# save combined data to excel
df.to_excel(excelAutoNamed, index=False)
I don't have Excel available right now, so I can't test, but something like this might be a good approach.
# Open the excel document using a context manager in 'append' mode.
with pd.ExcelWriter(excelAutoNamed, mode="a", engine="openpyxl", if_sheet_exists="overlay") as writer:
for excel_file in excel_files:
print(excel_file)
# Append Dataframe to Excel File.
pd.read_excel(excel_file, engine="openpyxl").to_excel(writer, index=False)
# Append Dataframe with two blank columns to File.
pd.DataFrame([np.nan, np.nan]).T.to_excel(writer, index=False, header=False)

Xlsxwriter writer is writing its own sheets and deletes existing ones

I am wring dataframes to excel. Maybe I am not doing it correctly,
When I use this code:
from datetime import datetime
import numpy as np
import pandas as pd
from openpyxl import load_workbook
start = datetime.now()
df = pd.read_excel(r"C:\Users\harsh\Google Drive\Oddsportal\Files\Oddsportal "
r"Data\Historical Worksheet\data.xlsx", sheet_name='x1')
df['run_time'] = start
df1 = pd.read_csv(r"C:\Users\harsh\Google Drive\Oddsportal\Files\Oddsportal "
r"Data\Pre-processed\oddsportal_upcoming_matches.csv")
df1['run_time'] = start
concat = [df, df1]
df_c = pd.concat(concat)
path = r"C:\Users\harsh\Google Drive\Oddsportal\Files\Oddsportal Data\Historical Worksheet\data.xlsx"
writer = pd.ExcelWriter(path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='x1')
df1.to_excel(writer, sheet_name='x2')
df_c.to_excel(writer, sheet_name='upcoming_archive')
writer.save()
writer.close()
print(df_c.head())
The dataframes are written in their respective sheets and all the other existing sheets get deleted.
How can i write to only the respective sheets and not disturb the other existing ones?
xlsxwriter is Not meant to alter an existing xlsx file. The only savier is openpyxl, which does the job but is hard to learn. I even wrote a simple python script to fill the gap to write a bunch of rows or columns in a sheet - openpyxl_writers.py
You just need to use the append mode and set if_sheet_exists to replace and use openpyxl as engine.
Replace:
writer = pd.ExcelWriter('test.xlsx')
By:
writer = pd.ExcelWriter('test.xlsx', mode='a', engine='openpyxl',
if_sheet_exists='replace') # <- HERE
From the documentation:
mode{‘w’, ‘a’}, default ‘w’

Can I export a dataframe to excel as the very first sheet?

Running dataframe.to_excel() automatically saves the dataframe as the last sheet in the Excel file.
Is there a way to save a dataframe as the very first sheet, so that, when you open the spreadsheet, Excel shows it as the first on the left?
The only workaround I have found is to first export an empty dataframe to the tab with the name I want as first, then export the others, then export the real dataframe I want to the tab with the name I want. Example in the code below. Is there a more elegant way? More generically, is there a way to specifically choose the position of the sheet you are exporting to (first, third, etc)?
Of course this arises because the dataframe I want as first is the result of some calculations based on all the others, so I cannot export it.
import pandas as pd
import numpy as np
writer = pd.ExcelWriter('My excel test.xlsx')
first_df = pd.DataFrame()
first_df['x'] = np.arange(0,100)
first_df['y'] = 2 * first_df['x']
other_df = pd.DataFrame()
other_df['z'] = np.arange(100,201)
pd.DataFrame().to_excel(writer,'this should be the 1st')
other_df.to_excel(writer,'other df')
first_df.to_excel(writer,'this should be the 1st')
writer.save()
writer.close()
It is possible to re-arrange the sheets after they have been created:
import pandas as pd
import numpy as np
writer = pd.ExcelWriter('My excel test.xlsx')
first_df = pd.DataFrame()
first_df['x'] = np.arange(0,100)
first_df['y'] = 2 * first_df['x']
other_df = pd.DataFrame()
other_df['z'] = np.arange(100,201)
other_df.to_excel(writer,'Sheet2')
first_df.to_excel(writer,'Sheet1')
writer.save()
This will give you this output:
Add this before you save the workbook:
workbook = writer.book
workbook.worksheets_objs.sort(key=lambda x: x.name)

Python pandas merge and save with existed sheets

i want merge multi excel file(1.xlsm, 2.xlsm....) to [A.xlsm] file with macro, 3sheets
so i try to merge
# input_file = (./*.xlsx)
all_data = pd.DataFrame()
for f in (input_file):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True, sort=False)
writer = pd.ExcelWriter(A.xlsm, engine='openpyxl')
all_data.to_excel(writer,'Sheet1')
writer.save()
the code dose not error,
but result file[A.xlsm] is error to open,
so i change extension to A.xlsx and open.
it opening is OK but disappear all Sheets and macro.
how can i merge multi xlsx file to xlsm file with macro?
I believe that if you want to use macro-enabled workbooks you need to load them with keep_vba=True:
from openpyxl import load_workbook
XlMacroFile = load_workbook('A.xlsm',keep_vba=True)
To preserve separate sheets, you can do something like
df_list = #list of your dataframes
filename = #name of your output file
with pd.ExcelWriter(filename) as writer:
for df in df_list:
df.to_excel(writer, sheet_name='sheet_name_goes_here')
This will write each dataframe in a separate sheet in your output excel file.

Categories