I have an excel worksheet, some buttons and some macros. I use xlwings to make it work. Is there a way to save the workbook through xlwings ? I want to extract a specific sheet after doing an operation, but the saved sheet is the extracted sheet before the operation without the generated data.
My code for extracting the sheet I need is the following:
Set objFSO = CreateObject("Scripting.FileSystemObject")
src_file = objFSO.GetAbsolutePathName(Wscript.Arguments.Item(0))
sheet_name = Wscript.Arguments.Item(1)
dir_name = Wscript.Arguments.Item(2)
file_name = Wscript.Arguments.Item(3)
Dim objExcel
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = False
Dim objWorkbook
Set objWorkbook = objExcel.Workbooks(src_file)
objWorkbook.Sheets(sheet_name).Copy
objExcel.DisplayAlerts = False
objExcel.ActiveWorkbook.SaveAs dir_name + file_name + ".xlsx", 51
objExcel.ActiveWorkbook.SaveAs dir_name + file_name + ".csv", 6
objWorkbook.Close False
objExcel.Quit
Book.save() has now been implemented: see the docs.
Related
I get this error
TypeError: 'Workbook' object is not subscriptable
when i run this code
import xlsxwriter
from openpyxl import load_workbook
in_folder = r'xxx' #Input folder
out_folder = r'xxx' #Output folder
if not os.path.exists(out_folder):
os.makedirs(out_folder)
file_exist = False
dir_list = os.listdir(in_folder)
for xlfile in dir_list:
if xlfile.endswith('.xlsx') or xlfile.endswith('.xls'):
file_exist = True
str_file = os.path.join(in_folder, xlfile)
work_book = xlsxwriter.Workbook(filename=str_file)
work_sheet = work_book['test1'] #error above is thrown here
work_sheet.write_formula('C2', '=A2+B2') #Add formular but not sure of how to apply it to the entire column.
out_Path = os.path.join(out_folder,work_book)
Edit:
I managed to figure out the above and using this code:-
work_book = openpyxl.load_workbook(os.path.join(in_folder,xlfile))
work_sheet = work_book['test1']
However, the issue formulas still exists in the new code below:-
from openpyxl import load_workbook
in_folder = r'xxx' #Input folder
out_folder = r'xxx' #Output folder
if not os.path.exists(out_folder):
os.makedirs(out_folder)
file_exist = False
dir_list = os.listdir(in_folder)
for xlfile in dir_list:
if xlfile.endswith('.xlsx') or xlfile.endswith('.xls'):
str_file = xlfile
work_book = openpyxl.load_workbook(os.path.join(in_folder,str_file))
work_sheet = work_book['Sheet1']
row_count = work_sheet.max_row
for row in work_sheet.iter_rows(min_row=1, min_col=1, max_row=work_sheet.max_row):
print(row_count)
for i, cellObj in enumerate(work_sheet['U'], 2):
cellObj.value = f'=Q{row_count}-T{row_count}'
work_book.save(os.path.join(out_folder, xlfile))
Ideally, I would like to loop through a folder with .xlsx files, add a formular and apply it to the entire column (U). In this case, I would like to save the files(with the formula effected) in another folder(out_folder).
Documentation for xlsxwriter.Workbook shows
work_book.get_worksheet_by_name('test1')
Maybe openpyxl or other module could use ['test1']
I am using python to create a single file from each sheet in an excel ('xlsx') file. The first part works when i try to iterate through the files after they have been created in order to delete the first 8 rows i am having trouble using openpyxl. After creating the files how do i iterate through them and delete the first 8 rows?
import os
import xlrd
from xlutils.copy import copy
import xlwt
import openpyxl
import pandas as pd
path = 'C:\excelfiles'
targetdir = (path + "/New_Files/") #where you want your new files
if not os.path.exists(targetdir): #makes your new directory
os.makedirs(targetdir)
for root,dir,files in os.walk(path, topdown=False): #all the files you want to split
xlsfiles=[f for f in files] #can add selection condition here
for f in xlsfiles:
wb = xlrd.open_workbook(os.path.join(root, f), on_demand=True)
for sheet in wb.sheets(): #cycles through each sheet in each workbook
newwb = copy(wb) #makes a temp copy of that book
newwb._Workbook__worksheets = [ worksheet for worksheet in newwb._Workbook__worksheets if worksheet.name == sheet.name ]
#brute force, but strips away all other sheets apart from the sheet being looked at
namer = targetdir + f.strip(".xls") + sheet.name + ".xlsx"
newwb.save(namer.replace(',',''))
#saves each sheet as the original file name plus the sheet name
path2='C:/excelfiles/New_Files/'
for root, dir, files in os.walk(path2, topdown=False):
xlsfiles2=[f2 for f2 in files]
for f2 in xlsfiles2:
sheet = openpyxl.open(path2 + f2)
sheet.delete_rows(7)
book.save(f2.strip(".xlsx") + sheet.name + ".xlsx")
Found the answer. First i needed to convert the files to .xlxs and then i could open using openpyxl.
path = 'C:\excelfiles'
targetdir = (path + "/New_Files/") #where you want your new files
if not os.path.exists(targetdir): #makes your new directory
os.makedirs(targetdir)
for root,dir,files in os.walk(path, topdown=False): #all the files you want to split
xlsfiles=[f for f in files] #can add selection condition here
for f in xlsfiles:
wb = xlrd.open_workbook(os.path.join(root, f), on_demand=True)
for sheet in wb.sheets(): #cycles through each sheet in each workbook
newwb = copy(wb) #makes a temp copy of that book
newwb._Workbook__worksheets = [ worksheet for worksheet in newwb._Workbook__worksheets if worksheet.name == sheet.name ]
#brute force, but strips away all other sheets apart from the sheet being looked at
namer = targetdir + f.strip(".xls") + sheet.name + ".xls"
newwb.save(namer.replace(',',''))
#saves each sheet as the original file name plus the sheet name
path2 = 'C:/excelfiles/New_Files/'
for root,dir,files in os.walk(path2, topdown=False):
xlsfiles2=[t for t in files]
for p3 in xlsfiles2:
wholename = getnamestringusingcityanddate(new_stringer,datefromname)
pathandfilename = path2 + p3
pathandfilenamexls = pathandfilename.replace('.xls','.xlsx')
p.save_book_as(file_name= pathandfilename,dest_file_name=pathandfilenamexls)
os.remove(pathandfilename)
for root,dir,files in os.walk(path2, topdown=False):
xlsfiles3=[d for d in files]
for p4 in xlsfiles3:
filepathcomplete= path2 + p4
book = openpyxl.load_workbook(filepathcomplete)
sheenames = book.sheetnames[0]
sheet = book[sheenames]
sheet.delete_rows(1,8)
sheet.delete_cols(11)
sheet.delete_cols(5)
date_style = NamedStyle(name='datetime', number_format='MM/DD/YYYY')
for col in range(1,2):
for row in range(2, sheet.max_row + 1):
sheet.cell(row=row,column=col).style = date_style
for col in range(10,11):
for row in range(2, sheet.max_row + 1):
sheet.cell(row=row,column=col).number_format = '0.00'
book.save(filepathcomplete)
book.close()
I'm new to pandas/python and Ive come up with the following code to extract data from a specific part of a worksheet.
import openpyxl as xl
import pandas as pd
rows_with_data = [34,37,38,39,44,45,46,47,48,49, 50,54,55,57,58,59,60,62,63,64,65,66,70,71,72,76,77, 78,79,80,81,82,83,84,88,89,90,91,92]
path = r'XXX'
xpath = input('XXX')
file = r'**.xlsm'
xfile = input('Change file name, current is ' + file + ' :')
sheetname = r'Summary'
wb = xl.load_workbook(filename = xpath + '\\' +file, data_only = True)
sheet = wb.get_sheet_by_name(sheetname)
rows = len(rows_with_data)
line_items = []
for i in range(rows) :
line_items.append(sheet.cell(row = rows_with_data[i], column = 13).value)
period = []
for col in range(17,35):
period.append(sheet.cell(row = 20, column = col).value)
print(line_items)
vals = []
x = []
for i in range(rows):
if i != 0:
vals.append(x)
x = []
for col in range(17,35):
x.append(sheet.cell(row = rows_with_data[i], column = col).value)
vals.append(x)
all_values = {}
all_values['Period'] = period
for i in range(rows):
print(line_items[i])
all_values[line_items[i]] = vals[i]
print(all_values)
period_review = input('Enter a period (i.e. 2002): ')
item = input('Enter a period (i.e. XXX): ')
time = period.index(period_review)
display_item = str(all_values[item][time])
print(item + ' for ' + period_review + " is " + display_item)
Summary_Dataframe = pd.DataFrame(all_values)
writer = pd.ExcelWriter(xpath + '\\' + 'values.xlsx')
Summary_Dataframe.to_excel(writer,'Sheet1')
writer.save()
writer.close()
I have the same worksheet (summary results) across a library of 60 xlsm files and I'm having a hard time figuring out how to iterate this across the entire folder of files. I also want change this from extracting specific rows to taking the entire "Summary" worksheet, pasting it to the new file and naming the worksheet by its filename ("Experiment_A") when pasted to the new excel file. Any advice?
I was having hard time to read your code to understand that what you want to do finally. So it is just an advice not a solution. You can iterate through all files in the folder using os then read the files in to one dataframe then save the single big data frame in to csv. I usually avoid excel but I guess you need the excel conversion. In the example below I have read all txt file from a directory put them in to dataframe list then store the big data frame as json. You can also store it as excel/csv.
import os
import pandas as pd
def process_data():
# input file path in 2 part in case it is very long
input_path_1 = r'\\path\to\the\folder'
input_path_2 = r'\second\part\of\the\path'
# adding the all file path
file_path = input_path_1 + input_path_2
# listing all file in the file folder
file_list = os.listdir(os.path.join(file_path))
# selecting only the .txt files in to a list object
file_list = [file_name for file_name in file_list if '.txt' in file_name]
# selecting the fields we need
field_names = ['country', 'ticket_id']
# defining a list to put all the datafremes in one list
pd_list = []
inserted_files = []
# looping over txt files and storing in to database
for file_name in file_list:
# creating the file path to read the file
file_path_ = file_path + '\\' + file_name
df_ = pd.read_csv(os.path.join(file_path_), sep='\t', usecols=field_names)
# converting the datetime to date
# few internal data transformation example before writting
df_['sent_date'] = pd.to_datetime(df_['sent_date'])
df_['sent_date'] = df_['sent_date'].values.astype('datetime64[M]')
# adding each dataframe to the list
pd_list.append(df_)
# adding file name to the inserted list to print later
inserted_files.append(file_name)
print(inserted_files)
# sql like union all dataframes and create a single data source
df_ = pd.concat(pd_list)
output_path_1 = r'\\path\to\output'
output_path_2 = r'\path\to\output'
output_path = output_path_1 + output_path_2
# put the file name
file_name = 'xyz.json'
# adding the day the file processed
df_['etl_run_time'] = pd.to_datetime('today').strftime('%Y-%m-%d')
# write file to json
df_.to_json(os.path.join(output_path, file_name), orient='records')
return print('Data Stored as json successfully')
process_data()
I have a for loop statement that when executes always creates one less excel than the list has. However when the first part of the if statement is used (xlwings, to modify the existing excels it works fine) Thoughts?
names= list(df_ora['XCODE'].unique())
for prov in names:
#for each matching agency code we create a df2
df2 = df_ora[df_ora['CODE'].isin([prov,'00000'])]
# create a filename to verify the excel exisits
filename = (dir_src + '\\' + str(prov) + '_' + 'Claims' + '.xlsx')
if os.path.isfile(filename):
wb = xw.Book(filename)
ws = wb.sheets['DATA']
ws.clear_contents()
ws.range('A1').options(index=False).value = df2
ws.autofit()
wb = xw.Book(filename)
wb.save()
xw.apps[0].quit()
counter = counter + 1
else:
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df2.to_excel(writer, sheet_name='DATA',index=False)
counter = counter + 1
Try closing the writer, or writer.save(), right after df2.to_excel(writer, sheet_name='DATA',index=False). The writer's content is probably not flushed to disk until it's either explicitly closed or goes out of scope.
I have a Python program that creates a new excel file based on some worksheets from a few other files. The following code I have copies the worksheets perfectly, but is unable to copy the image that is present in the worksheet. How do I copy images in an Excel worksheet to another Excel workbook using Python?
path1 = "/mnt/e/RecEasy-MVP-Python/FlaskApp/Uploaded_files/" + key
print path1
path2 = "/mnt/e/RecEasy-MVP-Python/FlaskApp/Compiled/" + current_acc_group + "_" + current_gl_account + ".xlsx"
print path2
path_to_key_sheet = "/mnt/e/RecEasy-MVP-Python/FlaskApp/Uploaded_files/" + key + "_key_sheet.txt"
print "Path to key sheet file:"
print path_to_key_sheet
wb1 = xl.load_workbook(filename=path1, read_only=True, data_only=True)
ws1 = wb1.worksheets[2]
counter = 0
for sheet in wb1:
if (str(sheet.title) == str(content_of_key_sheet_file)):
ws1 = wb1.worksheets[counter]
print "Sheet selected"
print sheet.title
counter = counter + 1
ws2 = wb2.create_sheet(ws1.title)
print "Copying from the Excel file: " + path1
for row in ws1:
for cell in row:
if (cell.value != None):
ws2[cell.coordinate].value = cell.value
wb2.save(path2)
install Pillow (just pip install Pillow, not needed import in your file)
then:
from openpyxl import drawing
.
.
.
img = drawing.image.Image('yourImg.png')
yourSheet.add_image(img, 'A2')
where A2 is your cell
I'd been struggling with this for a bit as most of the libraries I typically use to manipulate xlsx files seemed to not want to support this.
Fortunately, .xlsx is ooxml format. Thus, all you need to do is unzip the .xlsx and locate the pictures in xl/media/ of the directory you extracted your workbook to.
zip = ZipFile('yourWorkbook.xlsx')
zip.extractall()
Now you can insert them back into your new spreadsheet spreadsheet
import openpyxl
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
img = openpyxl.drawing.image.Image('test.jpg')
img.anchor(ws.cell('A1'))
ws.add_image(img)
wb.save('out.xlsx')