I have a for loop statement that when executes always creates one less excel than the list has. However when the first part of the if statement is used (xlwings, to modify the existing excels it works fine) Thoughts?
names= list(df_ora['XCODE'].unique())
for prov in names:
#for each matching agency code we create a df2
df2 = df_ora[df_ora['CODE'].isin([prov,'00000'])]
# create a filename to verify the excel exisits
filename = (dir_src + '\\' + str(prov) + '_' + 'Claims' + '.xlsx')
if os.path.isfile(filename):
wb = xw.Book(filename)
ws = wb.sheets['DATA']
ws.clear_contents()
ws.range('A1').options(index=False).value = df2
ws.autofit()
wb = xw.Book(filename)
wb.save()
xw.apps[0].quit()
counter = counter + 1
else:
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df2.to_excel(writer, sheet_name='DATA',index=False)
counter = counter + 1
Try closing the writer, or writer.save(), right after df2.to_excel(writer, sheet_name='DATA',index=False). The writer's content is probably not flushed to disk until it's either explicitly closed or goes out of scope.
Related
This :
def add_to_excel(list_to_save, file_to_save_in):
my_file = dir_path + '\\' + file_to_save_in
with openpyxl.load_workbook(filename=my_file) as links_excel:
sheet = links_excel['Sheet1']
for i in list_to_save:
sheet.append(i)
links_excel.save(filename)
return
returns this:
3 my_file = dir_path + '\\' + file_to_save_in
----> 4 with openpyxl.load_workbook(filename=my_file) as links_excel:
5 sheet = links_excel['Sheet1']
6 for i in list_to_save:
AttributeError: __enter__
Tried this:
You're not using with statement and there's no close() statement so if this is not the first time you're running the code, it's likely that you haven't closed the file properly and it is still sitting in the memory and prevents access.
Edit:
Apparently closing the excel fixes it, and the with statement is not needed.
links_excel.close()
def add_to_excel(list_to_save, file_to_save_in):
my_file = os.path.join(dir_path, file_to_save_in)
links_excel=openpyxl.load_workbook(filename=my_file)
sheet = links_excel['Sheet1']
for i in list_to_save:
sheet.append(i)
links_excel.save(my_file)
links_excel.close()
from openpyxl documentation
Read an existing workbook:
from openpyxl import load_workbook
wb = load_workbook(filename = 'empty_book.xlsx')
sheet_ranges = wb['range names']
print(sheet_ranges['D18'].value)
This is an example on how to use the load_workbook method, so you don't need to use that with statement. Just use assignment.
def add_to_excel(list_to_save, file_to_save_in):
my_file = dir_path + '\\' + file_to_save_in
links_excel = openpyxl.load_workbook(filename=my_file)
sheet = links_excel['Sheet1']
for i in list_to_save:
sheet.append(i)
links_excel.save(filename)
links_excel.close()
return
So far, I have been able to access csv and xlsx files in python, but I am unsure how to put in user inputs input() to add data to the spreadsheet.
I would also want this input() to only be enterable once per day but for different columns in my spreadsheet. (this is a separate issue)
Here is my code so far, first for csv, second for xlsx, I don't need both just either will do:
# writing to a CSV file
import csv
def main():
filename = "EdProjDBeg.csv"
header = ("Ans1", "Ans2", "Ans3")
data = [(0, 0, 0)]
writer(header, data, filename, "write")
updater(filename)
def writer(header, data, filename, option):
with open(filename, "w", newline = "") as csvfile:
if option == "write":
clidata = csv.writer(csvfile)
clidata.writerow(header)
for x in data:
clidata.writerow(x)
elif option == "update":
writer = csv.DictWriter(csvfile, fieldnames = header)
writer.writeheader()
writer.writerows(data)
else:
print("Option is not known")
# Updating the CSV files with new data
def updater(filename):
with open(filename, newline= "") as file:
readData = [row for row in csv.DictReader(file)]
readData[0]['Ans2'] = 0
readHeader = readData[0].keys()
writer(readHeader, readData, filename, "update")
# Reading and updating xlsx files
import openpyxl
theFile = openpyxl.load_workbook(r'C:\Users\joe_h\OneDrive\Documents\Data Analysis STUDYING\Excel\EdProjDBeg.xlsx')
print(theFile.sheetnames)
currentsheet = theFile['Customer1']
print(currentsheet['B3'].value)
wb = openpyxl.load_workbook(r'C:\Users\joe_h\OneDrive\Documents\Data Analysis STUDYING\Excel\EdProjDBeg.xlsx')
ws = wb.active
i = 0
cell_val = ''
# Finds which row is blank first
while cell_val != '':
cell_val = ws['A' + i].value
i += 1
# Modify Sheet, Starting With Row i
wb.save(r'C:\Users\joe_h\OneDrive\Documents\Data Analysis STUDYING\Excel\EdProjDBeg.xlsx')
x = input('Prompt: ')
This works for inputting data into an xlsx file.
Just use:
ws['A1'] = "data"
to input into cell A1
See code below for example using your original code:
wb = openpyxl.load_workbook('sample.xlsx')
print(wb.sheetnames)
currentsheet = wb['Sheet']
ws = currentsheet
#ws = wb.active <-- defaults to first sheet
i = 0
cell_val = ''
# Finds which row is blank first
while cell_val != None:
i += 1
cell_val = ws['A' + str(i)].value
print(cell_val)
x = input('Prompt: ')
#sets A column of first blank row to be user input
ws['A' + str(i)] = x
#saves spreadsheet
wb.save("sample.xlsx")
Also just made a few edits to your original while loop in the above code:
When a cell is blank, 'None' is returned
A1 is the first cell on the left, not A0 (moved i += 1 above finding value of cell)
Converted variable 'i' to a string when accessing the cell
See https://openpyxl.readthedocs.io/en/stable/ for the full documentation
I'm trying to change the title of five files to show the first and last values in a column in that file separated by a "_"
For example, in a column if I have values 0001,0002,0003,0004, I want to find the first and last value and change the title of the file to 0001_0004
files=os.listdir(os.getcwd())
for file in files:
if file.endswith('.xlsx'):
print(file)
try:
wb=openpyxl.load_workbook(os.path.join(os.getcwd(),file))
print('reading workbook'+file)
ws=wb['Sheet1']
for row in range(7, ws.max_row+1):
cell = ws.cell(row = row, column = 7)
#code to change name
wb.save(os.path.join(os.getcwd(),file))
print('file title changed')
except Exception as e:
print(e)
files=os.listdir(os.getcwd())
all_xls = []
for f in files:
if f.endswith('.xlsx'):
all_xls.append(f.replace('.xls', '')
all_xls.sort()
first_file = all_xls[0]
last_file = all_xls[-1]
#Do your renaming using os.rename('src', 'dest')
This is a just a quick solution I could think of and it is not the optimal solution, you can tweak it up to get the desired results faster.
hope this helps.
path = os.getcwd()
files = os.listdir(os.getcwd())
for filename in files:
if filename.endswith('.xlsx'):
print(filename)
wb = openpyxl.load_workbook(os.path.join(os.getcwd(),filename), data_only = True)
ws = wb['Sheet1']
final = ws.max_row
original_file = str(filename)
cell = ws['G7'].value
filenamep1 = cell
cell = ws.cell(row = final, column = 7).value
filenamep2 = cell
os.rename(original_file, path + '\\' + filenamep1 + '_' + filenamep2 + '_' +
original_file)
print("All files renamed")
This is what I ended up using to rename the file
I'm new to pandas/python and Ive come up with the following code to extract data from a specific part of a worksheet.
import openpyxl as xl
import pandas as pd
rows_with_data = [34,37,38,39,44,45,46,47,48,49, 50,54,55,57,58,59,60,62,63,64,65,66,70,71,72,76,77, 78,79,80,81,82,83,84,88,89,90,91,92]
path = r'XXX'
xpath = input('XXX')
file = r'**.xlsm'
xfile = input('Change file name, current is ' + file + ' :')
sheetname = r'Summary'
wb = xl.load_workbook(filename = xpath + '\\' +file, data_only = True)
sheet = wb.get_sheet_by_name(sheetname)
rows = len(rows_with_data)
line_items = []
for i in range(rows) :
line_items.append(sheet.cell(row = rows_with_data[i], column = 13).value)
period = []
for col in range(17,35):
period.append(sheet.cell(row = 20, column = col).value)
print(line_items)
vals = []
x = []
for i in range(rows):
if i != 0:
vals.append(x)
x = []
for col in range(17,35):
x.append(sheet.cell(row = rows_with_data[i], column = col).value)
vals.append(x)
all_values = {}
all_values['Period'] = period
for i in range(rows):
print(line_items[i])
all_values[line_items[i]] = vals[i]
print(all_values)
period_review = input('Enter a period (i.e. 2002): ')
item = input('Enter a period (i.e. XXX): ')
time = period.index(period_review)
display_item = str(all_values[item][time])
print(item + ' for ' + period_review + " is " + display_item)
Summary_Dataframe = pd.DataFrame(all_values)
writer = pd.ExcelWriter(xpath + '\\' + 'values.xlsx')
Summary_Dataframe.to_excel(writer,'Sheet1')
writer.save()
writer.close()
I have the same worksheet (summary results) across a library of 60 xlsm files and I'm having a hard time figuring out how to iterate this across the entire folder of files. I also want change this from extracting specific rows to taking the entire "Summary" worksheet, pasting it to the new file and naming the worksheet by its filename ("Experiment_A") when pasted to the new excel file. Any advice?
I was having hard time to read your code to understand that what you want to do finally. So it is just an advice not a solution. You can iterate through all files in the folder using os then read the files in to one dataframe then save the single big data frame in to csv. I usually avoid excel but I guess you need the excel conversion. In the example below I have read all txt file from a directory put them in to dataframe list then store the big data frame as json. You can also store it as excel/csv.
import os
import pandas as pd
def process_data():
# input file path in 2 part in case it is very long
input_path_1 = r'\\path\to\the\folder'
input_path_2 = r'\second\part\of\the\path'
# adding the all file path
file_path = input_path_1 + input_path_2
# listing all file in the file folder
file_list = os.listdir(os.path.join(file_path))
# selecting only the .txt files in to a list object
file_list = [file_name for file_name in file_list if '.txt' in file_name]
# selecting the fields we need
field_names = ['country', 'ticket_id']
# defining a list to put all the datafremes in one list
pd_list = []
inserted_files = []
# looping over txt files and storing in to database
for file_name in file_list:
# creating the file path to read the file
file_path_ = file_path + '\\' + file_name
df_ = pd.read_csv(os.path.join(file_path_), sep='\t', usecols=field_names)
# converting the datetime to date
# few internal data transformation example before writting
df_['sent_date'] = pd.to_datetime(df_['sent_date'])
df_['sent_date'] = df_['sent_date'].values.astype('datetime64[M]')
# adding each dataframe to the list
pd_list.append(df_)
# adding file name to the inserted list to print later
inserted_files.append(file_name)
print(inserted_files)
# sql like union all dataframes and create a single data source
df_ = pd.concat(pd_list)
output_path_1 = r'\\path\to\output'
output_path_2 = r'\path\to\output'
output_path = output_path_1 + output_path_2
# put the file name
file_name = 'xyz.json'
# adding the day the file processed
df_['etl_run_time'] = pd.to_datetime('today').strftime('%Y-%m-%d')
# write file to json
df_.to_json(os.path.join(output_path, file_name), orient='records')
return print('Data Stored as json successfully')
process_data()
I have an excel worksheet, some buttons and some macros. I use xlwings to make it work. Is there a way to save the workbook through xlwings ? I want to extract a specific sheet after doing an operation, but the saved sheet is the extracted sheet before the operation without the generated data.
My code for extracting the sheet I need is the following:
Set objFSO = CreateObject("Scripting.FileSystemObject")
src_file = objFSO.GetAbsolutePathName(Wscript.Arguments.Item(0))
sheet_name = Wscript.Arguments.Item(1)
dir_name = Wscript.Arguments.Item(2)
file_name = Wscript.Arguments.Item(3)
Dim objExcel
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = False
Dim objWorkbook
Set objWorkbook = objExcel.Workbooks(src_file)
objWorkbook.Sheets(sheet_name).Copy
objExcel.DisplayAlerts = False
objExcel.ActiveWorkbook.SaveAs dir_name + file_name + ".xlsx", 51
objExcel.ActiveWorkbook.SaveAs dir_name + file_name + ".csv", 6
objWorkbook.Close False
objExcel.Quit
Book.save() has now been implemented: see the docs.