Lets say you have the following df:
dfresult_secondlook = {'relfilepath': ['test.pdf', 'epic.pdf' ], 'col2': [3, 4]}
I want to move a file that is in this df to another folder with the following code:
#moving files from secondlook df to secondlook folder
sourceDir = 'C:\\Users\\Max12\\Desktop\\xml\\pdfminer\\UiPath\\attachments\\75090058\\Status\\PDFsend'
destDir = 'C:\\Users\\Max12\\Desktop\\xml\\pdfminer\\UiPath\\attachments\\75090058\\Status\\SecondLook'
files = os.listdir(sourceDir)
filesToMove = dfresult_secondlook
def move(file, sourceDir, destDir):
sourceFile = os.path.join(sourceDir, file)
if not os.path.exists(destDir):
os.makedirs(destDir)
try:
shutil.move(sourceFile, destDir)
except:
pass
for i in range(len(filesToMove)):
file = filesToMove['relfilepath'][i]
move(file,sourceDir,destDir)
#writing files to excel for further examination
book = load_workbook(r"C:\Users\Max12\Desktop\xml\pdfminer\UiPath\attachments\75090058\secondlook.xlsx")
writer = pd.ExcelWriter(r"C:\Users\Max12\Desktop\xml\pdfminer\UiPath\attachments\75090058\secondlook.xlsx", engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
dfresult_secondlook.to_excel(writer, "Main", header = False, index = False, startrow = writer.sheets['Main'].max_row)
writer.save()
However, I'm getting a KeyError:
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-13-4043ec85df9c> in <module>
17
18 for i in range(len(filesToMove)):
---> 19 file = filesToMove['relfilepath'][i]
20 move(file,sourceDir,destDir)
21
I don't see what's going wrong after 2 hours..
Please help!
According to the sample data you provided, relfilepath is a dict which does not always have 0 as a key. Thus, your for / loop, starting from 0, fails.
You could then try this:
for i in range(len(filesToMove)):
try:
file = filesToMove['relfilepath'][i]
move(file,sourceDir,destDir)
except KeyError:
continue
PS: you should modify the beginning of your post, where dfresult_secondlook shows 'relfilepath' as a list instead of a dict.
Related
This code used to get a xlsx file and write over it, but after updating from pandas 1.1.5 to 1.5.1 I got zipfile.badzipfile file is not a zip file
Then I read here that after pandas 1.2.0 the pd.ExcelWriter(report_path, engine='openpyxl') creates a new file but as this is a completely empty file, openpyxl cannot load it.
Knowing that, I changed the code to this one, but now I'm getting AttributeError: property 'sheets' of 'OpenpyxlWriter' object has no setter. How should I handle this?
book = load_workbook('Resultados.xlsx')
writer = pd.ExcelWriter('Resultados.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
reader = pd.read_excel(r'Resultados.xlsx')
df = pd.DataFrame.from_dict(dict_)
df.to_excel(writer, index=False, header=False, startrow=len(reader) + 1)
writer.close()
TLDR
Use .update to modify writer.sheets
Rearrange the order of your script to get it working
# run before initializing the ExcelWriter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
book = load_workbook("Resultados.xlsx")
# use `with` to avoid other exceptions
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
writer.book = book
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
Details
Recreating your problem with some fake data
import numpy as np
from openpyxl import load_workbook
import pandas as pd
if __name__ == "__main__":
# make some random data
np.random.seed(0)
df = pd.DataFrame(np.random.random(size=(5, 5)))
# this makes an existing file
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
df.to_excel(excel_writer=writer)
# make new random data
np.random.seed(1)
df = pd.DataFrame(np.random.random(size=(5, 5)))
# what you tried...
book = load_workbook("Resultados.xlsx")
writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
reader = pd.read_excel("Resultados.xlsx")
# skipping this step as we defined `df` differently
# df = pd.DataFrame.from_dict(dict_)
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
writer.close()
We get the same error plus a FutureWarning
...\StackOverflow\answer.py:23: FutureWarning: Setting the `book` attribute is not part of the public API, usage can give unexpected or corrupted results and will be removed in a future version
writer.book = book
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 24, in <module>
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
AttributeError: can't set attribute 'sheets'
The AttributeError is because sheets is a property of the writer instance. If you're unfamiliar with it, here is a resource.
In shorter terms, the exception is raised because sheets cannot be modified in the way you're trying. However, you can do this:
# use the `.update` method
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
That will move us past the the AttributeError, but we'll hit a ValueError a couple lines down:
reader = pd.read_excel("Resultados.xlsx")
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 26, in <module>
reader = pd.read_excel("Resultados.xlsx")
...
File "...\lib\site-packages\pandas\io\excel\_base.py", line 1656, in __init__
raise ValueError(
ValueError: Excel file format cannot be determined, you must specify an engine manually.
Do what the error message says and supply an argument to the engine parameter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
And now we're back to your original zipfile.BadZipFile exception
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 26, in <module>
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
...
File "...\Local\Programs\Python\Python310\lib\zipfile.py", line 1334, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
After a bit of toying, I noticed that the Resultados.xlsx file could not be opened manually after running this line:
writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
So I reordered some of the steps in your code:
# run before initializing the ExcelWriter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
book = load_workbook("Resultados.xlsx")
# the old way
# writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
writer.book = book
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
try this:
filepath = r'Resultados.xlsx'
with pd.ExcelWriter(
filepath,
engine='openpyxl',
mode='a',
if_sheet_exists='overlay') as writer:
reader = pd.read_excel(filepath)
df.to_excel(
writer,
startrow=reader.shape[0] + 1,
index=False,
header=False)
While Running through the lots of file getting the memory error at some point,Is there a way to handle it,Is i am handling the files in wrong way or there is any other way of doing it
Here is my code i tried so far
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
import pandas as pd
import os
opco=['msd','ped','med','suy_consol','sidy','giser','sdisa']
print("Start reading the template file\n")
wb = load_workbook(filename='template_patch_debt.xlsx',data_only =True)
ws = wb.create_sheet()
ws1=wb.create_sheet()
ws.title = 'vuln_export'
ws1.title='device_export'
print('###########Done Reading the Template File#######'+'/n')
print('#########Start Reading The CSV File#############')
filelist_device=[f for f in os.listdir() if f.endswith(".csv") and 'device' in f]
filelist_patch=[f for f in os.listdir() if f.endswith(".csv") and 'patch' in f]
for file_device in filelist_device:
for file_patch in filelist_patch:
for opco_name in opco:
if opco_name in file_device and opco_name in file_patch:
print(opco_name)
print("Start Reading for the device file for:"+file_device+'\n')
df_1 = pd.read_csv(file_device)
for r in dataframe_to_rows(df_1, index=False, header=True):
ws1.append(r)
print("Start reading patch_debt file :-"+file_patch+'\n')
df = pd.read_csv(file_patch)
for r in dataframe_to_rows(df, index=False, header=True):
ws.append(r)
wb.save((file_patch.rsplit('_',1)[0] +'_debt_measure' '.xlsx'))
wb.close()
for row in ws:
for cell in row:
cell.value = None
for row in ws1:
for cell in row:
cell.value = None
Here is the Error i am getting
MemoryError Traceback (most recent call last)
<ipython-input-1-a60a13cb6158> in <module>
41 for r in dataframe_to_rows(df, index=False, header=True):
42 ws.append(r)
---> 43 wb.save((file_patch.rsplit('_',1)[0] +'_debt_measure' '.xlsx'))
44 wb.close()
I have a routine where I have to read from a Excel which has a column with links to get individual .xlsx/.xls files that people upload into the form containing some information.
My problem is, people do not always upload the correct file format. So I had to create exceptions to handle that. I save the links that have a exception in a list, but I don't know which exception blocked it. Here's my code:
erros = []
for i in links:
try:
name = i[50:]
df = pd.read_excel(i, header = 1, usecols = col_names, encoding = 'utf-8') #usecols = names)
file_name = r"%s\%s" %(pasta_sol,name)
writer = pd.ExcelWriter(file_name , engine='xlsxwriter')
df.to_excel(writer, header = True, index = True)
writer.close()
except (TypeError, IndexError, ValueError, XLRDError, BadZipFile, urllib.error.URLError) as e:
erros.append(i)
There is a way to append to each file that has a exception which one was it? It could be a list or a new df that looks like it:
erros = [['http://abs.company.pdf', 'TypeError'],['http://abs.company.xls','XLRDError']]
or df.
*There are thousands of files to read per day.
Thanks
This is isnt exactly what you wanted but its close enough. Hope it helps
errors = []
for i in links:
try:
name = i[50:]
df = pd.read_excel(i, header = 1, usecols = col_names, encoding = 'utf-8') #usecols = names)
file_name = r"%s\%s" %(pasta_sol,name)
writer = pd.ExcelWriter(file_name , engine='xlsxwriter')
df.to_excel(writer, header = True, index = True)
writer.close()
except (TypeError, IndexError, ValueError, XLRDError, BadZipFile, urllib.error.URLError) as e:
errors.append([file_name, e.args[0]])
print(errors) # doesnt print the error name but the description of the error e.g "division by zero"
I try to use this link with vba code
but in Python it doesn't work.
import win32com.client
Excel = win32com.client.Dispatch("Excel.Application")
wb = Excel.Workbooks.Open(r'C:/Users/Home/Desktop/expdata.xlsx')
wb.Worksheets("Report").Activate # выбор активного листа
sheet = wb.ActiveSheet
obj1=wb.ActiveSheet.Pictures.Insert(r'C:\Users\Home\Desktop\picture.jpg')
obj1.ShapeRange
obj1.ShapeRange.LockAspectRatio = msoTrue
obj1.ShapeRange.Width = 75
obj1.ShapeRange.Height = 100
obj1.Left = xlApp.ActiveSheet.Cells(i, 20).Left
obj1.Top = xlApp.ActiveSheet.Cells(i, 20).Top
obj1.Placement = 1
obj1.PrintObject = True
wb.save
wb.Close
Excel.Quit()
AttributeError Traceback (most recent call last)
in ()
9 sheet.Cells(20, 20).Select
10 #obj1=sheet.Shapes.AddPicture (r'C:/Users/Home/Desktop/picture.jpg', False, True, 10, 3, 100, 100)
---> 11 obj1=wb.ActiveSheet.Pictures.Insert(r'C:/Users/Home/Desktop/picture.jpg')
12 obj1.ShapeRange
13 obj1.ShapeRange.LockAspectRatio = msoTrue
AttributeError: 'function' object has no attribute 'Insert'
Unless you absolutely need to use VBA, this sort of thing can be done thru just Python using xlsxwriter: http://xlsxwriter.readthedocs.io/example_images.html
import xlsxwriter
# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('images.xlsx')
worksheet = workbook.add_worksheet()
worksheet.insert_image('B2', 'python.png')
workbook.close()
I found this code to be working.
import win32com.client
pic_path=r'file_path.png'
Excel = win32com.client.Dispatch("Excel.Application")
wb = Excel.Workbooks.Open(r'C:/Users/Home/Desktop/expdata.xlsx')
ws =wb.Worksheets("Report")
left=ws.Cells(1,required_coloumn).Left
top=ws.Cells(required_row,1).Top
width=required_width
height=required_height
ws.Shapes.AddPicture(pic_path,LinkToFile=False, SaveWithDocument=True,left, top,width,height)
wb.save()
wb.Close()
Excel.Quit()
Attempting to extract .xlsx docs from a file and compile the data into a single worksheet.
Receiving a IOError despite that the files exist
Program is as follows
#-------------- loop that pulls in files from folder--------------
import os
#create directory from which to pull the files
rootdir = r'C:\Users\username\Desktop\Mults'
for subdir, dir, files in os.walk(rootdir):
for file in files:
print os.path.join(subdir,file)
#----------------------merge work books-----------------------
import xlrd
import xlsxwriter
wb = xlsxwriter.Workbook('merged.xls')
ws = wb.add_worksheet()
for file in files:
r = xlrd.open_workbook(file)
head, tail = os.path.split(file)
count = 0
for sheet in r:
if sheet.number_of_rows()>0:
count += 1
for sheet in r:
if sheet.number_of_rosw()>0:
if count == 1:
sheet_name = tail
else:
sheet_name = "%s_%s" (tail, sheet.name)
new_sheet = wb.create_sheet(sheet_name)
new_sheet.write_reader(sheet)
new_sheet.close()
wb.close()
Return error as follows
doc1.xlsx
doc2.xlsx
doc3.xlsx
doc4.xlsx
Traceback (most recent call last):
File "C:\Users\username\Desktop\Work\Python\excel practice\xlsx - loops files - 09204.py", line 23, in <module>
r = xlrd.open_workbook(file)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 394, in open_workbook
f = open(filename, "rb")
IOError: [Errno 2] No such file or directory: 'doc1.xlsx'
Any suggestions or changes?
Also, any advice if I'm heading in the right direction?
I'm new to the python world, so any advice will be much appreciated!
Thank you!!
You are opening the plain filename without the path; you are ignoring the directory component.
Don't just print the os.path.join() result, actually use it:
filename = os.path.join(subdir, file)
r = xlrd.open_workbook(filename)
For the first problem...
Instead of:
r = xlrd.open_workbook(file)
Use:
r = xlrd.open_workbook(os.path.join(subdir,file))
For the TypeError:
Instead of:
for sheet in r:
if sheet.number_of_rows()>0:
count += 1
Use:
for nsheet in r.sheet_names() #you need a list of sheet names to loop throug
sheet = r.sheet_by_name(nsheet) #then you create a sheet object with each name in the list
if sheet.nrows>0: #use the property nrows of the sheet object to count the number of rows
count += 1
Do the same for the second for loop.