Iterating over a multiple files in a folder - python

Im trying to develop a program that can iterate over different files in the same folder. The files are all the same format but will have different names. Right now if there is only 1 file in the folder the code executes with no problems but with different files i get the error:
Traceback (most recent call last):
File "D:/Downloads/FYP/Feedback draft.py", line 24, in <module>
wb = openpyxl.load_workbook(filename)
File "C:\Users\shomi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
reader = ExcelReader(filename, read_only, keep_vba,
File "C:\Users\shomi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
File "C:\Users\shomi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:\Users\shomi\AppData\Local\Programs\Python\Python38-32\lib\zipfile.py", line 1251, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'tester2.xlsx'
The code im using is :
directory = r'D:\Downloads\FYP\TEST'
for filename in os.listdir(directory):
if filename.endswith(".xlsx"):
wb = openpyxl.load_workbook(filename)
sh1=wb['test']
doc = DocxTemplate('Assignment1feedback.docx')
context = {
'acc': acceleration
}
doc.render(context)
doc.save('D:\\Downloads\\FYP\\TEST\\' + filename + '.docx')
This is incomplete code as the full thing would be quite long but overall i want to access these excel files and then create a corresponding docx

So os.listdir only provides the basename of the directory files, which will cause problems if your working directory does not match the value of directory. If your working directory is D:\Downloads, ./file.xlsx does not exist but D:\Downloads\FYP\TEST/file.xlsx does.
You will want to use the absolute path to the file, you have two options here. You could follow #IronMan's suggestion in the their comment to produce the file path from the directory path and file basename:
import os
directory = r'D:\Downloads\FYP\TEST'
for filename in os.listdir():
wb = openpyxl.load_workbook(os.path.join(directory, filename))
This is a simple and useful approach; however, its functionality is somewhat limited and may make it harder to make changes in the future. The alternative is to use python's paathlib and scandir, and access the path directly from there:
import pathlib
directory = r'D:\Downloads\FYP\TEST'
for entry in pathlib.scandir(diectory):
wb = openpyxl.load_workbook(entry.path)

Related

Decompressing .bz2 files in a directory in python

I would like to decompress a bunch of .bz2 files contained in a folder (where there are also .zst files). What I am doing is the following:
destination_folder = "/destination_folder_path/"
compressed_files_path="/compressedfiles_folder_path/"
dirListing = os.listdir(compressed_files_path)
for file in dirListing:
if ".bz2" in file:
unpackedfile = bz2.BZ2File(file)
data = unpackedfile.read()
open(destination_folder, 'wb').write(data)
But I keep on getting the following error message:
Traceback (most recent call last):
File "mycode.py", line 34, in <module>
unpackedfile = bz2.BZ2File(file)
File ".../miniconda3/lib/python3.9/bz2.py", line 85, in __init__
self._fp = _builtin_open(filename, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'filename.bz2'
Why do I receive this error?
You must be sure that all the file paths you are using exist.
It is better to use the full path to the file being opened.
import os
import bz2
# this path must exist
destination_folder = "/full_path_to/folder/"
compressed_files_path = "/full_path_to_other/folder/"
# get list with filenames (strings)
dirListing = os.listdir(compressed_files_path)
for file in dirListing:
# ^ this is only filename.ext
if ".bz2" in file:
# concatenation of directory path and filename.bz2
existing_file_path = os.path.join(compressed_files_path, file)
# read the file as you want
unpackedfile = bz2.BZ2File(existing_file_path)
data = unpackedfile.read()
new_file_path = os.path.join(destination_folder, file)
with bz2.open(new_file_path, 'wb') as f:
f.write(data)
You can also use the shutil module to copy or move files.
os.path.exists
os.path.join
shutil
bz2 examples

Python File Not Found Error even though file is in same directory

I'm running a python code (filename- images.py) that reads-
import gzip
f = gzip.open('i1.gz','r')
But it is showing the FileNotFoundError.
My folder containing images.py looks like-
New Folder/
images.py
i1.gz
(...Some other files...)
The problem is that you are not running the script from within the New Folder.
You can easily solve it by using the absolute path without hard-coding it:
from os import path
file_path = path.abspath(__file__) # full path of your script
dir_path = path.dirname(file_path) # full path of the directory of your script
zip_file_path = path.join(dir_path,'i1.gz') # absolute zip file path
# and now you can open it
f = gzip.open(zip_file_path,'r')
Check the current working directory of the script by doing:
import os
os.getcwd()
Then, compare this with your i1.gz absolute path. Then you should be able to see if there are any inconsistencies.
Are you run the script from the New Folder?
If you are in the folder, it should work:
c:\Data\Python\Projekty\Random\gzip_example>python load_gzip.py
but if you run the script from a parent folder with the folder name, it returned the error:
c:\Data\Python\Projekty\Random>python gzip_example\load_gzip.py
Traceback (most recent call last):
File "C:\Data\Python\Projekty\Random\gzip_example\load_gzip.py", line 2, in <module>
f = gzip.open('file.gz', 'r')
File "C:\Python\Python 3.8\lib\gzip.py", line 58, in open
binary_file = GzipFile(filename, gz_mode, compresslevel)
File "C:\Python\Python 3.8\lib\gzip.py", line 173, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'file.gz'
The way I usually set working directory and work with files is as follow:
import os
pwd_path= os.path.dirname(os.path.abspath(__file__))
myfile = os.path.join(pwd_path, 'i1.gz')

Python - Can't locate downloaded file to unzip

Using selenium, I was able to automate the download of a zip file and save it to a specified directory. When I try to unzip the file, however, I hit a snag where I can't seem to locate the recently downloaded file. If it helps, this is the block of code related to the downloading and unzipping process:
# Click on Map Link
driver.find_element_by_css_selector("input.linksubmit[value=\"▸ Map\"]").click()
# Download Data
driver.find_element_by_xpath('//*[#id="buttons"]/a[4]/img').click()
# Locate recently downloaded file
path = 'C:/.../Download'
list = os.listdir(path)
time_sorted_list = sorted(list, key=os.path.getmtime)
file_name = time_sorted_list[len(time_sorted_list)-1]
Specifically, this is my error:
Traceback (most recent call last):
File "C:\Users\...\AppData\Local\Continuum\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-89-3f1d00dac284>", line 3, in <module>
time_sorted_list = sorted(list, key=os.path.getmtime)
File "C:\Users\...\AppData\Local\Continuum\Anaconda3\lib\genericpath.py", line 55, in getmtime
return os.stat(filename).st_mtime
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'grid-m1b566d31a87cba1379e113bb93fdb61d5be5b128.zip'
I tried troubleshooting the code by deleting it and placing another file in the directory, and I was able to find the random file, but not the recently downloaded file. Can anyone tell me what's going on here?
First of all, do not use list for a variable name. That hides the list constructor from being readily available to use somewhere else in your program. Second, os.listdir does not return the full path of the files in that directory. If you want the full path, there are two things you can do:
You can use os.path.join:
import zipfile
path = 'C:/.../Download'
file_list = [os.path.join(path, f) for f in os.listdir(path)]
time_sorted_list = sorted(file_list, key=os.path.getmtime)
file_name = time_sorted_list[-1]
myzip = zipfile.ZipFile(file_name)
for contained_file in myzip.namelist():
if all(n in contained_file.lower() for n in ('corn', 'irrigation', 'high', 'brazil')):
with myzip.open(contained_file) as f:
# save data to a CSV file
You can also use the glob function from the glob module:
from glob import glob
import zipfile
path = 'C:/.../Download'
file_list = glob(path+"/*")
time_sorted_list = sorted(file_list, key=os.path.getmtime)
file_name = time_sorted_list[-1]
myzip = zipfile.ZipFile(file_name)
for contained_file in myzip.namelist():
if all(n in contained_file.lower() for n in ('corn', 'irrigation', 'high', 'brazil')):
with myzip.open(contained_file) as f:
# save data in a CSV file
Either should work.

why zipfile trying to unzip xlsx files?

I am trying to use the following code to unzip all the zip folders in my root folder; this code was found on this thread:
Unzip zip files in folders and subfolders with python
rootPath = u"//rootdir/myfolder" # CHOOSE ROOT FOLDER HERE
pattern = '*.zip'
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
print(os.path.join(root, filename))
zipfile.ZipFile(os.path.join(root, filename)).extractall(os.path.join(root, os.path.splitext(filename)[0]))
but I keep getting this error that says FileNotFoundError saying the xlsx file does not exist:
Traceback (most recent call last):
File "//rootdir/myfolder/Python code/unzip_helper.py", line 29, in <module>
zipfile.ZipFile(os.path.join(root, filename)).extractall(os.path.join(root, os.path.splitext(filename)[0]))
File "//rootdir/myfolder/Python\Python36-32\lib\zipfile.py", line 1491, in extractall
self.extract(zipinfo, path, pwd)
File "//myaccount/Local\Programs\Python\Python36-32\lib\zipfile.py", line 1479, in extract
return self._extract_member(member, path, pwd)
File "//myaccount/Local\Programs\Python\Python36-32\lib\zipfile.py", line 1542, in _extract_member
open(targetpath, "wb") as target:
FileNotFoundError: [Errno 2] No such file or directory: '\\rootdir\myfolder\._SGS Naked 3 01 WS Kappa Coated and a very long very long file name could this be a problem i dont think so.xlsx'
My question is, why would it want to unzip this excel file anyways?!
And how can I get rid of the error?
I've also tried using r instead of u for rootPath:
rootPath = r"//rootdir/myfolder"
and I get the same error.
Any help is truly appreciated!
Some filenames and directory names may have extra dots in their names, as a consequence the last line, unlike Windows filenames can have dots on Unix:
zipfile.ZipFile(os.path.join(root, filename)).extractall(os.path.join(root, os.path.splitext(filename)[0]))
this line fails. To see how that happens:
>>> filename = "my.arch.zip"
>>> root = "/my/path/to/mydir/"
>>> os.path.join(root, os.path.splitext(filename)[0])
'/my/path/to/mydir/my.arch'
With or without extra dots, problems will still take place in your code:
>>> os.path.join(root, os.path.splitext(filename)[0])
'/my/path.to/mydir/arch'
If no '/my/path.to/mydir/arch' can be found, FileNotFoundError will be raised. I suggest that you be explicit in you path, otherwise you have to ensure the existence of those directories.
ZipFile.extractall(path=None, members=None, pwd=None)
Extract all members from the archive to the current working directory. path specifies a different directory to extract to...
Unless path is an existent directory, FileNotFoundError will be raised.

How to move all .log and .txt files to a new folder

I'm having trouble figuring out how to move all .log and .txt files in a certain folder and it's subdirectories to a new folder. I understand how to move one file with shutil. But, I tried to use a loop, unsuccessfully, to move all. Can someone help me with this? Thanks ....
import os, os.path
import re
def print_tgzLogs (arg, dir, files):
for file in files:
path = os.path.join (dir, file)
path = os.path.normcase (path)
defaultFolder = "Log_Text_Files"
if not defaultFolder.endswith(':') and not os.path.exists('c:\\Extracted\Log_Text_Files'):
os.mkdir('C:\\Extracted\\Log_Text_Files')
if re.search(r".*\.txt$", path) or re.search(r".*\.log$", path):
os.rename(path, 'C:\\Extracted\\Log_Text_Files')
print path
os.path.walk('C:\\Extracted\\storage', print_tgzLogs, 0)
Below is the trace back error:
Traceback (most recent call last):
File "C:\SQA_log\scan.py", line 20, in <module>
os.path.walk('C:\\Extracted\\storage', print_tgzLogs, 0)
File "C:\Python27\lib\ntpath.py", line 263, in walk
walk(name, func, arg)
File "C:\Python27\lib\ntpath.py", line 263, in walk
walk(name, func, arg)
File "C:\Python27\lib\ntpath.py", line 263, in walk
walk(name, func, arg)
File "C:\Python27\lib\ntpath.py", line 259, in walk
func(arg, top, names)
File "C:\SQA_log\scan.py", line 16, in print_tgzLogs
os.rename(path, 'C:\\Extracted\\Log_Text_Files')
WindowsError: [Error 183] Cannot create a file when that file already exists
According to the traceback, the log-files are already existing. The Python docs to the os.rename say:
On Windows, if dst already exists, OSError will be raised [...].
Now you can either:
delete the files manually or
delete the files automatically using os.remove(path)
If you want the files to be automatically deleted, the code would look like this (notice that I replaced your regular expression with the python endswith as suggested by utdemir):
import os, os.path
def print_tgzLogs (arg, dir, files):
for file in files:
path = os.path.join (dir, file)
path = os.path.normcase (path)
defaultFolder = "Log_Text_Files"
if not defaultFolder.endswith(':') and not os.path.exists('c:\\Extracted\Log_Text_Files'):
os.mkdir('C:\\Extracted\\Log_Text_Files')
if path.endswith(".txt") or path.endswith(".log"):
if os.path.exists('C:\\Extracted\\Log_Text_Files\\%s' % file):
os.remove('C:\\Extracted\\Log_Text_Files\\%s' % file)
os.rename(path, 'C:\\Extracted\\Log_Text_Files\\%s' % file)
print path
os.path.walk('C:\\Extracted\\storage', print_tgzLogs, 0)
It looks like are trying to use
os.rename(path, 'C:\\Extracted\\Log_Text_Files')
to move the file path into the directory C:\Extracted\Log_Text_Files, but rename doesn't work like this: it's going to try to make a new file named C:\Extracted\Log_Text_Files. You probably want something more like this:
os.rename(path, os.path.join('C:\\Extracted\\Log_Text_Files',os.path.basename(path))

Categories