Python zipfile extract files from directory inside a zip file - python

I need to extract some files inside a directory in a zip file.
The main problem is that I want to extract only the contents from this directory, not the directory itself with all the files inside.
I've tried by iterating on them using namelist() or tweaking it with zipfile.Path(), unsuccessfully.
This works but it extracts the directory with the files (like extractall() does). Path doesn't work because raises KeyError saying that the item doesn't exist yet it does.
for zip_file in zip_files:
with zipfile.ZipFile(os.path.join(home_path, zip_file), 'r') as zip_ref:
files = [n for n in zip_ref.namelist()]
zip_ref.extractall(os.path.join(home_path, 'dir'), members=files)

written from my mobile but I expect it to work:
from pathlib import Path
with ZipFile(zipfile_path, "r") as zf:
for f in zf.namelist():
if f.startswith('/'):
continue
source = zf.open(f)
target = open(target_dir / Path(f).name, "wb")
with source, target:
shutil.copyfileobj(source, target)

Related

BadZipFile: File is not a zip for damaged zip files

I try to run the following code on the folder, which contains only zip files and somehow it does not work. I don't know how to solve it at all. I looked some references on line which said that it may be because some zip files may be damaged itside, but my Python skills are not that good to figure out what may be a solution then. Apparently, it is specifically because some of the zip files are damaged, because I checked extensions and all files inside folder are zip
import os, zipfile
dir_name = "C:/Users/Имя/termpaper/notifications_2020"
extension = ".zip"
os.chdir(dir_name)
for item in os.listdir(dir_name): # loop through items in dir
if item.endswith(extension): # check for ".zip" extension
file_name = os.path.abspath(item) # get full path of files
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
listOfFileNames = zip_ref.namelist()
for fileName in listOfFileNames:
if fileName.endswith('.xml'): #i choose files only with xml extension, because there are also sig files
if fileName.startswith('fksNotificationEA44'): # code for electronic auction notification
zip_ref.extract(fileName)

Python doesn't recognize zip files as zip files

I iterate through the directories and want to find all zip files and add them to download_all.zip
I am sure there are zip files, but Python doesn't recognize those zip files as zip files. Why is that?
my code:
os.chdir(boardpath)
# zf = zipfile.ZipFile('download_all.zip', mode='w')
z = zipfile.ZipFile('download_all.zip', 'w') #creating zip download_all.zip file
for path, dirs, files in os.walk(boardpath):
for file in files:
print file
if file.endswith('.zip'): # find all zip files
print ('adding', file)
z.write(file) # error shows: doesn't file is a str object, not a zip file
z.close()
z = zipfile.ZipFile("download_all.zip")
z.printdir()
I tried:
file.printdir()
# I got the following error: AttributeError: 'str' object has no attribute 'printdir'
zipfile.Zipfile.write(name), name actually stands for full file path, not just filename.
import os #at the top
if file.endswith('.zip'): # find all zip files
filepath = os.path.join(path, file)
print ('adding', filepath)
z.write(filepath) # no error
As stated in the ZipFile.write's doc, the filename argument must be relative to the archive root. So the following line:
z.write(file)
Should be:
z.write(os.path.relpath(os.path.join(path, file)))
The files that os/walk() yields are lists of filenames. These filenames are just strings (which don't have a printdir() method).
You want to use the context management while opening up the zip file archive and writing to it for each file that you find, hence the use of with. In addition, since you're walking through a directory structure, you need to full qualify each file's path.
import os
import Zipfile
with zipfile.ZipFile('download_all.zip', 'w') as zf:
for path, dirs, files in os.walk('/some_path'):
for file in files:
if file.endswith('.zip'):
zf.write(os.path.join(path, file))

Read .txt from multiple .zip in folder

I have a folder (not zipped) containing multiple zip files (no other file type within folder). Each zip has the same type of text files containing different data saved within.
I know how to read in each separately, but I am looking to loop the process without having to type in each zip name. The zipfile archive does not seem to allow wild cards, so I cannot loop using this method. Is it possible to loop the process using glob?
The goal is to get the agency names without extracting all the zipfiles.
Single file read
import os
os.listdir('C:\\NTM\\Test\\')
['00003_32_332.zip', '00011_273_569.zip', '00012_258_276.zip']
import glob
glob.glob('C:\\NTM\\Test\\*.zip')
['C:\\NTM\\Test\\00003_32_332.zip', 'C:\\NTM\\Test\\00011_273_569.zip', 'C:\\NTM\\Test\\00012_258_276.zip']
import zipfile
archive=zipfile.ZipFile('C:\\NTM\\Test\\00011_273_569.zip')
testagency=archive.open('agency.txt')
testagency.read()
'agency_id,agency_name,nVRT,ValleyRide'
Update:
Now, that I can loop through the zip files and loop through to get the text file - I cannot print the agency_name from all of the zip files in the folder. My current code only prints the name of the last agency from the text file of the last zip file in the folder. Am I missing some compound statement structure?
def csv_dict_reader(file_obj):
reader=csv.DictReader(file_obj, delimiter=',')
for row in reader:
print(row['agency_name'])
if name == 'main':
with archive.open('agency.txt')as f_obj:
csv_dict_reader(f_obj)
Whatcom Transportation Authority
Sample Code
import glob
import zipfile
dirName = '/backup/'
zipList = glob.glob(diName+'*.zip')
for zipname in zipList:
archive = zipfile.ZipFile(zipname)
fileList = archive.namelist()
for fileName in fileList:
if fileName.endswith('.txt'):
archive.extract(fileName)
archive.close()
Thanks Jean-Francois!
for archive_name in glob.glob('C:\\NTM\\Test\\*.zip'):
archive=zipfile.ZipFile(archive_name)
testagency=archive.open('agency.txt')
testagency.read()
As I could not comment on Fuji Komalans comment.
Here is the fixed code.
import glob
import zipfile
dirName = 'C:/test/'
zipList = glob.glob(dirName + '*.zip')
print(zipList)
for zipname in zipList:
archive = zipfile.ZipFile(zipname)
fileList = archive.namelist()
for fileName in fileList:
if fileName.endswith('.txt'):
archive.extract(fileName)
print(fileName)
archive.close()

WindowsError: [Error 267] The directory name is invalid - Spyder 2.7

I am trying to print the file names in my directory using the following code:
dir_path = 'D:/#/#/#/#.json'
for filename in os.listdir(dir_path):
print filename
f = open(os.path.join(dir_path,filename), 'r')
but this error is displayed when running the code:
for filename in os.listdir(dir_path):
WindowsError: [Error 267] The directory name is invalid: 'D:/#/#/#/#.json/*.*'
I am unsure what the json/*.* means in the error message, I am new to Python so apologies if this question is vague.
Your question is unclear as to whether the directory D:/#/#/# contains any files in it other than JSON files, so I shall give two answers. Hopefully one of them will apply to you:
Directory contains only JSON files
In that case, simply remove the /#.json from the end of dir_path:
dir_path = 'D:/#/#/#'
for filename in os.listdir(dir_path):
print filename
f = open(os.path.join(dir_path,filename), 'r')
Directory contains JSON files and other files that you want to exclude
In this situation it's best to use the Python glob module.
The following should list all of the .json files in the folder D:/#/#/#:
import glob
dir_path = 'D:/#/#/#/*.json'
for filename in glob.glob(dir_path):
print filename
f = open(filename, 'r')
Note that filenames returned by glob.glob include the directory path, so we don't use os.path.join on them.
os.listdir lists all files in the directory you give it. However it seems you're not passing it the name of a directory, you're passing it the name of a file. How can it possibly list all the files in a directory if you don't give it a directory?

CRC test of a zipped directory says it's corrupted but i can open and read it

I'm building a zipped archive with some data inside filtering the files inside a directory. The file are taken and removed from the directory after the zip archive is made. I've been asked to check the archive with a crc algorithm so i used this:
test=zf.testzip()
Te test fails and the "test" variable contains the first of the files which have to be filtered and compressed by the script. So i assume the other ones are all corrupted even. The problem is i can read the data inside the arechive, they are perfectly duplicated by extracting the archive so, where is the problem?
The code to make the archive is the following:
import zipfile
import os
[...]
if dozip==True:
zf = zipfile.ZipFile(zipname, "w", comprez)
for dirname, subdirs, files in os.walk(dir):
for filename in files:
fl=filename.split("-")
fdate= datetime.datetime.strptime(fl[0], "%Y%m%d")
if start <= fdate <= end:
if fl[1] == client_name+".stat":
zf.write(os.path.join(dirname, filename))
if docancel==True:
os.remove(os.path.join(dirname, filename))
test=zf.testzip()
if test == None:
zf.close()
else:
print test
zf.close()
Where is my mistake? How can I solve this problem?
You need to close the zipped file before checking for integrity, and open it in reading after:
zf = zipfile.ZipFile(zipname, "w", comprez)
[...]
zf.close()
zf2 = zipfile.ZipFile(zipname, "r")
test=zf2.testzip()

Categories