Python, Inconsistent zip file extraction - python

I am trying to extract zip files using the zipfile module's extractall method.
My code snippet is
import zipfile
file_path = '/something/airway.zip'
dir_path = 'something/'
with zipfile.ZipFile(file_path, "r") as zip_ref:
zip_ref.extractall(dir_path)
I have two zip files named, test (1.1 mb) and airway (520 mb).
For test.zip the folder contains all the files but for airway.zip, it creates another folder inside my target folder named Airway, and then extracts all the files there. Even after renaming the airway.zip to any garbage name, the result was same.
Is there some workaround to get only the files extracted in my target folder? It is critical for me as I'm doing this extraction automated from django
Python version: 3.9.6;
Django version: 2.2

I ran your code and it seems to be only a problem of the zipfile itself. If you create a zipfile by selecting only the elements you get the result you got with test.zip. If you create it by selecting a folder holding the elements the folder will be there if you extract it again, no matter what you name your zip file.

I have two articles related to this:
https://www.kite.com/python/docs/zipfile.ZipFile.extractall
https://www.geeksforgeeks.org/working-zip-files-python/
Even if both of these articles do not solve your problem then I think that instead of zipping the files in the folder you just zipped the folder itself so try by zipping the files inside the folder.

Related

namelist() method not listing directories in python

I want to extract all the folder names from the zip file so that I can extract them separately. My login is working with one zip and but it is not working with another zip with strcuture.
root_dir = r'C:/Workspace/Neo4j/FileStore/RDAR/data.zip'
archive = ZipFile(root_dir, "r")
folder_paths = []
for file in archive.namelist():
print(file)
if file.endswith("/"):
folder_paths.append(file)
The above code is working with data10.zip but it is not listing directories in data.zip
folder strcuture of data.zip which is not working
folder structure of data10.zip
I am able extract list of folder in data10.zip as shown above, but cannot in data.zip
Any clue what might be the reason ?
Thanks
I am trying to list the directories from zip, although my code is working for one zip and is not working for another with same structure.

Python Unzipping files with different name to a different location

I am trying to extract the files from a zip archive and appending "EI" to each file inside it. I want these files to be extracted in a certain location. I'm new to python, hence unable to figure out.
for i in zip_list:
if ("Rally-EI" in i):
zipdata = zipfile.ZipFile(i)
zipinfos = zipdata.infolist()
for zipinfo in zipinfos:
zipinfo.filename = zipinfo.filename[:-4] + "_EI.txt"
zipdata.extract(zipinfo)
This is the code I'm using for appending the file name and it is working well. Need to extract these files to a specific location.
Thanks
Try using os.chdir() to change the current directory temporarily for this extraction. It's not the most efficient way, but, it will do the work.
Do save your current working directory using os.getcwd() to revert back to the original working directory after the extraction is done.

Python NLTK Make corpus from zip files

I'm trying to create my own corpus in NLTK from ca. 200k text files each stored in it's own zip folder. It looks like the following:
Parent_dir
text1.zip
text1.txt
I'm using the following code and try to access all the text files from the parent directory:
from nltk.corpus.reader.plaintext import PlaintextCorpusReader
corpus_path="parent_dir"
corpus=PlaintextCorpusReader(corpus_path,".*")
file_ids=corpus.fileids()
print(file_ids)
But Python just returns an empty list because it probably can't access the text files due to the zipping. Is there an easy way to fix this? Unfortunately, I can't unzip the files because of the size of the files.
If all you're trying to do is get the fileIDs just use the 'glob' module, which doesn't care about file types.
Import the module (if you don't have it go ahead and pip install glob):
from glob import glob
Get your directory use * as a wildcard to get everything in the directory:
directory = glob('/path/to/your/corpus/*')
The glob() method returns a list of strings (which are file paths, in this case).
You can simply iterate over these to print the file name:
for file in directory:
print(file)
This article looks like an answer to your question about reading the contents of a zipped file: How to read text files in a zipped folder in Python
I think a combination of these methods makes an answer to your problem.
Good luck!

Python zipfile issue on Windows

I wrote a simple, rough program that automatically zip everything inside the current working directory. It works very well on Linux but there is huge problem when running on Windows.
Here is my code:
import os, zipfile
zip = zipfile.ZipFile('zipped.zip', 'w') #Create a zip file
zip.close()
zip = zipfile.ZipFile('zipped.zip', 'a') #Make zip file append instead of overwriting
for dir, subdir, file in os.walk(os.path.relpath('.')): #Loop for walking thru the directory
for subdirectory in subdir:
subdirs = os.path.join(dir, subdirectory)
zip.write(subdirs, compress_type=zipfile.ZIP_DEFLATED)
for files in file:
fil = os.path.join(dir, files)
zip.write(fil, compress_type=zipfile.ZIP_DEFLATED)
zip.close()
When I ran this on Windows, it won't stop compressing, but infinitely create the "zipped.zip" file in the zipped file, after left it running a few seconds, generated few hundreds MB of file. On Linux, the program will stop after it zipped all the files excluding newly created zipped.zip.
Screenshot: A "zipped.zip" inside the "zipped.zip"
I am wondering did I miss some code that will make this works well on Windows?
I would zip the folder in a temporary zipfile, then move the temporary zipfile in the folder.
That seems to be because you are saving the zip to the same folder that you are trying to compress, and that must be confusing os.walk() somehow.
One possible solution, as long as you don't have a giant directory to compress, is to use os.walk() to build a full list of what will be compressed, and after the list is complete, then you would it to populate the zip, instead of using os.walk() directly.

python get list of folder names from zip folder

I have 80 zipped files. In each of them, there are about 20 folders (that I will call first level folders). What is the python code to get a list of all of all of the first level folder names from each of the zipped file?
I need to have an excel spread sheet listing the names of the first level folders from all 80 zipped files.
Tricky part: There are 2 types of zipped files amongst those 80. Some have .zip extension while others have .7z extension.
The Python zipfile module documentaion answers your question well.
ZipFile.namelist()
Return a list of archive members by name.
For 7zip, it may be necessary to use the subprocess module and run 7zip; not all 7zip files can be opened by the zipfile module.

Categories