Create file inside of zip archive python - python

I have an external file-system and a way to download data from there. I want to download all data into .zip archive.
What I can do is:
Create file to write into
Download data from device to this file
Write file
Add file to zip archive with zipfile.write(file)
What I want to do is:
Create zip archive
Download data from device to created file in this archive without creating it on my local drive
Here is not working code to get an Idea:
def get_all_files(self):
self.savedir()
zipf = zipfile.ZipFile(self.dir_to_save+"/SD_contents.zip", 'w');
for file in self.nsh.get_all_files("/fs/microsd"):
# get_all_files() returns list of full file paths on the SD
print file
data = self.nsh.download_file("/fs/microsd"+file)
zipf.write(data);

If your target is simply to not create temp file, StringIO
is your saver, along with ZipFile.writestr() from Ignacio's answer.

ZipFile.writestr() will allow you to write the contents of an in-memory buffer to a zip entry given by filename or ZipInfo instance. But there is no way to do it in a streaming manner due to the nature of zip files.

Related

unzip file without creating temporary files

I download a zip file from AWS S3 and unzip it. Upon unzipping, all files are saved in the tmp/ folder.
s3 = boto3.client('s3')
s3.download_file('testunzipping','DataPump_10000838.zip','/tmp/DataPump_10000838.zip')
with zipfile.ZipFile('/tmp/DataPump_10000838.zip', 'r') as zip_ref:
zip_ref.extractall('/tmp/')
lstNEW = zip_ref.namelist()
The output of listNEW is something like this:
['DataPump_10000838/', '__MACOSX/._DataPump_10000838', 'DataPump_10000838/DockBooking', '__MACOSX/DataPump_10000838/._DockBooking', 'DataPump_10000838/LoadEquipment', '__MACOSX/DataPump_10000838/._LoadEquipment', ....]
LoadEquipment and DockBooking are files but the rest are not. Is it possible to unzip the file without creating those temporary files? Or is I possible to filter out the real files? Because later, I need to use the correct files and gzip them.
$item_$unixepochtimestamp.csv.gz
Do I use the compress function?
To only extract certain files, you can pass a list to extractall:
with zipfile.ZipFile('/tmp/DataPump_10000838.zip', 'r') as zip_ref:
lstNEW = list(filter(lambda x: not x.startswith("__MACOSX/"), zip_ref.namelist()))
zip_ref.extractall('/tmp/', members=lstNEW)
The files are not temporary files, but rather macOS's way of representing resource forks in zip files that don't normally support this.

How to create a zip archive in Python without creating files on file system?

One easy way is to create a directory and populate it with files. Then archive and compress that directory into a zip file called, say, file.zip. But this approach is needless since my files are in memory already, and needing to save them to disk is excessive.
Is it possible that I create the directory structure right in memory, without saving the unzipped files/directories? So that I end up saving only the final file.zip (without the intermediate stage of saving files/directories on file system)?
You can use zipfile:
from zipfile import ZipFile
with ZipFile("file.zip", "w") as zip_file:
zip_file.writestr("root/file.json", json.dumps(data))
zip_file.writestr("README.txt", "hello world")

How to exclude ".DS_Store" path when compressing files in Python

I have a python script that compresses specific files into a zip file. However I have noticed that a file ".DS_Store" is produced within this zip file. Is there a way I can remove this from the zip file or avoid it being created in the first place in my python script. From what I have found online I think on a windows machine this hidden file appears as "macosx" file.
I've tested the zip file with and without the ".DS_Store" hidden file (I manually deleted it). When I remove it, the zip file is able to be processed correctly and when I leave it in, errors are thrown.
This is how I create the zip file in my python script:
#Create zip file of all necessary files
zipf = zipfile.ZipFile(new_path+zip_file_name, 'w', zipfile.ZIP_DEFLATED)
create_zip(new_path,zipf)
zipf.close()
Any advice how to approach removing this hidden file would be appreciated.
Your code uses a function, create_zip, but you haven't shared the code of that function. Presumably, it loops through the contents of a directory and calls the .write method of the ZipFile instance in order to write each file into the archive. If this is the case, just add some logic to that function to exclude any files called .DS_Store.
def create_zip(path, zipfile):
files = os.listdir(path)
for file in files:
if file != '.DS_Store':
zipfile.write(file)

Reading gzipped data in Python

I have a *.tar.gz compressed file that I would like to read in with Python 2.7. The file contains multiple h5 formatted files as well as a few text files. I'm a novice with Python. Here is the code I'm trying to adapt:
`subset_path='c:\data\grant\files'
f=gzip.open(filename,'subset_full.tar.gz')
subset_data_path=os.path.join(subset_path,'f')
The first statement identifies the path to the folder with the data. The second statement tells Python to open a specific compressed file and the third statement (hopefully) executes a join of the prior two statements.
Several lines below this code I get an error when Python tries to use the 'subset_data_path' assignment.
What's going on?
The gzip module will only open a single file that has been compressed, i.e. my_file.gz. You have a tar archive of multiple files that are also compressed. This needs to be both untarred and uncompressed.
Try using the tarfile module instead, see https://docs.python.org/2/library/tarfile.html#examples
edit: To add a bit more information on what has happened, you have successfully opened the zipped tarball into a gzip file object, which will work almost the same as a standard file object. For instance you could call f.readlines() as if f was a normal file object and it would return the uncompressed lines.
However, this did not actually unpack the archive into new files in the filesystem. You did not create a subdirectory 'c:\data\grant\files\f', and so when you try to use the path subset_data_path you are looking for a directory that does not exist.
The following ought to work:
import tarfile
subset_path='c:\data\grant\files'
tar = tarfile.open("subset_full.tar.gz")
tar.extractall(subset_path)
subset_data_path=os.path.join(subset_path,'subset_full')

Python. Container file for different multimedia

This is my problem.
I need to combine text, picture and video (any codec) into one file.
I know there is binary files. How would I go about packaging and reading the file.
For example, In the one file I store the text, then the png and then the video.
In another Python file I extract the files again and display as I please.
Regards,
Renier Engelbrecht
You could use the zipfile module - it creates a single file from arbitrary components.
Sample usage (Python 3):
import zipfile
# Write zip file
with zipfile.ZipFile("combined_file.zip", mode='w', compression=zipfile.ZIP_STORED) as archive:
archive.write("file_1.ext")
archive.write("file_2.ext")
# Extract contents later
with zipfile.ZipFile("combined_file.zip", mode='r') as archive:
archive.extractall()

Categories