This is my problem.
I need to combine text, picture and video (any codec) into one file.
I know there is binary files. How would I go about packaging and reading the file.
For example, In the one file I store the text, then the png and then the video.
In another Python file I extract the files again and display as I please.
Regards,
Renier Engelbrecht
You could use the zipfile module - it creates a single file from arbitrary components.
Sample usage (Python 3):
import zipfile
# Write zip file
with zipfile.ZipFile("combined_file.zip", mode='w', compression=zipfile.ZIP_STORED) as archive:
archive.write("file_1.ext")
archive.write("file_2.ext")
# Extract contents later
with zipfile.ZipFile("combined_file.zip", mode='r') as archive:
archive.extractall()
Related
I am working on a project that manipulates with a document's xml file. My approach is like the following. First convert the DOCX document into a zip archive, then extract the contents of that archive in order to have access to the document.xml file, and finally convert the XML to a txt in order to work with it.
So i did all the above on a document, and everything worked perfectly, but when i decided to use a different document, the Zipfile library doesnt extract the content of the new ZIP archive, however it somehow extracts the contents of the old document that i processed before, and converts the document.xml file into document.txt without even me even running that block of code that converts the XML into txt.
The worst part is the old document is not even in the directory anymore, so i have no idea how Zipfile is extracting the content of that particular document when its not even in the path.
This is the code I am using in Jupyter notebook.
import shutil
import zipfile
# Convert the DOCX to ZIP
shutil.copyfile('data/docx/input.docx', 'data/zip/document.zip')
# Extract the ZIP
with zipfile.ZipFile('zip/document.zip', 'r') as zip_ref:
zip_ref.extractall('data/extracted/')
# Convert "document.xml" to txt
os.rename('extracted/word/document.xml', 'extracted/word/document.txt')
# Read the txt file
with open('extracted/word/document.txt') as intxt:
data = intxt.read()
This is the directory tree for the extracted zip archive for the first document.
data -
1-docx
2-zip
3-extracted/-
1-customXml/
2-docProps/
3-_rels
4-[Content_Types].xml
5-word/-document.txt
The 2nd document's directory tree should be as following
data -
1-docx
2-zip
3-extracted/-
1-customXml/
2-docProps/
3-_rels
4-[Content_Types].xml
5-word/-document.xml
But Zipfile is extracting the contents of the first document even when the DOCX file is not in the directory.I am also using Ubuntu 20.04 so i am not sure if it has to do with my OS.
I suspect that you are having issues with relative paths, as unzipping any Word document will create the same file/directory structure. I'd suggest using absolute paths to avoid this. What you may also want to do is, after you are done manipulating and processing the extracted files and directories, delete them. That way you won't encounter any issues with lingering files.
I'm currently trying to open an .xlsx file with zipfile on Python, finding all files with namelist(), then using .count() to find all images in .png format within the archive.
My problem is currently, the list returned by namelist() function returns only 1680 elements.
After saving the xlsx file as an html, I am able to view all images contained in the excel spreadsheet and the total file count is 3,352 files.
I checked documentation for zipfile and exhausted the best Google searches I could muster. I appreciate any hints or advice!
Here's the snippet of code I'm using:
import zipfile as zf
xlsx = 'myfile.xlsx'
xlsx_file = zf.ZipFile(xlsx)
fileList = xlsx_file.namelist()
maybe convert it to a wheel file? wheel works good to me
I have a .torrent file that contains a .bz2 file. I am sure that such a file is actually in the .torrent because I extracted the .bz2 with utorrent.
How can I do the same thing in python instead of using utorrent?
I have seen a lot of libraries for dealing with .torrent files in python but apparently none does what I need. Among my unsuccessful attempts I can mention:
import torrent_parser as tp
file_cont = tp.parse_torrent_file('RC_2015-01.bz2.torrent')
file_cont is now a dictionary and file_cont['info']['name']='RC_2015-01.bz2' but if I try to open the file, i.e.
from bz2 import BZ2File
with BZ2File(file_cont['info']['name']) as f:
what_I_want = f.read()
then the content of the dictionary is (obviously, I'd say) interpreted as a path, and I get
No such file or directory: 'RC_2015-01.bz2'
Other attempts have been even more ruinous.
A .torrent file is just a metadata file, indicating where to get the data and the filename of the file. You can't get the file contents from that file.
Only once you have successfully downloaded this torrent file to disk (using torrent software) you can then use BZ2File to open it (if it is .bz2 format).
If you want to perform the actual download with Python, the only option I found was torrent-dl which hasn't been updated for 2 years.
I have a *.tar.gz compressed file that I would like to read in with Python 2.7. The file contains multiple h5 formatted files as well as a few text files. I'm a novice with Python. Here is the code I'm trying to adapt:
`subset_path='c:\data\grant\files'
f=gzip.open(filename,'subset_full.tar.gz')
subset_data_path=os.path.join(subset_path,'f')
The first statement identifies the path to the folder with the data. The second statement tells Python to open a specific compressed file and the third statement (hopefully) executes a join of the prior two statements.
Several lines below this code I get an error when Python tries to use the 'subset_data_path' assignment.
What's going on?
The gzip module will only open a single file that has been compressed, i.e. my_file.gz. You have a tar archive of multiple files that are also compressed. This needs to be both untarred and uncompressed.
Try using the tarfile module instead, see https://docs.python.org/2/library/tarfile.html#examples
edit: To add a bit more information on what has happened, you have successfully opened the zipped tarball into a gzip file object, which will work almost the same as a standard file object. For instance you could call f.readlines() as if f was a normal file object and it would return the uncompressed lines.
However, this did not actually unpack the archive into new files in the filesystem. You did not create a subdirectory 'c:\data\grant\files\f', and so when you try to use the path subset_data_path you are looking for a directory that does not exist.
The following ought to work:
import tarfile
subset_path='c:\data\grant\files'
tar = tarfile.open("subset_full.tar.gz")
tar.extractall(subset_path)
subset_data_path=os.path.join(subset_path,'subset_full')
I have an external file-system and a way to download data from there. I want to download all data into .zip archive.
What I can do is:
Create file to write into
Download data from device to this file
Write file
Add file to zip archive with zipfile.write(file)
What I want to do is:
Create zip archive
Download data from device to created file in this archive without creating it on my local drive
Here is not working code to get an Idea:
def get_all_files(self):
self.savedir()
zipf = zipfile.ZipFile(self.dir_to_save+"/SD_contents.zip", 'w');
for file in self.nsh.get_all_files("/fs/microsd"):
# get_all_files() returns list of full file paths on the SD
print file
data = self.nsh.download_file("/fs/microsd"+file)
zipf.write(data);
If your target is simply to not create temp file, StringIO
is your saver, along with ZipFile.writestr() from Ignacio's answer.
ZipFile.writestr() will allow you to write the contents of an in-memory buffer to a zip entry given by filename or ZipInfo instance. But there is no way to do it in a streaming manner due to the nature of zip files.