Reading file from concatinated ( tar ) file directly without untarring the tar file

Reading file from concatinated ( tar ) file directly without untarring the tar file - python

Hi i am having a one xml file and some image files, i am making my one concatenate file (ie as like tar file) from all these file (i am having my own scripts for tarring and untarring).
before i describe what exactly i want you have to look the current situation.
As of now i have to untar the all files into a directory then i am able to read the xml file which is part of the tar file.Then i read the data from xml file and then i am able to draw image mention in xml ( image names are mention in xml attribute value) on corresponding panels.
Now i want when someone click on my tar file, i should able to read the xml file and then i am able to read all the other images ( data) and i can draw on the corresponding panel with extract specifically in to a directory.
Is any method or any help really help me alot.
Thanks in advance.

The tarfile module gives you access to tarballs. It won't be random access, but you can read out any files you need and put them in a temporary directory, or just store them in strings.

Related

Getting images from KMZ files

I'm trying to take an image out of a KMZ file. I currently have the local path of the kmz and the path of the file inside its respective kml file (relative, not global), and what I need is to get the path to load the file to a database. Is there a way to get it using basic string-type paths?

A KMZ is just Zip archive (with a .kmz extension instead of .zip), so you should be able to unzip it and access all the files with "zipfile" or similar.

accessing one file at a time from a large (40GB) tar file in python

I'm trying to access a large tarball (tar.gz) in python. The tarball contains multiple mp3 or wav files. I'd like to read each file individually and do the processing that I would like to do.
I did look at a few of the suggestions available here: this and this.
Both solutions offer reading the table of contents, but not accessing/reading each file at a time.
The other solutions I have seen refer to extracting the entire tarball - I do not have so much place left on my disk to do so.
Any help in this regard will be appreciated.

You can use TarFile.extractfile to get a buffered reader on each file in the archive without decompressing the others.
import tarfile
with tarfile.open("test.tar.gz") as archive:
for member in archive:
file_obj = archive.extractfile(member)
print(file_obj)

How can i import the data from dataset of images when i have given a Zip file of type Gz file. an i have already extracted?

i am working on a new assingment where i have asked to bulit a CNN classification model .how to handel when data is given in Gz file type and have diffent folders with images.
Before now have only worked on csv type file but currently i am not able to how to handel this type of data having only images in different folder and that folder contains in gz type of file.

Before going further you need to know what .gz is. It's a file format like zip. In order to uncompress the file, you can follow the following commands on linux
tar -xvzf file.tar.gz
On Windows - Use winrar or winzip.

How do I open/convert .pkz files?

A python package that I'm using has data stored under a single file with a .pkz extension. How would I unzip (?) this file to view the format of data within?

Looks like what you are referencing is just a one-off file format used in sample data in scikit-learn. The .pkz is just a compressed version of a Python pickle file which usually has the extension .pkl.
Specifically you can see this in one of their sample files here along with the fact they are using the zlib_codec. To open it, you can go in reverse or try uncompressing from the command line.

Before attempting to open an PKZ file, you'll need to determine what kind of file you are dealing with and whether it is even possible to open or view the file format.
Files which are given the .PKZ extension are known as Winoncd Images Mask files, however other file types may also use this extension. If you are aware of any additional file formats that use the PKZ extension, please let us know.
How to open a PKZ file:
The best way to open an PKZ file is to simply double-click it and let the default assoisated application open the file. If you are unable to open the file this way, it may be because you do not have the correct application associated with the extension to view or edit the PKZ file.
If you can do it, great, you have a program installed that can do it, lets say that program is called pkzexecutor.exe, with python, you just have to do:
import subprocess
import os
path_to_notepad = 'C:\\Windows\\System32\\pkzexecutor.exe'
path_to_file = 'C:\\Users\\Desktop\\yourfile.pkz'
subprocess.call([path_to_notepad, path_to_file])

From the source code for fetch_olivetti_faces, the file appears to be downloaded from http://cs.nyu.edu/~roweis/data/ and originally has a .mat file extension, meaning it is actually a MATLAB file. If you have access to MATLAB or another program which can read those files, try opening it from there with the original file extension and see what that gives you.
(If you want to try opening this file in Python itself, then perhaps give this question a look: Read .mat files in Python )

compressed archive with quick access to individual file

I need to come up with a file format for new application I am writing.
This file will need to hold a bunch other text files which are mostly text but can be other formats as well.
Naturally, a compressed tar file seems to fit the bill.
The problem is that I want to be able to retrieve some data from the file very quickly and getting just a particular file from a tar.gz file seems to take longer than it should. I am assumeing that this is because it has to decompress the entire file even though I just want one. When I have just a regular uncompressed tar file I can get that data real quick.
Lets say the file I need quickly is called data.dat
For example the command...
tar -x data.dat -zf myfile.tar.gz
... is what takes a lot longer than I'd like.
MP3 files have id3 data and jpeg files have exif data that can be read in quickly without opening the entire file.
I would like my data.dat file to be available in a similar way.
I was thinking that I could leave it uncompressed and seperate from the rest of the files in myfile.tar.gz
I could then create a tar file of data.dat and myfile.tar.gz and then hopefully that data would be able to be retrieved faster because it is at the head of outer tar file and is uncompressed.
Does this sound right?... putting a compressed tar inside of a tar file?
Basically, my need is to have an archive type of file with quick access to one particular file.
Tar does this just fine, but I'd also like to have that data compressed and as soon as I do that, I no longer have quick access.
Are there other archive formats that will give me that quick access I need?
As a side note, this application will be written in Python. If the solution calls for a re-invention of the wheel with my own binary format I am familiar with C and would have no problem writing the Python module in C. Idealy I'd just use tar, dd, cat, gzip, etc though.
Thanks,
~Eric

ZIP seems to be appropriate for your situation. Files are compressed individually, which means you access them without streaming through everything before.
In Python, you can use zipfile.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.