Tar function in Python

Tar function in Python - python

I am trying to tar a folder using the following code.
make_tarfile('logs_' + str(datetime.datetime.now()),logFolder)
def make_tarfile(output_filename, source_dir):
with closing(tarfile.open(output_filename, "w:gz")) as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
I dont see any tar file created though.Please could any one correct my code.
Thanks in advance

your code looks usable, you shall find the archive created.
be aware, the archive file name will be exactly the name, you pass into tarfile.open, you have to specify the extension .tar.gz if you want to see it as name of the archive.
the with closing is not necessary in Python 2.7+, you may use with tarfile.open(output_filename, "w:gz")) as tar: as open tarfile has proper context manager available.

Related

Saving a .tar.gz file located on server to a FILE object

I'm currently working on a Python Flask API.
For demo purposes, I have a folder in the server containing .tar.gz files.
Basically I'm wondering how do I save these files knowing their relative path name, say like file.tar.gz, into a FILE object. I need the tar file in the format to be able to run the following code on it, where f would be the tar file:
tar = tarfile.open(mode="r:gz", fileobj=f)
for member in tar.getnames():
tf = tar.extractfile(member)
Thanks in advance!

Not ver familiar with this but , just saving it normally with .tar.gz extension should work? if yes and if you have the file already compressed then a very simple code could do that,
compresseddata= 'your file'
with open('file.tar.gz') as fo:
fo.write(compressed data)
fo.flush().close()
Will this do the job , or am i getting something wrong here?

How to get information of .jar file in python-magic

I have a folder full of jar, html, css, exe type file. How can I check the file?
I already run "file" command on *NIX and using python-magic. but the result is all like this.
test : Zip archive data, at least v1.0 to extract
How can I get information specifically like test : jar only using using magic number.
How do I do like this?

While not required, most JAR files have a META-INF/MANIFEST.MF file contained within them. You could check for the existence of this file, after checking if it's a zip file:
import zipfile
def zipFileContains(zipFileName, pathName):
f = zipfile.ZipFile(zipFileName, "r")
result = any(x.startswith(pathName.rstrip("/")) for x in f.namelist())
f.close()
return result
print zipFileContains("test.jar", "META-INF/MANIFEST.MF")
However, it might be better to just check if it's a zip file that ends in .jar.
Magic alone won't do it for you, since a JAR is literally just a zip file. Read more about the format here.

Is it possible to extract single file from tar bundle in python

I need to fetch a couple of files from a huge svn repo. Whole repo takes almost an hour to be fetched. Files I am looking for are part of tar bundle.
Is it possible to fetch only those two files from tar bundle without extracting the whole bundle through Python Code?
If so, can anybody let me know how should I go about it?

It sounds like you have two parts to your question:
Fetching a single tar bundle from the SVN repo, without the rest of the repo's files.
Using Python to extract two files from the retrieved bundle.
For the first part, I'll simply refer to this post on svn export and sparse checkouts.
For the second part, here is a solution for extracting the two files from the retrieved tarball:
import tarfile
files_i_want = ['path/to/file1','path/to/file2']
tar = tarfile.open("bundle.tar")
tar.extractall(members=[x for x in tar.getmembers() if x.name in files_i_want])

Here is one way to get a tar file from svn and extract one file from it all:
import tarfile
from subprocess import check_output
# Capture the tar file from subversion
tmp='/home/me/tempfile.tar'
open(tmp, 'wb').write(check_output(["svn", "cat", "svn://url/some.tar"]))
# Extract the file we want, saving to current directory
tarfile.open(tmp).extract('dir1/fname.ext', path='dir2')
where 'dir1/fname.ext' is the full path to the file that you want within the tar archive. It will be saved in 'dir2/dir1/fname.ext'. If you omit the path argument, it will be saved in 'dir1/fname.ext' under the current directory.
The above can be understood as follows. On a normal shell command line, svn cat url tells subversion to send the file defined by url to stdout (see svn help cat for more info). url can be any type of url that svn understands such as svn://..., svn+ssh://..., or file://.... We run this command under python control using the subprocess module. To do this the svn cat url command is broken up into a list: ["svn", "cat", "url"]. The output from this svn command is saved to a local file defined by the tmp variable. We then use the tarfile module to extract the file you want.
Alternatively, you could use the extractfile method to capture the file data to a python variable:
handle = t.extractfile('dir1/fname.ext')
print handle.readlines() # show file contents
According to the documentation, tarfile should accept a subprocess's stdout as a file handle. This would simplify the code and eliminate the need to save the tar file locally. However, due to a bug, Issue 10436, that will not work.

Perhaps you want something like this?
#!/usr/local/cpython-3.3/bin/python
import tarfile as tarfile_mod
def main():
tarfile = tarfile_mod.TarFile('tar-archive.tar', 'r')
if False:
file_ = tarfile.extractfile('etc/protocols')
print(file_.read())
else:
tarfile.extract('etc/protocols')
tarfile.close()
main()

How can I rename files with a python script?

On my plone site I have hundreds of files (pdf, doc, ...) in filefield of archetypes objects. Something went wrong during importation and all the filenames are missing. The problem is that when somebody wants to open the file, since the extension is missing, the browser doesn't always propose to open it with a viewer.
The user has to save the file and add an extension to open it.
Can I write a python script to rename all files with an extension depending on the filetype?
Thank you.

http://plone.org/documentation/manual/plone-community-developer-documentation/content/rename
you've all you need here :)
The important part is this: parent.manage_renameObject(id, id + "-old")
you can loop over the subobjects doing:
for i in context.objectIds():
obj = context[i]
context.manage_renameObject(i, i + ".pdf")
context is the folder where you put this script, the folder where you've all your pdfs

The standard library function os.rename(src, dst) will do the trick. That's all you need if you know what the extension should be (e.g. all the files are .pdf). If you have a mixed bag of .doc, .pdf, .jpg, .xls files with no extensions, you'll need to examine the file contents to determine the proper extension using something like python-magic.
import os
for fn in os.listdir(path='.'):
os.rename(fn, fn + ".pdf")

Delete file from zipfile with the ZipFile Module

The only way I came up for deleting a file from a zipfile was to create a temporary zipfile without the file to be deleted and then rename it to the original filename.
In python 2.4 the ZipInfo class had an attribute file_offset, so it was possible to create a second zip file and copy the data to other file without decompress/recompressing.
This file_offset is missing in python 2.6, so is there another option than creating another zipfile by uncompressing every file and then recompressing it again?
Is there maybe a direct way of deleting a file in the zipfile, I searched and didn't find anything.

The following snippet worked for me (deletes all *.exe files from a Zip archive):
zin = zipfile.ZipFile ('archive.zip', 'r')
zout = zipfile.ZipFile ('archve_new.zip', 'w')
for item in zin.infolist():
buffer = zin.read(item.filename)
if (item.filename[-4:] != '.exe'):
zout.writestr(item, buffer)
zout.close()
zin.close()
If you read everything into memory, you can eliminate the need for a second file. However, this snippet recompresses everything.
After closer inspection the ZipInfo.header_offset is the offset from the file start. The name is misleading, but the main Zip header is actually stored at the end of the file. My hex editor confirms this.
So the problem you'll run into is the following: You need to delete the directory entry in the main header as well or it will point to a file that doesn't exist anymore. Leaving the main header intact might work if you keep the local header of the file you're deleting as well, but I'm not sure about that. How did you do it with the old module?
Without modifying the main header I get an error "missing X bytes in zipfile" when I open it. This might help you to find out how to modify the main header.

Not very elegant but this is how I did it:
import subprocess
import zipfile
z = zipfile.ZipFile(zip_filename)
files_to_del = filter( lambda f: f.endswith('exe'), z.namelist()]
cmd=['zip', '-d', zip_filename] + files_to_del
subprocess.check_call(cmd)
# reload the modified archive
z = zipfile.ZipFile(zip_filename)

The routine delete_from_zip_file from ruamel.std.zipfile¹ allows you to delete a file based on its full path within the ZIP, or based on (re) patterns. E.g. you can delete all of the .exe files from test.zip using
from ruamel.std.zipfile import delete_from_zip_file
delete_from_zip_file('test.zip', pattern='.*.exe')
(please note the dot before the *).
This works similar to mdm's solution (including the need for recompression), but recreates the ZIP file in memory (using the class InMemZipFile()), overwriting the old file after it is fully read.
¹ Disclaimer: I am the author of that package.

Based on Elias Zamaria comment to the question.
Having read through Python-Issue #51067, I want to give update regarding it.
For today, solution already exists, though it is not approved by Python due to missing Contributor Agreement from the author.
Nevertheless, you can take the code from https://github.com/python/cpython/blob/659eb048cc9cac73c46349eb29845bc5cd630f09/Lib/zipfile.py and create a separate file from it. After that just reference it from your project instead of built-in python library: import myproject.zipfile as zipfile.
Usage:
with zipfile.ZipFile(f"archive.zip", "a") as z:
z.remove(f"firstfile.txt")
I believe it will be included in future python versions. For me it works like a charm for given use case.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tar function in Python - python

Related

Saving a .tar.gz file located on server to a FILE object

How to get information of .jar file in python-magic

Is it possible to extract single file from tar bundle in python

How can I rename files with a python script?

Delete file from zipfile with the ZipFile Module

Categories

Resources