Python tar.add files but omit parent directories - python

I am trying to create a tar file from a list of files stored in a text file, I have working code to create the tar, but I wish to start the archive from a certain directory (app and all subdirectories), and remove the parents directories. This is due to the software only opening the file from a certain directory.
package.list files are as below:
app\myFile
app\myDir\myFile
app\myDir\myFile2
If I omit the path in restore.add, it cannot find the files due to my program running from elsewhere. How do I tell the tar to start at a particular directory, or to add the files, but maintain the directory structure it got from the text file, e.g starting with app not all the parent dirs
My objective is to do this tar cf restore.tar -T package.list but with Python on Windows.
I have tried basename from here: How to compress a tar file in a tar.gz without directory?, this strips out ALL the directories.
I have also tried using arcname='app' in the .add method, however this gives some weird results by breaking the directory structure and renames loads of files to app
path = foo + '\\' + bar
file = open(path + '\\package.list', 'r')
restore = tarfile.open(path + '\\restore.tar', 'w')
for line in file:
restore.add(path + '\\' + line.strip())
restore.close()
file.close()
Using Python 2.7

You can use 2nd argument for TarFile.add, it specified the name inside the archive.
So assuming every path is sane something like this would work:
import tarfile
prefix = "some_dir/"
archive_path = "inside_dir/file.txt"
with tarfile.open("test.tar", "w") as tar:
tar.add(prefix+archive_path, archive_path)
Usage:
> cat some_dir/inside_dir/file.txt
test
> python2 test_tar.py
> tar --list -f ./test.tar
inside_dir/file.txt
In production, i'd advise to use appropriate module for path handling to make sure every slash and backslash is in right place.

Related

Getting dir path for file in argument in Python

I'm using argparse to send .pcap's to a script that scrapes through them. Wanted to organize what I was doing better, so I threw the data I was using in one folder and my scripts in another. Interfered with the way I was saving IP lists/hostnames (filename now tries to just add 'IP-list-' before the path to the .pcap file I sent as an argument).
new_ips_filename = '/IP-list-' + self.pcap
new_ips_file = open(new_ips_filename[:-5], 'w')
for i in range(len(self.new_ips)):
new_ips_file.write(self.new_ips[i] + ':' + self.new_hostnames[i] + '\n')
new_ips_file.close()
self.pcap is the path to the .pcap (which would just be ./file.pcap if it were in the same dir), is there any easy way to pull the dir from the filename here?
Are you looking for the absolute path of the file?
You might try os.path.abspath("file.pcap")
Or if you want only the directory name os.path.dirname(..)

Regarding file io in Python

I mistakenly, typed the following code:
f = open('\TestFiles\'sample.txt', 'w')
f.write('I just wrote this line')
f.close()
I ran this code and even though I have mistakenly typed the above code, it is however a valid code because the second backslash ignores the single-quote and what I should get, according to my knowledge is a .txt file named "\TestFiles'sample" in my project folder. However when I navigated to the project folder, I could not find a file there.
However, if I do the same thing with a different filename for example. Like,
f = open('sample1.txt', 'w')
f.write('test')
f.close()
I find the 'sample.txt' file created in my folder. Is there a reason for the file to not being created even though the first code was valid according to my knowledge?
Also is there a way to mention a file relative to my project folder rather than mentioning the absolute path to a file? (For example I want to create a file called 'sample.txt' in a folder called 'TestFiles' inside my project folder. So without mentioning the absolute path to TestFiles folder, is there a way to mention the path to TestFiles folder relative to the project folder in Python when opening files?)
I am a beginner in Python and I hope someone could help me.
Thank you.
What you're looking for are relative paths, long story short, if you want to create a file called 'sample.txt' in a folder 'TestFiles' inside your project folder, you can do:
import os
f = open(os.path.join('TestFiles', 'sample1.txt'), 'w')
f.write('test')
f.close()
Or using the more recent pathlib module:
from pathlib import Path
f = open(Path('TestFiles', 'sample1.txt'), 'w')
f.write('test')
f.close()
But you need to keep in mind that it depends on where you started your Python interpreter (which is probably why you're not able to find "\TestFiles'sample" in your project folder, it's created elsewhere), to make sure everything works fine, you can do something like this instead:
from pathlib import Path
sample_path = Path(Path(__file__).parent, 'TestFiles', 'sample1.txt')
with open(sample_path, "w") as f:
f.write('test')
By using a [context manager]{https://book.pythontips.com/en/latest/context_managers.html} you can avoid using f.close()
When you create a file you can specify either an absolute filename or a relative filename.
If you start the file path with '\' (on Win) or '/' it will be an absolute path. So in your first case you specified an absolute path, which is in fact:
from pathlib import Path
Path('\Testfile\'sample.txt').absolute()
WindowsPath("C:/Testfile'sample.txt")
Whenever you run some code in python, the relative paths that will be generate will be composed by your current folder, which is the folder from which you started the python interpreter, which you can check with:
import os
os.getcwd()
and the relative path that you added afterwards, so if you specify:
Path('Testfiles\sample.txt').absolute()
WindowsPath('C:/Users/user/Testfiles/sample.txt')
In general I suggest you use pathlib to handle paths. That makes it safer and cross platform. For example let's say that your scrip is under:
project
src
script.py
testfiles
and you want to store/read a file in project/testfiles. What you can do is get the path for script.py with __file__ and build the path to project/testfiles
from pathlib import Path
src_path = Path(__file__)
testfiles_path = src_path.parent / 'testfiles'
sample_fname = testfiles_path / 'sample.txt'
with sample_fname.open('w') as f:
f.write('yo')
As I am running the first code example in vscode, I'm getting a warning
Anomalous backslash in string: '\T'. String constant might be missing an r prefix.
And when I am running the file, it is also creating a file with the name \TestFiles'sample.txt. And it is being created in the same directory where the .py file is.
now, if your working tree is like this:
project_folder
-testfiles
-sample.txt
-something.py
then you can just say: open("testfiles//hello.txt")
I hope you find it helpful.

Extracting inner file from zip file with python

I am able to extract the inner file, but it extracts the entire chain.
Suppose the following file structure
v a.zip
v folder1
v folder2
> inner.txt
and suppose I want to extract inner.txt to some folder target.
Currently what happens when I try to do this is that I end up extracting folder1/folder2/inner.txt to target. Is it possible to extract the single file instead of the entire chain of directories? So that when target is opened, the only thing inside is inner.txt.
EDIT:
Using python zip module to unzip files and extract only the inner files to the desired location.
You should use the -j (junk paths (do not make directories)) modifier (old v5.52 has it). Here's the full list: [DIE.Linux]: unzip(1) - Linux man page, or you could simply run (${PATH_TO}/)unzip in the terminal, and it will output the argument list.
Considering that you want to extract the file in a folder called target, use the command (you may need to specify the path to unzip):
"unzip" -j "a.zip" -d "target" "folder1/folder2/inner.txt"
Output (Win, but for Nix it's the same thing):
(py35x64_test) c:\Work\Dev\StackOverflow\q047439536>"unzip" -j "a.zip" -d "target" "folder1/folder2/inner.txt"
Archive: a.zip
inflating: target/inner.txt
Output (without -j):
(py35x64_test) c:\Work\Dev\StackOverflow\q047439536>"unzip" "a.zip" -d "target" "folder1/folder2/inner.txt"
Archive: a.zip
inflating: target/folder1/folder2/inner.txt
Or, since you mentioned Python,
code00.py:
import os
from zipfile import ZipFile
def extract_without_folder(arc_name, full_item_name, folder):
with ZipFile(arc_name) as zf:
file_data = zf.read(full_item_name)
with open(os.path.join(folder, os.path.basename(full_item_name)), "wb") as fout:
fout.write(file_data)
if __name__ == "__main__":
extract_without_folder("a.zip", "folder1/folder2/inner.txt", "target")
The zip doesn't have a folder structure in the same way as on the filesystem - each file has a name that is its entire path.
You'll want to use a method that allows you to read the file contents (such as zipfile.open or zipfile.read), extract the part of the filename you actually want to use, and save the file contents to that file yourself.

How to make SCons not include the base dir in zip files?

SCons provides a Zip builder to produce zip files from groups of files.
For example, suppose we have a folder foo that looks like this:
foo/
foo/blah.txt
and we create the zip file foo.zip from a folder foo:
env.Zip('foo.zip', 'foo/')
This produces a zip file:
$ unzip -l foo.zip
Archive: foo.zip
foo/
foo/foo.txt
However, suppose we are using a VariantDir of bar, which contains foo:
bar/
bar/foo/
bar/foo/foo.txt
Because we are in a VariantDir, we still use the same command to create the zip file, even though it has slightly different effects:
env.Zip('foo.zip', 'foo/')
This produces the zip file:
$ unzip -l bar/foo.zip
Archive: bar/foo.zip
bar/foo/
bar/foo/foo.txt
The problem is extra bar/ prefix for each of the files within the zip. If this was not SCons, the simple solution would be to cd into bar and call zip from within there with something like cd bar; zip -r foo.zip foo/. However, this is weird/difficult with SCons, and at any rate seems very un-SCons-like. Is there a better solution?
You can create a SCons Builder which accomplishes this task. We can use the standard Python zipfile to make the zip files. We take advantage of zipfile.write, which allows us to specify a file to add, as well as what it should be called within the zip:
zf.write('foo/bar', 'bar') # save foo/bar as bar
To get the right paths, we use os.path.relpath with the path of the base file to find the path to the overall file.
Finally, we use os.walk to walk through contents of directories that we want to add, and call the previous two functions to add them, correctly, to the final zip.
import os.path
import zipfile
def zipbetter(target, source, env):
# Open the zip file with appending, so multiple calls will add more files
zf = zipfile.ZipFile(str(target[0]), 'a', zipfile.ZIP_DEFLATED)
for s in source:
# Find the path of the base file
basedir = os.path.dirname(str(s))
if s.isdir():
# If the source is a directory, walk through its files
for dirpath, dirnames, filenames in os.walk(str(s)):
for fname in filenames:
path = os.path.join(dirpath, fname)
if os.path.isfile(path):
# If this is a file, write it with its relative path
zf.write(path, os.path.relpath(path, basedir))
else:
# Otherwise, just write it to the file
flatname = os.path.basename(str(s))
zf.write(str(s), flatname)
zf.close()
# Make a builder using the zipbetter function, that takes SCons files
zipbetter_bld = Builder(action = zipbetter,
target_factory = SCons.Node.FS.default_fs.Entry,
source_factory = SCons.Node.FS.default_fs.Entry)
# Add the builder to the environment
env.Append(BUILDERS = {'ZipBetter' : zipbetter_bld})
Call it just like the normal SCons Zip:
env.ZipBetter('foo.zip', 'foo/')
Using construct variable ‘ZIPROOT’
Directories can indeed be a challenge with SCons. There are a couple of different ways you can specify the directory of the files to include in the Zip() file as follows, assuming the files are 'in project':
relative to the root project dir, prepending the path with '#'. This option will include the complete directory, like you mentioned
relative to a particular SConscript file. Either specify files in the same directory, or specify a subdirectory relative to the SConscript.
Sounds like you want the second option. Do you have a SConscript file in the same dir that you want to zip, foo in your case? This should work even for the variant_dir.

How to write tag deleter script in python

I want to implement a file reader (folders and subfolders) script which detects some tags and delete those tags from the files.
The files are .cpp, .h .txt and .xml And they are hundreds of files under same folder.
I have no idea about python, but people told me that I can do it easily.
EXAMPLE:
My main folder is A: C:\A
Inside A, I have folders (B,C,D) and some files A.cpp A.h A.txt and A.xml. In B i have folders B1, B2,B3 and some of them have more subfolders, and files .cpp, .xml and .h....
xml files, contains some tags like <!-- $Mytag: some text$ -->
.h and .cpp files contains another kind of tags like //$TAG some text$
.txt has different format tags: #$This is my tag$
It always starts and ends with $ symbol but it always have a comment character (//,
The idea is to run one script and delete all tags from all files so the script must:
Read folders and subfolders
Open files and find tags
If they are there, delete and save files with changes
WHAT I HAVE:
import os
for root, dirs, files in os.walk(os.curdir):
if files.endswith('.cpp'):
%Find //$ and delete until next $
if files.endswith('.h'):
%Find //$ and delete until next $
if files.endswith('.txt'):
%Find #$ and delete until next $
if files.endswith('.xml'):
%Find <!-- $ and delete until next $ and -->
The general solution would be to:
use the os.walk() function to traverse the directory tree.
Iterate over the filenames and use fn_name.endswith('.cpp') with if/elseif to determine which file you're working with
Use the re module to create a regular expression you can use to determine if a line contains your tag
Open the target file and a temporary file (use the tempfile module). Iterate over the source file line by line and output the filtered lines to your tempfile.
If any lines were replaced, use os.unlink() plus os.rename() to replace your original file
It's a trivial excercise for a Python adept but for someone new to the language, it'll probably take a few hours to get working. You probably couldn't ask for a better task to get introduced to the language though. Good Luck!
----- Update -----
The files attribute returned by os.walk is a list so you'll need to iterate over it as well. Also, the files attribute will only contain the base name of the file. You'll need to use the root value in conjunction with os.path.join() to convert this to a full path name. Try doing just this:
for root, d, files in os.walk('.'):
for base_filename in files:
full_name = os.path.join(root, base_filename)
if full_name.endswith('.h'):
print full_name, 'is a header!'
elif full_name.endswith('.cpp'):
print full_name, 'is a C++ source file!'
If you're using Python 3, the print statements will need to be function calls but the general idea remains the same.
Try something like this:
import os
import re
CPP_TAG_RE = re.compile(r'(?<=// *)\$[^$]+\$')
tag_REs = {
'.h': CPP_TAG_RE,
'.cpp': CPP_TAG_RE,
'.xml': re.compile(r'(?<=<!-- *)\$[^$]+\$(?= *-->)'),
'.txt': re.compile(r'(?<=# *)\$[^$]+\$'),
}
def process_file(filename, regex):
# Set up.
tempfilename = filename + '.tmp'
infile = open(filename, 'r')
outfile = open(tempfilename, 'w')
# Filter the file.
for line in infile:
outfile.write(regex.sub("", line))
# Clean up.
infile.close()
outfile.close()
# Enable only one of the two following lines.
os.rename(filename, filename + '.orig')
#os.remove(filename)
os.rename(tempfilename, filename)
def process_tree(starting_point=os.curdir):
for root, d, files in os.walk(starting_point):
for filename in files:
# Get rid of `.lower()` in the following if case matters.
ext = os.path.splitext(filename)[1].lower()
if ext in tag_REs:
process_file(os.path.join(root, base_filename), tag_REs[ext])
Nice thing about os.splitext is that it does the right thing for filenames that start with a ..

Categories