Use python tarfile add without normpath being applied to arcname

Use python tarfile add without normpath being applied to arcname - python

I'm using python's tarfile library to create a gzipped tar file.
The tar file needs to have absolute pathnames for reasons that I've got no control over. (I'm aware that this isn't normal practice.)
When I call
tarobject.add("/foo/xx1", "/bar/xx1")
the arcname argument "/bar/xx1" is run through os.path.normpath() and converted to "bar/xx1"
How do I avoid this and end up with "/bar/xx1" as I require?
I've read that I can replace normpath somewhere, but I'm fairly new to Python and I'm not sure how to do this or what the wider implications would be.
edit
After looking at this question I had a closer look at the tarinfo object, and this seems to work:
my_tarinfo = tarobject.gettarinfo("/foo/xx1")
my_tarinfo.name = "/bar/xx1"
tarobject.addfile(my_tarinfo, file("/foo/xx1"))

You really do strange things. You need to copy /usr/lib64/python/tarfile.py to a directory in your PYTHONPATH. Then you can modify this file. But you need to bundle you own tarfile module with your code.

Related

How to unzip .usdz file in python?

Is there a way to unzip a .usdz file in python? I was looking at the shutil.unpack_archive, but it looks like I can't use that without an existing function to unpack it. They use zip compression, just have a different file extension. Would just renaming them to have .zip extensions work? Is there a way to "tell" shutil that these are basically .zip files, something else I can use?
Running the Linux unzip command can unpack them, but due to my relative unfamiliarity with shell scripting and the file manipulation I'll need to do, I'd prefer to use python.

You can do this a couple ways.
Use shutil.unpack_archive with the format="zip" argument, e.g.
import shutil
archive_path = "/path/to/archive.usdz"
shutil.unpack_archive(archive_path, format="zip")
# note you can also pass extract_dir keyword argument to
# set where the files are extracted to
You can also directly use the zipfile module:
import zipfile
archive_path = "/path/to/archive.usdz"
zf = zipfile.ZipFile(archive_path)
zf.extractall()
# note that this extracts to the working directory unless you specify the path argument

Save a CSV in same directory as python file, using 'to_csv' and 'os.path'?

I want this line to save the csv in my current directory alongside my python file:
df.to_csv(./"test.csv")
My python file is in "C:\Users\Micheal\Desktop\VisualStudioCodes\Q1"
Unfortunately it saves it in "C:\Users\Micheal" instead.
I have tried import os path to use os.curdir but i get nothing but errors with that.
Is there even a way to save the csv alongside the python file using os.curdir?
Or is there a simpler way to just do this in python without importing anything?

import os
directory_of_python_script = os.path.dirname(os.path.abspath(__file__))
df.to_csv(os.path.join(directory_of_python_script, "test.csv"))
And if you want to read same .csv file later,
pandas.read_csv(os.path.join(directory_of_python_script, "test.csv"))
Here, __file__ gives the relative location(path) of the python script being runned. We get the absolute path by os.path.abspath() and then convert it to the name of the parent directory.
os.path.join() joins two paths together considering the operating system defaults for path seperators, '\' for Windows and '/' for Linux, for example.
This kind of an approach should work, I haven't tried, if does not work, let me know.

How to read multiple .tar files using python

So i'm trying to read a .tar file, it works fine but sometimes the filename is a bit different.
The filename sometimes changes from filename_01.tar to filename_02.tar
I have tried using filename_*.tar but that doesn't seem to work.
I know it's a basic question but I can't figure it out.
My code: (using python 3.7+)
import tarfile
tar = tarfile.open('filename_01.tar')
tar.extractall('locationfolder')
tar.close

* isn't expanded by tar command. You can create a loop with glob.glob on the required pattern. Also, better use with syntax to open the file, so there's no typo when calling tar.close without parentheses, which does nothing.
import tarfile,glob
for f in glob.glob('filename_*.tar'):
with tarfile.open(f) as tar:
tar.extractall('locationfolder')

Primer needed in python pathnames

I am a very novice coder, and Python is my first (and, practically speaking, only) language. I am charged as part of a research job with manipulating a collection of data analysis scripts, first by getting them to run on my computer. I was able to do this, essentially by removing all lines of coding identifying paths, and running the scripts through a Jupyter terminal opened in the directory where the relevant modules and CSV files live so the script knows where to look (I know that Python defaults to the location of the terminal).
Here are the particular blocks of code whose function I don't understand
import sys
sys.path.append('C:\Users\Ben\Documents\TRACMIP_Project\mymodules/')
import altdata as altdata
I have replaced the pathname in the original code with the path name leading to the directory where the module is; the file containing all the CSV files that end up being referenced here is also in mymodules.
This works depending on where I open the terminal, but the only way I can get it to work consistently is by opening the terminal in mymodules, which is fine for now but won't work when I need to work by accessing the server remotely. I need to understand better precisely what is being done here, and how it relates to the location of the terminal (all the documentation I've found is overly technical for my knowledge level).
Here is another segment I don't understand
import os.path
csvfile = 'csv/' + model +'_' + exp + '.csv'
if os.path.isfile(csvfile): # csv file exists
hcsvfile = open(csvfile )
I get here that it's looking for the CSV file, but I'm not sure how. I'm also not sure why then on some occasions depending on where I open the terminal it's able to find the module but not the CSV files.
I would love an explanation of what I've presented, but more generally I would like information (or a link to information) explaining paths and how they work in scripts in modules, as well as what are ways of manipulating them. Thanks.

sys.path
This is simple list of directories where python will look for modules and packages (.py and dirs with __init__.py file, look at modules tutorial). Extending this list will allow you to load modules (custom libs, etc.) from non default locations (usually you need to change it in runtime, for static dirs you can modify startup script to add needed enviroment variables).
os.path
This module implements some useful functions on pathnames.
... and allows you to find out if file exists, is it link, dir, etc.
Why you failed loading *.csv?
Because sys.path responsible for module loading and only for this. When you use relative path:
csvfile = 'csv/' + model +'_' + exp + '.csv'
open() will look in current working directory
file is either a string or bytes object giving the pathname (absolute or relative to the current working directory)...
You need to use absolute paths by constucting them with os.path module.

I agree with cdarke's comment that you are probably running into an issue with backslashes. Replacing the line with:
sys.path.append(r'C:\Users\Ben\Documents\TRACMIP_Project\mymodules')
will likely solve your problem. Details below.
In general, Python treats paths as if they're relative to the current directory (where your terminal is running). When you feed it an absolute path-- which is a path that includes the root directory, like the C:\ in C:\Users\Ben\Documents\TRACMIP_Project\mymodules-- then Python doesn't care about the working directory anymore, it just looks where you tell it to look.
Backslashes are used to make special characters within strings, such as line breaks (\n) and tabs (\t). The snag you've hit is that Python paths are strings first, paths second. So the \U, \B, \D, \T and \m in your path are getting misinterpreted as special characters and messing up Python's path interpretation. If you prefix the string with 'r', Python will ignore the special characters meaning of the backslash and just interpret it as a literal backslash (what you want).
The reason it still works if you run the script from the mymodules directory is because Python automatically looks in the working directory for files when asked. sys.path.append(path) is telling the computer to include that directory when it looks for commands, so that you can use files in that directory no matter where you're running the script. The faulty path will still get added, but its meaningless. There is no directory where you point it, so there's nothing to find there.
As for path manipulation in general, the "safest" way is to use the function in os.path, which are platform-independent and will give the correct output whether you're working in a Windows or a Unix environment (usually).
EDIT: Forgot to cover the second part. Since Python paths are strings, you can build them using string operations. That's what is happening with the line
csvfile = 'csv/' + model +'_' + exp + '.csv'
Presumably model and exp are strings that appear in the filenames in the csv/ folder. With model = "foo" and exp = "bar", you'd get csv/foo_bar.csv which is a relative path to a file (that is, relative to your working directory). The code makes sure a file actually exists at that path and then opens it. Assuming the csv/ folder is in the same path as you added in sys.path.append, this path should work regardless of where you run the file, but I'm not 100% certain on that. EDIT: outoftime pointed out that sys.path.append only works for modules, not opening files, so you'll need to either expand csv/ into an absolute path or always run in its parent directory.
Also, I think Python is smart enough to not care about the direction of slashes in paths, but you should probably not mix them. All backslashes or all forward slashes only. os.path.join will normalize them for you. I'd probably change the line to
csvfile = os.path.join('csv\', model + '_' + exp + '.csv')
for consistency's sake.

How to extract a specific war file to a specific folder

I have the python code which will download the .war file and put it in a path which is specified by the variable path.
Now I wish to extract a specific file from that war to a specific folder.
But I got struck up here :
os.system(jar -xvf /*how to give the path varible here*/ js/pay.js)
I'm not sure how to pass on the variable path to os.system command.
I'm very new to python, kindly help me out.

If you really want to use os.system, the shell command line is passed as a string, and you can pass any string you want. So:
os.system('jar -xvf "' + pathvariable + '" js/pay.js)
Or you can use {} or %s formatting, etc.
However, you probably do not want to use os.system.
First, if you want to run other programs, it's almost always better to use the subprocess module. For example:
subprocess.check_call(['jar', '-xvf', pathvariable, 'js/pay.js'])
As you can see, you can pass a list of arguments instead of trying to work out how to put a string together (and deal with escaping and quoting and all that mess). And there are lots of other advantages, mostly described in the documentation itself.
However, you probably don't want to run the war tool at all. As jimhark says, a WAR file is just a special kind of JAR file, which is just a special kind of ZIP file. For creating them, you generally want to use JAR/WAR-specific tools (you need to verify the layout, make sure the manifest is the first entry in the ZIP directory, take care of the package signature, etc.), but for expanding them, any ZIP tool will work. And Python has ZIP support built in. What you want to do is probably as simple as this:
import zipfile
with zipfile.ZipFile(pathvariable, 'r') as zf:
zf.extract('js/pay.js', destinationpathvariable)
IIRC, you can only directly use ZipFile in a with statement in 2.7 and 3.2+, so if you're on, say, 2.6 or 3.1, you have to do it indirectly:
from contextlib import closing
import zipfile
with closing(zipfile.ZipFile(pathvariable, 'r')) as zf:
zf.extract('js/pay.js', destinationpathvariable)
Or, if this is just a quick&dirty script that quits as soon as it's done, you can get away with:
import zipfile
zf = zipfile.ZipFile(pathvariable, 'r')
zf.extract('js/pay.js', destinationpathvariable)
But I try to always use with statements whenever possible, because it's a good habit to have.

Isn't a war file a type of zip file? Python has zipfile support (click link for docs page).

You can use os.environ, it holds all environment variables on it.
It is a dict, so you can just use it like:
pypath = os.environ['PYTHONPATH']
now if you mean it's a common python variable, just use it like:
var1 = 'pause'
os.system('#echo & %s' % var1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.