How to read multiple .tar files using python - python

So i'm trying to read a .tar file, it works fine but sometimes the filename is a bit different.
The filename sometimes changes from filename_01.tar to filename_02.tar
I have tried using filename_*.tar but that doesn't seem to work.
I know it's a basic question but I can't figure it out.
My code: (using python 3.7+)
import tarfile
tar = tarfile.open('filename_01.tar')
tar.extractall('locationfolder')
tar.close

* isn't expanded by tar command. You can create a loop with glob.glob on the required pattern. Also, better use with syntax to open the file, so there's no typo when calling tar.close without parentheses, which does nothing.
import tarfile,glob
for f in glob.glob('filename_*.tar'):
with tarfile.open(f) as tar:
tar.extractall('locationfolder')

Related

File not found from Python although file exists

I'm trying to load a simple text file with an array of numbers into Python. A MWE is
import numpy as np
BASE_FOLDER = 'C:\\path\\'
BASE_NAME = 'DATA.txt'
fname = BASE_FOLDER + BASE_NAME
data = np.loadtxt(fname)
However, this gives an error while running:
OSError: C:\path\DATA.txt not found.
I'm using VSCode, so in the debug window the link to the path is clickable. And, of course, if I click it the file opens normally, so this tells me that the path is correct.
Also, if I do print(fname), VSCode also gives me a valid path.
Is there anything I'm missing?
EDIT
As per your (very helpful for future reference) comments, I've changed my code using the os module and raw strings:
BASE_FOLDER = r'C:\path_to_folder'
BASE_NAME = r'filename_DATA.txt'
fname = os.path.join(BASE_FOLDER, BASE_NAME)
Still results in error.
Second EDIT
I've tried again with another file. Very basic path and filename
BASE_FOLDER = r'Z:\Data\Enzo\Waste_Code'
BASE_NAME = r'run3b.txt'
And again, I get the same error.
If I try an alternative approach,
os.chdir(BASE_FOLDER)
a = os.listdir()
then select the right file,
fname = a[1]
I still get the error when trying to import it. Even though I'm retrieving it directly from listdir.
>> os.path.isfile(a[1])
False
Using the module os you can check the existence of the file within python by running
import os
os.path.isfile(fname)
If it returns False, that means that your file doesn't exist in the specified fname. If it returns True, it should be read by np.loadtxt().
Extra: good practice working with files and paths
When working with files it is advisable to use the amazing functionality built in the Base Library, specifically the module os. Where os.path.join() will take care of the joins no matter the operating system you are using.
fname = os.path.join(BASE_FOLDER, BASE_NAME)
In addition it is advisable to use raw strings by adding an r to the beginning of the string. This will be less tedious when writing paths, as it allows you to copy-paste from the navigation bar. It will be something like BASE_FOLDER = r'C:\path'. Note that you don't need to add the latest '\' as os.path.join takes care of it.
You may not have the full permission to read the downloaded file. Use
sudo chmod -R a+rwx file_name.txt
in the command prompt to give yourself permission to read if you are using Ubuntu.
For me the problem was that I was using the Linux home symbol in the link (~/path/file). Replacing it with the absolute path /home/user/etc_path/file worked like charm.

How to input multiple files from a directory

First and foremost, I am recently new to Unix and I have tried to find a solution to my question online, but I could not find a solution.
So I am running Python through my Unix terminal, and I have a program that parses xml files and inputs the results into a .dat file.
My program works, but I have to input every single xml file (which number over 50) individually.
For example:
clamshell: python3 my_parser2.py 'items-0.xml' 'items-1.xml' 'items-2.xml' 'items-3.xml' .....`
So I was wondering if it is possible to read from the directory, which contains all of my files into my program? Rather than typing all the xml file names individually and running the program that way.
Any help on this is greatly appreciated.
import glob
listOffiles = glob.glob('directory/*.xml')
The shell itself can expand wildcards so, if you don't care about the order of the input files, just use:
python3 my_parser2.py items-*.xml
If the numeric order is important (you want 0..9, 10-99 and so on in that order, you may have to adjust the wildcard arguments slightly to guarantee this, such as with:
python3 my_parser2.py items-[0-9].xml items-[1-9][0-9].xml items-[1-9][0-9][0-9].xml
python3 my_parser2.py *.xml should work.
Other than the command line option, you could just use glob from within your script and bypass the need for command arguments:
import glob
filenames = glob.glob("*.xml")
This will return all .xml files (as filenames) in the directory from which you are running the script.
Then, if needed you can simply iterate through all the files with a basic loop:
for file in filenames:
with open(file, 'r') as f:
# do stuff to f.

How to loop through the list of .tar.gz files using linux command in python

Using python 2.7
I have a list of *.tat.gz files on a linux box. Using python, I want to loop through the files and extract those files in a different location, under their respective folders.
For example: if my file name is ~/TargetData/zip/1440198002317590001.tar.gz
then I want to untar and ungzip this file in a different location under its
respective folder name i.e. ~/TargetData/unzip/1440198002317590001.
I have written some code but I am not able to loop through the files. In a command line I am able to untar using $ tar -czf 1440198002317590001.tar.gz 1440198002317590001 command. But I want to be able to loop through the .tar.gz files. The code is mentioned below. Here, I’m not able to loop just the files Or print only the files. Can you please help?
import os
inF = []
inF = str(os.system('ls ~/TargetData/zip/*.tar.gz'))
#print(inF)
if inF is not None:
for files in inF[:-1]:
print files
"""
os.system('tar -czf files /unzip/files[:-7]')
# This is what i am expecting here files = "1440198002317590001.tar.gz" and files[:-7]= "1440198002317590001"
"""
Have you ever worked on this type of use case? Your help is greatly appreciated!! Thank you!
I think you misunderstood the meaning of os.system(), that will do the job, but its return value was not expected by you, it returns 0 for successful done, you can not directly assign its output to a variable. You may consider the module [subprocess], see doc here. However, I DO NOT recommend that way to list files (actually, it returns string instead of list, see doc find the detail by yourself).
The best way I think would be glob module, see doc here. Use glob.glob(pattern), you can put all files match the pattern in a list, then you can loop it easily.
Of course, if you are familiar with os module, you also can use os.listdir(), os.path.join(), or even os.paht.expanduser() to do this. (Unlike glob, it only put filenames without fully path into a list, you need to reconstruct file path).
By the way, for you purpose here, there is no need to declare an empty list first (i.e. inF = [])
For unzip file part, you can do it by os.system, but I also recommend to use subprocess module instead of os.system, you will find the reason in the doc of subprocess.
DO NOT see the following code, ONLY see them after you really can not solve this by yourself.
import os
import glob
inF = glob.glob('~/TargetData/zip/*.tar.gz')
if inF:
for files in inF:
# consider subprocess.call() instead of os.system
unzip_name = files.replace('zip', 'unzip')[:-7]
# get directory name and make sure it exists, otherwise create it
unzip_dir = os.path.dirname(unzip_name)
if not os.path.exists(unzip_dir):
os.mkdir(unzip_dir)
subprocess.call(['tar -xzf', files, '-C', unzip_name])
# os.system('tar -czf files /unzip/files[:-7]')

How to extract a specific war file to a specific folder

I have the python code which will download the .war file and put it in a path which is specified by the variable path.
Now I wish to extract a specific file from that war to a specific folder.
But I got struck up here :
os.system(jar -xvf /*how to give the path varible here*/ js/pay.js)
I'm not sure how to pass on the variable path to os.system command.
I'm very new to python, kindly help me out.
If you really want to use os.system, the shell command line is passed as a string, and you can pass any string you want. So:
os.system('jar -xvf "' + pathvariable + '" js/pay.js)
Or you can use {} or %s formatting, etc.
However, you probably do not want to use os.system.
First, if you want to run other programs, it's almost always better to use the subprocess module. For example:
subprocess.check_call(['jar', '-xvf', pathvariable, 'js/pay.js'])
As you can see, you can pass a list of arguments instead of trying to work out how to put a string together (and deal with escaping and quoting and all that mess). And there are lots of other advantages, mostly described in the documentation itself.
However, you probably don't want to run the war tool at all. As jimhark says, a WAR file is just a special kind of JAR file, which is just a special kind of ZIP file. For creating them, you generally want to use JAR/WAR-specific tools (you need to verify the layout, make sure the manifest is the first entry in the ZIP directory, take care of the package signature, etc.), but for expanding them, any ZIP tool will work. And Python has ZIP support built in. What you want to do is probably as simple as this:
import zipfile
with zipfile.ZipFile(pathvariable, 'r') as zf:
zf.extract('js/pay.js', destinationpathvariable)
IIRC, you can only directly use ZipFile in a with statement in 2.7 and 3.2+, so if you're on, say, 2.6 or 3.1, you have to do it indirectly:
from contextlib import closing
import zipfile
with closing(zipfile.ZipFile(pathvariable, 'r')) as zf:
zf.extract('js/pay.js', destinationpathvariable)
Or, if this is just a quick&dirty script that quits as soon as it's done, you can get away with:
import zipfile
zf = zipfile.ZipFile(pathvariable, 'r')
zf.extract('js/pay.js', destinationpathvariable)
But I try to always use with statements whenever possible, because it's a good habit to have.
Isn't a war file a type of zip file? Python has zipfile support (click link for docs page).
You can use os.environ, it holds all environment variables on it.
It is a dict, so you can just use it like:
pypath = os.environ['PYTHONPATH']
now if you mean it's a common python variable, just use it like:
var1 = 'pause'
os.system('#echo & %s' % var1)

Use python tarfile add without normpath being applied to arcname

I'm using python's tarfile library to create a gzipped tar file.
The tar file needs to have absolute pathnames for reasons that I've got no control over. (I'm aware that this isn't normal practice.)
When I call
tarobject.add("/foo/xx1", "/bar/xx1")
the arcname argument "/bar/xx1" is run through os.path.normpath() and converted to "bar/xx1"
How do I avoid this and end up with "/bar/xx1" as I require?
I've read that I can replace normpath somewhere, but I'm fairly new to Python and I'm not sure how to do this or what the wider implications would be.
edit
After looking at this question I had a closer look at the tarinfo object, and this seems to work:
my_tarinfo = tarobject.gettarinfo("/foo/xx1")
my_tarinfo.name = "/bar/xx1"
tarobject.addfile(my_tarinfo, file("/foo/xx1"))
You really do strange things. You need to copy /usr/lib64/python/tarfile.py to a directory in your PYTHONPATH. Then you can modify this file. But you need to bundle you own tarfile module with your code.

Categories