How to input multiple files from a directory

How to input multiple files from a directory - python

First and foremost, I am recently new to Unix and I have tried to find a solution to my question online, but I could not find a solution.
So I am running Python through my Unix terminal, and I have a program that parses xml files and inputs the results into a .dat file.
My program works, but I have to input every single xml file (which number over 50) individually.
For example:
clamshell: python3 my_parser2.py 'items-0.xml' 'items-1.xml' 'items-2.xml' 'items-3.xml' .....`
So I was wondering if it is possible to read from the directory, which contains all of my files into my program? Rather than typing all the xml file names individually and running the program that way.
Any help on this is greatly appreciated.

import glob
listOffiles = glob.glob('directory/*.xml')

The shell itself can expand wildcards so, if you don't care about the order of the input files, just use:
python3 my_parser2.py items-*.xml
If the numeric order is important (you want 0..9, 10-99 and so on in that order, you may have to adjust the wildcard arguments slightly to guarantee this, such as with:
python3 my_parser2.py items-[0-9].xml items-[1-9][0-9].xml items-[1-9][0-9][0-9].xml

python3 my_parser2.py *.xml should work.

Other than the command line option, you could just use glob from within your script and bypass the need for command arguments:
import glob
filenames = glob.glob("*.xml")
This will return all .xml files (as filenames) in the directory from which you are running the script.
Then, if needed you can simply iterate through all the files with a basic loop:
for file in filenames:
with open(file, 'r') as f:
# do stuff to f.

Related

Iterating over .wav files in subdirectories of parent directory

Cheers everybody,
I need help with something in python 3.6 exactly. So i have structure of data like this:
|main directory
| |subdirectory's(plural)
| | |.wav files
I'm currently working from a directory where main directory is placed so I don't need to specify paths before that. So firstly I wanna iterate over my main directory and find all subdirectorys. Then in each of them I wanna find the .wav files, and when done with processing them I wanna go to next subdirectory and so on until all of them are opened, and all .wav files are processed. Exactly what I wanna do with those .wav files is input them in my program, process them so i can convert them to numpy arrays, and then I convert that numpy array into some other object (working with tensorflow to be exact, and wanna convert to TF object). I wrote about the whole process if anybody has any fast advices on doing that too so why not.
I tried doing it with for loops like:
for subdirectorys in open(data_path, "r"):
for files in subdirectorys:
#doing some processing stuff with the file
The problem is that it always raises error 13, Permission denied showing on that data_path I gave him but when I go to properties there it seems okay and all permissions are fine.
I tried some other ways like with os.open or i replaced for loop with:
with open(data_path, "r") as data:
and it always raises permission denied error.
os.walk works in some way but it's not what I need, and when i tried to modify it id didn't give errors but it also didnt do anything.
Just to say I'm not any pro programmer in python so I may be missing an obvious thing but ehh, I'm here to ask and learn. I also saw a lot of similiar questions but they mainly focus on .txt files and not specificaly in my case so I need to ask it here.
Anyway thanks for help in advance.

Edit: If you want an example for glob (more sane), here it is:
from pathlib import Path
# The pattern "**" means all subdirectories recursively,
# with "*.wav" meaning all files with any name ending in ".wav".
for file in Path(data_path).glob("**/*.wav"):
if not file.is_file(): # Skip directories
continue
with open(file, "w") as f:
# do stuff
For more info see Path.glob() on the documentation. Glob patterns are a useful thing to know.
Previous answer:
Try using either glob or os.walk(). Here is an example for os.walk().
from os import walk, path
# Recursively walk the directory data_path
for root, _, files in walk(data_path):
# files is a list of files in the current root, so iterate them
for file in files:
# Skip the file if it is not *.wav
if not file.endswith(".wav"):
continue
# os.path.join() will create the path for the file
file = path.join(root, files)
# Do what you need with the file
# You can also use block context to open the files like this
with open(file, "w") as f: # "w" means permission to write. If reading, use "r"
# Do stuff
Note that you may be confused about what open() does. It opens a file for reading, writing, and appending. Directories are not files, and therefore cannot be opened.
I suggest that you Google for documentation and do more reading about the functions used. The documentation will help more than I can.
Another good answer explaining in more detail can be seen here.

import glob
import os
main = '/main_wavs'
wavs = [w for w in glob.glob(os.path.join(main, '*/*.wav')) if os.path.isfile(w)]
In terms of permissions on a path A/B/C... A, B and C must all be accessible. For files that means read permission. For directories, it means read and execute permissions (listing contents).

How to loop through the list of .tar.gz files using linux command in python

Using python 2.7
I have a list of *.tat.gz files on a linux box. Using python, I want to loop through the files and extract those files in a different location, under their respective folders.
For example: if my file name is ~/TargetData/zip/1440198002317590001.tar.gz
then I want to untar and ungzip this file in a different location under its
respective folder name i.e. ~/TargetData/unzip/1440198002317590001.
I have written some code but I am not able to loop through the files. In a command line I am able to untar using $ tar -czf 1440198002317590001.tar.gz 1440198002317590001 command. But I want to be able to loop through the .tar.gz files. The code is mentioned below. Here, I’m not able to loop just the files Or print only the files. Can you please help?
import os
inF = []
inF = str(os.system('ls ~/TargetData/zip/*.tar.gz'))
#print(inF)
if inF is not None:
for files in inF[:-1]:
print files
"""
os.system('tar -czf files /unzip/files[:-7]')
# This is what i am expecting here files = "1440198002317590001.tar.gz" and files[:-7]= "1440198002317590001"
"""
Have you ever worked on this type of use case? Your help is greatly appreciated!! Thank you!

I think you misunderstood the meaning of os.system(), that will do the job, but its return value was not expected by you, it returns 0 for successful done, you can not directly assign its output to a variable. You may consider the module [subprocess], see doc here. However, I DO NOT recommend that way to list files (actually, it returns string instead of list, see doc find the detail by yourself).
The best way I think would be glob module, see doc here. Use glob.glob(pattern), you can put all files match the pattern in a list, then you can loop it easily.
Of course, if you are familiar with os module, you also can use os.listdir(), os.path.join(), or even os.paht.expanduser() to do this. (Unlike glob, it only put filenames without fully path into a list, you need to reconstruct file path).
By the way, for you purpose here, there is no need to declare an empty list first (i.e. inF = [])
For unzip file part, you can do it by os.system, but I also recommend to use subprocess module instead of os.system, you will find the reason in the doc of subprocess.
DO NOT see the following code, ONLY see them after you really can not solve this by yourself.
import os
import glob
inF = glob.glob('~/TargetData/zip/*.tar.gz')
if inF:
for files in inF:
# consider subprocess.call() instead of os.system
unzip_name = files.replace('zip', 'unzip')[:-7]
# get directory name and make sure it exists, otherwise create it
unzip_dir = os.path.dirname(unzip_name)
if not os.path.exists(unzip_dir):
os.mkdir(unzip_dir)
subprocess.call(['tar -xzf', files, '-C', unzip_name])
# os.system('tar -czf files /unzip/files[:-7]')

Opening .out files in Python

Am I right in thinking Python cannot open and read from .out files?
My application currently spits out a bunch of .out files that would be read manually for logging purposes, I'm building a Python script to automate this.
When the script gets to the following
for file in os.listdir(DIR_NAME):
if (file.endswith('.out')):
open(file)
The script blows up with the following error "IOError : No such file or directory: 'Filename.out' "
I've a similar function with the above code and works fine, only it reads .err files. Printing out DIR_NAME before the above code also shows the correct directory is being pointed to.

os.listdir() returns only filenames, not full paths. Use os.path.join() to create a full path:
for file in os.listdir(DIR_NAME):
if (file.endswith('.out')):
open(os.path.join(DIR_NAME, file))

As an alternative that I find a bit easier and flexible to use:
import glob,os
for outfile in glob.glob( os.path.join(DIR_NAME, '*.out') ):
open(outfile)
Glob will also accept things like '*/*.out' or '*something*.out'. I also read files of certain types and have found this to be very handy.

Running bash scripts within newly created folders based on file names

I'm not sure even where to start.
I have a list of output files from a program, lets call them foo. They are numbered outputs like foo_1.out
I'd like to make a directory for each file, move the file to its directory, run a bash script within that directory, take the output from each script, copy it to the root directory as a concatenated single file.
I understand that this is not a forum for "hey, do my work for me", I'm honestly trying to learn. Any suggestions on where to look are sincerely appreciated!
Thanks!

You should probably look up the documentation for the python modules os - specifically os.path and a couple of others - and subprocess which can be found here and here respectively.
Without wanting to do it all for you as you stated - you'll be wanting to do something like:
for f in filelist:
[pth, ext] = os.path.splitext(f)
os.mkdir(pth)
out = subprocess.Popen(SCRIPTNAME, stdout=...)
# and so on...

To get a list of all files in a directory or make folders, check out the os module. Specifically, try os.listdir and os.mkdir
To copy files, you could either manually open each file, copy the contents to a string, and rewrite it to a different file. Alternatively, look at the shutil module
To run bash scripts, use the subprocess library.
All three of those should be a part of python's standard library.

UNIX shell script to call python

I have a python script that runs on three files in the following way
align.py *.wav *.txt *.TextGrid
However, I have a directory full of files that I want to loop through. The original author suggests creating a shell script to loop through the files.
The tricky part about the loop is that I need to match three files at a time with three different extensions for the script to run correctly.
Can anyone help me figure out how to create a shell script to loop through a directory of files, match three of them according to name (with three different extensions) and run the python script on each triplet?
Thanks!

Assuming you're using bash, here is a one-liner:
for f in *.wav; do align.py $f ${f%\.*}.txt ${f%\.*}.TextGrid; done

You could use glob.glob to list only the wav files, then construct the subprocess.Popen call like so:
import glob
import os
import subprocess
for wav_name in glob.glob('*.wav'):
basename,ext = os.path.splitext(wav_name)
txt_name=basename+'.txt'
grid_name=basename+'.TextGrid'
proc=subprocess.Popen(['align.py',wav_name,txt_name,grid_name])
proc.communicate()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.