Running bash scripts within newly created folders based on file names - python

I'm not sure even where to start.
I have a list of output files from a program, lets call them foo. They are numbered outputs like foo_1.out
I'd like to make a directory for each file, move the file to its directory, run a bash script within that directory, take the output from each script, copy it to the root directory as a concatenated single file.
I understand that this is not a forum for "hey, do my work for me", I'm honestly trying to learn. Any suggestions on where to look are sincerely appreciated!
Thanks!

You should probably look up the documentation for the python modules os - specifically os.path and a couple of others - and subprocess which can be found here and here respectively.
Without wanting to do it all for you as you stated - you'll be wanting to do something like:
for f in filelist:
[pth, ext] = os.path.splitext(f)
os.mkdir(pth)
out = subprocess.Popen(SCRIPTNAME, stdout=...)
# and so on...

To get a list of all files in a directory or make folders, check out the os module. Specifically, try os.listdir and os.mkdir
To copy files, you could either manually open each file, copy the contents to a string, and rewrite it to a different file. Alternatively, look at the shutil module
To run bash scripts, use the subprocess library.
All three of those should be a part of python's standard library.

Related

How to delete files in a folder except for a few certain files

It's quite hard to explain personally but basically, I have this Python script that creates thousands of other copies of itself instantly when opened.However i want to add a sort of kill-code that remove all the copies but not the original.#
The main file is called "RSV.py" and it's copies are called "RSV-" and then a random HEX code (with the ".py" extension on the end)
I apologise for not having the code present, this is due to it being saved on a different system to which i'm writing this on.
All help is appreciated, and unless i didn't see it,this is not a duplicate.
rm RSV-*.py should do the trick on Mac or Linux
I'd be remiss if I didn't say that Python is a lame way to handle directory maintenance. That said, who am I to tell a man how to delete his files. Try this...
import glob
for file in glob.glob('/tmp/RSV-*.py'):
os.remove(file)
A-
P.S. Where was the code you tried to write? ...grin.
If you need to stop the python script to create its immediate copies if they're not needed, then there is something wrong in the code itself which is creating them in the first place. However, if you want to just add a line to remove all the excess of python copies of your main script you can add a system call inside the main script to remove all the excess files that are created when the main script finishes its execution.
Here's a simple guide on how to do that. You can replace the used ls -l command with what Nagavamsikrishna mentioned if you are running on Linux/Mac, else just replace rm with del to run on windows.
Calling an external command in Python
This will delete all the copies except for the original:
import glob
import os
for file in glob.glob ('DIRECTORY'):
os.remove(file)

How to input multiple files from a directory

First and foremost, I am recently new to Unix and I have tried to find a solution to my question online, but I could not find a solution.
So I am running Python through my Unix terminal, and I have a program that parses xml files and inputs the results into a .dat file.
My program works, but I have to input every single xml file (which number over 50) individually.
For example:
clamshell: python3 my_parser2.py 'items-0.xml' 'items-1.xml' 'items-2.xml' 'items-3.xml' .....`
So I was wondering if it is possible to read from the directory, which contains all of my files into my program? Rather than typing all the xml file names individually and running the program that way.
Any help on this is greatly appreciated.
import glob
listOffiles = glob.glob('directory/*.xml')
The shell itself can expand wildcards so, if you don't care about the order of the input files, just use:
python3 my_parser2.py items-*.xml
If the numeric order is important (you want 0..9, 10-99 and so on in that order, you may have to adjust the wildcard arguments slightly to guarantee this, such as with:
python3 my_parser2.py items-[0-9].xml items-[1-9][0-9].xml items-[1-9][0-9][0-9].xml
python3 my_parser2.py *.xml should work.
Other than the command line option, you could just use glob from within your script and bypass the need for command arguments:
import glob
filenames = glob.glob("*.xml")
This will return all .xml files (as filenames) in the directory from which you are running the script.
Then, if needed you can simply iterate through all the files with a basic loop:
for file in filenames:
with open(file, 'r') as f:
# do stuff to f.

How to loop through the list of .tar.gz files using linux command in python

Using python 2.7
I have a list of *.tat.gz files on a linux box. Using python, I want to loop through the files and extract those files in a different location, under their respective folders.
For example: if my file name is ~/TargetData/zip/1440198002317590001.tar.gz
then I want to untar and ungzip this file in a different location under its
respective folder name i.e. ~/TargetData/unzip/1440198002317590001.
I have written some code but I am not able to loop through the files. In a command line I am able to untar using $ tar -czf 1440198002317590001.tar.gz 1440198002317590001 command. But I want to be able to loop through the .tar.gz files. The code is mentioned below. Here, I’m not able to loop just the files Or print only the files. Can you please help?
import os
inF = []
inF = str(os.system('ls ~/TargetData/zip/*.tar.gz'))
#print(inF)
if inF is not None:
for files in inF[:-1]:
print files
"""
os.system('tar -czf files /unzip/files[:-7]')
# This is what i am expecting here files = "1440198002317590001.tar.gz" and files[:-7]= "1440198002317590001"
"""
Have you ever worked on this type of use case? Your help is greatly appreciated!! Thank you!
I think you misunderstood the meaning of os.system(), that will do the job, but its return value was not expected by you, it returns 0 for successful done, you can not directly assign its output to a variable. You may consider the module [subprocess], see doc here. However, I DO NOT recommend that way to list files (actually, it returns string instead of list, see doc find the detail by yourself).
The best way I think would be glob module, see doc here. Use glob.glob(pattern), you can put all files match the pattern in a list, then you can loop it easily.
Of course, if you are familiar with os module, you also can use os.listdir(), os.path.join(), or even os.paht.expanduser() to do this. (Unlike glob, it only put filenames without fully path into a list, you need to reconstruct file path).
By the way, for you purpose here, there is no need to declare an empty list first (i.e. inF = [])
For unzip file part, you can do it by os.system, but I also recommend to use subprocess module instead of os.system, you will find the reason in the doc of subprocess.
DO NOT see the following code, ONLY see them after you really can not solve this by yourself.
import os
import glob
inF = glob.glob('~/TargetData/zip/*.tar.gz')
if inF:
for files in inF:
# consider subprocess.call() instead of os.system
unzip_name = files.replace('zip', 'unzip')[:-7]
# get directory name and make sure it exists, otherwise create it
unzip_dir = os.path.dirname(unzip_name)
if not os.path.exists(unzip_dir):
os.mkdir(unzip_dir)
subprocess.call(['tar -xzf', files, '-C', unzip_name])
# os.system('tar -czf files /unzip/files[:-7]')

Running python script with new directories

I have recently begun working on a new computer. All my python files and my data are in the dropbox folder, so having access to the data is not a problem. However, the "user" name on the file has changed. Thus, none of my os.chdir() operations work. Obviously, I can modify all of my scripts using a find and replace, but that won't help if I try using my old computer.
Currently, all the directories called look something like this:
"C:\Users\Old_Username\Dropbox\Path"
and the files I want to access on the new computer look like:
"C:\Users\New_Username\Dropbox\Path"
Is there some sort of try/except I can build into my script so it goes through the various path-name options if the first attempt doesn't work?
Thanks!
Any solution will involve editing your code; so if you are going to edit it anyway - its best to make it generic enough so it works on all platforms.
In the answer to How can I get the Dropbox folder location programmatically in Python? there is a code snippet that you can use if this problem is limited to dropbox.
For a more generic solution, you can use environment variables to figure out the home directory of a user.
On Windows the home directory is location is stored in %UserProfile%, on Linux and OSX it is in $HOME. Luckily Python will take care of all this for you with os.path.expanduser:
import os
home_dir = os.path.expanduser('~')
Using home_dir will ensure that the same path is resolved on all systems.
Thought the file sq.py with these codes(your olds):
C:/Users/Old_Username/Dropbox/Path
for x in range:
#something
def Something():
#something...
C:/Users/Old_Username/Dropbox/Path
Then a new .py file run these codes:
with open("sq.py","r") as f:
for x in f.readlines():
y=x
if re.findall("C:/Users/Old_Username/Dropbox/Path",x) == ['C:/Users/Old_Username/Dropbox/Path']:
x="C:/Users/New_Username/Dropbox/Path"
y=y.replace(y,x)
print (y)
Output is:
C:/Users/New_Username/Dropbox/Path
for x in range:
#something
def Something():
#something...
C:/Users/New_Username/Dropbox/Path
Hope its your solution at least can give you some idea dealing with your problem.
Knowing that eventually I will move or rename my projects or scripts, I always use this code right at the beginning:
import os, inspect
this_dir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
this_script = inspect.stack()[0][1]
this_script_name = this_script.split('/')[-1]
If you call your script not with the full but a relative path, then this_script will also not contain a full path. this_dir however will always be the full path to the directory.

UNIX shell script to call python

I have a python script that runs on three files in the following way
align.py *.wav *.txt *.TextGrid
However, I have a directory full of files that I want to loop through. The original author suggests creating a shell script to loop through the files.
The tricky part about the loop is that I need to match three files at a time with three different extensions for the script to run correctly.
Can anyone help me figure out how to create a shell script to loop through a directory of files, match three of them according to name (with three different extensions) and run the python script on each triplet?
Thanks!
Assuming you're using bash, here is a one-liner:
for f in *.wav; do align.py $f ${f%\.*}.txt ${f%\.*}.TextGrid; done
You could use glob.glob to list only the wav files, then construct the subprocess.Popen call like so:
import glob
import os
import subprocess
for wav_name in glob.glob('*.wav'):
basename,ext = os.path.splitext(wav_name)
txt_name=basename+'.txt'
grid_name=basename+'.TextGrid'
proc=subprocess.Popen(['align.py',wav_name,txt_name,grid_name])
proc.communicate()

Categories