I have written a function for data processing and need to apply it to a large amount of files in a directory.
The function works when applied to individual files.
def getfwhm(x):
import numpy as np
st=np.std(x[:,7])
fwhm=2*np.sqrt(2*np.log(2))*st
file=open('all_fwhm2.txt', 'at')
file.write("fwhm = %.6f\n" % (fwhm))
file.close()
file=open('all_fwhm2.txt', 'rt')
print file.read()
file.close()
I now want to use this in a larger scale. So far I have written this code
import os
import fwhmfunction
files=os.listdir(".")
fwhmfunction.getfwhm(files)
But I get the following error
File "fwhmfunction.py", line 11, in getfwhm
st=np.std(x[:,7])
TypeError: list indices must be integers, not tuple
I am writing in python using spyder.
Thanks for your help!
In the spirit of the unix You should separate the program in two:
the program which acts on the given file
the program which applies a given script to a given list of files (glob, or whatever)
So here's a sample of 1:
# name: script.py
import sys
File = sys.argv[1]
# do something here
print File
(it is better to use argparse to parse the args, but we used argv to keep it simple)
As to the second part, there is an excellent unix tool already:
$ find . -maxdepth 1 -mindepth 1 -name "*.txt" | parallel python2.7 script.py {}
Here you get an extra bonus: parallel task execution.
If You on windows, then You can write something simple (sequentional) in python:
# name: apply_script_to_glob.py
import sys, os
from glob import glob
Script = sys.argv[1]
Glob = sys.argv[2]
Files = glob(Glob)
for File in Files:
os.system("python2.7 " + Script + " " + File)
(again we didn't use argparse nor checked anything to keep it simple). You'd call the script with
$ python2.7 apply_script_to_glob.py "script.py" "*.txt"
Related
This is my Python script:
import json
import csv
import glob
import os
import shlex
import subprocess
os.chdir('C:/ck-master/target')
path='C:/Users/AQ42770/Desktop/congress-android'
for n in range(0,100):
path1= path+"/"+str(n)
cmd='java -jar ck-0.3.2-SNAPSHOT-jar-with-dependencies.jar "%s"'
cmd = cmd % (path)
args = shlex.split(cmd)
p = subprocess.Popen(args)
It returns for me 4 CSV files in directory C:\ck-master\target.
The problem is that each time it crushes the 4 CSV files and returns the last result. My expectation is that for each iteration it should move files to another folder and it should name them from 0 to 99.
I will answer briefly, cause the question is duplicate.
Task is done with python or cmd directly.
1) Find files in the directory. For example using os.listdir() or glob.glob() methods
2) Copy them using shututil How do I copy a file in Python? or write copy urself. It's simple - jost open the file and write it to new destination with a new name.
or move - How to move a file in Python
3) Success
1. Introduction
I have a bunch of files in netcdf format.
Each file contain the meteorology condition of somewhere in different period(hourly data).
I need to extract the first 12 h data for each file. So I select to use NCO(netcdf operator) to deal with.
NCO works with terminal environment. With >ncks -d Time 0,11 input.nc output.nc, I can get one datafile called out.ncwhich contain the first 12h data of in.nc.
2. My attempt
I want to keep all the process inside my ipython notebook. But I stuck on two aspects.
How to execute terminal code in python loop
How to transfer the string in python into terminal code.
Here is my fake code for example.
files = os.listdir('.')
for file in files:
filename,extname = os.path.splitext(file)
if extname == '.nc':
output = filename + "_0-12_" + extname
## The code below was my attempt
!ncks -d Time 0,11 file output`
3. Conclusion
Basically, my target was letting the fake code !ncks -d Time 0,11 file output coming true. That means:
execute netcdf operator directly in python loop...
...using filename which is an string in python environment.
Sorry for my unclear question. Any advice would be appreciated!
You can use subprocess.check_output to execute external program:
import glob
import subprocess
for fn in glob.iglob('*.nc'):
filename, extname = os.path.splitext(fn)
output_fn = filename + "_0-12_" + extname
output = subprocess.call(['ncks', '-d', 'Time', '0,11', fn, output_fn])
print(output)
NOTE: updated the code to use glob.iglob; you don't need to check extension manually.
You may also check out pynco which wraps the NCO with subprocess calls, similar to #falsetru's answer. Your application may look something like
nco = Nco()
for fn in glob.iglob('*.nc'):
filename, extname = os.path.splitext(fn)
output_fn = filename + "_0-12_" + extname
nco.ncks(input=filename, output=output_fn, dimension='Time 0,11')
I'm trying to practice with python script by writing a simple script that would take a large series of files named A_B and write them to the location B\A. The way I was passing the arguments into the file was
python script.py *
and my program looks like
from sys import argv
import os
import ntpath
import shutil
script, filename = argv
target = open(filename)
outfilename = target.name.split('_')
outpath=outfilename[1]
outpath+="/"
outpath+=outfilename[0]
if not os.path.exists(outfilename[1]):
os.makedirs(outfilename[1])
shutil.copyfile(target.name, outpath)
target.close()
The problem with this is that this script the way it's currently written is set up to only accept 1 file at a time. Originally I was hoping the wildcard would pass one file at a time to the script then execute the script each time.
My question covers both cases:
How could I instead pass the wildcard files one at a time to a script.
and
How do I modify this script to instead accept all the arguments? (I can handle list-ifying everything but argv is what I'm having problems with and im a bit unsure about how to create a list of files)
You have two options, both of which involve a loop.
To pass the files one by one, use a shell loop:
for file in *; do python script.py "$file"; done
This will invoke your script once for every file matching the glob *.
To process multiple files in your script, use a loop there instead:
from sys import argv
for filename in argv[1:]:
# rest of script
Then call your script from bash like python script.py * to pass all the files as arguments. argv[1:] is an array slice, which returns a list containing the elements from argv starting from position 1 to the end of the array.
I would suggest the latter approach as it means that you are only invoking one instance of your script.
I have a shell script that does
find /tmp/test/* -name "*.json" -exec python /python/path {} \;
it looks for all the JSON files in specific directories and executes the OTHER python script that I have..
How can I do this python scripting?
I'm not sure if I understood your question, if you are trying to execute a shell command from a python script, you can use os.system() :
import os
os.system('ls -l')
complete documentation
import glob,subprocess
for json_file in glob.glob("/home/tmp/*.json"):
subprocess.Popen(["python","/path/to/my.py",json_file],env=os.environ).communicate()
If you want to use Python instead of find, start with os.walk (official doc) to get the files. Once you have them (or as you them), act on them however you like.
From that page:
import os
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
# act on the file
I guess you want to adjust your other python file /python/path/script.py, such that you only need this one file:
#!/usr/bin/env python
import sys
import glob
import os
#
# parse the command line arguemnts
#
for i, n in enumerate(sys.argv):
# Debug output
print "arg nr %02i: %s" % (i, n)
# Store the args in variables
# (0 is the filename of the script itself)
if i==1:
path = sys.argv[1] # the 1st arg is the path like "/tmp/test"
if i==2:
pattern = sys.argv[2] # a pattern to match for, like "'*.json'"
#
# merge path and pattern
# os.path makes it win / linux compatible ( / vs \ ...) and other stuff
#
fqps = os.path.join(path, pattern)
#
# Do something with your files
#
for filename in glob.glob(fqps):
print filename
# Do your stuff here with one file
with open(filename, 'r') as f: # 'r'= only ready from file ('w' = write)
lines = f.readlines()
# at this point, the file is closed again!
for line in lines:
print line
# and so on ...
Then you can use the one script like this
/python/path/script.py /tmp/test/ '*.json'. (Without needing to write python in front, thanks to the very first line, called shebang. But you need to make it executable once, using chmod +x /python/path/script.py)
Of course you can omit the 2nd arg and assign a default value to pattern, or only use one arg in the first place. I did it this way to demonstrate os.path.join() and quoting of arguments that should not be extended by bash (compare the effect of using '*.json' and *.json on the printed list of arguments at the start)
Here are some information about better / more sophisticated ways of handling command line arguments.
And, as a bonus, using a main() function is advisable as well, to keep overview if your script gets larger or is being used by other python scripts.
What you need is
find /tmp/test/* -name "*.json" -exec sh -c "python /python/path {}" \;
I've a python code performs some operator on some netCDF files. It has names of netCDF files as a list. I want to calculate ensemble average of these netCDF files using netCDF operator ncea (the netCDF ensemble average). However to call NCO, I need to pass all list elements as arguments as follows:
filelist = [file1.ncf file2.ncf file3.ncf ........ file50.ncf]
ncea file1.ncf file2.ncf ......file49.ncf file50.ncf output.cdf
Any idea how this can be achieved.
ANy help is greatly appreciated.
import subprocess
import shlex
args = 'ncea file1.ncf file2.ncf ......file49.ncf file50.ncf output.cdf'
args = shlex.split(args)
p = subprocess.Popen(args,stdout=subprocess.PIPE)
print p.stdout # Print stdout if you need.
I usually do the following:
Build a string containing the ncea command, then use the os module to execute the command inside a python script
import os
out_file = './output.nc'
ncea_str = 'ncea '
for file in filelist:
ncea_str += file+' '
os.system(ncea_str+'-O '+out_file)
EDIT:
import subprocess
outfile = './output.nc'
ncea_str = '{0} {1} -O {2}'.format('ncea', ' '.join(filelist), out_file)
subprocess.call(ncea_str, shell=True)