1. Introduction
I have a bunch of files in netcdf format.
Each file contain the meteorology condition of somewhere in different period(hourly data).
I need to extract the first 12 h data for each file. So I select to use NCO(netcdf operator) to deal with.
NCO works with terminal environment. With >ncks -d Time 0,11 input.nc output.nc, I can get one datafile called out.ncwhich contain the first 12h data of in.nc.
2. My attempt
I want to keep all the process inside my ipython notebook. But I stuck on two aspects.
How to execute terminal code in python loop
How to transfer the string in python into terminal code.
Here is my fake code for example.
files = os.listdir('.')
for file in files:
filename,extname = os.path.splitext(file)
if extname == '.nc':
output = filename + "_0-12_" + extname
## The code below was my attempt
!ncks -d Time 0,11 file output`
3. Conclusion
Basically, my target was letting the fake code !ncks -d Time 0,11 file output coming true. That means:
execute netcdf operator directly in python loop...
...using filename which is an string in python environment.
Sorry for my unclear question. Any advice would be appreciated!
You can use subprocess.check_output to execute external program:
import glob
import subprocess
for fn in glob.iglob('*.nc'):
filename, extname = os.path.splitext(fn)
output_fn = filename + "_0-12_" + extname
output = subprocess.call(['ncks', '-d', 'Time', '0,11', fn, output_fn])
print(output)
NOTE: updated the code to use glob.iglob; you don't need to check extension manually.
You may also check out pynco which wraps the NCO with subprocess calls, similar to #falsetru's answer. Your application may look something like
nco = Nco()
for fn in glob.iglob('*.nc'):
filename, extname = os.path.splitext(fn)
output_fn = filename + "_0-12_" + extname
nco.ncks(input=filename, output=output_fn, dimension='Time 0,11')
Related
I'm trying to concatenate multiple files in a directory to a single file. So far I've been trying to use cat with subprocess with poor results.
My original code was:
source = ['folder01/*', 'folder02/*']
target = ['/output/File1', '/output/File2']
for f1, f2, in zip(source, target):
subprocess.call(['cat', f1, '>>', f2])
I've tried handing it shell=True:
..., f2], shell=True)
And in conjunction with subprocess.Popen instead of call in a number of permutations, but with no joy.
As I've understood from other similar questions, with shell=True the command will need to be provided as a string. How can I go about calling cat on all items in my list whilst executing as a string?
You don't need subprocess here and you must always avoid subprocess when you can (that means: 99.99% of time).
As Joel pointed out in comments, maybe I should take a few minutes and bullet points to explain you why:
Using subprocess (or similar) assume your code will always run on the exact same environment, that means same OS, version, shell, tools installed, etc.. This is really not fitted for a production grade code.
These kind of libraries will prevent you to make "pythonic Python code", you will have to handle errors by parsing string instead of try / except, etc..
Tim Peters wrote the Zen of Python and I encourage you to follow it, at least 3 points are relevant here: "Beautiful is better than ugly.", "Readability counts." and "Simple is better than complex.".
In other words: subprocess will only make your code less robust, force you to handle non-Python issues, force you to perform tricky computing where you could just write clean and powerful Python code.
There are way more good reasons to not use subprocess, but I think you got the point.
Just open files with open, here is a basic example you will need to adapt:
import os
for filename in os.listdir('./'):
with open(filename, 'r') as fileh:
with open('output.txt', 'a') as outputh:
outputh.write(fileh.read())
Implementation example for your specific needs:
import os
sources = ['/tmp/folder01/', '/tmp/folder02/']
targets = ['/tmp/output/File1', '/tmp/output/File2']
# Loop in `sources`
for index, directory in enumerate(sources):
# Match `sources` file with expected `targets` directory
output_file = targets[index]
# Loop in files within `directory`
for filename in os.listdir(directory):
# Compute absolute path of file
filepath = os.path.join(directory, filename)
# Open source file in read mode
with open(filepath, 'r') as fileh:
# Open output file in append mode
with open(output_file, 'a') as outputh:
# Write content into output
outputh.write(fileh.read())
Be careful, I changed your source and target values (/tmp/)
I have a nonpython program I am running with python using the os.system command, but I put this command inside a function. The program I want to run with os.system is supposed to give me an output file, and I need that output for processing, also I need that output to be actually written in the directory I am sending it to.
I wrote my function is the following general format
def myFunction(infile):
os.system('myProgram '+infile+' '+outfileName)
outfile = numpy.loadtxt(outfileName)
return outfile
However, the output of myProgram (outfileName) isn't being written to my directory and numpy can't therefore load it. Is there a way to store globally outputs of programs I run using os.system when it's inside a function?
Assuming myProgram is working correctly, this is likely happening because myProgram does not know the python path, so the file is simply being written somewhere else. Try using the full paths and see if that works.
Assuming infile and outfileName are relative paths in your current working directory, you could do:
def myFunction(infile):
cmd = 'myProgram ' + os.path.join(os.getcwd(), infile)
cmd += ' ' + os.path.join(os.getcwd(), outfileName))
os.system(cmd)
outfile = numpy.loadtxt(outfileName)
return outfile
So I have been working on integrating the Geospatial Modeling Environment tools (formerly Hawth's) with ArcGIS 10.1 via Python. Below is the code I am using, which works great, to create a text file of code and then call up GME via Python to process the shapefiles I am using. As far as I can tell, I have been able to mimic verbatim what the creator states will work in Python (see his documentation here: http://www.spatialecology.com/gme/images/SpatialEcologyGME.pdf)
Code:
import arcpy, sys, os, subprocess
from arcpy import env
#Supply the following arguments:
#Workspace (full path)
#Catchment Polygons (full path)
#Raster Data (full path)
#Prefix for the output: 6 characters to denote the raster dataset.
#Thematic value: TRUE or FALSE
#An output txt file (full path -> eg. C:/Users/Alison/Desktop/file.txt)
########
#Each argument must be in double quotes, and they must be separated by a space.
#The polygon and raster datasets must be in same coordinate system.
env.workspace = sys.argv[1]
print env.workspace
inputPoly = sys.argv[2]
inputRast = sys.argv[3]
prefix = sys.argv[4]
thematic = sys.argv[5]
code = 'isectpolyrst(in="' + inputPoly + '", raster="' + inputRast + '", prefix="' + prefix + '", thematic="' + thematic +'");'
print code
newFile = sys.argv[6]
print newFile
newFileObj = open(newFile, 'w')
newFileObj.write(code)
newFileObj.close()
print newFile
os.system(r'C:\Program Files (x86)\SpatialEcology\GME\SEGME.exe')
print "subprocess.call(r'C:\Program Files (x86)\SpatialEcology\GME\SEGME.exe -c run(in=\\\"" + newFile + "\\\");');"
subprocess.call(r'C:\Program Files (x86)\SpatialEcology\GME\SEGME.exe -c run(in=\\\"" + newFile + "\\\");');
However, while this process works fine, I just end up hitting another wall...It opens GME, but alas, it doesn't actually do anything. It ultimately doesn't seem to run the created text file. The isectpolyrst tool works like Tabulate Area, so in theory, the values should all be appended to the polygon data, but through Python it doesn't seem to do it....(and I am using GME because Tabulate Area cannot handle the size of my datafiles and crashes both in Arc but also as a Python script).
I am wondering if anyone has successfully been able to run GME through Python for use in what will be an ArcPy script, so that the task can be automated, rather than having to go through GME and then into Arc. My searching suggests that this is a common problem for those trying to automate the process, but for all I know I am just missing a colon somewhere, or some other piece of code.
Thanks for the feedback!
Figured it out!
GME can use a text file to read the desired code, so I wrote the inputs in Python in as they should appear in GME, and wrote these to a text file. These were then read into a subprocess call that brings up GME and runs the text file. Works like a charm.
Took some tinkering, but worth it!
I am currently able to visualize correctly in ParaView a .vtp file for each time step of a simulation, and to print a screenshot for each. I want to do that in batch, but I want to keep the same state for each one (view point, filters applied, etc). I have already saved the state into a .psvm file , and I tried to write a python script which, after being run by pvbatch, will (hopefully) print the screenshots. But, unfortunately, it is not working. I tried to change the filename in the state by processing the state file and doing a search and replace, but still it is not working. For instance, it keeps plotting the first data input only, even if the current file is different (altough GetSources() shows an always increasing list of sources). I use ParaView 3.14.0 in Snow Leopard. I am sure this is easy, but I am overwhelmed with the large amount of info about python and paraview with no reference to this particularissue. Please, please, any advice is greatly welcome, and I am sorry if this has been answered previously (I looked at google, the paraview mailing list, and here). Below is my script, which can also be found at http://pastebin.com/4xiLNrS0 . Furthermore, you can find some example files and state in http://goo.gl/XjPpE .
#!/bin/python
import glob, string, os, commands
from paraview.simple import *
#help(servermanager)
# vtp files are inside the local subdir DISPLAY
files = (commands.getoutput("ls DISPLAY/data-*.vtp | grep -v contacts")).split()
# process each file
for filename in files:
fullfn = commands.getoutput("ls $PWD/" + filename)
fn = filename.replace('DISPLAY/', '')
#os.system("cp ../dem_git/addons/paraview_state.pvsm tmp.pvsm")
os.system("cp ~/Desktop/state.pvsm tmp.pvsm")
os.system("sed -i.bck 's/DATA.vtp/" + fullfn.replace('/','\/') + "/1' tmp.pvsm") # replace first intance with full path
os.system("sed -i.bck 's/DATA.vtp/" + fullfn.replace('/','\/') + "/1' tmp.pvsm") # replace second intance with full path
os.system("sed -i.bck 's/DATA.vtp/" + fn + "/1' tmp.pvsm") # replace third with just the filename path
servermanager.LoadState("tmp.pvsm")
pm = servermanager.ProxyManager()
reader = pm.GetProxy('sources', fullfn)
reader.FileName = fullfn
reader.FileNameChanged()
reader.UpdatePipeline()
view = servermanager.GetRenderView()
SetActiveView(view)
view.StillRender()
WriteImage(filename.replace(".vtp", ".png"))
os.system("rm -f tmp.pvsm")
os.system("rm -f tmp.pvsm.bck")
Delete(reader)
I realise this is an old question, but I had exactly the same problem recently and couldn't find any answers either. All you need to do is add Delete(view) after Delete(reader) for your script to work.
I am using python to read 2 files from my linux os. One contains a single entry/number 'DATE':
20111125
the other file contains many entries, 'TIME':
042844UTC
044601UTC
...
044601UTC
I am able to read the files to assign to proper variables. I would like to then use the variables to create folder paths, move files etc... such as:
$PATH/20111125/042844UTC
$PATH/20111125/044601UTC
$PATH/20111125/044601UTC
and so on.
Somehow this doesn't work with multiple variables passed at once:
import subprocess, sys, os, os.path
DATEFILE = open('/Astronomy/Sorted/2-Scratch/MAPninox-DATE.txt', "r")
TIMEFILE = open('/Astronomy/Sorted/2-Scratch/MAPninox-TIME.txt', "r")
for DATE in DATEFILE:
print DATE,
for TIME in TIMEFILE:
os.popen('mkdir -p /Astronomy/' + DATE + '/' TIME) # this line works for DATE only
os.popen('mkdir -p /Astronomy/20111126/' + TIME) # this line works for TIME only
subprocess.call(['mkdir', '-p', '/Astronomy/', DATE]), #THIS LINE DOESN'T WORK
Thanks!
I would suggest using os.makedirs (which does the same thing as mkdir -p) instead of subprocess or popen:
import sys
import os
DATEFILE = open(os.path.join(r'/Astronomy', 'Sorted', '2-Scratch', 'MAPninox-DATE.txt'), "r")
TIMEFILE = open(os.path.join(r'/Astronomy', 'Sorted', '2-Scratch', 'MAPninox-TIME.txt'), "r")
for DATE in DATEFILE:
print DATE,
for TIME in TIMEFILE:
os.makedirs(os.path.join(r'/Astronomy', DATE, TIME))
astrDir = os.path.join(r'/Astronomy', '20111126', TIME)
try
os.makedirs(astrDir)
except os.error:
print "Dir %s already exists, moving on..." % astrDir
# etc...
Then use shutil for any cp/mv/etc operations.
From the os Docs:
os.makedirs(path[, mode])
Recursive directory creation function. Like mkdir(), but makes all
intermediate-level directories needed to contain the leaf directory.
Raises an error exception if the leaf directory already exists or
cannot be created. The default mode is 0777 (octal). On some systems,
mode is ignored. Where it is used, the current umask value is first
masked out.
I see a couple of errors in your code.
os.popen('mkdir -p /Astronomy/' + DATE + '/' TIME) # this line works for DATE only
This is a syntax error. I think you meant to have '/' + TIME, not '/' TIME. I'm not sure what you mean by "this line works for DATE only"?
subprocess.call(['mkdir', '-p', '/Astronomy/', DATE]), #THIS LINE DOESN'T WORK
What command do you expect to call? I'm guessing from the rest of your code that you're trying to execute mkdir -p /Astronomy/<<DATE>>. That isn't what you've coded though. Each item in the list you pass to subprocess.call is a separate argument, so what you've written comes out as mkdir -p /Astronomy <<DATE>>. This will attempt to create two directories, a root-level directory /Astronomy, and another one in the current working directory named whatever DATE is.
If I'm correct about what you wanted to do, the corrected line would be:
subprocess.call(['mkdir', '-p', '/Astronomy/' + DATE])
chown's answer using os.makedirs (and using os.path.join to splice paths, rather than string +) is a better general approach, in my opinion. But this is why your current code isn't working, as far as I can tell.