Pipe output from subprocess to file, and then read it back - python

I have a python script that runs a subprocess to get some data and then process it. What I'm trying to achieve is have the data written to a file, and then use the data from the file to do the processing (the reason is that the subprocess is slow, but can change based on the date, time, and parameters I use, and I need to run the script frequently)
I've tried various methods, including opening the file as w+ and trying to seek to the beginning after the write is done, but nothing seems to work - the file is written, but when I try to read back from it (using file.readline()) i get EOF back.
This is what I'm essentially trying to accomplish:
myFile = open(fileName, "w")
p = subprocess.Popen(args, stdout=myFile)
myFile.flush() # force the file to disk
os.fsync(myFile) # ..
myFile.close()
myFile = open(fileName, "r")
while myFile.readline():
pass # do stuff
myFile.close()
But even though the file is correctly written (after the script runs, i can see the contents of the file), readline never returns a valid line. Like I said I also tried using the same file object, and doing seek(0) on it, to no luck. This only worked when opening the file as r+, which fails when the file doesn't already exist.
Any help would be appreciated. Also if there's a cleaner way to do this, i'm open to it :)
PS: I realize I can Popen and stdout to a pipe, read from the pipe and then write line by line the data to the file as I do that, but I'm trying to separate the creation of the data file from the reading.

The subprocess almost certainly isn't finishing before you try to read from the file. In fact, it's likely that the subprocess isn't even writing anything before you try to read from the file. For true separation you're going to have to have the subprocess write to a temporary file then replace the file you read from, so that you either read the previous version or the new version but never get to see the partially-written file from the new version.
You can do this in a number of ways; the easiest would be to change the subprocess, but I don't know if that's an option for you here. Alternatively, you can wrap it in your own separate script to manage the files. You probably don't want to call the subprocess in the script that analyses the output file either; you'll want a cronjob or something to regenerate periodically.

This should work as is provided the subprocess is finishing in time (see James's answer).
If you want to wait for it to finish, add p.wait() after the Popen invocation.
What is your actual while loop, though? while myFile.readline() makes it seem as you're not actually saving the line for anything. Try this:
myFile = open(fileName, "r")
print myFile.readlines()
myFile.close()
Or, if you want to interactively examine the state of your program:
myFile = open(fileName, "r")
import pdb; pdb.set_trace()
myFile.close()
Then you can do things like print myFile.readlines() after it stops.

#James Aylett pointed me to the right path, it appears that my problem was that subprocess.Popen wasn't finished running when I call .flush().
The solution, is to call p.wait() right after the subprocess.Popen call, to allow for the underlying command to finish. After doing that, .flush does the right thing (since all the data is there), and I can proceed to read from the file.
So the above code becomes:
myFile = open(fileName, "w")
p = subprocess.Popen(args, stdout=myFile)
p.wait() # <-- Missing line
myFile.flush() # force the file to disk
os.fsync(myFile) # ..
myFile.close()
myFile = open(fileName, "r")
while myFile.readline():
pass # do stuff
myFile.close()
And then it all works!

Related

Read from pipe, write to file

I want to read from a pipe, inspect each line one at a time and modify or ignore it, and then write it out to a file on disk.
My latest try was:
import os
outfile = open("outputfile",'w')
with os.popen('./ps1 postgres getSoxPrivs.sql') as mypipe:
for line in mypipe:
outfile.write(line)
outfile.close()
This never exits; I have to \q\q to get it to stop. But it does appear to write out the file. However, it adds a bunch of line feeds not in the original data.
I saw other people using subprocess but I never could get their examples to work for my case. Seems like there are multiple ways to get this done but I can't quite find the way to make this work properly for me.

Can I rely on a temporary file to remain unchanged after I close it?

I'm using a temporary file to exchange data between two processes:
I create a temporary file and write some data into it
I start a subprocess that reads and modifies the file
I read the result from the file
For demonstration purposes, here's a piece of code that uses a subprocess to increment a number:
import subprocess
import sys
import tempfile
# create the file and write the data into it
with tempfile.NamedTemporaryFile('w', delete=False) as file_:
file_.write('5') # input: 5
path = file_.name
# start the subprocess
code = r"""with open(r'{path}', 'r+') as f:
num = int(f.read())
f.seek(0)
f.write(str(num + 1))""".format(path=path)
proc = subprocess.Popen([sys.executable, '-c', code])
proc.wait()
# read the result from the file
with open(path) as file_:
print(file_.read()) # output: 6
As you can see above, I've used tempfile.NamedTemporaryFile(delete=False) to create the file, then closed it and later reopened it.
My question is:
Is this reliable, or is it possible that the operating system deletes the temporary file after I close it? Or perhaps it's possible that the file is reused for another process that needs a temporary file? Is there any chance that something will destroy my data?
The documentation does not say. The operating system might automatically
delete the file after some time, depending on any number of things about
how it was set up and what directory is used. If you want persistence,
code for persistence: use a regular file, not a temporary one.

Python not reading from file

I am trying to loop over the lines of a text file which is verifiably non-empty and I am running into problems with my script. In my attempt to debug what I wrote, I figured I would make sure my script is properly reading from the file, so I am currently trying to print every line in it.
At first I tried using the usual way of doing this in Python i.e.:
with open('file.txt') as fo:
for line in fo:
print line
but my script is not printing anything. I then tried storing all of the lines in a list like so:
with open('file.txt') as fo:
flines = fo.readlines()
print flines
and yet my program still outputs an empty list (i.e. []). I have also tried making sure that my file pointer is pointing to the beginning of the file using fo.seek(0) before attempting to read from it, yet that also does not work.
I have spent some time reading solutions to similar questions posted on here, but so far nothing I have tried has worked. I do not know how such an elementary I/O operation is giving me so much trouble, but I must be missing something basic and would thus really appreciate any help/suggestions.
EDIT: Here is the part of my script which is causing the problem:
import subprocess as sbp
with open('conf_15000.xyz','w') as fo:
p1 =sbp.Popen(['head','-n', '300000','nnp-pos-1.xyz'],stdout=sbp.PIPE)
p2 = sbp.Popen(['tail','-n', '198'],stdin=p1.stdout,stdout=fo)
with open('conf_15000.xyz','r') as fp:
fp.seek(0)
flines = fp.readlines()
print flines
And here is an exerpt from the nnp-pos-1.xyz file (all lines have the same format and there are 370642 of them in total):
Ti 32.9136715924 28.5387609200 24.6554922872
O 39.9997000300 35.1489480846 22.8396092714
O 33.7314699265 30.3398473499 23.8866085372
Ti 27.7756767925 31.3455930970 25.9779887743
O 31.1520937719 29.0752315770 25.4786577758
O 26.1870965535 32.4876155555 26.3346205619
Ti 38.4478275543 25.5734609650 22.0654953429
O 24.1328940232 31.3858060129 28.8575469919
O 38.6506317714 27.3779871011 22.6552032123
Ti 40.5617501289 27.5095900385 22.8436684314
O 38.2400600469 29.1828342919 20.7853056680
O 38.8481088254 27.2704154737 26.9590081202
When running the script, the file being read from (conf_15000.xyz) gets written to properly, however I cannot seem to be able to read from it at runtime.
EDIT-2: Following sudonym's recommendation I am using the absolute file path and am checking whether or not the file is empty before reading from it by adding the following unindented lines between the two with statements I wrote in my previous edit:
print os.path.isfile(r'full/path/to/file')
print (os.stat(r'full/path/to/file').st_size != 0)
The first boolean evaluates to True (meaning the file exists) while the second evaluates to False (meaning the file is empty). This is very strange because both of these lines are added after I close the file pointer fo which writes to the file and also because the file being written to (and subsequently read from with fp) is not empty after I execute the script (in fact, it contains all the lines it is supposed to).
EDIT-3: Turns out the reason why my script saw the file it needed to read as empty is because it did not wait for the subprocess (p2 in the example above) that writes to it to stop executing, meaning it would execute the lines after my first with statement before the file pointer was actually closed (i. e. before the file was done being written to). The fix was therefore to add the statement p2.wait() at the end of the first with statement like so:
import subprocess as sbp
with open('conf_15000.xyz','w') as fo:
p1 =sbp.Popen(['head','-n', '300000','nnp-pos-1.xyz'],stdout=sbp.PIPE)
p2 = sbp.Popen(['tail','-n', '198'],stdin=p1.stdout,stdout=fo)
p2.wait()
with open('conf_15000.xyz','r') as fp:
fp.seek(0)
flines = fp.readlines()
print flines
Now everything works the way it is supposed to.
You probably need to flush() the buffers first (and maybe call os.fsync() too) - after writing and before reading.
See file.flush() and this post.
first, include the absolute path.
Second, check if the file actually exists and is not empty:
import os
FILEPATH = r'path\to\file.txt' # full path as raw string
if os.path.isfile(FILEPATH) and (os.stat(FILEPATH).st_size != 0):
with open(FILEPATH) as fo:
flines = fo.readlines()
print flines
else:
print FILEPATH, "doesn't exist or is empty"

Save to Text File from Infinite While Loop

I am currently writing data from an infinite while loop to an SD Card on a raspberry pi.
file = open("file.txt", "w")
while True:
file.write( DATA )
It seems that sometimes file.txt doesn't always save if the program isn't closed through either a command or a keyboard interrupt. Is there a periodic way to save and make sure the data is being saved? I was considering using
open("file.txt", "a")
to append to file and periodically closing the txt file and opening it up again. Would there be a better way to safely store data while running through an infinite while loop?
A file's write() method doesn't necessarily write the data to disk. You have to call the flush() method to ensure this happens...
file = open("file.txt", "w")
while True:
file.write( DATA )
file.flush()
Don't worry about the reference to os.fsync() - the OS will pretend the data has been written to disk even if it actually hasn't.
Use a with statement -- it will make sure that the file automatically closes!
with open("file.txt", "w") as myFile:
myFile.write(DATA)
Essentially, what the with statement will do in this case is this:
try:
myFile = open("file.txt", "w")
do_stuff()
finally:
myFile.close()
assuring you that the file will be closed, and that the information written to the file will be saved.
More information about the with statement can be found here: PEP 343
If you're exiting the program abnormally, then you should expect that sometimes the file won't be closed properly.
Opening and closing the file after each write won't do it, since there's still a chance that you'll interrupt the program while the file is open.
The equivalent of the CTRL-C method of exiting the program is low-level. It's like, "Get out now, there's a fire, save yourself" and the program leaves itself hanging.
If you want a clean close to your file, then put the interrupt statement in your code. That way you can handle the close gracefully.
close the file and write the code again to the file.
and try choosing a+ mode

Reading command Line Args

I am running a script in python like this from the prompt:
python gp.py /home/cdn/test.in..........
Inside the script i need to take the path of the input file test.in and the script should read and print from the file content. This is the code which was working fine. But the file path is hard coded in script. Now I want to call the path as a command line argument.
Working Script
#!/usr/bin/python
import sys
inputfile='home/cdn/test.in'
f = open (inputfile,"r")
data = f.read()
print data
f.close()
Script Not Working
#!/usr/bin/python
import sys
print "\n".join(sys.argv[1:])
data = argv[1:].read()
print data
f.close()
What change do I need to make in this ?
While Brandon's answer is a useful solution, the reason your code is not working also deserves explanation.
In short, a list of strings is not a file object. In your first script, you open a file and operate on that object (which is a file object.). But writing ['foo','bar'].read() does not make any kind of sense -- lists aren't read()able, nor are strings -- 'foo'.read() is clearly nonsense. It would be similar to just writing inputfile.read() in your first script.
To make things explicit, here is an example of getting all of the content from all of the files specified on the commandline. This does not use fileinput, so you can see exactly what actually happens.
# iterate over the filenames passed on the commandline
for filename in sys.argv[1:]:
# open the file, assigning the file-object to the variable 'f'
with open(filename, 'r') as f:
# print the content of this file.
print f.read()
# Done.
Check out the fileinput module: it interprets command line arguments as filenames and hands you the resulting data in a single step!
http://docs.python.org/2/library/fileinput.html
For example:
import fileinput
for line in fileinput.input():
print line
In the script that isn't working for you, you are simply not opening the file before reading it. So change it to
#!/usr/bin/python
import sys
print "\n".join(sys.argv[1:])
f = open(argv[1:], "r")
data = f.read()
print data
f.close()
Also, f.close() this would error out because f has not been defined. The above changes take care of it though.
BTW, you should use at least 3 chars long variable names according to the coding standards.

Categories