I have a weird problem to read from STDIN in a python script.
Here is my use case. I have rsyslog configured with an output module so rsyslog can pipe log messages to my Python script.
My Python script is really trivial :
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import sys
fd = open('/tmp/testrsyslogomoutput.txt', 'a')
fd.write("Receiving log message : \n%s\n" % ('-'.join(sys.stdin.readlines())))
fd.close()
If I run echo "foo" | mypythonscript.py I can get "foo" in the target file /tmp/testrsyslogomoutput.txt. However when I run it within rsyslog, messages seems to be sent only when I stop/restart rsyslog (I believe some buffer is flushed at some point).
I first thought it was a problem with Rsyslog. So I replaced my python program with a shell one, without changing anything to the rsyslog configuration. The shell script works perfectly with rsyslog and as you can see in the code below, the script is really trivial:
#! /bin/sh
cat /dev/stdin >> /tmp/testrsyslogomoutput.txt
Since my shell script works but my Python one does not, I believe I made a mistake somewhere in my Python code but I can not find where. If you could point me to my mistake(s) that would be great.
Thanks in advance :)
readlines will not return until it has finished reading the file. Since the pipe feeding stdin never finishes, readlines never finishes either. Stopping rsyslog closes the pipe and lets it finish.
I'd also suspect the reason is that rsyslog does not terminate. readlines() should not return until it reaches a real EOF. But why would the shell script act differently? Perhaps the use of /dev/stdin is the reason. Try this version and see if it still runs without hanging:
#!/bin/sh
cat >> /tmp/testrsyslogomoutput.txt
If this makes a difference, you'll also have a fix: open and read /dev/stdin from python, instead of sys.stdin.
Edit: So cat somehow reads whatever is waiting at stdin and returns, but python blocks and waits until stdin is exhausted. Strange. You can also try replacing readlines() with a single read() followed by split("\n"), but at this point I doubt that will help.
So, forget the diagnosis and let's try a work-around: Force stdin to do non-blocking i/o. The following is supposed to do that:
import fcntl, os
# Add O_NONBLOCK to the stdin descriptor flags
flags = fcntl.fcntl(0, fcntl.F_GETFL)
fcntl.fcntl(0, fcntl.F_SETFL, fl | os.O_NONBLOCK)
message = sys.stdin.read().split("\n") # Read what's waiting, in one go
fd = open('/tmp/testrsyslogomoutput.txt', 'a')
fd.write("Receiving log message : \n%s\n" % ('-'.join(message)))
fd.close()
You probably want to use that in combination with python -u. Hope it works!
If you use readline() instead, it will return on \n, though this will only write one line then quit.
If you want to keep writing lines as long they are there, you can use a simple for:
for line in fd:
fd.write("Receiving log message : \n%s\n" % (line)
fd.close()
Related
I have a script that uses a really simple file based IPC to communicate with another program. I write a tmp file with the new content and mv it onto the IPC file to keep stuff atomar (the other program listens of rename events).
But now comes the catch: This works like 2 or 3 times but then the exchange is stuck.
time.sleep(10)
# check lsof => target file not opened
subprocess.run(
"mv /tmp/tempfile /tmp/target",
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
shell=True,
)
# check lsof => target file STILL open
time.sleep(10)
/tmp/tempfile will get prepared for every write
The first run results in:
$ lsof /tmp/target
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 1714 <user> 3u REG 0,18 302 10058 /tmp/target
which leave it open until I terminate the main python program. Consecutive runs change the content as expected, the inode and file descriptor but its still open what I would not expect from a mv.
The file is finally gets closed when the python program featuring these lines above is getting closed.
EDIT:
Found the bug: mishandeling the tempfile.mkstemp(). See: https://docs.python.org/3/library/tempfile.html#tempfile.mkstemp
I created the tempfile like so:
_fd, temp_file_path = tempfile.mkstemp()
where I discarded the filedescriptor _fd which was open by default. I did not close it and so it was left open even after the move. This resulted in an open target and since I was just lsofing on the target, I did not see that the tempfile was already opened. This would be the corrected version:
fd, temp_file_path = tempfile.mkstemp()
fd.write(content)
fd.close()
# ... mv/rename via shell execution/shutil/pathlib
Thank you all very much for your help and your suggestions!
I wasn't able reproduce this behavior. I created a file /tmp/tempfile and ran a python script with the subprocess.run call you give followed by a long sleep. /tmp/target was not in use, nor did I see any unexpected open files in lsof -p <pid>.
(edit) I'm not surprised at this, because there's no way that your subprocess command is opening the file: mv does not open its arguments (you can check this with ltrace) and subprocess.run does not parse its argument or do anything with it besides pass it along to be exec-ed.
However, when I added some lines to open a file and write to it and then move that file, I see the same behavior you describe. This is the code:
import subprocess
out=open('/tmp/tempfile', 'w')
out.write('hello')
subprocess.run(
"mv /tmp/tempfile /tmp/target",
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
shell=True,
)
import time
time.sleep(5000)
In this case, the file is still open because it was never closed, and even though it's been renamed the original file handle still exists. My bet would be that you have something similar in your code that's creating this file and leaving open a handle to it.
Is there any reason why you don't use shutil.move? Otherwise it may be necassary to wait for the mv command to finish moving and then kill it, read stdin, run something like
p = subprocess.run(...)
# wait to finish moving/read from stdin
p.terminate()
Of course terminate would be a bit harsh.
Edit: depending on your use rsync, which is not part of python, may be a elegant solution to keep your data synced over the network without writing a single line of code
you say it is still open by "mv" but you lsof result shown open by python. As it is an sub process see if the pid is the same of the python process maybe it is another python process.
I'm using python 3 on Komodo, and I want for there to be a time delay between the execution of commands. However, using the code below, all of the print commands are launched at the same time, but it does show that the time after all the commands are executed is two seconds greater than the time before the commands are executed. Is there a way for the first line to be printed, wait a second, second line be printed, wait a second, and have third and fourth lines be print?
import time
from time import sleep
t=time.asctime(time.localtime(time.time()));
print(t)
time.sleep(1)
print('Good Night')
time.sleep(1)
print('I"m back')
t=time.asctime(time.localtime(time.time()));
print(t)
By default, print prints to sys.stdout, which is line-buffered when writing to an interactive terminal,1 but block-buffered when writing to a file.
So, when you run your code with python myscript.py from your Terminal or Command Prompt, you will see each line appear as it's printed, as desired.
But if you run it with, say, python myscript.py >outfile, nothing will get written until the buffer fills up (or until the script exits, if that never happens). Normally, that's fine. But apparently, however you're running your script in Komodo, it looks like a regular file, not an interactive terminal, to Python.
It's possible that you can fix that just by using or configuring Komodo differently.
I don't know much about Komodo, but I do see that there's an addon for embedding a terminal; maybe if you use that instead of sending output to the builtin JavaScript (?) console, things will work better, but I really have no idea.
Alternatively, you can make sure that the output buffer is flushed after each line by doing it manually, e.g., by passing the flush argument to print:
print(t, flush=True)
If you really want to, you can even replace print in your module with a function that always does this:
import builtins
import functools
print = functools.partial(builtins.print, flush=True)
… but you probably don't want to do that.
Alternatively, you can replace sys.stdout with a line-buffered file object over the raw stdout, just by calling open on its underlying raw file or file descriptor:
sys.stdout = open(sys.stdout.fileno(), buffering=1)
If you search around Stack Overflow or the web, you'll find a lot of suggestions to disable buffering. And you can force Python to use unbuffered output with the -u flag or the PYTHONUNBUFFERED environment variable. But that may not do any good in Python 3.2
1. As sys.stdout explains, it's just a regular text file, like those returned by open. As explained in open, this distinction is made by calling isatty.
2. Python 2's stdout is just a thin wrapper around the C stdio object, so if you open it unbuffered, there's no buffering. Python 3's stdout is a hefty wrapper around the raw file descriptor that does its own buffering and decoding (see the io docs for details), so -u will make sys.stdout.buffer.raw unbuffered, but sys.stdout itself will still be buffered, as explained in the -u docs.
I've got a problem: I have one program running in a shell that does some calculations based on user input, and I can launch this program in an interactive way so it will keep asking for input and it outputs its calculations after user press enter. So it remains open inside the shell till user types the exit-word.
What I want to do is to create an interface in such a way that user has to type input somewhere else off the shell and using pipe, fifo and so on, input is carried away to that program, and its output goes to this interface.
In a few word: I have a long running process and I need to attach, when needed, my interface to its stdin and stdout.
For this kind of problem, I was thinking to use a FIFO file made by mkfifo command (we are in Unix, especially for Mac user) and redirect program stdin and stdout to this file:
my_program < fifofile > fifofile
But I've found some difficulties about reading and writing to this fifo file. So I decided to use 2 fifo files, one for input and one for output. So:
exec my_program < fifofile_in > fifofile_out
(don't know why I use exec for redirection, but it works... and I'm okay with exec ;) )
If I launch this command in a shell, and in another one I wrote:
echo -n "date()" > fifofile_in
The echoing process is succesful, and if I do:
cat fifofile_out
I'm able to see my_program output. Ok! But I don't want to deal with shell, instead I want to deal with a program written by me, like this python script:
import os, time;
text="";
OUT=os.open("sample_out",os.O_RDONLY | os.O_NONBLOCK)
out=os.fdopen(OUT);
while(1):
#IN=open("sample_in",'w' );
IN=os.open("sample_in",os.O_WRONLY)
#OUT=os.fdopen(os.open("sample_out",os.O_RDONLY| os.O_NONBLOCK |os.O_APPEND));
#OUT=open("sample_out","r");
print "Write your mess:";
text=raw_input();
if (text=="exit"):
break;
os.write(IN,text);
os.close(IN);
#os.fsync(IN);
time.sleep(0.05);
try:
while True:
#c=os.read(OUT,1);
c=out.readline();
print "Read: ", c#, " -- ", ord(c);
if not c:
print "End of file";
quit();
#break;
except OSError as e:
continue;
#print "OSError"
except IOError as e:
continue;
#print "IOError"
Where:
sample_in, sample_out are respectively fifo files used for redirections to stdin and stdout (so I write to stdin in order to give input to my_program and I read from stdout in order to get my_program output)
out is my os.fdopen file descriptor used for getting lines with out.readline() instead of using OUT.read(1) (char by char)
time.sleep(0.05) is for delay some time before go reading my_program output (needed for calculations, else I got nothing to read).
Whit this script and my_program running in background from the shell, I'm able to write to stdin and read from stdout correctly, but the journey to achieve this code wasn't easy: after have read all posts about fifo and reading/writing from/to fifo files, I came with this solution of closing the IN fd before reading from OUT even if the fifo files are different! From what I read around internet and in Stackoverflow articles, I thought that this procedure was for handle only one fifo file, but here I deal with two (different!). I think it is something related to how I write into sample_in: I tried to flush to look like echo -n command, but it seems useless.
So I would like to ask you if this behaviour is normal, and how can achieve the same thing with echo -n "...." > sample_in and in other shell cat sample_out? In particular, cat is outputting data continuously as soon as I echo input in sample_in, but my way of reading is with data blocks.
Thanks so much, I hope everything it's clear enough!
I have the simplified following code in Python:
proc_args = "gzip --force file; echo this_still_prints > out"
post_proc = subprocess.Popen(proc_args, shell=True)
while True:
time.sleep(1)
Assume file is big enough to take several seconds do process. If I close the Python process while gzip is still running, it will cause gzip to end, but it will still execute the following line to gzip. I'd like to know why this happens, and if there's a way I can make it to not continue executing the following commands.
Thank you!
A process exiting does not automatically cause all its child processes to be killed. See this question and its related questions for much discussion of this.
gzip exits because the pipe containing its standard input gets closed when the parent exits; it reads EOF and exits. However, the shell that's running the two commands is not reading from stdin, so it doesn't notice this. So it just continues on and executes the echo command (which also doesn't read stdin).
post_proc.kill() I believe is what you are looking for ... but afaik you must explicitly call it
see: http://docs.python.org/library/subprocess.html#subprocess.Popen.kill
I use try-finally in such cases (unfortunately you cannot employ with like you would in file.open()):
proc_args = "gzip --force file; echo this_still_prints > out"
post_proc = subprocess.Popen(proc_args, shell=True)
try:
while True:
time.sleep(1)
finally:
post_proc.kill()
I have a long-running Python script that I run from the command-line. The script writes progress messages and results to the standard output. I want to capture everything the script write to the standard output in a file, but also see it on the command line. Alternatively, I want the output to go to the file immediately, so I can use tail to view the progress. I have tried this:
python MyLongRunngingScript.py | tee log.txt
But it does not produce any output (just running the script produces output as expected). Can anyone propose a simple solution? I am using Mac OS X 10.6.4.
Edit I am using print for output in my script.
You are on the right path but the problem is python buffering the output.
Fortunately there is a way to tell it not to buffer output:
python -u MyLongRunngingScript.py | tee log.txt
The fact that you don't see anything is probably related to the fact that buffering is occurring. So you only get output every 4 Ko of text or so.
instead, try something like this :
class OutputSplitter(object):
def __init__(self, real_output, *open_files):
self.__stdout = real_output
self.__fds = open_files
self.encoding = real_output.encoding
def write(self, string):
self.__stdout.write(string) # don't catch exception on that one.
self.__stdout.flush()
for fd in self.__fds:
try:
fd.write(string)
fd.flush()
except IOError:
pass # do what you want here.
def flush(self):
pass # already flushed
Then decorate sys.stdout with that class with some code like that :
stdout_saved = sys.stdout
logfile = open("log.txt","a") # check exception on that one.
sys.stdout = OutputSplitter(stdout_saved, logfile)
That way, every output (print included) is flushed to the standard output and to the specified file. Might require tweaking because i haven't tested that implementation.
Of course, expect to see a (small most of the time) performance penalty when printing messages.
Another simple solution could also be
python script.py > output.log
You could try doing sys.stdout.flush() occasionally in your script, and running with tee again. When stdout is redirected through to tee, it might get buffered for longer than if it's going straight to a terminal.