Read a file for ten seconds - python

I'm working with Logfiles right now. My need is I want to read a file line by line for a specified period of time, say 10s. Can anybody help me if there is a way to accomplish this in Python?

Run tail or tac using Popen and iterate over output until you find a line you want to stop. Here is a example snippet.
filename = '/var/log/nginx/access.log'
# Command to read file from the end
cmd = sys.platform == 'darwin' and ['tail', '-r', filename] or ['tac', filename]
# But if you want read it from beginning, use the following
#cmd = ['cat', filename]
proc = Popen(cmd, close_fds=True, stdout=PIPE, stderr=PIPE)
output = proc.stdout
FORMAT = [
# 'foo',
# 'bar',
]
def extract_log_data(line):
'''Extact data in you log format, normalize it.
'''
return dict(zip(FORMAT, line))
csv.register_dialect('nginx', delimiter=' ', quoting=csv.QUOTE_MINIMAL)
lines = csv.reader(output, dialect='nginx')
started_at = dt.datetime.utcnow()
for line in lines:
data = extract_log_data(line)
print data
if (dt.datetime.utcnow() - started_at) >= dt.timedelta(seconds=10):
break
output.close()
proc.terminate()

Code
from multiprocessing import Process
import time
def read_file(path):
try:
# open file for writing
f = open(path, "r")
try:
for line in f:
# do something
pass
# always close the file when leaving the try block
finally:
f.close()
except IOError:
print "Failed to open/read from file '%s'" % (path)
def read_file_limited_time(path, max_seconds):
# init Process
p = Process(target=read_file, args=(path,))
# start process
p.start()
# for max seconds
for i in range(max_seconds):
# sleep for 1 seconds (you may change the sleep time to suit your needs)
time.sleep(1)
# if process is not alive, we can break the loop
if not p.is_alive():
break
# if process is still alive after max_seconds, kiil it!
if p.is_alive():
p.terminate()
def main():
path = "f1.txt"
read_file_limited_time(path,10)
if __name__ == "__main__":
main()
Notes
The reason why we "wake up" every 1 second and check whether the process we started is still alive is just to prevent us from keep sleeping when the process has finished. time wasting to sleep for 9 seconds if the process ended after 1 second.

Related

How to terminate the script started with subprocess.Popen() based on specific condition

I am starting a Python script called test.py from the main script called main.py.
In the test.py I am tracking some machine learning metrics. When these metrics reach a certain threshold, I want to terminate the subprocess in which the test.py was started.
Is there a possibility to achieve this in Python if I have started this script by using:
proc = subprocess.Popen("python test.py", shell=True)
I haven't found anything in the documentation which would allow me to trigger this event on my own.
UPDATED
The easiest way to perform this is to pass the termination condition as a parameter to test.py.
Otherwise, you can use printing and reading from stdout and stdin If you want to preserve the output and still use Popen, see below. As an example, consider a simple test.py that calculates (in a very inefficient way) some primes:
test.py
import time
primes = [2, 3]
if __name__ == "__main__":
for p in primes:
print(p, flush=True)
i = 5
while True:
for p in primes:
if i % p == 0:
break
if i % p:
primes.append(i)
print(i, flush=True)
i += 2
time.sleep(.005)
You can read the output and choose to terminate the process when you achieve the desired output. As an example, I want to get primes up to 1000.
import subprocess
proc = subprocess.Popen("python test.py",
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
bufsize=1, universal_newlines=True,
shell=True, text=True)
must_stop = False
primes = []
while proc.poll() is None:
line = proc.stdout.readline()
if line:
new_prime = int(line)
primes.append(new_prime)
if new_prime > 1000:
print("Threshold achieved", line)
proc.terminate()
else:
print("new prime:", new_prime)
print(primes)
please notice that since there is a delay in the processing and communication, you might get one or two more primes than desired. If you want to avoid that, you'd need bi-directional communication and test.py would be more complicated. If you want to see the output of test.py on screen, you can print it and then somehow parse it and check if the condition is fulfilled. Other options include using os.mkfifo (Linux only, not very difficult), which provides an easy communication path between two processes:
os.mkinfo version
test.py
import time
import sys
primes = [2, 3]
if __name__ == "__main__":
outfile = sys.stdout
if len(sys.argv) > 1:
try:
outfile = open(sys.argv[1], "w")
except:
print("Could not open file")
for p in primes:
print(p, file=outfile, flush=True)
i = 5
while True:
for p in primes:
if i % p == 0:
break
if i % p:
primes.append(i)
print("This will be printed to screen:", i, flush=True)
print(i, file=outfile, flush=True) # this will go to the main process
i += 2
time.sleep(.005)
main file
import subprocess
import os
import tempfile
tmpdir = tempfile.mkdtemp()
filename = os.path.join(tmpdir, 'fifo') # Temporary filename
os.mkfifo(filename) # Create FIFO
proc = subprocess.Popen(["python3", "test.py", filename], shell=False)
with open(filename, 'rt', 1) as fifo:
primes = []
while proc.poll() is None:
line = fifo.readline()
if line:
new_prime = int(line)
primes.append(new_prime)
if new_prime > 1000:
print("Threshold achieved", line)
proc.terminate()
else:
print("new prime:", new_prime)
print(primes)
pass
os.remove(filename)
os.rmdir(tmpdir)

How to execute command on linux plateform with different thread in python?

I want to execute two separate commands on command prompt and read command's output in python. My approach is, I want to execute these command on certain time interval i.e. after x seconds.
I have two commands say command1 and command2. command1 is taking maximum 30 seconds to print it's output on console. command2 is taking maximum 10 seconds to print it't output on console.
I want to execute this command1 and command2 with new thread every time (time interval) i.e. after every x seconds
program code -
import os,sys
import thread,threading
import time
def read_abc_data():
with open("abc.txt", "a") as myfile:
output = os.popen('command1').read()
myfile.write(output +"\n\n")
def abc(threadName):
while True:
threading.Thread(target = read_abc_data).start()
time.sleep(10)
def read_pqr_data():
with open("pqr.txt", "a") as myfile:
output = os.popen('command2').read()
myfile.write(output +"\n\n")
def pqr(threadName):
while True:
threading.Thread(target = read_pqr_data).start()
time.sleep(10)
if __name__ == "__main__":
try:
thread.start_new_thread( abc, ("Thread-1", ) )
thread.start_new_thread( pqr, ("Thread-2", ) )
except:
print "Error: unable to start thread"
while 1:
pass
Currently I have given 10 seconds sleep (delay) to execute read_abc_data() and read_pqr_data() functions. after executing this program for 1 minute I'm getting abc.txt file as empty. I think the reason is command1 didn't provide complete output in 10 seconds. right?
I want abc.txt and pqr.txt files with commands output as data in that. Is I'm missing something?
As commands output while running, so you'd better get outputs streamingly. It can be done with subprocess and readline:
import subprocess
def read_abc_data():
with open("abc.txt", "a") as myfile:
process = subprocess.Popen('command1', stdout=subprocess.PIPE)
for line in iter(process.stdout.readline, ''):
myfile.write(line)
Try this code. It uses locks to lock the file resource during updates, and uses a single function that gets executed as part of thread execution.
import time
import subprocess
import threading
from thread import start_new_thread
command1 = "ls"
command2 = "date"
file_name1 = "/tmp/one"
file_name2 = "/tmp/two"
def my_function(command, file_name, lock):
process_obj = subprocess.Popen(command, stdout=subprocess.PIPE)
command_output, command_error = process_obj.communicate()
print command_output
lock.acquire()
with open(file_name, 'a+') as f:
f.write(command_output)
print 'Writing'
lock.release()
if __name__ == '__main__':
keep_running = True
lock1 = threading.Lock()
lock2 = threading.Lock()
while keep_running:
try:
start_new_thread(my_function, (command1, file_name1, lock1))
start_new_thread(my_function, (command2, file_name2, lock2))
time.sleep(10)
except KeyboardInterrupt, e:
keep_running = False

python threading.timer set time limit when program runs out of time

I have some questions related to setting the maximum running time of a function in Python. In fact, I would like to use pdfminer to convert the .pdf files to .txt.
The problem is that very often, some files are not possible to decode and take extremely long time. So I want to set threading.Timer() to limit the conversion time for each file to 5 seconds. In addition, I run under windows so I cannot use the signal module for this.
I succeeded in running the conversion code with pdfminer.convert_pdf_to_txt() (in my code it is "c"), but I am not sure that the in the following code, threading.Timer() works. (I don't think it properly constrains the time for each processing)
In summary, I want to:
Convert the pdf to txt
Time limit for each conversion is 5 sec, if it runs out of time, throw an exception and save an empty file
Save all the txt files under the same folder
If there are any exceptions/errors, still save the file but with empty content.
Here is the current code:
import converter as c
import os
import timeit
import time
import threading
import thread
yourpath = 'D:/hh/'
def iftimesout():
print("no")
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
try:
timer = threading.Timer(5.0,iftimesout)
timer.start()
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
print("yes")
timer.cancel()
except KeyboardInterrupt:
raise
except:
for name in files:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
I finally figured it out!
First of all, define a function to call another function with a limited timeout:
import multiprocessing
def call_timeout(timeout, func, args=(), kwargs={}):
if type(timeout) not in [int, float] or timeout <= 0.0:
print("Invalid timeout!")
elif not callable(func):
print("{} is not callable!".format(type(func)))
else:
p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(timeout)
if p.is_alive():
p.terminate()
return False
else:
return True
What does the function do?
Check timeout and function to be valid
Start the given function in a new process, which has some advantages over threads
Block the program for x seconds (p.join()) and allow the function to be executed in this time
After the timeout expires, check if the function is still running
Yes: Terminate it and return False
No: Fine, no timeout! Return True
We can test it with time.sleep():
import time
finished = call_timeout(2, time.sleep, args=(1, ))
if finished:
print("No timeout")
else:
print("Timeout")
We run a function which needs one second to finish, timeout is set to two seconds:
No timeout
If we run time.sleep(10) and set the timeout to two seconds:
finished = call_timeout(2, time.sleep, args=(10, ))
Result:
Timeout
Notice the program stops after two seconds without finishing the called function.
Your final code will look like this:
import converter as c
import os
import timeit
import time
import multiprocessing
yourpath = 'D:/hh/'
def call_timeout(timeout, func, args=(), kwargs={}):
if type(timeout) not in [int, float] or timeout <= 0.0:
print("Invalid timeout!")
elif not callable(func):
print("{} is not callable!".format(type(func)))
else:
p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(timeout)
if p.is_alive():
p.terminate()
return False
else:
return True
def convert(root, name, g, t):
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
try:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
finished = call_timeout(5, convert, args=(root, name, g, t))
if finished:
print("yes")
else:
print("no")
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
except KeyboardInterrupt:
raise
except:
for name in files:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
The code should be easy to understand, if not, feel free to ask.
I really hope this helps (as it took some time for us to get it right ;))!
Check following code and let me know in case of any issues. Also let me know whether you still want to use force termination feature (KeyboardInterruption)
path_to_pdf = "C:\\Path\\To\\Main\\PDFs" # No "\\" at the end of path!
path_to_text = "C:\\Path\\To\\Save\\Text\\" # There is "\\" at the end of path!
TIMEOUT = 5 # seconds
TIME_TO_CHECK = 1 # seconds
# Save PDF content into text file or save empty file in case of conversion timeout
def convert(path_to, my_pdf):
my_txt = text_file_name(my_pdf)
with open(my_txt, "w") as my_text_file:
try:
my_text_file.write(convert_pdf_to_txt(path_to + '\\' + my_pdf))
except:
print "Error. %s file wasn't converted" % my_pdf
# Convert file_name.pdf from PDF folder to file_name.text in Text folder
def text_file_name(pdf_file):
return path_to_text + (pdf_file.split('.')[0]+ ".txt")
if __name__ == "__main__":
# for each pdf file in PDF folder
for root, dirs, files in os.walk(path_to_pdf, topdown=False):
for my_file in files:
count = 0
p = Process(target=convert, args=(root, my_file,))
p.start()
# some delay to be sure that text file created
while not os.path.isfile(text_file_name(my_file)):
time.sleep(0.001)
while True:
# if not run out of $TIMEOUT and file still empty: wait for $TIME_TO_CHECK,
# else: close file and start new iteration
if count < TIMEOUT and os.stat(text_file_name(my_file)).st_size == 0:
count += TIME_TO_CHECK
time.sleep(TIME_TO_CHECK)
else:
p.terminate()
break

strace a python function

Is it possible to strace a python function for opened files, and differentiate if they were opened by python or a subprocess?
read_python, read_external = [], []
#strace_read(read_python, read_external)
function test():
file = open("foo.txt", "r")
subprocess.call(["cat", "bar.txt"])
for file in read_python:
print("python: ", file)
for file in read_external:
print("external: ", file)
So the output is as:
>>> python: foo.txt
>>> external: bar.txt
I'm most interested in using a decorator. Differentiating isn't a priority.
Conceptually, my best guess is to replace instances of load_function(open) with wrappers ... actually, I have no idea, there are too many ways to access open.
I'd solve it in a much simpler way but with similar result. Instead of figuring out how to enable strace on a single function:
Create decorator like this: (untested)
-
def strace_mark(f):
def wrapper(*args, **kwargs):
try:
open('function-%s-start' % f.__name__, 'r')
except:
pass
ret = f(*args, **kwargs)
try:
open('function-%s-end' % f.__name__, 'r')
except:
pass
return ret
Run the whole app under strace -e file.
Get only the parts between calls open(function-something-start) and open(function-something-end).
If you do strace -f, you get the python/external separation for free. Just look at what pid calls the function.
This is the solution I used:
#!/usr/bin/env python3
import multiprocessing
import selectors
import os
import array
import fcntl
import termios
import subprocess
import decorator
import locale
import io
import codecs
import re
import collections
def strace(function):
StraceReturn = collections.namedtuple("StraceReturn", ["return_data", "pid", "strace_data"])
def strace_filter(stracefile, pid, exclude_system=False):
system = ( "/bin"
, "/boot"
, "/dev"
, "/etc"
, "/lib"
, "/proc"
, "/root"
, "/run"
, "/sbin"
, "/srv"
, "/sys"
, "/tmp"
, "/usr"
, "/var"
)
encoding = locale.getpreferredencoding(False)
for line in stracefile:
match = re.search(r'^(?:\[pid\s+(\d+)\]\s+)?open\(\"((?:\\x[0-9a-f]{2})+)\",', line, re.IGNORECASE)
if match:
p, f = match.groups(pid)
f = codecs.escape_decode(f.encode("ascii"))[0].decode(encoding)
if exclude_system and f.startswith(system):
continue
yield (p, f)
def strace_reader(conn_parent, conn_child, barrier, pid):
conn_parent.close()
encoding = locale.getpreferredencoding(False)
strace_args = ["strace", "-e", "open", "-f", "-s", "512", "-xx", "-p", str(pid)]
process_data = io.StringIO()
process = subprocess.Popen\
( strace_args
, stdout = subprocess.DEVNULL
, stderr = subprocess.PIPE
, universal_newlines = True
)
selector = selectors.DefaultSelector()
selector.register(process.stderr, selectors.EVENT_READ)
selector.select()
barrier.wait()
selector.register(conn_child, selectors.EVENT_READ)
while len(selector.get_map()):
events = selector.select()
for key, mask in events:
if key.fd == conn_child.fileno():
conn_child.recv()
selector.unregister(key.fd)
process.terminate()
try:
process.wait(5)
except TimeoutError:
process.kill()
process.wait()
else:
ioctl_buffer = array.array("i", [0])
try:
fcntl.ioctl(key.fd, termios.FIONREAD, ioctl_buffer)
except OSError:
read_bytes = 1024
else:
read_bytes = max(1024, ioctl_buffer[0])
data = os.read(key.fd, read_bytes)
if data:
# store all data, simpler but not as memory-efficient
# as:
# result, leftover_line = strace_filter\
# ( leftover_line + data.decode(encoding)
# , pid
# )
# process_data.append(result)
# with, after this loop, a final:
# result = strace_filter(leftover_line + "\n", pid)
# process_data.append(result)
process_data.write(data.decode(encoding))
else:
selector.unregister(key.fd)
selector.close()
process_data.seek(0, io.SEEK_SET)
for pidfile in strace_filter(process_data, pid):
conn_child.send(pidfile)
conn_child.close()
def strace_wrapper(function, *args, **kw):
strace_data = list()
barrier = multiprocessing.Barrier(2)
conn_parent, conn_child = multiprocessing.Pipe(duplex = True)
process = multiprocessing.Process\
( target=strace_reader
, args=(conn_parent, conn_child, barrier, os.getpid())
)
process.start()
conn_child.close()
barrier.wait()
function_return = function()
conn_parent.send(None)
while True:
try:
strace_data.append(conn_parent.recv())
except EOFError:
break
process.join(5)
if process.is_alive():
process.terminate()
process.join(5)
if process.is_alive():
os.kill(process.pid, signal.SIGKILL)
process.join()
conn_parent.close()
return StraceReturn(function_return, os.getpid(), strace_data)
return decorator.decorator(strace_wrapper, function)
#strace
def test():
print("Entering test()")
process = subprocess.Popen("cat +μυρτιὲς.txt", shell=True)
f = open("test\"test", "r")
f.close()
process.wait()
print("Exiting test()")
return 5
print(test())
Note that any information strace generates after the termination event will be collected. To avoid that, use a while not signaled loop, and terminate the subprocess after the loop (the FIONREAD ioctl is a holdover from this case; I didn't see any reason to remove it).
In hindsight, the decorator could have been greatly simplified had I used a temporary file, rather than multiprocessing/pipe.
A child process is forked to then fork strace - in other words, strace is tracing its grandparent. Some linux distributions only allow strace to trace its children. I'm not sure how to work around this restriction - having the main program continue executing in the child fork (while the parent execs strace) is probably a bad idea - the program will trade PIDs like a hot potato if the decorated functions are used too often.

File Processor using multiprocessing

I am writing a file processor that can (hopefully) parse arbitrary files and perform arbitrary actions on the parsed contents. The file processor needs to run continuously. The basic idea that I am following is
Each file will have two associated processes (One for reading, other for Parsing and writing somewhere else)
The reader will read a line into a common buffer(say a Queue) till EOF or buffer full. Then wait(sleep)
Writer will read from buffer, parse the stuff, write it to (say) DB till buffer not empty. Then wait(sleep)
Interrupting the main program will cause the reader/writer to exit safely (buffer can be washed away without writing)
The program runs fine. But, sometimes Writer will initialize first and find the buffer empty. So it will go to sleep. The Reader will fill the buffer and sleep too. So for sleep_interval my code does nothing. To get around that thing, I tried using a multiprocessing.Event() to signal to the writer that the buffer has some entries which it may process.
My code is
import multiprocessing
import time
import sys
import signal
import Queue
class FReader(multiprocessing.Process):
"""
A basic file reader class
It spawns a new process that shares a queue with the writer process
"""
def __init__(self,queue,fp,sleep_interval,read_offset,event):
self.queue = queue
self.fp = fp
self.sleep_interval = sleep_interval
self.offset = read_offset
self.fp.seek(self.offset)
self.event = event
self.event.clear()
super(FReader,self).__init__()
def myhandler(self,signum,frame):
self.fp.close()
print "Stopping Reader"
sys.exit(0)
def run(self):
signal.signal(signal.SIGINT,self.myhandler)
signal.signal(signal.SIGCLD,signal.SIG_DFL)
signal.signal(signal.SIGILL,self.myhandler)
while True:
sleep_now = False
if not self.queue.full():
print "READER:Reading"
m = self.fp.readline()
if not self.event.is_set():
self.event.set()
if m:
self.queue.put((m,self.fp.tell()),block=False)
else:
sleep_now = True
else:
print "Queue Full"
sleep_now = True
if sleep_now:
print "Reader sleeping for %d seconds"%self.sleep_interval
time.sleep(self.sleep_interval)
class FWriter(multiprocessing.Process):
"""
A basic file writer class
It spawns a new process that shares a queue with the reader process
"""
def __init__(self,queue,session,sleep_interval,fp,event):
self.queue = queue
self.session = session
self.sleep_interval = sleep_interval
self.offset = 0
self.queue_offset = 0
self.fp = fp
self.dbqueue = Queue.Queue(50)
self.event = event
self.event.clear()
super(FWriter,self).__init__()
def myhandler(self,signum,frame):
#self.session.commit()
self.session.close()
self.fp.truncate()
self.fp.write(str(self.offset))
self.fp.close()
print "Stopping Writer"
sys.exit(0)
def process_line(self,line):
#Do not process comments
if line[0] == '#':
return None
my_list = []
split_line = line.split(',')
my_list = split_line
return my_list
def run(self):
signal.signal(signal.SIGINT,self.myhandler)
signal.signal(signal.SIGCLD,signal.SIG_DFL)
signal.signal(signal.SIGILL,self.myhandler)
while True:
sleep_now = False
if not self.queue.empty():
print "WRITER:Getting"
line,offset = self.queue.get(False)
#Process the line just read
proc_line = self.process_line(line)
if proc_line:
#Must write it to DB. Put it into DB Queue
if self.dbqueue.full():
#DB Queue is full, put data into DB before putting more data
self.empty_dbqueue()
self.dbqueue.put(proc_line)
#Keep a track of the maximum offset in the queue
self.queue_offset = offset if offset > self.queue_offset else self.queue_offset
else:
#Looks like writing queue is empty. Just check if DB Queue is empty too
print "WRITER: Empty Read Queue"
self.empty_dbqueue()
sleep_now = True
if sleep_now:
self.event.clear()
print "WRITER: Sleeping for %d seconds"%self.sleep_interval
#time.sleep(self.sleep_interval)
self.event.wait(5)
def empty_dbqueue(self):
#The DB Queue has many objects waiting to be written to the DB. Lets write them
print "WRITER:Emptying DB QUEUE"
while True:
try:
new_line = self.dbqueue.get(False)
except Queue.Empty:
#Write the new offset to file
self.offset = self.queue_offset
break
print new_line[0]
def main():
write_file = '/home/xyz/stats.offset'
wp = open(write_file,'r')
read_offset = wp.read()
try:
read_offset = int(read_offset)
except ValueError:
read_offset = 0
wp.close()
print read_offset
read_file = '/var/log/somefile'
file_q = multiprocessing.Queue(100)
ev = multiprocessing.Event()
new_reader = FReader(file_q,open(read_file,'r'),30,read_offset,ev)
new_writer = FWriter(file_q,open('/dev/null'),30,open(write_file,'w'),ev)
new_reader.start()
new_writer.start()
try:
new_reader.join()
new_writer.join()
except KeyboardInterrupt:
print "Closing Master"
new_reader.join()
new_writer.join()
if __name__=='__main__':
main()
The dbqueue in Writer is for batching together Database writes and for each line I keep the offset of that line. The maximum offset written into DB is stored into offset file on exit so that I can pick up where I left on next run. The DB object (session) is just '/dev/null' for demo.
Previously rather than do
self.event.wait(5)
I was doing
time.sleep(self.sleep_interval)
Which (as I have said) worked well but introduced a little delay. But then the processes exited perfectly.
Now on doing a Ctrl-C on the main process, the reader exits but the writer throws an OSError
^CStopping Reader
Closing Master
Stopping Writer
Process FWriter-2:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "FileParse.py", line 113, in run
self.event.wait(5)
File "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 303, in wait
self._cond.wait(timeout)
File "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 212, in wait
self._wait_semaphore.acquire(True, timeout)
OSError: [Errno 0] Error
I know event.wait() somehow blocks the code but I can't get how to overcome this. I tried wrapping self.event.wait(5) and sys.exit() in a try: except OSError: block but that only makes the program hang forever.
I am using Python-2.6
I think it would be better to use the Queue blocking timeout for the Writer class - using Queue.get(True, 5), then if during the time interval something was put into the queue, the Writer would wake up immediately.. The Writer loop would then be something like:
while True:
sleep_now = False
try:
print "WRITER:Getting"
line,offset = self.queue.get(True, 5)
#Process the line just read
proc_line = self.process_line(line)
if proc_line:
#Must write it to DB. Put it into DB Queue
if self.dbqueue.full():
#DB Queue is full, put data into DB before putting more data
self.empty_dbqueue()
self.dbqueue.put(proc_line)
#Keep a track of the maximum offset in the queue
self.queue_offset = offset if offset > self.queue_offset else self.queue_offset
except Queue.Empty:
#Looks like writing queue is empty. Just check if DB Queue is empty too
print "WRITER: Empty Read Queue"
self.empty_dbqueue()

Categories