Python Memory leak using Yocto - python

I'm running a python script on a raspberry pi that constantly checks on a Yocto button and when it gets pressed it puts data from a different sensor in a database.
a code snippet of what constantly runs is:
#when all set and done run the program
Active = True
while Active:
if ResponseType == "b":
while Active:
try:
if GetButtonPressed(ResponseValue):
DoAllSensors()
time.sleep(5)
else:
time.sleep(0.5)
except KeyboardInterrupt:
Active = False
except Exception, e:
print str(e)
print "exeption raised continueing after 10seconds"
time.sleep(10)
the GetButtonPressed(ResponseValue) looks like the following:
def GetButtonPressed(number):
global buttons
if ModuleCheck():
if buttons[number - 1].get_calibratedValue() < 300:
return True
else:
print "module not online"
return False
def ModuleCheck():
global moduleb
return moduleb.isOnline()
I'm not quite sure about what might be going wrong. But it takes about an hour before the RPI runs out of memory.
The memory increases in size constantly and the button is only pressed once every 15 minutes or so.
That already tells me that the problem must be in the code displayed above.

The problem is that the yocto_api.YAPI object will continue to accumulate _Event objects in its _DataEvents dict (a class-wide attribute) until you call YAPI.YHandleEvents. If you're not using the API's callbacks, it's easy to think (I did, for hours) that you don't need to ever call this. The API docs aren't at all clear on the point:
If your program includes significant loops, you may want to include a call to this function to make sure that the library takes care of the information pushed by the modules on the communication channels. This is not strictly necessary, but it may improve the reactivity of the library for the following commands.
I did some playing around with API-level callbacks before I decided to periodically poll the sensors in my own code, and it's possible that some setting got left enabled in them that is causing these events to accumulate. If that's not the case, I can't imagine why they would say calling YHandleEvents is "not strictly necessary," unless they make ARM devices with unlimited RAM in Switzerland.
Here's the magic static method that thou shalt call periodically, no matter what. I'm doing so once every five seconds and that is taking care of the problem without loading down the system at all. API code that would accumulate unwanted events still smells to me, but it's time to move on.
#noinspection PyUnresolvedReferences
#staticmethod
def HandleEvents(errmsgRef=None):
"""
Maintains the device-to-library communication channel.
If your program includes significant loops, you may want to include
a call to this function to make sure that the library takes care of
the information pushed by the modules on the communication channels.
This is not strictly necessary, but it may improve the reactivity
of the library for the following commands.
This function may signal an error in case there is a communication problem
while contacting a module.
#param errmsg : a string passed by reference to receive any error message.
#return YAPI.SUCCESS when the call succeeds.
On failure, throws an exception or returns a negative error code.
"""
errBuffer = ctypes.create_string_buffer(YAPI.YOCTO_ERRMSG_LEN)
#noinspection PyUnresolvedReferences
res = YAPI._yapiHandleEvents(errBuffer)
if YAPI.YISERR(res):
if errmsgRef is not None:
#noinspection PyAttributeOutsideInit
errmsgRef.value = YByte2String(errBuffer.value)
return res
while len(YAPI._DataEvents) > 0:
YAPI.yapiLockFunctionCallBack(errmsgRef)
if not (len(YAPI._DataEvents)):
YAPI.yapiUnlockFunctionCallBack(errmsgRef)
break
ev = YAPI._DataEvents.pop(0)
YAPI.yapiUnlockFunctionCallBack(errmsgRef)
ev.invokeData()
return YAPI.SUCCESS

Related

Killing cv2 read process after N time

I'm desperate.
My code reads nframe in videos, sometimes the code just stop for no reason, and no error.
So I decided to somehow raise an error.
The thing is, the code does raise an error, but it ignores it for some reason, and just works as normal.
*Ive provided a code block on which exactly the same method works.
handler:
def handler(signum,frame):
print("error") ## This is printed
raise Exception('time out') ## I guess this is getting raised
Code part i want to wrap:
for i in range(0,int(frame_count), nframe): # basicly loads every nframe from the video
try:
frame = video.set(1,i)
signal.signal(signal.SIGALRM), handler)
signal.alarm(1) # At this point, the 'handler' did raise the error, but it did not kill this 'try' block.
_n,frame = video.read() # This line sometimes gets for infinit amount of time, and i want to wrap it
except Exception as e:
print('test') # Code does not get here, yet the 'handler' does raise an exception
raise e
# Here i need to return False, or rise an error, but the code just does not get here.
An example where exactly the same method will work:
import signal
import time
def handler(signum, frame):
raise Exception('time out')
def function():
try:
signal.signal(signal.SIGALRM,handler)
signal.alarm(5) # 5 seconds till raise
time.sleep(10) # does not get here, an Exception is raised after 5 seconds
except Exception as e:
raise e # This will indeed work
My guess is that the read() call is blocked somewhere inside C code. The signal handler runs, puts an exception into the Python interpreter somewhere, but the exception isn't handled until the Python interpreter regains control. This is a limitation documented in the signal module:
A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.
One possible workaround is to read frames on a separate process using the multiprocessing module, and return them to the main process using a multiprocessing.Queue (from which you can get with a timeout). However, there will be extra overhead in sending the frames between processes.
Another approach might be to try and avoid the root of the problem. OpenCV has different video backends (V4L, GStreamer, ffmpeg, ...); one of them might work where another doesn't. Using the second argument to the VideoCapture constructor, you can indicate a preference for which backend to use:
cv.VideoCapture(..., cv.CAP_FFMPEG)
See the documentation for the full list of backends. Depending on your platform and OpenCV build, not all of them will be available.

Segmentation fault when initializing array

I am getting a segmentation fault when initializing an array.
I have a callback function from when an RFID tag gets read
IDS = []
def readTag(e):
epc = str(e.epc, 'utf-8')
if not epc in IDS:
now = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')
IDS.append([epc, now, "name.instrument"])
and a main function from which it's called
def main():
for x in vals:
IDS.append([vals[0], vals[1], vals[2]])
for x in IDS:
print(x[0])
r = mercury.Reader("tmr:///dev/ttyUSB0", baudrate=9600)
r.set_region("NA")
r.start_reading(readTag, on_time=1500)
input("press any key to stop reading: ")
r.stop_reading()
The error occurs because of the line IDS.append([epc, now, "name.instrument"]). I know because when I replace it with a print call instead the program will run just fine. I've tried using different types for the array objects (integers), creating an array of the same objects outside of the append function, etc. For some reason just creating an array inside the "readTag" function causes the segmentation fault like row = [1,2,3]
Does anyone know what causes this error and how I can fix it? Also just to be a little more specific, the readTag function will work fine for the first two (only ever two) calls, but then it crashes and the Reader object that has the start_reading() function is from the mercury-api
This looks like a scoping issue to me; the mercury library doesn't have permission to access your list's memory address, so when it invokes your callback function readTag(e) a segfault occurs. I don't think that the behavior that you want is supported by that library
To extend Michael's answer, this appears to be an issue with scoping and the API you're using. In general pure-Python doesn't seg-fault. Or at least, it shouldn't seg-fault unless there's a bug in the interpreter, or some extension that you're using. That's not to say pure-Python won't break, it's just that a genuine seg-fault indicates the problem is probably the result of something messy outside of your code.
I'm assuming you're using this Python API.
In that case, the README.md mentions that the Reader.start_reader() method you're using is "asynchronous". Meaning it invokes a new thread or process and returns immediately and then the background thread continues to call your callback each time something is scanned.
I don't really know enough about the nitty gritty of CPython to say exactly what going on, but you've declared IDS = [] as a global variable and it seems like the background thread is running the callback with a different context to the main program. So when it attempts to access IDS it's reading memory it doesn't own, hence the seg-fault.
Because of how restrictive the callback is and the apparent lack of a buffer, this might be an oversight on the behalf of the developer. If you really need asynchronous reads it's worth sending them an issue report.
Otherwise, considering you're just waiting for input you probably don't need the asynchronous reads, and you could use the synchronous Reader.read() method inside your own busy loop instead with something like:
try:
while True:
readTags(r.read(timeout=10))
except KeyboardInterrupt: ## break loop on SIGINT (Ctrl-C)
pass
Note that r.read() returns a list of tags rather than just one, so you'd need to modify your callback slightly, and if you're writing more than just a quick script you probably want to use threads to interrupt the loop properly as SIGINT is pretty hacky.

Python: PySerial disconnecting from device randomly

I have a process that runs data acquisition using PySerial. It's working fine now, but there's a weird thing I had to do to make it work continuously, and I'm not sure this is normal, so I'm asking this question.
What happens: It looks like that the connection drops now and then! Around once every 30-60 minutes, with big error bars (could go for hours and be OK, but sometimes happens often).
My question: Is this standard?
My temporary solution: I wrote a simple "reopen" function that looks like this:
def ReopenDevice(devObject):
try:
devObject.close()
devObject.open()
except Exception as e:
print("Error while trying to connect to device " + devObject.port + ". The error says: " + str(e))
time.sleep(2)
And what I do is that if data pulling fails for 2 minutes, I reopen the device with this function, and it continues working well with no problems.
My program model: It's a GUI program, where the user clicks something like "Start", and that button does some preparations and runs a function through multiprocessing.Process() that starts with:
devObj = serial.Serial()
#... other params
devObj.open()
and that function then runs a while loop that keeps polling data with something like:
bytesToRead = devObj.inWaiting()
if bytesToRead != 0:
buffer = decodeString(devObj.read(bytesToRead))
#process buffer and push it to a list...
The way I know that the problem happened, is that devObj.inWaiting() Keeps returning zero... no matter how much data there's on the device!
Is this behavior expected and should always be considered whether it happens or doesn't happen?
The problem reduced a lot after not calling inWaiting() very frequently. Anyway, I kept the reconnect part to ensure that my program never fails. Thanks for "Kobi K" for suggesting the possible cause of the problem.

Extra Characthers showing up after peeking at an MSMQ message

I am in the process of upgrading an older legacy system that is using Biztalk, MSMQs, Java, and python.
Currently, I am trying to upgrade a particular piece of the project which when complete will allow me to begin an in-place replacement of many of the legacy systems.
What I have done so far is recreate the legacy system in a newer version of Biztalk (2010) and on a machine that isn't on its last legs.
Anyway, the problem I am having is that there is a piece of Python code that picks up a message from an MSMQ and places it on another server. This code has been in place on our legacy system since 2004 and has worked since then. As far as I know, has never been changed.
Now when I rebuilt this, I started getting errors in the remote server and, after checking a few things out and eliminating many possible problems, I have established that the error occurs somewhere around the time the Python code is picking up from the MSMQ.
The error can be created using just two messages. Please note that I am using sample XMls here as the actual ones are pretty long.
Message one:
<xml>
<field1>Text 1</field1>
<field2>Text 2</field2>
</xml>
Message two:
<xml>
<field1>Text 1</field1>
</xml>
Now if I submit message one followed by message two to the MSMQ, they both appear correctly on the queue. If I then call the Python script, message one is returned correctly but message two gains extra characters.
Post-Python message two:
<xml>
<field1>Text 1</field1>
</xml>1>Te
I thought at first that there might have been scoping problems within the Python code but I have gone through that as well as I can and found none. However, I must admit that the first time that I've looked seriously at Python code is this project.
The Python code first peeks at a message and then receives it. I have been able to see the message when the script peeks and it has the same error message as when it receives.
Also, this error only shows up when going from a longer message to a shorter message.
I would welcome any suggestions of things that might be wrong, or things I could do to identify the problem.
I have googled and searched and gone a little crazy. This is holding an entire project up, as we can't begin replacing the older systems with this piece in place to act as a new bridge.
Thanks for taking the time to read through my problem.
Edit: Here's the relevant Python code:
import sys
import pythoncom
from win32com.client import gencache
msmq = gencache.EnsureModule('{D7D6E071-DCCD-11D0-AA4B-0060970DEBAE}', 0, 1, 0)
def Peek(queue):
qi = msmq.MSMQQueueInfo()
qi.PathName = queue
myq = qi.Open(msmq.constants.MQ_PEEK_ACCESS,0)
if myq.IsOpen:
# Don't loose this pythoncom.Empty thing (it took a while)
tmp = myq.Peek(pythoncom.Empty, pythoncom.Empty, 1)
myq.Close()
return tmp
The function calls this piece of code. I don't have access to the code that calls this until Monday, but the call is basically:
msg= MSMQ.peek()
2nd Edit.
I am attaching the first half of the script. this basically loops around
import base64, xmlrpclib, time
import MSMQ, Config, Logger
import XmlRpcExt,os,whrandom
QueueDetails = Config.InQueueDetails
sleeptime = Config.SleepTime
XMLRPCServer = Config.XMLRPCServer
usingBase64 = Config.base64ing
version=Config.version
verbose=Config.verbose
LogO = Logger.Logger()
def MSMQToIAMS():
# moved svr cons out of daemon loop
LogO.LogP(version)
svr = xmlrpclib.Server(XMLRPCServer, XmlRpcExt.getXmlRpcTransport())
while 1:
GotOne = 0
for qd in QueueDetails:
queue, agency, messagetype = qd
#LogO.LogD('['+version+"] Searching queue %s for messages"%queue)
try:
msg=MSMQ.Peek(queue)
except Exception,e:
LogO.LogE("Peeking at \"%s\" : %s"%(queue, e))
continue
if msg:
try:
msg = msg.__call__().encode('utf-8')
except:
LogO.LogE("Could not convert massege on \"%s\" to a string, leaving it on queue"%queue)
continue
if verbose:
print "++++++++++++++++++++++++++++++++++++++++"
print msg
print "++++++++++++++++++++++++++++++++++++++++"
LogO.LogP("Found Message on \"%s\" : \"%s...\""%(queue, msg[:40]))
try:
rv = svr.accept(msg, agency, messagetype)
if rv[0] != "OK":
raise Exception, rv[0]
LogO.LogP('Message has been sent successfully to IAMS from %s'%queue)
MSMQ.Receive(queue)
GotOne = 1
StoreMsg(msg)
except Exception, e:
LogO.LogE("%s"%e)
if GotOne == 0:
time.sleep(sleeptime)
else:
gotOne = 0
This is the full code that calls MSMQ. Creates a little program that watches MSMQ and when a message arrives picks it up and sends it off to another server.
Sounds really Python-specific (of which I know nothing) rather then MSMQ-specific. Isn't this just a case of a memory variable being used twice without being cleared in between? The second message is shorter than the first so there are characters from the first not being overwritten. What do the relevant parts of the Python code look like?
[[21st April]]
The code just shows you are populating the tmp variable with a message. What happens to tmp before the next message is accessed? I'm assuming it is not cleared.

Overriding basic signals (SIGINT, SIGQUIT, SIGKILL??) in Python

I'm writing a program that adds normal UNIX accounts (i.e. modifying /etc/passwd, /etc/group, and /etc/shadow) according to our corp's policy. It also does some slightly fancy stuff like sending an email to the user.
I've got all the code working, but there are three pieces of code that are very critical, which update the three files above. The code is already fairly robust because it locks those files (ex. /etc/passwd.lock), writes to to a temporary files (ex. /etc/passwd.tmp), and then, overwrites the original file with the temporary. I'm fairly pleased that it won't interefere with other running versions of my program or the system useradd, usermod, passwd, etc. programs.
The thing that I'm most worried about is a stray ctrl+c, ctrl+d, or kill command in the middle of these sections. This has led me to the signal module, which seems to do precisely what I want: ignore certain signals during the "critical" region.
I'm using an older version of Python, which doesn't have signal.SIG_IGN, so I have an awesome "pass" function:
def passer(*a):
pass
The problem that I'm seeing is that signal handlers don't work the way that I expect.
Given the following test code:
def passer(a=None, b=None):
pass
def signalhander(enable):
signallist = (signal.SIGINT, signal.SIGQUIT, signal.SIGABRT, signal.SIGPIPE, signal.SIGALRM, signal.SIGTERM, signal.SIGKILL)
if enable:
for i in signallist:
signal.signal(i, passer)
else:
for i in signallist:
signal.signal(i, abort)
return
def abort(a=None, b=None):
sys.exit('\nAccount was not created.\n')
return
signalhander(True)
print('Enabled')
time.sleep(10) # ^C during this sleep
The problem with this code is that a ^C (SIGINT) during the time.sleep(10) call causes that function to stop, and then, my signal handler takes over as desired. However, that doesn't solve my "critical" region problem above because I can't tolerate whatever statement encounters the signal to fail.
I need some sort of signal handler that will just completely ignore SIGINT and SIGQUIT.
The Fedora/RH command "yum" is written is Python and does basically exactly what I want. If you do a ^C while it's installing anything, it will print a message like "Press ^C within two seconds to force kill." Otherwise, the ^C is ignored. I don't really care about the two second warning since my program completes in a fraction of a second.
Could someone help me implement a signal handler for CPython 2.3 that doesn't cause the current statement/function to cancel before the signal is ignored?
As always, thanks in advance.
Edit: After S.Lott's answer, I've decided to abandon the signal module.
I'm just going to go back to try: except: blocks. Looking at my code there are two things that happen for each critical region that cannot be aborted: overwriting file with file.tmp and removing the lock once finished (or other tools will be unable to modify the file, until it is manually removed). I've put each of those in their own function inside a try: block, and the except: simply calls the function again. That way the function will just re-call itself in the event of KeyBoardInterrupt or EOFError, until the critical code is completed.
I don't think that I can get into too much trouble since I'm only catching user provided exit commands, and even then, only for two to three lines of code. Theoretically, if those exceptions could be raised fast enough, I suppose I could get the "maximum reccurrsion depth exceded" error, but that would seem far out.
Any other concerns?
Pesudo-code:
def criticalRemoveLock(file):
try:
if os.path.isFile(file):
os.remove(file)
else:
return True
except (KeyboardInterrupt, EOFError):
return criticalRemoveLock(file)
def criticalOverwrite(tmp, file):
try:
if os.path.isFile(tmp):
shutil.copy2(tmp, file)
os.remove(tmp)
else:
return True
except (KeyboardInterrupt, EOFError):
return criticalOverwrite(tmp, file)
There is no real way to make your script really save. Of course you can ignore signals and catch a keyboard interrupt using try: except: but it is up to your application to be idempotent against such interrupts and it must be able to resume operations after dealing with an interrupt at some kind of savepoint.
The only thing that you can really to is to work on temporary files (and not original files) and move them after doing the work into the final destination. I think such file operations are supposed to be "atomic" from the filesystem prospective. Otherwise in case of an interrupt: restart your processing from start with clean data.

Categories