Do I lock code in both threads? - python

new to programming in threads and using locks
So I've 2 threads - one reading vlaues from COM port - the other one controlling a stepper motor
in one thread
if stepperb_value != 0: #if stepperb non-zero
if stepperb_value > 0: # if positive value
self.step_fine(10,11,12,13,step_delay) #step forward
else:
self.step_fine(13,12,11,10,step_delay) #step backwards
if abs(stepperb_value) != 100:
time.sleep(10*step_delay*((100/abs(stepperb_value))-1))
(I need to prevent a change to stepperb causing a divide by zero error on last line)
in the other thread thats reading the values from the COM port
if 'stepperb' in dataraw:
outputall_pos = dataraw.find('stepperb')
sensor_value = dataraw[(1+outputall_pos+len('stepperb')):].split()
print "stepperb" , sensor_value[0]
if isNumeric(sensor_value[0]):
stepperb_value = int(max(-100,min(100,int(sensor_value[0]))))
Where (and what sort of locks) do I need - the first thread is time sensitive so that needs to have priority
regards
Simon

If you're using only CPython, you have another possibility (for now).
As CPython has the Global Interpreter Lock, assignments are atomic. So you can pull stepperb_value into a local variable in your motor control thread before using it and use the local variable instead of the global one.
Note that you'd be using an implementation detail here. Your code possibly won't run safely on other python implementations such as Jython or IronPython. Even on CPython, the behaviour might change in the future, but I don't see the GIL go away before Python 5 or something.

You're going to have to lock around any series of accesses that expects stepperb_value not to change in the series. In your case, that would be the entire motor control block you posted. Similarly, you should lock around the entire assignment to stepperb_value in the COM port thread, to ensure that the write is atomic, and does not happen while the first thread holds the lock (and expects the value not to change).

Well, your critical variable is stepperb_value so that's what should be "guarded":
I suggest you to use Queue which takes care for all the synchronization issues (consumer/producer pattern). Also events and conditions can be a suitable solution:
http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/

Related

How to do I write signal-safe Python code?

I'll start with an example code, that "should never" fail:
counter1 = 0
counter2 = 0
def increment():
global counter1, counter2
counter1 += 1
counter2 += 1
while True:
try:
increment()
except:
pass
assert counter1 == counter2
The counters represent an internal structure that should keep its integrity no matter what. By a quick look, the assertion will never be False, and the structure will be intact.
One small signal however, that occurs in the middle of the function (for example SIGINT or KeyboardInterrupt in Python) causes the internal structure to break. In real-world scenario it may cause memory corruption, deadlocks, and all other types of mayhem.
Is there any way to make this function signal safe? If any arbitrary code can run anywhere and cause it to malfunction, how can we program library code that is safe and secure?
Even if we attempt to guard it using try...finally..., we might still receive a signal at the finally and prevent it from running.
While the example is in Python, I believe this question applies to all programming languages as a whole.
EDIT:
Do keep in mind the counters are just an example. In my actual use case it's different Locks and threading.Events. There's no real way (that I know of) to make the operation atomic.
One solution to your example is to update the data atomically, if possible:
counters = [0, 0]
def increment():
global counters
# "complex" preparation. not atomic at all
next0 = counters[0] + 1
next1 = counters[1] + 1
temp = [next0, next1]
# actual atomic update
counters = temp
try:
while True:
increment()
finally:
assert counters[0] == counters[1]
No matter where the execution of the code stops, counters will always be in a consistent state, even without any cleanup.
Of course, this is a special case, which can't be always applied. A more general concept would be transactions where multiple actions are bundled and either all get executed or none.
Errors during rollback/abort of a transaction can be still problematic, though.
There is a transaction package, but I did not look into that.
best way is to used the techniques here. https://julien.danjou.info/atomic-lock-free-counters-in-python/
unfortunately other answers have some complex concurrency issues.

Deleted objects in Python 3 not being released from memory? [duplicate]

I wrote a python script that backs up my files while I'm sleeping at night. The program is designed to run whenever the computer is on and to automatically shut down the computer after the backups are done. My code looks like this:
from datetime import datetime
from os import system
from backup import backup
while True:
today = datetime.now()
# Perform backups on Monday at 3:00am.
if today.weekday() == 0 and today.hour == 3:
print('Starting backups...')
# Perform backups.
backup("C:\\Users\\Jeff Moorhead\\Desktop", "E:\\")
backup("C:\\Users\\Jeff Moorhead\\Desktop", "F:\\")
backup("C:\\Users\\Jeff Moorhead\\OneDrive\\Documents", "E:\\")
backup("C:\\Users\\Jeff Moorhead\\OneDrive\\Documents", "F:\\")
# Shutdown computer after backups finish.
system('shutdown /s /t 10')
break
else:
del today
continue
The backup function is from another file that I wrote to perform more customized backups on a case by by case basis. This code all works perfectly fine, but I'm wondering if the del statement
del today
is really necessary. I put it in thinking that it would prevent my memory from getting filled up by thousands of datetime objects, but then I read that Python uses garbage collection, similar to Java. Further, does the todayvariable automatically get replaced with each pass through the while loop? I know that the program works as intended with the del statement, but if it is unnecessary, then I would like to get rid of it if only for the sake of brevity! What are it's actual effects on memory?
I put it in thinking that it would prevent my memory from getting filled up by thousands of datetime objects
The del statement is not necessary, you may simply remove that block. Python will free the space from those local variables automatically.
... but then I read that Python uses garbage collection, similar to Java.
The above statement is misguided: this has nothing to do with the garbage collector, which exists to break up circular references. In CPython, the memory is released when the object reference count decreases to zero, and that would occur even if the garbage collector is disabled.
Further, does the today variable automatically get replaced with each pass through the while loop? I know that the program works as intended with the del statement, but if it is unnecessary, then I would like to get rid of it if only for the sake of brevity! What are it's actual effects on memory?
A new datetime object is created on each iteration of the loop.
The today name in scope will be rebound to the newly created datetime instance. The old datetime instance will be deleted because no reference exists on it (since the only existing reference is lost once you rebound the name today to a different object). Once again, I stress that this is just ref-counting and has nothing to do with gc.
On an unrelated note, your program will busy-loop and consume an entire CPU with this while loop. You should consider adding a call to time.sleep into the loop so the process will remain mostly idle. Or, better yet, schedule the task to run periodically using cron.

Segmentation fault when initializing array

I am getting a segmentation fault when initializing an array.
I have a callback function from when an RFID tag gets read
IDS = []
def readTag(e):
epc = str(e.epc, 'utf-8')
if not epc in IDS:
now = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')
IDS.append([epc, now, "name.instrument"])
and a main function from which it's called
def main():
for x in vals:
IDS.append([vals[0], vals[1], vals[2]])
for x in IDS:
print(x[0])
r = mercury.Reader("tmr:///dev/ttyUSB0", baudrate=9600)
r.set_region("NA")
r.start_reading(readTag, on_time=1500)
input("press any key to stop reading: ")
r.stop_reading()
The error occurs because of the line IDS.append([epc, now, "name.instrument"]). I know because when I replace it with a print call instead the program will run just fine. I've tried using different types for the array objects (integers), creating an array of the same objects outside of the append function, etc. For some reason just creating an array inside the "readTag" function causes the segmentation fault like row = [1,2,3]
Does anyone know what causes this error and how I can fix it? Also just to be a little more specific, the readTag function will work fine for the first two (only ever two) calls, but then it crashes and the Reader object that has the start_reading() function is from the mercury-api
This looks like a scoping issue to me; the mercury library doesn't have permission to access your list's memory address, so when it invokes your callback function readTag(e) a segfault occurs. I don't think that the behavior that you want is supported by that library
To extend Michael's answer, this appears to be an issue with scoping and the API you're using. In general pure-Python doesn't seg-fault. Or at least, it shouldn't seg-fault unless there's a bug in the interpreter, or some extension that you're using. That's not to say pure-Python won't break, it's just that a genuine seg-fault indicates the problem is probably the result of something messy outside of your code.
I'm assuming you're using this Python API.
In that case, the README.md mentions that the Reader.start_reader() method you're using is "asynchronous". Meaning it invokes a new thread or process and returns immediately and then the background thread continues to call your callback each time something is scanned.
I don't really know enough about the nitty gritty of CPython to say exactly what going on, but you've declared IDS = [] as a global variable and it seems like the background thread is running the callback with a different context to the main program. So when it attempts to access IDS it's reading memory it doesn't own, hence the seg-fault.
Because of how restrictive the callback is and the apparent lack of a buffer, this might be an oversight on the behalf of the developer. If you really need asynchronous reads it's worth sending them an issue report.
Otherwise, considering you're just waiting for input you probably don't need the asynchronous reads, and you could use the synchronous Reader.read() method inside your own busy loop instead with something like:
try:
while True:
readTags(r.read(timeout=10))
except KeyboardInterrupt: ## break loop on SIGINT (Ctrl-C)
pass
Note that r.read() returns a list of tags rather than just one, so you'd need to modify your callback slightly, and if you're writing more than just a quick script you probably want to use threads to interrupt the loop properly as SIGINT is pretty hacky.

Python Memory leak using Yocto

I'm running a python script on a raspberry pi that constantly checks on a Yocto button and when it gets pressed it puts data from a different sensor in a database.
a code snippet of what constantly runs is:
#when all set and done run the program
Active = True
while Active:
if ResponseType == "b":
while Active:
try:
if GetButtonPressed(ResponseValue):
DoAllSensors()
time.sleep(5)
else:
time.sleep(0.5)
except KeyboardInterrupt:
Active = False
except Exception, e:
print str(e)
print "exeption raised continueing after 10seconds"
time.sleep(10)
the GetButtonPressed(ResponseValue) looks like the following:
def GetButtonPressed(number):
global buttons
if ModuleCheck():
if buttons[number - 1].get_calibratedValue() < 300:
return True
else:
print "module not online"
return False
def ModuleCheck():
global moduleb
return moduleb.isOnline()
I'm not quite sure about what might be going wrong. But it takes about an hour before the RPI runs out of memory.
The memory increases in size constantly and the button is only pressed once every 15 minutes or so.
That already tells me that the problem must be in the code displayed above.
The problem is that the yocto_api.YAPI object will continue to accumulate _Event objects in its _DataEvents dict (a class-wide attribute) until you call YAPI.YHandleEvents. If you're not using the API's callbacks, it's easy to think (I did, for hours) that you don't need to ever call this. The API docs aren't at all clear on the point:
If your program includes significant loops, you may want to include a call to this function to make sure that the library takes care of the information pushed by the modules on the communication channels. This is not strictly necessary, but it may improve the reactivity of the library for the following commands.
I did some playing around with API-level callbacks before I decided to periodically poll the sensors in my own code, and it's possible that some setting got left enabled in them that is causing these events to accumulate. If that's not the case, I can't imagine why they would say calling YHandleEvents is "not strictly necessary," unless they make ARM devices with unlimited RAM in Switzerland.
Here's the magic static method that thou shalt call periodically, no matter what. I'm doing so once every five seconds and that is taking care of the problem without loading down the system at all. API code that would accumulate unwanted events still smells to me, but it's time to move on.
#noinspection PyUnresolvedReferences
#staticmethod
def HandleEvents(errmsgRef=None):
"""
Maintains the device-to-library communication channel.
If your program includes significant loops, you may want to include
a call to this function to make sure that the library takes care of
the information pushed by the modules on the communication channels.
This is not strictly necessary, but it may improve the reactivity
of the library for the following commands.
This function may signal an error in case there is a communication problem
while contacting a module.
#param errmsg : a string passed by reference to receive any error message.
#return YAPI.SUCCESS when the call succeeds.
On failure, throws an exception or returns a negative error code.
"""
errBuffer = ctypes.create_string_buffer(YAPI.YOCTO_ERRMSG_LEN)
#noinspection PyUnresolvedReferences
res = YAPI._yapiHandleEvents(errBuffer)
if YAPI.YISERR(res):
if errmsgRef is not None:
#noinspection PyAttributeOutsideInit
errmsgRef.value = YByte2String(errBuffer.value)
return res
while len(YAPI._DataEvents) > 0:
YAPI.yapiLockFunctionCallBack(errmsgRef)
if not (len(YAPI._DataEvents)):
YAPI.yapiUnlockFunctionCallBack(errmsgRef)
break
ev = YAPI._DataEvents.pop(0)
YAPI.yapiUnlockFunctionCallBack(errmsgRef)
ev.invokeData()
return YAPI.SUCCESS

"Online" monkey patching of a function

Your program just paused on a pdb.set_trace().
Is there a way to monkey patch the function that is currently running, and "resume" execution?
Is this possible through call frame manipulation?
Some context:
Oftentimes, I will have a complex function that processes large quantities of data, without having a priori knowledge of what kind of data I'll find:
def process_a_lot(data_stream):
#process a lot of stuff
#...
data_unit= data_stream.next()
if not can_process(data_unit)
import pdb; pdb.set_trace()
#continue processing
This convenient construction launches a interactive debugger when it encounters unknown data, so I can inspect it at will and change process_a_lot code to handle it properly.
The problem here is that, when data_stream is big, you don't really want to chew through all the data again (let's assume next is slow, so you can't save what you already have and skip on the next run)
Of course, you can replace other functions at will once in the debugger. You can also replace the function itself, but it won't change the current execution context.
Edit:
Since some people are getting side-tracked:
I know there are a lot of ways of structuring your code such that your processing function is separate from process_a_lot. I'm not really asking about ways to structure the code as much as how to recover (in runtime) from the situation when the code is not prepared to handle the replacement.
First a (prototype) solution, then some important caveats.
# process.py
import sys
import pdb
import handlers
def process_unit(data_unit):
global handlers
while True:
try:
data_type = type(data_unit)
handler = handlers.handler[data_type]
handler(data_unit)
return
except KeyError:
print "UNUSUAL DATA: {0!r}". format(data_unit)
print "\n--- INVOKING DEBUGGER ---\n"
pdb.set_trace()
print
print "--- RETURNING FROM DEBUGGER ---\n"
del sys.modules['handlers']
import handlers
print "retrying"
process_unit("this")
process_unit(100)
process_unit(1.04)
process_unit(200)
process_unit(1.05)
process_unit(300)
process_unit(4+3j)
sys.exit(0)
And:
# handlers.py
def handle_default(x):
print "handle_default: {0!r}". format(x)
handler = {
int: handle_default,
str: handle_default
}
In Python 2.7, this gives you a dictionary linking expected/known types to functions that handle each type. If no handler is available for a type, the user is dropped own into the debugger, giving them a chance to amend the handlers.py file with appropriate handlers. In the above example, there is no handler for float or complex values. When they come, the user will have to add appropriate handlers. For example, one might add:
def handle_float(x):
print "FIXED FLOAT {0!r}".format(x)
handler[float] = handle_float
And then:
def handle_complex(x):
print "FIXED COMPLEX {0!r}".format(x)
handler[complex] = handle_complex
Here's what that run would look like:
$ python process.py
handle_default: 'this'
handle_default: 100
UNUSUAL DATA: 1.04
--- INVOKING DEBUGGER ---
> /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit()
-> print
(Pdb) continue
--- RETURNING FROM DEBUGGER ---
retrying
FIXED FLOAT 1.04
handle_default: 200
FIXED FLOAT 1.05
handle_default: 300
UNUSUAL DATA: (4+3j)
--- INVOKING DEBUGGER ---
> /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit()
-> print
(Pdb) continue
--- RETURNING FROM DEBUGGER ---
retrying
FIXED COMPLEX (4+3j)
Okay, so that basically works. You can improve and tweak that into a more production-ready form, making it compatible across Python 2 and 3, et cetera.
Please think long and hard before you do it that way.
This "modify the code in real-time" approach is an incredibly fragile pattern and error-prone approach. It encourages you to make real-time hot fixes in the nick of time. Those fixes will probably not have good or sufficient testing. Almost by definition, you have just this moment discovered you're dealing with a new type T. You don't yet know much about T, why it occurred, what its edge cases and failure modes might be, etc. And if your "fix" code or hot patches don't work, what then? Sure, you can put in some more exception handling, catch more classes of exceptions, and possibly continue.
Web frameworks like Flask have debug modes that work basically this way. But those are debug modes, and generally not suited for production. Moreover, what if you type the wrong command in the debugger? Accidentally type "quit" rather than "continue" and the whole program ends, and with it, your desire to keep the processing alive. If this is for use in debugging (exploring new kinds of data streams, maybe), have at.
If this is for production use, consider instead a strategy that sets aside unhandled-types for asynchronous, out-of-band examination and correction, rather than one that puts the developer / operator in the middle of a real-time processing flow.
No.
You can't moneky-patch a currently running Python function and continue pressing as though nothing else had happened. At least not in any general or practical way.
In theory, it is possible--but only under limited circumstances, with much effort and wizardly skill. It cannot be done with any generality.
To make the attempt, you'd have to:
Find the relevant function source and edit it (straightforward)
Compile the changed function source to bytecode (straightforward)
Insert the new bytecode in place of the old (doable)
Alter the function housekeeping data to point at the "logically" "same point" in the program where it exited to pdb. (iffy, under some conditions)
"Continue" from the debugger, falling back into the debugged code (iffy)
There are some circumstances where you might achieve 4 and 5, if you knew a lot about the function housekeeping and analogous debugger housekeeping variables. But consider:
The bytecode offset at which your pdb breakpoint is called (f_lasti in the frame object) might change. You'd probably have to narrow your goal to "alter only code further down in the function's source code than the breakpoint occurred" to keep things reasonably simple--else, you'd have to be able to compute where the breakpoint is in the newly-compiled bytecode. That might be feasible, but again under restrictions (such as "will only call pdb_trace() once, or similar "leave breadcrumbs for post-breakpoint analysis" stipulations).
You're going to have to be sharp at patching up function, frame, and code objects. Pay special attention to func_code in the function (__code__ if you're also supporting Python 3); f_lasti, f_lineno, and f_code in the frame; and co_code, co_lnotab, and co_stacksize in the code.
For the love of God, hopefully you do not intend to change the function's parameters, name, or other macro defining characteristics. That would at least treble the amount of housekeeping required.
More troubling, adding new local variables (a pretty common thing you'd want to do to alter program behavior) is very, very iffy. It would affect f_locals, co_nlocals, and co_stacksize--and quite possibly, completely rearrange the order and way bytecode accesses values. You might be able to minimize this by adding assignment statements like x = None to all your original locals. But depending on how the bytecodes change, it's possible you'll even have to hot-patch the Python stack, which cannot be done from Python per se. So C/Cython extensions could be required there.
Here's a very simple example showing that bytecode ordering and arguments can change significantly even for small alterations of very simple functions:
def a(x): LOAD_FAST 0 (x)
y = x + 1 LOAD_CONST 1 (1)
return y BINARY_ADD
STORE_FAST 1 (y)
LOAD_FAST 1 (y)
RETURN_VALUE
------------------ ------------------
def a2(x): LOAD_CONST 1 (2)
inc = 2 STORE_FAST 1 (inc)
y = x + inc LOAD_FAST 0 (x)
return y LOAD_FAST 1 (inc)
BINARY_ADD
STORE_FAST 2 (y)
LOAD_FAST 2 (y)
RETURN_VALUE
Be equally sharp at patching some of the pdb values that track where it's debugging, because when you type "continue," those are what dictates where control flow goes next.
Limit your patchable functions to those that have rather static state. They must, for example, never have objects that might be garbage-collected before the breakpoint is resumed, but accessed after it (e.g. in your new code). E.g.:
some = SomeObject()
# blah blah including last touch of `some`
# ...
pdb.set_trace()
# Look, Ma! I'm monkey-patching!
if some.some_property:
# oops, `some` was GC'd - DIE DIE DIE
While "ensuring the execution environment for the patched function is same as it ever was" is potentially problematic for many values, it's guaranteed to crash and burn if any of them exit their normal dynamic scope and are garbage-collected before patching alters their dynamic scope/lifetime.
Assert you only ever want to run this on CPython, since PyPy, Jython, and other Python implementations don't even have standard Python bytecodes and do their function, code, and frame housekeeping differently.
I would love to say this super-dynamic patching is possible. And I'm sure you can, with a lot of housekeeping object twiddling, construct simple cases where it does work. But real code has objects that go out of scope. Real patches might want new variables allocated. Etc. Real world conditions vastly multiply the effort required to make the patching work--and in some cases, make that patching strictly impossible.
And at the end of the day, what have you achieved? A very brittle, fragile, unsafe way to extend your processing of a data stream. There is a reason most monkey-patching is done at function boundaries, and even then, reserved for a few very-high-value use cases. Production data streaming is better served adopting a strategy that sets aside unrecognized values for out-of-band examination and accommodation.
If I understand correctly:
you don't want to repeat all the work that has already been done
you need a way to replace the #continue processing as usual with the new code once you have figured out how to handle the new data
#user2357112 was on the right track: expected_types should be a dictionary of
data_type:(detect_function, handler_function)
and detect_type needs to go through that to find a match. If no match is found, pdb pops up, you can then figure out what's going on, write a new detect_function and handler_funcion, add them to expected_types, and continue from pdb.
What I wanted to know is if there's a way to monkey patch the function that is currently running (process_a_lot), and "resume" execution.
So you want to somehow, from within pdb, write a new process_a_lot function, and then transfer control to it at the location of the pdb call?
Or, do you want to rewrite the function outside pdb, and then somehow reload that function from the .py file and transfer control into the middle of the function at the location of the pdb call?
The only possibility I can think of is: from within pdb, import your newly written function, then replace the current process_a_lot byte-code with the byte-code from the new function (I think it's func.co_code or something). Make sure you change nothing in the new function (not even the pdb lines) before the pdb lines, and it might work.
But even if it does, I would imagine it is a very brittle solution.

Categories