Threading in Python - python

I have two definitions or methods in python. I'd like to run them at the same exact time. Originally I tried to use forking but since the child retained the memory from the parent, it's writing multiple things that I don't need in a file. So I switched to threading.
I have something similar to
import threading
class test(threading.Thread)
def __init__(self,numA, list):
self.__numA=numA # (random number)
self.__list=list #(list)
def run(self):
makelist(self)
makelist2(self)
makelist() and makelist2() use numA and list. So in those definitions/methods instead of saying
print list
I say
print self.__list.
In the main() I made a new class object:
x = test()
x.start()
When I run my program I get an attribute error saying it cannot recognize the __list or __numA.
I've been stuck on this for a while. If there's another better way to run two methods at the same time (the methods are not connected at all) please inform me of so and explain how.
Thank you.

The __list and __numA won't be visible from makelist and makelist2 if they are not also members of the same class. The double underscore will make things like this fail:
>>> class A(object):
... def __init__(self):
... self.__a = 2
...
>>> def f(x):
... print x.__a
...
>>> a = A()
>>> f(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
AttributeError: 'A' object has no attribute '__a'
But, naming the __a something without two leading underscores would work. Is that what you are seeing?
You can read more about private variables in the python documentation.

Firstly, don't name your variable the same as built-in types or functions, i.e. list.
Secondly. as well as the problems that others have pointed out (__ name mangling, initialising Thread etc), if your intention is to run makelist and makelist2 at the same time then you are doing it wrong, since your run method will still execute them one after the other. You need to run them in separate threads, not sequentially in the same thread.
Thirdly how exact do you mean by "same exact time"? Using threads in (C)Python this is physically impossible, since the execution will be interleaved at the bytecode level. Other versions of Python (Jython, IronPython etc) may run them at exactly the same time on a multi-core system, but even then you have no control over when the OS scheduler will start each one.
Finally it is a bad idea to share mutable objects between threads, since if both threads change the data at the same time then unpredictable things can (and will) happen. You need to protect against this by either using locks or only passing round immutable data or copies of the data. Using locks can also cause its own problems if you are not careful, such as deadlocks.

I'd like to run them at the same exact time.
You can't do this with threading: the Global Interpreter Lock in Python ensures that only one thread can execute Python code at any time (threads are switched every sys.getcheckinterval() bytecodes). Use multiprocessing instead:
from multiprocessing import Process
import os
def info(title):
print title
print 'module name:', __name__
print 'parent process:', os.getppid()
print 'process id:', os.getpid()
def f(name):
info('function f')
print 'hello', name
if __name__ == '__main__':
info('main line')
p = Process(target=f, args=('bob',))
p.start()
p.join()

A Couple of things:
A) When you override the __init__ method of the threading.Thread object you need to initialize threading.Thread yourself which can be accomplished by putting "threading.Thread.__init__(self)" at the end of the __init__ function
B) As msw pointed out those calls to "makelist" and "makelist2" seem to be to global functions which kinda
defeats the purpose of the threading. I recommend making them functions of test.

Related

multiprocessing.Queue.put_nowait blocks when used in destructor

I'm trying to use a multiprocessing.Queue to communicate something when an object in another process which has this queue as an attribute is deleted. While doing so I noticed that Queue.put() with block=False (or equivalently Queue.put_nowait()) blocks anyway under certain circumstances and boiled it down to the following minimal reproducible example:
import multiprocessing as mp
class A:
def __init__(self):
self.q = mp.Queue()
# uncommenting this makes it work fine:
# self.q.put("test")
def __del__(self):
print("before putting CLOSE")
self.q.put_nowait("CLOSE")
print("after putting CLOSE")
a = A()
# uncommenting this also makes it work fine:
# del a
Running this code, the output gets as far as
before putting CLOSE
and then it freezes indefinitely.
I'm at a loss as to what's going on here and why this happens. Queue.put_nowait() seems to block if and only if it's called from an object's destructor (__del__()) and it hasn't had data put into it before and the destructor was called due to the object going out of scope rather than due to explicit del. The same exact thing happens if the queue is a global variable rather than an attribute of A, as long as put_nowait() is called within A's destructor.
Interestingly, aborting the script with Ctrl+C doesn't result in the usual exception output (KeyboardInterrupt etc.) but quits without any output.
Am I missing something obvious here? Why does this happen?

Use multiprocessing to run functions inside a while loop in a class method

I have a method which calculates a final result using multiple other methods. It has a while loop inside which continuously checks for new data, and if new data is received, it runs the other methods and calculates the results. This main method is the only one which is called by the user, and it stays active until the program is closed. the basic structure is as follows:
class sample:
def __init__(self):
results = []
def main_calculation(self):
while True:
#code to get data
if newdata != olddata:
#insert code to prepare data for analysis
res1 = self.calc1(prepped_data)
res2 = self.calc2(prepped_data)
final = res1 + res2
self.results.append(final)
I want to run calc1 and calc2 in parallel, so that I can get the final result faster. However, I am unsure of how to implement multiprocessing in this way, since I'm not using a __main__ guard. Is there any way to run these processes in parallel?
This is likely not the best organization for this code, but it is what is easiest for the actual calculations I am running, since it is necessary that this code be imported and run from a different file. However, I can restructure the code if this is not a salvageable structure.
According to the documentation, the reason you need to use a __main__ guard is that when your program creates a multiprocessing.Process object, it starts up a whole new copy of the Python interpreter which will import a new copy of your program's modules. If importing your module calls multiprocessing.Process() itself, that will create yet another copy of the Python interpreter which interprets yet another copy of your code, and so on until your system crashes (or actually, until Python hits a non-reentrant piece of the multiprocessing code).
In the main module of your program, which usually calls some code at the top level, checking __name__ == '__main__' is the way you can tell whether the program is being run for the first time or is being run as a subprocess. But in a different module, there might not be any code at the top level (other than definitions), and in that case there's no need to use a guard because the module can be safely imported without starting a new process.
In other words, this is dangerous:
import multiprocessing as mp
def f():
...
p = mp.Process(target=f)
p.start()
p.join()
but this is safe:
import multiprocessing as mp
def f():
...
def g():
p = mp.Process(target=f)
p.start()
p.join()
and this is also safe:
import multiprocessing as mp
def f():
...
class H:
def g(self):
p = mp.Process(target=f)
p.start()
p.join()
So in your example, you should be able to directly create Process objects in your function.
However, I'd suggest making it clear in the documentation for the class that that method creates a Process, because whoever uses it (maybe you) needs to know that it's not safe to call that method at the top level of a module. It would be like doing this, which also falls in the "dangerous" category:
import multiprocessing as mp
def f():
...
class H:
def g(self):
p = mp.Process(target=f)
p.start()
p.join()
H().g() # this creates a Process at the top level
You could also consider an alternative approach where you make the caller do all the process creation. In this approach, either your sample class constructor or the main_calculation() method could accept, say, a Pool object, and it can use the processes from that pool to do its calculations. For example:
class sample:
def main_calculation(self, pool):
while True:
if newdata != olddata:
res1_async = pool.apply_async(self.calc1, [prepped_data])
res2_async = pool.apply_async(self.calc2, [prepped_data])
res1 = res1_async.get()
res2 = res2_async.get()
# and so on
This pattern may also allow your program to be more efficient in its use of resources, if there are many different calculations happening, because they can all use the same pool of processes.

Python multiprocessing pool get integer identifying call

import multiprocessing
def func():
# Print call index
pass
p = multiprocessing.Pool(4)
p.map(func, data)
In the above code, is it possible to print in func(), an integer identifying the number of times it has been called 0, 1...
I want to do this because there is some code in func() which I only want to execute once, and it cannot be moved outside of func()
What you want to do involves sharing state between processes -- which can get to be very hairy in Python. If you don't think you can restructure your program to avoid it, multiprocessing has most of the useful synchronization constructs from the threading module: https://docs.python.org/2/library/multiprocessing.html#synchronization-between-processes. It sounds like maybe you need a lock and some shared memory.

What is being pickled when I call multiprocessing.Process?

I know that multiprocessing uses pickling in order to have the processes run on different CPUs, but I think I am a little confused as to what is being pickled. Lets look at this code.
from multiprocessing import Process
def f(I):
print('hello world!',I)
if __name__ == '__main__':
for I in (range1, 3):
Process(target=f,args=(I,)).start()
I assume what is being pickled is the def f(I) and the argument going in. First, is this assumption correct?
Second, lets say f(I) has a function call within in it like:
def f(I):
print('hello world!',I)
randomfunction()
Does the randomfunction's definition get pickled as well, or is it only the function call?
Further more, if that function call was located in another file, would the process be able to call it?
In this particular example, what gets pickled is platform dependent. On systems that support os.fork, like Linux, nothing is pickled here. Both the target function and the args you're passing get inherited by the child process via fork.
On platforms that don't support fork, like Windows, the f function and args tuple will both be pickled and sent to the child process. The child process will re-import your __main__ module, and then unpickle the function and its arguments.
In either case, randomfunction is not actually pickled. When you pickle f, all you're really pickling is a pointer for the child function to re-build the f function object. This is usually little more than a string that tells the child how to re-import f:
>>> def f(I):
... print('hello world!',I)
... randomfunction()
...
>>> pickle.dumps(f)
'c__main__\nf\np0\n.'
The child process will just re-import f, and then call it. randomfunction will be accessible as long as it was properly imported into the original script to begin with.
Note that in Python 3.4+, you can get the Windows-style behavior on Linux by using contexts:
ctx = multiprocessing.get_context('spawn')
ctx.Process(target=f,args=(I,)).start() # even on Linux, this will use pickle
The descriptions of the contexts are also probably relevant here, since they apply to Python 2.x as well:
spawn
The parent process starts a fresh python interpreter process.
The child process will only inherit those resources necessary to run
the process objects run() method. In particular, unnecessary file
descriptors and handles from the parent process will not be inherited.
Starting a process using this method is rather slow compared to using
fork or forkserver.
Available on Unix and Windows. The default on Windows.
fork
The parent process uses os.fork() to fork the Python interpreter.
The child process, when it begins, is effectively identical to the
parent process. All resources of the parent are inherited by the child
process. Note that safely forking a multithreaded process is
problematic.
Available on Unix only. The default on Unix.
forkserver
When the program starts and selects the forkserver start
method, a server process is started. From then on, whenever a new
process is needed, the parent process connects to the server and
requests that it fork a new process. The fork server process is single
threaded so it is safe for it to use os.fork(). No unnecessary
resources are inherited.
Available on Unix platforms which support passing file descriptors
over Unix pipes.
Note that forkserver is only available in Python 3.4, there's no way to get that behavior on 2.x, regardless of the platform you're on.
The function is pickled, but possibly not in the way you think of it:
You can look at what's actually in a pickle like this:
pickletools.dis(pickle.dumps(f))
I get:
0: c GLOBAL '__main__ f'
12: p PUT 0
15: . STOP
You'll note that there is nothing in there correspond to the code of the function. Instead, it has references to __main__ f which is the module and name of the function. So when this is unpickled, it will always attempt to lookup the f function in the __main__ module and use that. When you use the multiprocessing module, that ends up being a copy of the same function as it was in your original program.
This does mean that if you somehow modify which function is located at __main__.f you'll end up unpickling a different function then you pickled in.
Multiprocessing brings up a complete copy of your program complete with all the functions you defined it. So you can just call functions. The entire function isn't copied over, just the name of the function. The pickle module's assumption is that function will be same in both copies of your program, so it can just lookup the function by name.
Only the function arguments (I,) and the return value of the function f are pickled. The actual definition of the function f has to be available when loading the module.
The easiest way to see this is through the code:
from multiprocessing import Process
if __name__ == '__main__':
def f(I):
print('hello world!',I)
for I in [1,2,3]:
Process(target=f,args=(I,)).start()
That returns:
AttributeError: 'module' object has no attribute 'f'

Python simplest form of multiprocessing

Ive been trying to read up on threading and multiprocessing but all the examples are to intricate and advanced for my level of python/programming knowlegde. I want to run a function, which consists of a while loop, and while that loop runs I want to continue with the program and eventually change the condition for the while-loop and end that process. This is the code:
class Example():
def __init__(self):
self.condition = False
def func1(self):
self.condition = True
while self.condition:
print "Still looping"
time.sleep(1)
print "Finished loop"
def end_loop(self):
self.condition = False
The I make the following function-calls:
ex = Example()
ex.func1()
time.sleep(5)
ex.end_loop()
What I want is for the func1 to run for 5s before the end_loop() is called and changes the condition and ends the loop and thus also the function. I.e I want one process to start and "go" into func1 and at the same time I want time.sleep(5) to be called, so the processes "split" when arriving at func1, one process entering the function while the other continues down the program and start with the time.sleep(5) execution.
This must be the most basic example of a multiprocess, still Ive had trouble finding a simple way to do it!
Thank you
EDIT1: regarding do_something. In my real problem do_something is replaced by some code that communicates with another program via a socket and receives packages with coordinates every 0.02s and stores them in membervariables of the class. I want this constant updating of the coordinates to start and then be able to to read the coordinates via other functions at the same time.
However that is not so relevant. What if do_something is replaced by:
time.sleep(1)
print "Still looping"
How do I solve my problem then?
EDIT2: I have tried multiprocessing like this:
from multiprocessing import Process
ex = Example()
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
p1.start()
time.sleep(5)
p2.start()
When I ran this, I never got to p2.start(), so that did not help. Even if it had this is not really what Im looking for either. What I want would be just to start the process p1, and then continue with time.sleep and ex.end_loop()
The first problem with your code are the calls
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
With ex.func1() you're calling the function and pass the return value as target parameter. Since the function doesn't return anything, you're effectively calling
p1 = Process(target=None)
p2 = Process(target=None)
which makes, of course, no sense.
After fixing that, the next problem will be shared data: when using the multiprocessing package, you implement concurrency using multiple processes which, by default, cannot simply share data afaik. Have a look at Sharing state between processes in the package's documentation to read about this. Especially take the first sentence into account: "when doing concurrent programming it is usually best to avoid using shared state as far as possible"!
So you might want to also have a look at Exchanging objects between processes to read about how to send/receive data between two different processes. So, instead of simply setting a flag to stop the loop, it might be better to send a message to signal the loop should be terminated.
Also note that processes are a heavyweight form of multiprocessing, they spawn multiple OS processes which comes with a relatively big overhead. multiprocessing's main purpose is to avoid problems imposed by Python's Global Interpreter Lock (google about this to read more...) If your problem is'nt much more complex than what you've told us, you might want to use the threading package instead: threads come with less overhead than processes and also allow to access the same data (although you really should read about synchronization when doing this...)
I'm afraid, multiprocessing is an inherently complex subject. So I think you will need to advance your programming/python skills to successfully use it. But I'm sure you'll manage this, the python documentation about this is comprehensive and there are a lot of other resources about this.
To tackle your EDIT2 problem, you could try using the shared memory map Value.
import time
from multiprocessing import Process, Value
class Example():
def func1(self, cond):
while (cond.value == 1):
print('do something')
time.sleep(1)
return
if __name__ == '__main__':
ex = Example()
cond = Value('i', 1)
proc = Process(target=ex.func1, args=(cond,))
proc.start()
time.sleep(5)
cond.value = 0
proc.join()
(Note the target=ex.func1 without the parentheses and the comma after cond in args=(cond,).)
But look at the answer provided by MartinStettner to find a good solution.

Categories