Python running multiple independent pieces of code - python

I am facing a small issue in my code. I have a main function that, given a certain condition arises, has to launch one or more different functions which deal with web scraping, in particular they use Selenium. The problem is that I would simply like to launch this web scraping "task", which is simply a python function, and not wait for it to terminate, rather let it go on independently from the rest of my code, so that I might be independently running 5 different instances of the same function, without waiting for them to terminate.
Some pseudo code:
while True:
condition = SomeComputation()
if(condition):
IndependentFunction( some_parameter )
Once IndependtFunction is called, I would like to not have to wait for it to end. I have looked at multiprocessing, but from what I understood I might not need such type of parallelisation.
Thanks!

You would need multithreading in order to do that. The basic usage of threading module on par with your independent function could be like this:
import threading
while True:
condition = SomeComputation()
if(condition):
newThread = threading.Thread(target=IndependentFunction, args=(some_parameter,), daemon=True)
newThread.start()
That daemon=True argument means that the thread will execute fully independently, and the main program will not wait for it to finish what it is doing before quitting the program entirely.
Check this page for a more detailed tutorial.

If you're not depending on the output of that scraping then you could use threading
it would be like
mytask = threading.Thread(myfunction,args=(arg1,arg2,argn,))
mytask.start()
more details documentation: https://docs.python.org/3/library/threading.html

Related

Twisted callRemote

I have to make remote calls that can take quite a long time (over 60 seconds). Our entire code relies on processing the return value from the callRemote, so that's pretty bad since we're blocking on IO the whole time despite using twqisted + 50 worker threads running.
We currently use something like
result = threads.blockingCallFromThread(reactor, callRemote, "method", args)
and get the result/go on, but as its name says it's blocking the event loop so we cannot wait for several results at the same time.
THere's no way I can refactor the whole code to make it asynchronous so I think the only way is to defer the long IO tasks to threads.
I'm trying to make the remote calls in threads, but I can't find a way to get the result from the blocking calls back. The remoteCalls are made, the result is somewhere but I just can't get a hook on it.
What I'm trying to do currently looks like
reactor.callInThread(callRemote, name, *args, **kw)
which returns a empty Deferred (why ?).
I'm trying to put the result in some sort of queue but it just won't work. How do I do that ?
AFAIK, blockingCallFromThread executes code in reactor's thread. That's why it doesn't work as you need.
If I understand you properly, you need to move some operation out off reactors thread and get the result into reactors thread.
I use approach with deferToThread for the same case.
Example with deferreds:
import time
from twisted.internet import reactor, threads
def doLongCalculation():
time.sleep(1)
return 3
def printResult(x):
print x
# run method in thread and get result as defer.Deferred
d = threads.deferToThread(doLongCalculation)
d.addCallback(printResult)
reactor.run()
Also, you might be interested in threads.deferToThreadPool.
Documentation about threading in Twisted.

time.sleep that allows parent application to still evaluate?

I've run into situations as of late when writing scripts for both Maya and Houdini where I need to wait for aspects of the GUI to update before I can call the rest of my Python code. I was thinking calling time.sleep in both situations would have fixed my problem, but it seems that time.sleep just holds up the parent application as well. This means my script evaluates the exact same regardless of whether or not the sleep is in there, it just pauses part way through.
I have a thought to run my script in a separate thread in Python to see if that will free up the application to still run during the sleep, but I haven't had time to test this yet.
Thought I would ask in the meantime if anybody knows of some other solution to this scenario.
Maya - or more precisely Maya Python - is not really multithreaded (Python itself has a dodgy kind of multithreading because all threads fight for the dread global interpreter lock, but that's not your problem here). You can run threaded code just fine in Maya using the threading module; try:
import time
import threading
def test():
for n in range (0, 10):
print "hello"
time.sleep(1)
t = threading.Thread(target = test)
t.start()
That will print 'hello' to your listener 10 times at one second intervals without shutting down interactivity.
Unfortunately, many parts of maya - including most notably ALL user created UI and most kinds of scene manipulation - can only be run from the "main" thread - the one that owns the maya UI. So, you could not do a script to change the contents of a text box in a window using the technique above (to make it worse, you'll get misleading error messages - code that works when you run it from the listener but errors when you call it from the thread and politely returns completely wrong error codes). You can do things like network communication, writing to a file, or long calculations in a separate thread no problem - but UI work and many common scene tasks will fail if you try to do them from a thread.
Maya has a partial workaround for this in the maya.utils module. You can use the functions executeDeferred and executeInMainThreadWithResult. These will wait for an idle time to run (which means, for example, that they won't run if you're playing back an animation) and then fire as if you'd done them in the main thread. The example from the maya docs give the idea:
import maya.utils import maya.cmds
def doSphere( radius ):
maya.cmds.sphere( radius=radius )
maya.utils.executeInMainThreadWithResult( doSphere, 5.0 )
This gets you most of what you want but you need to think carefully about how to break up your task into threading-friendly chunks. And, of course, running threaded programs is always harder than the single-threaded alternative, you need to design the code so that things wont break if another thread messes with a variable while you're working. Good parallel programming is a whole big kettle of fish, although boils down to a couple of basic ideas:
1) establish exclusive control over objects (for short operations) using RLocks when needed
2) put shared data into safe containers, like Queue in #dylan's example
3) be really clear about what objects are shareable (they should be few!) and which aren't
Here's decent (long) overview.
As for Houdini, i don't know for sure but this article makes it sound like similar issues arise there.
A better solution, rather than sleep, is a while loop. Set up a while loop to check a shared value (or even a thread-safe structure like a Queue). The parent processes that your waiting on can do their work (or children, it's not important who spawns what) and when they finish their work, they send a true/false/0/1/whatever to the Queue/variable letting the other processes know that they may continue.

How to use python (maya) multithreading

I've been looking at examples from other people but I can't seem to get it to work properly.
It'll either use a single core, or basically freeze up maya if given too much to process, but I never seem to get more than one core working at once.
So for example, this is kind of what I'd like it to do, on a very basic level. Mainly just let each loop run simultaneously on a different processor with the different values (in this case, the two values would use two processors)
mylist = [50, 100, 23]
newvalue = [50,51]
for j in range(0, len(newvalue)):
exists = False
for i in range(0, len(mylist)):
#search list
if newvalue[j] == mylist[i]:
exists = True
#add to list
if exists == True:
mylist.append(mylist)
Would it be possible to pull this off? The actual code I'm wanting to use it on can take from a few seconds to like 10 minutes for each loop, but they could theoretically all run at once, so I thought multithreading would speed it up loads
Bear in mind I'm still relatively new to python so an example would be really appreciated
Cheers :)
There are really two different answers to this.
Maya scripts are really supposed to run in the main UI thread, and there are lots of ways they can trip you up if run from a separate thread. Maya includes a module called maya.utils which includes methods for deferred evaluation in the main thread. Here's a simple example:
import maya.cmds as cmds
import maya.utils as utils
import threading
def do_in_main():
utils.executeDeferred (cmds.sphere)
for i in range(10):
t = threading.Thread(target=do_in_main, args=())
t.start()
That will allow you to do things with the maya ui from a separate thread (there's another method in utils that will allow the calling thread to await a response too). Here's a link to the maya documentation on this module
However, this doesn't get you around the second aspect of the question. Maya python isn't going to split up the job among processors for you: threading will let you create separate threads but they all share the same python intepreter and the global interpreter lock will mean that they end up waiting for it rather than running along independently.
You can't use the multiprocessing module, at least not AFAIK, since it spawns new mayas rather than pushing script execution out into other processors in the Maya you are running within. Python aside, Maya is an old program and not very multi-core oriented in any case. Try XSI :)
Any threading stuff in Maya is tricky in any case - if you touch the main application (basically, any function from the API or a maya.whatever module) without the deferred execution above, you'll probably crash maya. Only use it if you have to.
And, BTW, you cant use executeDeferred, etc in batch mode since they are implemented using the main UI loop.
What theodox says is still true today, six years later. However one may go another route by spawning a new process by using the subprocess module. You'll have to communicate and share data via sockets or something similar since the new process is in a seperate interpreter. The new interpreter runs on its own and doesn't know about Maya but you can do any other work in it benefitting from the multi-threaded environment your OS provides before communicating it back to your Maya python script.

Alternative python library for managing threads

I had some annoyances with spawning subprocesses, like getting correct output and so on. A wrapper library, envoy, solved all of my problems with an easy-to-use interface that gets rid of most problems.
Using thread, I sometimes struggle with hanging processes that does not end, external programs launched within threads that I can't reach anymore and so on.
Is there any "threading for dummies" python library out there? Thanks
Is there any "threading for dummies" python library out there?
No, there is not. threading is pretty simple to use in simple cases. You want to use it to introduce concurrency in your program. This means you want to use it whenever you want two or more actions to happen simultaneously, i.e. at the same time.
This is how you can let Peter build a house and let Igor drive to Moskow at the same time:
from threading import Thread
import time
def drive_bus():
time.sleep(1)
print "Igor: I'm Igor and I'm driving to... Moskow!"
time.sleep(9)
print "Igor: Yei, Moskow!"
def build_house():
print "Peter: Let's start building a large house..."
time.sleep(10.1)
print "Peter: Urks, we have no tools :-("
threads = [Thread(target=drive_bus), Thread(target=build_house)]
for t in threads:
t.start()
for t in threads:
t.join()
Isn't that simple? Define your function to be run in another thread. Create a threading.Thread instance with that function as target. Nothing happend so far, until you invoke start. It fires off the thread and immediately returns.
Before letting your main thread exit, you should wait for all the threads you have spawned to finish. This is what t.join() does: it blocks and waits for the thread t to finish. Only then it returns.
I would recommend reading more about the actual Python library - it is simple enough. Your problem with hanging threads, provided it prevents your application from exiting, may be solved by using daemon threads.
What kind of task are you trying to achieve? If you are trying to run a task in parallel without actual use of the custom threading, you may find the package multiprocessing useful. Furthermore, there is an interesting piece of information on the python wiki about parallel processing.
Could you elaborate a bit more on the task please?

How to use thread on just one line of python, not for all the script

I need to know if there is a way I can just use multithreading in a single line, rather than the full program.
That is, I want to run the program as a single thread, but in the middle of it I want one particular line to be multithreaded. I want to multithread something like this:
br.open("http://www.google.com.pk/search?q="+str(m))
So, basically I want to open all the links (m can assume 50 distinct values), and I don't want to open them one after the other. I want to open all 50 at once! If I open them one by one, it slows the process down and I want to avoid that.
You want to look at the threading.Thread class.
import threading
def worker():
"""thread worker function"""
print 'Worker'
return
threads = []
for i in range(5):
t = threading.Thread(target=worker)
threads.append(t)
t.start()
If you are doing multiple HTTP requests you may also want to look at queue.Queue You would then queue up multiple http requests. Here is a nice example of that in action.
You are after asynchronous communications. I haven't tried it myself, but take a look at grequests. It used to be part of the requests library, but has been factored out. Usage (from their github page) seems very easy:
import grequests
urls = [
'http://www.heroku.com',
'http://tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://kennethreitz.com'
]
rs = (grequests.get(u) for u in urls)
grequests.map(rs)
If you want to keep within the Python standard library, you might take a look at the asyncore module.
Also, another alternative is Twisted. It might be overkill for your requirements, and the learning curve is notoriously steep.
I would here be actually trying to answer the original question asked by the OP, I don't recommend this code due to obvious readability issues but just like in Java you can actually run a thread from a single line in Python.
We will be using the Thread Class from Python
from threading import Thread
Define the Function which you want to call.
def func1():
print("Hello World")
Start a thread from one single line
Thread(target=func1).start()
That is how you start one liner thread in python.
For information on how to pass arguments to target function

Categories