Strange python queue behavior. Crashes if queue isn't named "queue" - python

The name kind of says it all. I'm writing this program in python 2.7, and I'm trying to take advantage of threaded queues to make a whole bunch of web requests. Here's the problem: I would like to have two different queues, one to handle the threaded requests, and a separate one to handle the responses. If I have a queue in my program that isn't named "queue", for example if I want the initial queue to be named "input_q", then the program crashes and just refuses to work. This makes absolutely no sense to me. In the code below, all of the imported custom modules work just fine (at least, they did independently, passed all unit tests, and don't see any reason they could be the source of the problem).
Also, via diagnostic statements, I have determined that it crashes just before it spawns the thread pool.
Thanks in advance.
EDIT: Crash may be the wrong term here. It actually just stops. Even after waiting half an hour to complete, when the original program ran in under thirty seconds, the program wouldn't run. When I told it to print out toCheck, it would only make it part way through the list, stop in the middle of an entry, and do nothing.
EDIT2: Sorry for wasting everyones time, I forgot about this post. Someone had changed one of my custom modules (threadcheck). It looks like it was initializing the module, then running along its merry way with the rest of the program. Threadcheck was crashing after initialization, when the program was in the middle of computations, and that crash was taking the whole thing down with it.
code:
from binMod import binExtract
from grabZip import grabZip
import random
import Queue
import time
import threading
import urllib2
from threadCheck import threadUrl
import datetime
queue = Queue.Queue()
#output_q = Queue.Queue()
#input_q = Queue.Queue()
#output = queue
p=90
qb = 22130167533
url = grabZip(qb)
logFile = "log.txt"
metaC = url.grabMetacell()
toCheck = []
print metaC[0]['images']
print "beginning random selection"
for i in range(4):
if (len(metaC[i]['images'])>0):
print metaC[i]['images'][0]
for j in range(len(metaC[i]['images'])):
chance = random.randint(0, 100)
if chance <= p:
toCheck.append(metaC[i]['images'][j]['resolution 7 url'])
print "Spawning threads..."
for i in range(20):
t = threadUrl(queue)
t.setDaemon(True)
t.start()
print "initializing queue..."
for i in range(len(toCheck)):
queue.put(toCheck[i])
queue.join()
#input_q.join()
output = open(logFile, 'a')
done = datetime.datetime.now()
results = "\n %s \t %s \t %s \t %s"%(done, qb, good, bad)
output.write(results)

What the names are is irrelevant to Python -- Python doesn't care, and the objects themselves (for the most part) don't even know the names they have been assigned to. So the problem has to be somewhere else.
As has been suggested in the comments, carefully check your renames of queue.
Also, try it without daemon mode.

Related

Share variables across scripts

I have 2 separate scripts working with the same variables.
To be more precise, one code edits the variables and the other one uses them (It would be nice if it could edit them too but not absolutely necessary.)
This is what i am currently doing:
When code 1 edits a variable it dumps it into a json file.
Code 2 repeatedly opens the json file to get the variables.
This method is really not elegant and the while loop is really slow.
How can i share variables across scripts?
My first scripts gets data from a midi controller and sends web-requests.
My second script is for LED strips (those run thanks to the same midi controller). Both script run in a "while true" loop.
I can't simply put them in the same script since every webrequest would slow the LEDs down. I am currently just sharing the variables via a json file.
If enough people ask for it i will post the whole code but i have been told not to do this
Considering the information you provided, meaning...
Both script run in a "while true" loop.
I can't simply put them in the same script since every webrequest would slow the LEDs down.
To me, you have 2 choices :
Use a client/server model. You have 2 machines. One acts as the server, and the second as the client. The server has a script with an infinite loop that consistently updates the data, and you would have an API that would just read and expose the current state of your file/database to the client. The client would be on another machine, and as I understand it, it would simply request the current data, and process it.
Make a single multiprocessing script. Each script would run on a separate 'thread' and would manage its own memory. As you also want to share variables between your two programs, you could pass as argument an object that would be shared between both your programs. See this resource to help you.
Note that there are more solutions to this. For instance, you're using a JSON file that you are consistently opening and closing (that is probably what takes the most time in your program). You could use a real Database that could handle being opened only once, and processed many times, while still being updated.
a Manager from multiprocessing lets you do this sort thing pretty easily
first I simplify your "midi controller and sends web-request" code down to something that just sleeps for random amounts of time and updates a variable in a managed dictionary:
from time import sleep
from random import random
def slow_fn(d):
i = 0
while True:
sleep(random() ** 2)
i += 1
d['value'] = i
next we simplify the "LED strip" control down to something that just prints to the screen:
from time import perf_counter
def fast_fn(d):
last = perf_counter()
while True:
sleep(0.05)
value = d.get('value')
now = perf_counter()
print(f'fast {value} {(now - last) * 1000:.2f}ms')
last = now
you can then run these functions in separate processes:
import multiprocessing as mp
with mp.Manager() as manager:
d = manager.dict()
procs = []
for fn in [slow_fn, fast_fn]:
p = mp.Process(target=fn, args=[d])
procs.append(p)
p.start()
for p in procs:
p.join()
the "fast" output happens regularly with no obvious visual pauses

How to call method from different class using multiprocess pool python

How do I call a method from a different class (different module) with the use of Multiprocess pool in python?
My aim is to start a process which keep running until some task is provide, and once task is completed it will again go back to waiting mode.
Below is code, which has three module, Reader class is my run time task, I will provide execution of reader method to ProcessExecutor.
Process executor is process pool, it will continue while loop until some task is provided to it.
Main module which initiates everything.
Module 1
class Reader(object):
def __init__(self, message):
self.message = message
def reader(self):
print self.message
Module 2
class ProcessExecutor():
def run(self, queue):
print 'Before while loop'
while True:
print 'Reached Run'
try:
pair = queue.get()
print 'Running process'
print pair
func = pair.get('target')
arguments = pair.get('args', None)
if arguments is None:
func()
else:
func(arguments)
queue.task_done()
except Exception:
print Exception.message
main Module
from process_helper import ProcessExecutor
from reader import Reader
import multiprocessing
import Queue
if __name__=='__main__':
queue = Queue.Queue()
myReader = Reader('Hi')
ps = ProcessExecutor()
pool = multiprocessing.Pool(2)
pool.apply_async(ps.run, args=(queue, ))
param = {'target': myReader.reader}
queue.put(param)
Code executed without any error: C:\Python27\python.exe
C:/Users/PycharmProjects/untitled1/main/main.py
Process finished with exit code 0
Code gets executed but it never reached to run method. I am not sure is it possible to call a method of the different class using multi-processes or not
I tried apply_async, map, apply but none of them are working.
All example searched online are calling target method from the script where the main method is implemented.
I am using python 2.7
Please help.
Your first problem is that you just exit without waiting on anything. You have a Pool, a Queue, and an AsyncResult, but you just ignore all of them and exit as soon as you've created them. You should be able to get away with only waiting on the AsyncResult (after that, there's no more work to do, so who cares what you abandon), except for the fact that you're trying to use Queue.task_done, which doesn't make any sense without a Queue.join on the other side, so you need to wait on that as well.
Your second problem is that you're using the Queue from the Queue module, instead of the one from the multiprocessing module. The Queue module only works across threads in the same process.
Also, you can't call task_done on a plain Queue; that's only a method for the JoinableQueue subclass.
Once you've gotten to the point where the pool tries to actually run a task, you will get the problem that bound methods can't be pickled unless you write a pickler for them. Doing that is a pain, even though it's the right way. The traditional workaround—hacky and cheesy, but everyone did it, and it works—is to wrap each method you want to call in a top-level function. The modern solution is to use the third-party dill or cloudpickle libraries, which know how to pickle bound methods, and how to hook into multiprocessing. You should definitely look into them. But, to keep things simple, I'll show you the workaround.
Notice that, because you've created an extra queue to pass methods onto, in addition to the one built into the pool, you'll need the workaround for both targets.
With these problems fixed, your code looks like this:
from process_helper import ProcessExecutor
from reader import Reader
import multiprocessing
def call_run(ps):
ps.run(queue)
def call_reader(reader):
return reader.reader()
if __name__=='__main__':
queue = multiprocessing.JoinableQueue()
myReader = Reader('Hi')
ps = ProcessExecutor()
pool = multiprocessing.Pool(2)
res = pool.apply_async(call_run, args=(ps,))
param = {'target': call_reader, 'args': myReader}
queue.put(param)
print res.get()
queue.join()
You have additional bugs beyond this in your ProcessReader, but I'm not going to debug everything for you. This gets you past the initial hurdles, and shows the answer to the specific question you were asking about. Also, I'm not sure what the point of all that code is. You seem to be trying to replace what Pool already does on top of Pool, only in a more complicated but less powerful way, but I'm not entirely sure.
Meanwhile, here's a program that does what I think you want, with no problems, by just throwing away that ProcessExecutor and everything that goes with it:
from reader import Reader
import multiprocessing
def call_reader(reader):
return reader.reader()
if __name__=='__main__':
myReader = Reader('Hi')
pool = multiprocessing.Pool(2)
res = pool.apply_async(call_reader, args=(myReader,))
print res.get()

Autodesk's Fbx Python and threading

I'm trying to use the fbx python module from autodesk, but it seems I can't thread any operation. This seems due to the GIL not relased. Has anyone found the same issue or am I doing something wrong? When I say it doesn't work, I mean the code doesn't release the thread and I'm not be able to do anything else, while the fbx code is running.
There isn't much of code to post, just to know whether it did happen to anyone to try.
Update:
here is the example code, please note each fbx file is something like 2GB
import os
import fbx
import threading
file_dir = r'../fbxfiles'
def parse_fbx(filepath):
print '-' * (len(filepath) + 9)
print 'parsing:', filepath
manager = fbx.FbxManager.Create()
importer = fbx.FbxImporter.Create(manager, '')
status = importer.Initialize(filepath)
if not status:
raise IOError()
scene = fbx.FbxScene.Create(manager, '')
importer.Import(scene)
# freeup memory
rootNode = scene.GetRootNode()
def traverse(node):
print node.GetName()
for i in range(0, node.GetChildCount()):
child = node.GetChild(i)
traverse(child)
# RUN
traverse(rootNode)
importer.Destroy()
manager.Destroy()
files = os.listdir(file_dir)
tt = []
for file_ in files:
filepath = os.path.join(file_dir, file_)
t = threading.Thread(target=parse_fbx, args=(filepath,))
tt.append(t)
t.start()
One problem I see is with your traverse() function. It's calling itself recursively potentially a huge number of times. Another is having all the threads printing stuff at the same time. Doing that properly requires coordinating access to the shared output device (i.e. the screen). A simple way to do that is by creating and using a global threading.Lock object.
First create a global Lock to prevent threads from printing at same time:
file_dir = '../fbxfiles' # an "r" prefix needed only when path contains backslashes
print_lock = threading.Lock() # add this here
Then make a non-recursive version of traverse() that uses it:
def traverse(rootNode):
with print_lock:
print rootNode.GetName()
for i in range(node.GetChildCount()):
child = node.GetChild(i)
with print_lock:
print child.GetName()
It's not clear to me exactly where the reading of each fbxfile takes place. If it all happens as a result of the importer.Import(scene) call, then that is the only time any other threads will be given a chance to run — unless some I/O is [also] done within the traverse() function.
Since printing is most definitely a form of output, thread switching will also be able to occur when it's done. However, if all the function did was perform computations of some kind, no multi-threading would take place within it during its execution.
Once you get the multi-reading working, you may encounter insufficient memory issues if multiple 2GB fbxfiles are being read into memory simultaneously by the various different threads.

Python simplest form of multiprocessing

Ive been trying to read up on threading and multiprocessing but all the examples are to intricate and advanced for my level of python/programming knowlegde. I want to run a function, which consists of a while loop, and while that loop runs I want to continue with the program and eventually change the condition for the while-loop and end that process. This is the code:
class Example():
def __init__(self):
self.condition = False
def func1(self):
self.condition = True
while self.condition:
print "Still looping"
time.sleep(1)
print "Finished loop"
def end_loop(self):
self.condition = False
The I make the following function-calls:
ex = Example()
ex.func1()
time.sleep(5)
ex.end_loop()
What I want is for the func1 to run for 5s before the end_loop() is called and changes the condition and ends the loop and thus also the function. I.e I want one process to start and "go" into func1 and at the same time I want time.sleep(5) to be called, so the processes "split" when arriving at func1, one process entering the function while the other continues down the program and start with the time.sleep(5) execution.
This must be the most basic example of a multiprocess, still Ive had trouble finding a simple way to do it!
Thank you
EDIT1: regarding do_something. In my real problem do_something is replaced by some code that communicates with another program via a socket and receives packages with coordinates every 0.02s and stores them in membervariables of the class. I want this constant updating of the coordinates to start and then be able to to read the coordinates via other functions at the same time.
However that is not so relevant. What if do_something is replaced by:
time.sleep(1)
print "Still looping"
How do I solve my problem then?
EDIT2: I have tried multiprocessing like this:
from multiprocessing import Process
ex = Example()
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
p1.start()
time.sleep(5)
p2.start()
When I ran this, I never got to p2.start(), so that did not help. Even if it had this is not really what Im looking for either. What I want would be just to start the process p1, and then continue with time.sleep and ex.end_loop()
The first problem with your code are the calls
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
With ex.func1() you're calling the function and pass the return value as target parameter. Since the function doesn't return anything, you're effectively calling
p1 = Process(target=None)
p2 = Process(target=None)
which makes, of course, no sense.
After fixing that, the next problem will be shared data: when using the multiprocessing package, you implement concurrency using multiple processes which, by default, cannot simply share data afaik. Have a look at Sharing state between processes in the package's documentation to read about this. Especially take the first sentence into account: "when doing concurrent programming it is usually best to avoid using shared state as far as possible"!
So you might want to also have a look at Exchanging objects between processes to read about how to send/receive data between two different processes. So, instead of simply setting a flag to stop the loop, it might be better to send a message to signal the loop should be terminated.
Also note that processes are a heavyweight form of multiprocessing, they spawn multiple OS processes which comes with a relatively big overhead. multiprocessing's main purpose is to avoid problems imposed by Python's Global Interpreter Lock (google about this to read more...) If your problem is'nt much more complex than what you've told us, you might want to use the threading package instead: threads come with less overhead than processes and also allow to access the same data (although you really should read about synchronization when doing this...)
I'm afraid, multiprocessing is an inherently complex subject. So I think you will need to advance your programming/python skills to successfully use it. But I'm sure you'll manage this, the python documentation about this is comprehensive and there are a lot of other resources about this.
To tackle your EDIT2 problem, you could try using the shared memory map Value.
import time
from multiprocessing import Process, Value
class Example():
def func1(self, cond):
while (cond.value == 1):
print('do something')
time.sleep(1)
return
if __name__ == '__main__':
ex = Example()
cond = Value('i', 1)
proc = Process(target=ex.func1, args=(cond,))
proc.start()
time.sleep(5)
cond.value = 0
proc.join()
(Note the target=ex.func1 without the parentheses and the comma after cond in args=(cond,).)
But look at the answer provided by MartinStettner to find a good solution.

Using the Queue class in Python 2.6

Let's assume I'm stuck using Python 2.6, and can't upgrade (even if that would help). I've written a program that uses the Queue class. My producer is a simple directory listing. My consumer threads pull a file from the queue, and do stuff with it. If the file has already been processed, I skip it. The processed list is generated before all of the threads are started, so it isn't empty.
Here's some pseudo-code.
import Queue, sys, threading
processed = []
def consumer():
while True:
file = dirlist.get(block=True)
if file in processed:
print "Ignoring %s" % file
else:
# do stuff here
dirlist.task_done()
dirlist = Queue.Queue()
for f in os.listdir("/some/dir"):
dirlist.put(f)
max_threads = 8
for i in range(max_threads):
thr = Thread(target=consumer)
thr.start()
dirlist.join()
The strange behavior I'm getting is that if a thread encounters a file that's already been processed, the thread stalls out and waits until the entire program ends. I've done a little bit of testing, and the first 7 threads (assuming 8 is the max) stop, while the 8th thread keeps processing, one file at a time. But, by doing that, I'm losing the entire reason for threading the application.
Am I doing something wrong, or is this the expected behavior of the Queue/threading classes in Python 2.6?
I tried running your code, and did not see the behavior you describe. However, the program never exits. I recommend changing the .get() call as follows:
try:
file = dirlist.get(True, 1)
except Queue.Empty:
return
If you want to know which thread is currently executing, you can import the thread module and print thread.get_ident().
I added the following line after the .get():
print file, thread.get_ident()
and got the following output:
bin 7116328
cygdrive 7116328
cygwin.bat 7149424
cygwin.ico 7116328
dev etc7598568
7149424
fix 7331000
home 7116328lib
7598568sbin
7149424Thumbs.db
7331000
tmp 7107008
usr 7116328
var 7598568proc
7441800
The output is messy because the threads are writing to stdout at the same time. The variety of thread identifiers further confirms that all of the threads are running.
Perhaps something is wrong in the real code or your test methodology, but not in the code you posted?
Since this problem only manifests itself when finding a file that's already been processed, it seems like this is something to do with the processed list itself. Have you tried implementing a simple lock? For example:
processed = []
processed_lock = threading.Lock()
def consumer():
while True:
with processed_lock.acquire():
fileInList = file in processed
if fileInList:
# ... et cetera
Threading tends to cause the strangest bugs, even if they seem like they "shouldn't" happen. Using locks on shared variables is the first step to make sure you don't end up with some kind of race condition that could cause threads to deadlock.
Of course, if what you're doing under # do stuff here is CPU-intensive, then Python will only run code from one thread at a time anyway, due to the Global Interpreter Lock. In that case, you may want to switch to the multiprocessing module - it's very similar to threading, though you will need to replace shared variables with another solution (see here for details).

Categories