How to instantiate and destroy classes in python effectively and efficiently

How to instantiate and destroy classes in python effectively and efficiently - python

My classes have become complicated, and I think the way I am instantiating and not closing may be a problem... sometimes the program just hangs, code that took .2 seconds to run will take 15 seconds. It happens randomly and rebooting seems to fix it. Restarting the program without rebooting does not, I'm using cntrl+C to stop the program running on CENTOS 7.
Here's my class layout:
def do_stuff(self):
pool_class = pools()
sql_pool_insert = pool_class.generate_sql()
print "pools done"
sql_HD_insert = hard_drives(pool_class).generate_sql()
print "hds done"
class pools
class devs
class hard_drives(object):
def __init__(self, **pool_class = pools()**):
self.drives_table=[]
self.assigned_drive_names =[]
self.unassigned_drives_table=[]
#for each pool, generate drives list actively combining
for pool in pool_class.pool_name:
self.drives_table = self.drives_table + pool_class.Vdevs(pool).get_drive_table()
i=0
Outside of that file, I have my main program that imports the above as one of many packages. The main file calls the functions in the above file that are above the classes (because they use the classes) and sometimes the main program uses discrete functions inside classes. The problem is the pools and devs class init sections are HUGE! It takes .7 seconds to run all that code in some cases, so I want to instantiate it correctly and close it effectively. Here is a sample from the main program:
if (new_pool==1):
result = mypackage.pools().create_pool(pool_to_create)
I really feel Like I'm doing two or three things wrong. I think, perhaps, I should just create an instance of the pools class in my main program while loop, at the top, then use that everywhere in the loop... then I would close it at the end of the loop; the loop runs every .25 seconds. I want the pool class to be initiated at the beginning of the while loop and closed at the end. Is this a good approach? This question applies equally to the Mysql InnoDB cursors I'm using, which may be another factor of the issue, how should these be handled as not to let instances of cursors and classes get out of hand? Heck, maybe I should be instantiating the entire package instead of one class in the package; i.e.:
with package as MyPackage:
package.pool()...#do whatever, all inits in the package have run once and you don't have to worry about them running again so long as you use the package object for everything???
Maybe I'm just confused... it comes down to this... how do I control when a class init function is run... it really only needs to run at the beginning of the loop, so that the loop has fresh data to work with, and then get used everywhere else... but you see how the pools class is used by the hard_drives class... how would I handle that?

Related

Statements before multiprocessing main() executed multiple times (Python)

I am learning Python and its multiprocessing.
I created a project with a mian() in main.py and a a_simulation inside the module simulation.py under the package simulator/.
The symptom is that a test statement print("hello\n") inside main.py before the definition of mian() is executed multiple times when the program is run with python main.py, indicating things before the print, including the creations of the lists are all executed multiple times.
I do not think I understand the related issues of python very well. May I know what is reason for the symptom and what is the best practice in python when creating projects like this? I have included the codes and the terminal prints. Thank you!
Edit: Forgot to mention that I am running it with anaconda python on macOS, although I would wish my project will work just fine on any platforms.
mian.py:
from multiprocessing import Pool
from simulator.simulation import a_simulation
import random
num_trials = 10
iter_trials = list(range(num_trials))
arg_list = [random.random() for _ in range(num_trials)]
input = list(zip(iter_trials, arg_list))
print("hello\n")
def main():
with Pool(processes=4) as pool:
result = pool.starmap(a_simulation, input)
print(result)
if __name__ == "__main__":
main()
simulatior/simulation.py:
import os
from time import sleep
def a_simulation(x, seed_):
print(f"Process {os.getpid()}: trial {x} received {seed_}\n" )
sleep(1)
return seed_
Results from the terminal:
hello
hello
hello
hello
hello
Process 71539: trial 0 received 0.4512600158461971
Process 71538: trial 1 received 0.8772526554425158
Process 71541: trial 2 received 0.6893833978242683
Process 71540: trial 3 received 0.29249994820563296
Process 71538: trial 4 received 0.5759647958461107
Process 71541: trial 5 received 0.08799525261308505
Process 71539: trial 6 received 0.3057644321667139
Process 71540: trial 7 received 0.5402091856171599
Process 71538: trial 8 received 0.1373456223147438
Process 71541: trial 9 received 0.24000943476017
[0.4512600158461971, 0.8772526554425158, 0.6893833978242683, 0.29249994820563296, 0.5759647958461107, 0.08799525261308505, 0.3057644321667139, 0.5402091856171599, 0.1373456223147438, 0.24000943476017]
(base)

The reason why this happens is because multiprocessing uses start method spawn, by default, on Windows and macOS to start new processes. What this means is that whenever you want to start a new process, the child process is initially created without sharing any of the memory of the parent. However, this makes things messy when you want to start a function in the child process from the parent because not only will the child not know the definition of the function itself, you might also run into some unexpected obstacles (what if the function depends on a variable defined in the parent processes' module?). To stop these sorts of things from happening, multiprocessing automatically imports the parent processes' module from the child process, which essentially copies almost the entire state of the parent when the child process was started.
This is where the if __name__ == "__main__" comes in. This statement basically translates to if the current file is being run directly then..., the code under this block will not run if the module is being imported. Therefore, the child process will not run anything under this block when they are spawned. You can hence use this block to create, for example, variables which use up a lot of memory and are not required for the child processes to function but are used by the parent. Basically, anything that the child processes won't need, throw it under here.
Now coming to your comment about imports:
This must be a silly questions, but should I leave the import statements as they are, or move them inside if name == "main":, or somewhere else? Thanks
Like I said, anything that the child doesn't need can be put under this if block. The reason you don't often see imports under this block is perhaps due to sticking to convention ("imports should be done at the top") and because the modules being imported don't really affect performance much (even after being needlessly imported multiple times). Keep in mind however, that if a child process requires a particular module to start its work, it will always be imported again within the child process, even if you have imported it under the if __name__... block. This is because when you attempt to spawn child processes to start a function in parallel, multiprocessing automatically serializes and sends the names of the function, and the module that defines the function (actual code is not serialized, only the names), to the child processes where they are imported once more (relevant question).
This is only specific to when the start method is spawn, you can read more about the differences here

Why is it doesn't the function run?

Ok so I have multiple problems with the code under:
when the key chosen in the Combo Box is held down, it keeps printing "You Pressed It", is there a way to avoid this?
When I press the set hotkey, the label changes but the while loop in the process() doesnt, its suppose to do a process of tasks but I simplified it to print for this question.
run = False
def press():
global run
while True:
if keyboard.read_key(hotCombo.get()):
print("You Pressed It")
run = not run
keyboard.wait(hotCombo.get())
if run == True:
status["text"]="Working"
else:
status["text"]="Not Working"
def process():
while run == True:
print("runnning")
Been tinkering with it and found more problems
I ended up with this but while its printing run I cant seem to stop it
def process():
global run
while True:
if keyboard.read_key(hotCombo.get()):
print("kijanbdsjokn")
run = not run
keyboard.wait(hotCombo.get())
if run == True:
status["text"]="Working"
else:
status["text"]="Not Working"
while run == True:
print("run")
time.sleep(1)

Can I ask why I cant just integrate tkinter into a working python script using threading?
A Python script is generally linear. You do things in sequence and then you exit.
In a tkinter program, your code consists of three things.
Code to set up the window and widgets.
Initialization of global variables (doesn't really matter if you hide them in a class instance; they're still globals).
Most of it will be functions/methods that are called as callbacks from tkinter when it is in the mainloop.
So in atkinter program most of your code is a guest in the mainloop. Where it is executed in small pieces in reaction to events. This is a completely different kind of program. It was called event-driven or message based programming long before that became cool in web servers and frameworks.
So, can you integrate a script in a tkinter program? Yes, it is possible.
There are basically three ways you can do it;
Split the code up into small pieces that can be called via after timeouts. This involves the most reorganization of your code. To keep the GUI responsive, event handlers (like timeouts) should not take too long; 50 ms seems to be a reasonable upper limit.
Run it in a different thread. We will cover that in more detail below.
Run it in a different process. Broadly similar to running in a thread (the API's of threading.Thread and multiprocessing.Process are almost the same by design). The largest difference is that communication between processes has to be done explicitly via e.g. Queue or Pipe.
There are some things that you have to take into account when using extra threads, especially in a tkinter program.
1) Python version
You need to use Python 3. This will not work well in Python 2 for reasons that are beyond the scope of this answer. Better preemption of threads in Python 3 is a big part of it.
2) Multithreaded tkinter build
The tkinter (or rather the underlying tcl interpreter) needs to be built with threading enabled. I gather that the official python.org builds for ms-windows are, but apart from that YMMV. On some UNIX-like systems such as Linux or *BSD the packages/ports systems gives you a choice in this.
3) Make your code into a function
You need to wrap up the core of your original script in a function so you can start it in a thread.
4) Make the function thread-friendly
You probably want to be able to interrupt that thread if it takes too long. So you have to adapt it to check regularly if it should continue. Checking if a global named run is True is one method. Note that the threading API does not allow you to just terminate a thread.
5 The normal perils of multithreading
You have to be careful with modifying widgets or globals from both threads at the same time.
At the time of writing, the Python GIL helps you here. Since it assures that only one thread at a time is executing Python bytecode, any change that can be done in a single bytecode is multithreading safe as a side effect.
For example, look at the modification of a global in the modify function:
In [1]: import dis
In [2]: data = []
Out[2]: []
In [3]: def modify():
...: global data
...: newdata = [1,2,3]
...: data = newdata
...:
In [4]: dis.dis(modify)
3 0 BUILD_LIST 0
2 LOAD_CONST 1 ((1, 2, 3))
4 LIST_EXTEND 1
6 STORE_FAST 0 (newdata)
4 8 LOAD_FAST 0 (newdata)
10 STORE_GLOBAL 0 (data)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
See how the new list is built separately, and only when it is comple is it assigned to the global. (This was not by accident.)
It takes only a single bytecode instruction (STORE_GLOBAL) to set a global variable to a newly created list. So at no point can the value of data be ambiguous.
But a lot of things take more then one bytecode. So there is a chance that one thread is preempted in favor of the other while it was modifying a variable or widget. How big of a chance that is depends on how often these situations happen and how long they take.
IIRC, currently a thread is preempted every 15 ms.
So an change that takes longer than that is guaranteed to be preempted. As is any task that drops the GIL for I/O.
So if you see weird things happening, make sure to use Locks to regulate access to shared resources.
It helps if e.g. a widget or variable is only modified from one thread, and only read from all the other threads.

One way to handle your key is to turn it into a two-phase loop:
def press():
global run
while True:
while not keyboard.read_key(hotCombo.get()):
time.sleep(0.2)
run = True
status["text"]="Working"
while keyboard.read_key(hotCombo.get()):
print("running")
time.sleep(0.2)
run == False
status["text"]="Not Working"

Strange blocking behavior with python multiprocessing queue put() and get()

I have written a class in python 2.7 (under linux) that uses multiple processes to manipulate a database asynchronously. I encountered a very strange blocking behaviour when using multiprocessing.Queue.put() and multiprocessing.Queue.get() which I can't explain.
Here is a simplified version of what I do:
from multiprocessing import Process, Queue
class MyDB(object):
def __init__(self):
self.inqueue = Queue()
p1 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
p1.daemon = True
started = False
while not started:
try:
p1.start()
started = True
except:
time.sleep(1)
#Sometimes I start a same second process but it makes no difference to my problem
p2 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
#blahblah... (same as above)
#staticmethod
def _worker_process(inqueue):
while True:
#--------------this blocks depite data having arrived------------
op = inqueue.get(block = True)
#do something with specified operation
#---------------problem area end--------------------
print "if this text gets printed, the problem was solved"
def delete_parallel(self, key, rawkey = False):
someid = ...blahblah
#--------------this section blocked when I was posting the question but for unknown reasons it's fine now
self.inqueue.put({"optype": "delete", "kwargs": {"key":key, "rawkey":rawkey}, "callid": someid}, block = True)
#--------------problem area end----------------
print "if you see this text, there was no blocking or block was released"
If I run the code above inside a test (in which I call delete_parallel on the MyDB object) then everything works, but if I run it in context of my entire application (importing other stuff, inclusive pygtk) strange things happen:
For some reason self.inqueue.get blocks and never releases despite self.inqueue having the data in its buffer. When I instead call self.inqueue.get(block = False, timeout = 1) then the call finishes by raising Queue.Empty, despite the queue containing data. qsize() returns 1 (suggests that data is there) while empty() returns True (suggests that there is no data).
Now clearly there must be something somewhere else in my application that renders self.inqueue unusable by causing acquisition of some internal semaphore. However I don't know what to look for. Eclipse dubugging becomes useless once a blocking semaphore is reached.
Edit 8 (cleaning up and summarizing my previous edits) Last time I had a similar problem, it turned out that pygtk was hijacking the global interpreter lock, but I solved it by calling gobject.threads_init() before I called anything else. Could this issue be related?
When I introduce a print "successful reception" after the get() method and execute my application in terminal, the same behaviour happens at first. When I then terminate by pressing CTRL+D I suddenly get the string "successful reception" inbetween messages. This looks to me like some other process/thread is terminated and releases the lock that blocks the process that is stuck at get().
Since the process that was stuck terminates later, I still see the message. What kind of process could externally mess with a Queue like that? self.inqueue is only accessed inside my class.
Right now it seems to come down to this queue which won't return anything despite the data being there:
the get() method seems to get stuck when it attempts to receive the actual data from some internal pipe. The last line before my debugger hangs is:
res = self._recv()
which is inside of multiprocessing.queues.get()
Tracking this internal python stuff further I find the assignments
self._recv = self._reader.recv and self._reader, self._writer = Pipe(duplex=False).
Edit 9
I'm currently trying to hunt down the import that causes it. My application is quite complex with hundreds of classes and each class importing a lot of other classes, so it's a pretty painful process. I have found a first candidate class which Uses 3 different MyDB instances when I track all its imports (but doesn't access MyDB.inqueue at any time as far as I can tell). The strange thing is, it's basically just a wrapper and the wrapped class works just fine when imported on its own. This also means that it uses MyDB without freezing. As soon as I import the wrapper (which imports that class), I have the blocking issue.
I started rewriting the wrapper by gradually reusing the old code. I'm testing each time I introduce a couple of new lines until I will hopefully see which line will cause the problem to return.

queue.Queue uses internal threads to maintain its state. If you are using GTK then it will break these threads. So you will need to call gobject.init_threads().
It should be noted that qsize() only returns an approximate size of the queue. The real size may be anywhere between 0 and the value returned by qsize().

How to run several looping functions at the same time in python?

i've been working on a project that i was assigned to do. it is about some sort of parking lot where the cars that enter, are generated automaticly (done) now, I've put them into a 'waiting list (because i have to represent them with a GUI module later) in order to later be assigned in a spot in the parking lot. and then they must get out the parking lot (also randomly)
The problem raises when I created a function that will always create cars randomly, now i cant call any other function because the first one is looping.
the question is, is there a way to call several looping functions at the same time?
Thanks

the question is, is there a way to call several looping functions at the same time?
This is a great question and there are several ways to do it.
Threading can let your functions run concurrently. The data flow between the threads should be managed using the Queue module:
# Inter-thread communication
wait_to_park = Queue()
wait_to_exit = Queue()
# Start the simulation
tg = threading.Thread(target=generate_cars)
tp = threading.Thread(target=park_cars)
tu = threading.Thread(target=unpark_cars)
tg.start(); tp.start(); tu.start()
# Wait for simumlation to finish
tg.join()
wait_to_park.join()
tp.join()
wait_to_exit.join()
tu.join()
Alternatively, you can use an event-loop such as the sched module to coordinate the events. Generators may help with this -- they work like functions that can be suspended and restarted.

maybe import random and then set up a range that you want certain events to happen?
def mainLoop():
while True:
x = random.randrange(1,100)
if 0>x>10: do something()
if 10>x>60: do somethingMoreFrequently()
if 60>x>61: do somethingRarely()
etcetera
if you LITERALLY want to call several looping functions at the same time be prepared to learn about Threading. Threading is difficult and i never do it unless 100% necessary.
but this should be simple enough to achieve without

Don't have both infinitely loop, have then each do work if needed and return (or possibly yield). Then have your main event loop call both. Something like this:
def car_arrival():
if need_to_generate_car:
# do car generation stuff
return
def car_departure()
if time_for_car_to_leave:
# do car leaving stuff
return
def event_loop():
while sim_running:
car_arrival()
car_departure()
sleep(0.5)

What is the "correct" way to make a stoppable thread in Python, given stoppable pseudo-atomic units of work?

I'm writing a threaded program in Python. This program is interrupted very frequently, by user (CRTL+C) interaction, and by other programs sending various signals, all of which should stop thread operation in various ways. The thread does a bunch of units of work (I call them "atoms") in sequence.
Each atom can be stopped quickly and safely, so making the thread itself stop is fairly trivial, but my question is: what is the "right", or canonical way to implement a stoppable thread, given stoppable, pseudo-atomic pieces of work to be done?
Should I poll a stop_at_next_check flag before each atom (example below)? Should I decorate each atom with something that does the flag-checking (basically the same as the example, but hidden in a decorator)? Or should I use some other technique I haven't thought of?
Example (simple stopped-flag checking):
class stoppable(Thread):
stop_at_next_check = False
current_atom = None
def __init__(self):
Thread.__init__(self)
def do_atom(self, atom):
if self.stop_at_next_check:
return False
self.current_atom = atom
self.current_atom.do_work()
return True
def run(self):
#get "work to be done" objects atom1, atom2, etc. from somewhere
if not do_atom(atom1):
return
if not do_atom(atom2):
return
#...etc
def die(self):
self.stop_at_next_check = True
self.current_atom.stop()

Flag checking seems right, but you missed an occasion to simplify it by using a list for atoms. If you put atoms in a list, you can use a single for loop without needing a do_atom() method, and the problem of where to do the check solves itself.
def run(self):
atoms = # get atoms
for atom in atoms:
if self.stop_at_next_check:
break
self.current_atom = atom
atom.do_work()

Create a "thread x should continue processing" flag, and when you're done with the thread, set the flag to false.
Killing a thread directly is considered bad form, because you might get a fractional chunk of work completed.

A tad late but I have created a small library, ants, solving this problem. In your example an atomic unit is represented by an worker
Example
from ants import worker
#worker
def hello():
print(“hello world”)
t = hello.start()
...
t.stop()
In above example hello() will run in a separate thread being called in a while True: loop thus spitting out “hello world” as fast as possible
You can also have triggering events , e.g. in above replace hello.start() with hello.start(lambda: time.sleep(5)) and you will have it trigger every 5:th second
The library is very new and work is ongoing on GitHub https://github.com/fa1k3n/ants.git
Future work includes adding a colony for having several workers working on different parts of same data, also planning on a queen for worker communication and control, like synch

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to instantiate and destroy classes in python effectively and efficiently - python

Related

Statements before multiprocessing main() executed multiple times (Python)

Why is it doesn't the function run?

Strange blocking behavior with python multiprocessing queue put() and get()

How to run several looping functions at the same time in python?

What is the "correct" way to make a stoppable thread in Python, given stoppable pseudo-atomic units of work?

Categories

Resources