Multiple Threads and Python SQL3 - python

I use Python 2.7 and a SQLite3 database. I want to run update queries on the database that can take some time. On the other hand I don't want that the user has to wait.
Therefore I want to start a new thread to do the database updating.
Python throws an error. Is there an effective way to tell the database to do the update in it's own thread without having to wait for the thread to finish?
line 39, in execute
ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 3648 and this is thread id 6444
As far as examples go I'm trying to write an Anki addon. The Addon code that produces that error is:
from anki.sched import Scheduler
import threading
def burying(self, card):
buryingThread = threading.Thread(target = self._burySiblings, args = (card,))
buryingThread.start()
def newGetCard(self):
"Pop the next card from the queue. None if finished."
self._checkDay()
if not self._haveQueues:
self.reset()
card = self._getCard()
if card:
burying(self, card)
self.reps += 1
card.startTimer()
return card
__oldFunc = Scheduler.getCard
Scheduler.getCard = newGetCard

Check out celery project, and if you're using django it will be even more straight forward www.celeryproject.org

Related

Run an object method in a daemon thread in python

I am trying to simulate an environment with vms and trying to run an object method in background thread. My code looks like the following.
hyper_v.py file :
import random
from threading import Thread
from virtual_machine import VirtualMachine
class HyperV(object):
def __init__(self, hyperv_name):
self.hyperv_name = hyperv_name
self.vms_created = {}
def create_vm(self, vm_name):
if vm_name not in self.vms_created:
vm1 = VirtualMachine({'vm_name': vm_name})
self.vms_created[vm_name] = vm1
vm1.boot()
else:
print('VM:', vm_name, 'already exists')
def get_vm_stats(self, vm_name):
print('vm stats of ', vm_name)
print(self.vms_created[vm_name].get_values())
if __name__ == '__main__':
hv = HyperV('temp')
vm_name = 'test-vm'
hv.create_vm(vm_name)
print('getting vm stats')
th2 = Thread(name='vm1_stats', target=hv.get_vm_stats(vm_name) )
th2.start()
virtual_machine.py file in the same directory:
import random, time, uuid, json
from threading import Thread
class VirtualMachine(object):
def __init__(self, interval = 2, *args, **kwargs):
self.vm_id = str(uuid.uuid4())
#self.vm_name = kwargs['vm_name']
self.cpu_percentage = 0
self.ram_percentage = 0
self.disk_percentage = 0
self.interval = interval
def boot(self):
print('Bootingup', self.vm_id)
th = Thread(name='vm1', target=self.update() )
th.daemon = True #Setting the thread as daemon thread to run in background
print(th.isDaemon()) #This prints true
th.start()
def update(self):
# This method needs to run in the background simulating an actual vm with changing values.
i = 0
while(i < 5 ): #Added counter for debugging, ideally this would be while(True)
i+=1
time.sleep(self.interval)
print('updating', self.vm_id)
self.cpu_percentage = round(random.uniform(0,100),2)
self.ram_percentage = round(random.uniform(0,100),2)
self.disk_percentage = round(random.uniform(0,100),2)
def get_values(self):
return_json = {'cpu_percentage': self.cpu_percentage,
'ram_percentage': self.ram_percentage,
'disk_percentage': self.disk_percentage}
return json.dumps(return_json)
The idea is to create a thread that keeps on updating the values and on request, we read the values of the vm object by calling the vm_obj.get_values() we would be creating multiple vm_objects to simulate multiple vms running in parallel and we need to get the information from a particular vm on request.
The problem, that I am facing, is that the update() function of the vm doesnot run in the background (even though the thread is set as daemon thread).
The method call hv.get_vm_stats(vm_name) waits until the completion of vm_object.update() (which is called by vm_object.boot()) and then prints the stats. I would like to get the stats of the vm on request by keeping the vm_object.update() running in the background forever.
Please share your thoughts if I am overlooking anything related to the basics. I tried looking into the issues related to the python threading library but I could not come to any conclusion. Any help is greatly appreciated. The next steps would be to have a REST api to call these functions to get the data of any vm but I am struck with this problem.
Thanks in advance,
As pointed out by #Klaus D in the comments, my mistake was using the braces when specifying the target function in the thread definition, which resulted in the function being called right away.
target=self.update() will call the method right away. Remove the () to
hand the method over to the thread without calling it.

Job Scheduling in Django

I need to implement a scheduled task in our Django app. DBader's schedule seems to be a good candidate for the job, however when run it as part of a Django project, it doesn't seem to produce the desired effect.
Specifically, this works fine as an independent program:
import schedule
import time
import logging
log = logging.getLogger(__name__)
def handleAnnotationsWithoutRequests(settings):
'''
From settings passed in, grab job-ids list
For each job-id in that list, perform annotation group/set logic [for details, refer to handleAnnotationsWithRequests(requests, username)
sans requests, those are obtained from db based on job-id ]
'''
print('Received settings: {}'.format(str(settings)))
def job():
print("I'm working...")
#schedule.every(3).seconds.do(job)
#schedule.every(2).seconds.do(handleAnnotationsWithoutRequests, settings={'a': 'b'})
invoc_time = "10:33"
schedule.every().day.at(invoc_time).do(handleAnnotationsWithoutRequests, settings={'a': 'b'})
while True:
schedule.run_pending()
time.sleep(1)
But this (equivalent) code run in Django context doesn't result in an invocation.
def handleAnnotationsWithoutRequests(settings):
'''
From settings passed in, grab job-ids list
For each job-id in that list, perform annotation group/set logic [for details, refer to handleAnnotationsWithRequests(requests, username)
sans requests, those are obtained from db based on job-id ]
'''
log.info('Received settings: {}'.format(str(settings)))
def doSchedule(settings):
'''
with scheduler library
Based on time specified in settings, invoke .handleAnnotationsWithoutRequests(settings)
'''
#settings will need to be reconstituted from the DB first
#settings = {}
invocationTime = settings['running_at']
import re
invocationTime = re.sub(r'([AaPp][Mm])', "", invocationTime)
log.info("Invocation time to be used: {}".format(invocationTime))
schedule.every().day.at(invocationTime).do(handleAnnotationsWithoutRequests, settings=settings)
while True:
schedule.run_pending()
time.sleep(1)
so the log from handleAnnotationsWithoutRequests() doesn't appear on the console.
Is this scheduling library compatible with Django? Are there any usage samples that one could refer me to?
I'm suspecting some thread issues are at work here. Perhaps there are better alternatives to be used? Suggestions are welcome.
Thank you in advance.
For web servers, you probably don't want something that runs in-process:
An in-process scheduler for periodic jobs [...]
https://github.com/Tivix/django-cron has proven a working solution.
There's also the heavyweight champion Celery and Celerybeat.
I do this a lot with Django Commands
The pattern I use is to setup a new Django command in my app and then make it a long-running process inside a never-ending while() loop.
I the loop iterates continuously with a custom defined sleep(1) timer.
The short version is here, with a bit of pseudo-code thrown in. You can see a working version of this pattern in my Django Reference Implementation.
class Command(BaseCommand):
help = 'My Long running job'
def handle(self, *args, **options):
self.stdout.write(self.style.SUCCESS(f'Starting long-running job.'))
while True:
if conditions met for job:
self.job()
sleep(5)
def job(self):
self.stdout.write(self.style.SUCCESS(f'Running the job...'))

Win32com events not raising inside thread?

I am new to both COM and Python, so im not very familiar with exact terminologies. So apologies for using inexact terms.
I am trying to connect to a desktop application via a proprietary COM interface using pywin32.
I created a PoC and it runs fine. The COM function call is processed and I get the expected event.
class MyEvents:
def __init__(self):
print("Callback class initialized")
def OnMyEvent(self, data):
print('MyEvent raised')
class ComUser:
comObj = None
def __init__(self):
comObj = win32com.client.DispatchWithEvents("ProproetaryInterface.InterfaceClass",
MyEvents)
comObj.Register()
comObj.DoSomething(data)
time.sleep(120)
userObj = ComUser()
So far so good. I get the event on the screen
Callback class initialized
MyEvent raised
Next I tried to put it into my application where I have multiple threads. To explain it in simple terms:
Main creates an object of Class X which initializes an XMLRPC Server thread.
The XMLRPC handler simply takes incoming info and puts it into a queue
The queue is from multiprocessing lib.
Another thread waits on this queue for an incoming message
def __startPollingThread(self):
pythoncom.CoInitialize()
pollingThread = Thread(target=self.__checkQueue )
pollingThread.start()
pythoncom.CoUninitialize()
This is the polling thread method:
def __checkQueue(self):
try:
pythoncom.CoInitialize()
while True:
currMessage = self.__messageQueue.get()
self.__processMessage(currMessage);
except :
#Log message
finally:
pythoncom.CoUninitialize()
The __processMessage passes through multliple classes (something like a strategy pattern + state pattern) before it hits the class that handles COM interface.
In the ComUser class, i have a method which registers with the client application's com interface:
def initSystem(self):
import pythoncom
try:
pythoncom.CoInitialize()
self.ComConnector = win32com.client.DispatchWithEvents("ProprietaryInterface.InterfaceClass",
MyEvents)
self.ComConnector.Register()
except:
finally:
pythoncom.CoUninitialize()
Another method handles the specific requests as they arrive and makes the corresponding COM calls.
def handleMessage(self, message):
#if message = this then
comObj.DoSomething(data)
Both methods are called from the __processMessage method. All my classes reside in separate Py files. Except ComUser and MyEvents which are in same py module
I can call the Com Interface and see the Application reacting to the COM method calls but I cant see any events being raised. I have tried a whole lot of combinations of CoInitialize and Uninitialze and "import pythoncom" statements to ensure that it is not a problem with the threading. Also tried setting the sys.coinit_flags = 0 and checked. Seems to make no difference. I just dont see any events.
Is it a problem that I call DispatchWithEvents in a child thread instead of the main thread(The calls seem to work fine) ? Or is it that the main thread (ie main method of the program) dies out. I tried putting a long sleep there too. I even tried a separate thread with PumpWaitingMessages loop but it made no difference. I cant think of any solutions.

Python SQLAlchemy Update Postgres Record

I'm trying to update a row on database (the asynchronous way) using the multiprocessing module. My code has a simple function create_member that insert some data on a table and then create a process that maybe will change this data. The problem is that the session passed to async_create_member is closing the database connection, and the next requisition I get psycopg's error:
(Interface Error) connection already closed
Here's the code:
def create_member(self, data):
member = self.entity(**data)
self.session.add(member)
for name in data:
setattr(member, name, data[name])
self.session.commit()
self.session.close()
if self.index.is_indexable:
Process(target=self.async_create_member,
args=(data, self.session)).start()
return member
def async_create_member(self, data, session):
ok, data = self.index.create(data)
if ok:
datacopy = data.copy()
data.clear()
data['document'] = datacopy['document']
data['dt_idx'] = datacopy['dt_idx']
stmt = update(self.entity.__table__).where(
self.entity.__table__.c.id_doc == datacopy['id_doc'])\
.values(**data)
session.begin()
session.execute(stmt)
session.commit()
session.close()
I could possibly solve this by creating a new connetion on async_create_member, but this was leaving too much idle transactions on postgres:
engine = create_new_engine()
conn = engine.connect()
conn.execute(stmt)
conn.close()
What should I do now? is there a way to solve the first code? Or Should I keep creating new connections with create_new_engine function? Should I use threads or processes ?
You can't reuse sessions across threads or processes. Sessions aren't thread safe, and the connectivity that underlies a Session isn't inherited cleanly across processes. The error message you are getting is accurate, if uninformative: the DB connection is indeed closed if you try to use it after inheriting it across a process boundary.
In most cases, yes, you should create a session for each process in a multiprocessing setting.
If your problem meets the following conditions:
you are doing a lot of CPU-intensive processing for each object
database writes are relatively lightweight in comparison
you want to use a lot of processes (I do this on 8+ core machines)
It might be worth your while to create a single writer process that owns a session, and pass the objects to that process. Here's how it usually works for me (Note: not meant to be runnable code):
import multiprocessing
from your_database_layer import create_new_session, WhateverType
work = multiprocessing.JoinableQueue()
def writer(commit_every = 50):
global work
session = create_new_session()
counter = 0
while True:
item = work.get()
if item is None:
break
session.add(item)
counter += 1
if counter % commit_every == 0:
session.commit()
work.task_done()
# Last DB writes
session.commit()
# Mark the final None in the queue as complete
work.task_done()
return
def very_expensive_object_creation(data):
global work
very_expensive_object = WhateverType(**data)
# Perform lots of computation
work.put(very_expensive_object)
return
def main():
writer_process = multiprocessing.Process(target=writer)
writer_process.start()
# Create your pool that will feed the queue here, i.e.
workers = multiprocessing.Pool()
# Dispatch lots of work to very_expensive_object_creation in parallel here
workers.map(very_expensive_object_creation, some_iterable_source_here)
# --or-- in whatever other way floats your boat, such as
workers.apply_async(very_expensive_object_creation, args=(some_data_1,))
workers.apply_async(very_expensive_object_creation, args=(some_data_2,))
# etc.
# Signal that we won't dispatch any more work
workers.close()
# Wait for the creation work to be done
workers.join()
# Trigger the exit condition for the writer
work.put(None)
# Wait for the queue to be emptied
work.join()
return

Run a script with web server and maintain data to be used later

Note: I want to implement this without using any framework.
I have to create an web application using python. The application should maintain a running average of the CPU usage for each process over the past 60 seconds. It should should act as a web server and when it gets a request, it should return the current average for each process. Following are the scripts I've written. record_usage.py is a script which I want to run as soon as the server.py is run. So that it runs and maintain the cpu usage data, which I intend to read whenever I get an XHR request and send it back to the client.
So, my problem is how do I invoke this requirement? I tried running record_usage.py using subprocess.POPEN after starting the server. record_usage.py starts running in background as well. But when I try accessing the data created by it, the class object I create is not the one it uses but a new one. How to complete this link?
Kindly ask things that I could not make clear.
Latest changes in server.py
if __name__ == '__main__':
RU_OBJ = RU(settings.SAMPLING_FREQ, settings.AVG_INTERVAL)
RU_LOCK = RLock()
# Record CPU usage in a thread.
ru_thread = Thread(target=RU_OBJ.record, args=(RU_LOCK,))
ru_thread.daemon = True
ru_thread.start()
# Run server.
run()
Latest change in record_usage.py
def record(self, lock):
while True:
with lock:
self.add_processes()
time.sleep(self.sampling_freq)
Is this a proper way of applying locks? A similar lock is being applied when am reading the processes information. Would it work?
Added the functions:
def add_processes(self,):
for _process in psutil.process_iter():
try:
new_proc = _process.as_dict(attrs=['cpu_times', 'name', 'pid',
'status'])
except psutil.NoSuchProcess:
continue
pid, (user, _sys) = new_proc['pid'], new_proc.pop('cpu_times')
# Get or create details object for the process.
existing = self.processes.setdefault(pid, new_proc)
# Get or create queue object for the CPU times of the process.
queue_dict = self.process_queue.setdefault(pid, dict())
# User CPU time.
user_q = queue_dict.setdefault('user_q', PekableQueue(self.avg_interval))
user_q.enqueue(user)
user_avg = get_avg(user_q)
# System CPU time.
sys_q = queue_dict.setdefault('sys_q', PekableQueue(self.avg_interval))
sys_q.enqueue(_sys)
sys_avg = get_avg(sys_q)
# Update the details object for the process.
existing.update(user_avg=user_avg, sys_avg=sys_avg, **new_proc)
def get_curr_processes(self):
return [self.processes[pid] for pid in psutil.get_pid_list()
if pid in self.processes]
To collect statistics in another thread:
if __name__ == '__main__':
from threading import Thread, Lock
import record_usage
lock = Lock()
t = Thread(target=record_usage.record, args=[lock])
t.daemon = True
t.start()
run(lock)
If you change some shared data in one thread and read it in another then you could protect the places where you access/change the value with a lock:
#...
with self.lock:
existing = self.processes.setdefault(pid, new_proc)
#...
with self.lock:
existing.update(user_avg=user_avg, sys_avg=sys_avg, **new_proc)
#...
def get_curr_processes(self):
with self.lock:
return [self.processes[pid] for pid in psutil.get_pid_list()
if pid in self.processes]
It is essential that self.lock is the same object in all threads. If self.processes is a dict then you don't need to use a lock in CPython. The methods are implemented in C and the interpreter doesn't release GIL (global lock) while calling them i.e., only one thread at a time accesses the dict.

Categories