Flask unit tests not closing port after each test - python

I'm doing some unit testing for a flask application. A part of this includes restarting the flask application for each test. To do this, I'm creating my flask application in the setUp() function of my unitest.TestCase, so that I get the application in its fresh state for each run. Also, I'm starting the application in a separate thread so the tests can run without the flask application blocking.
Example below:
import requests
import unittest
from threading import Thread
class MyTest(unittest.TestCase):
def setUp(self):
test_port = 8000
self.test_url = f"http://0.0.0.0:{str(test_port)}"
self.app_thread = Thread(target=app.run, kwargs={"host": "0.0.0.0", "port": test_port, "debug": False})
self.app_thread.start()
def test_a_test_that_contacts_the_server(self):
response = requests.post(
f"{self.test_url}/dosomething",
json={"foo": "bar"},
headers=foo_bar
)
is_successful = json.loads(response.text)["isSuccessful"]
self.assertTrue(is_successful, msg=json.loads(response.text)["message"])
def tearDown(self):
# what should I do here???
pass
This becomes problematic because when the tests that come after the initial test run, they run into an issue with port 8000 being used. This raises OSError: [Errno 98] Address already in use.
(For now, I've built a workaround, where I generate a list of high ranged ports, and another list of ports used per test, so that I never select a port used by a previous test. This work around works, but I'd really like to know the proper way to shut down this flask application, ultimately closing the connection and releasing/freeing that port.)
I'm hopeful that there is a specific way to shutdown this flask application in the tearDown() function.
How should I go about shutting down the flask application in my tearDown() method?

I found the solution to my own question while writing it, and since it's encouraged to answer your own question on Stack Overflow, I'd like to still share this for anyone else with the same issue.
The solution to this problem is to treat the flask application as another process instead of a thread. This is accomplished using Process from the multiprocessing module en lieu of Thread from the threading module.
I came to this conclusion after reading this Stack Overflow answer regarding stopping flask without using CTRL + C. Reading that answer then lead me to read about the differences between multiprocessing and threading in this Stack Overflow answer. Of course, after that, I moved on to the official documentation on the multiprocessing module, found here. More specifically, this link will take you straight to the Process class.
I'm not able to fully articulate why the multiprocessing module serves this purpose better than threading, but I do feel that it makes more sense for this application. After all, the flask application is acting as its own API server that is separate from my test, and my test is testing the calls to it/responses it gets back. For this reason, I think it makes the most sense for my flask application to be its own process.
tl;dr
Use multiprocessing.Process en lieu of threading.Thread, and then call Process.terminate() to kill the process, followed by Process.join() to block until the process is terminated.
example:
import requests
import unittest
from multiprocessing import Process
class MyTest(unittest.TestCase):
def setUp(self):
test_port = 8000
self.test_url = f"http://0.0.0.0:{str(test_port)}"
self.app_process = Process(target=app.run, kwargs={"host": "0.0.0.0", "port": test_port, "debug": False})
self.app_process.start()
def test_a_test_that_contacts_the_server(self):
response = requests.post(
f"{self.test_url}/dosomething",
json={"foo": "bar"},
headers=foo_bar
)
is_successful = json.loads(response.text)["isSuccessful"]
self.assertTrue(is_successful, msg=json.loads(response.text)["message"])
def tearDown(self):
self.app_process.terminate()
self.app_process.join()
Test early, and test often!

Related

python multiprocessing wrap tornado.ioloop

I am really new to python's multiple processing, and I have some notions about async call, yield and etc...the most basic stuff. And I came to this snippet, in which multiprocessing.Process wraps around tornado.ioloop.IOLoop.instance
# Set up the tornado web app
app = make_app(predicted_model_queue)
app.listen(8080)
server_process = Process(target=tornado.ioloop.IOLoop.instance().start)
# Start up the server to expose the metrics.
server_process.start()
It intends to start a tornado server as a server_process, but the code does not work. I got the error,
OSError: [Errno 9] Bad file descriptor
I have no experiences with both lib, and have no idea how to fix it. Can anyone please help me?
This is an unusual pattern - if you're writing a new app, I wouldn't recommend copying it.
If you're just trying to run an app that does this (looks like it came from here), the problem is that IOLoops cannot safely cross process boundaries (on some platforms it can sometimes work, but not always). To rewrite this code to correctly create the app and IOLoop in the child process, you could do this:
def run_server():
app = make_app(predicted_model_queue)
app.listen(8080)
tornado.ioloop.IOLoop.current().start()
server_process = Process(target=run_server)
server_process.start()
This way only the predicted_model_queue is shared between the two processes.

Gspread - Change Listener?

I currently run a daemon thread that grabs all cell values, calculates if there's a change, and then writes out dependent cells in a loop, ie:
def f():
while not event.is_set():
update()
event.wait(15)
Thread(target=f).start()
This works, but the looped get-all calls are significant I/O.
Rather than doing this, it would be much cleaner if the thread was notified of changes by Google Sheets. Is there a way to do this?
I rephrased my comment on gspread GitHub's Issues:
Getting a change notification from Google Sheets is possible with help of installable triggers in Apps Script. You set up a custom function in the Scripts editor and assign a trigger event for this function. In this function you can fetch an external url with UrlFetchApp.fetch.
On the listening end (your web server) you'll have a handler for this url. This handler will do the job. Depending on the server configuration (many threads or processes) make sure to avoid possible race condition.
Also, I haven't tested non browser-triggered updates. If Sheets trigger the same event for this type of updates there could be a case for infinite loops.
I was able to get this working by triggering an HTTP request whenever Google Sheets detected a change.
On Google Sheets:
function onEdit (e) {
UrlFetchApp.fetch("http://myaddress.com");
}
Python-side (w/ Tornado)
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
on_edit()
self.write('Updating.')
def on_edit():
# Code here
pass
app = tornado.web.Application([(r'/', MainHandler)])
app.listen(#port here)
tornado.ioloop.IOLoop.current().start()
I don't think this sort of functionality should be within the scope of gspread, but I hope the documentation helps others.

Multithread call inside Twisted _delayedRender of request

I have simple Twisted webserver application serving my math requests. Everything working fine (I hide big code pieces which is not related to my question):
#import section ...
class PlsPage(Resource):
isLeaf = True
def render_POST(self, request):
reactor.callLater(0, self._delayedRender, request)
return NOT_DONE_YET
def _delayedRender(self, request):
#some actions before
crossval_scores = cross_validation.cross_val_score(pls1, X, y=numpy.asarray(Y), scoring=my_custom_scorer, cv=KFold(700, n_folds=700))
#some actions after
request.finish()
reactor.listenTCP(12000, server.Site(PlsPage()))
reactor.run()
When I try to speed up cross_validation calculation by setting n_jobs for example to 3.
crossval_scores = cross_validation.cross_val_score(pls1, X, y=numpy.asarray(Y), scoring=my_custom_scorer, cv=KFold(700, n_folds=700), n_jobs=3)
and after that I got exactly 3 exceptions:
twisted.internet.error.CannotListenError: Couldn't listen on any:12000: [Errno 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted.
For some reasons I can't call cross_val_score with n_jobs > 1 inside _delayedRender.
Here is a traceback of exception, for some reasons reactor.listenTCP trying to start 3 times too.
Any ideas how to get it work?
UPD1. I create file PLS.py and moved all the code here, except last 2 lines:
from twisted.web import server
from twisted.internet import reactor, threads
import PLS
reactor.listenTCP(12000, server.Site(PLS.PlsPage()))
reactor.run()
But the problem still persists. I also found that this problem persists only on Windows. My Linux machine run this scripts well.
scikit_learn apparently uses the multiprocessing module in order to achieve concurrency. The multiprocessing transmits data between processes using pickle, which, among other... idiosyncratic problems that it causes, will cause some of the modules imported in your parent process to be imported in your worker processes.
Your PLS_web.py "module", however, is not actually a module, it's a script; since you have put reactor.listenTCP and reactor.run at the bottom of it, it actually does stuff when you import it rather than just loading its code.
This particular error is because since your web server is being run 4 times (once for the controller process, once for each of the three jobs), each of the 3 times beyond the first encounter an error because the first server is already listening on port 12000.
You should remove the reactor.run/reactor.listenTCP lines elsewhere, into a top level script. A good rule of thumb is that these lines should never appear in the same file as a class or def statement; define your code in one place and start it up in another. Once you've moved it to a file that doesn't get imported (and you might want to even put it in a file whose name isn't a legal module identifier, like run-my-server.py) then multiprocessing might be able to import all the code it needs and do its job.
Better yet, don't write those lines at all, write a twisted application plugin and run your program with twistd. If you don't have to put the reactor.run statement in any place, you can't put it in the wrong place :).

Unable to use Python's threading.Thread in Django app

I am trying to create a web application as a front end to another Python app. I have the user enter data into a form, and upon submitting, the idea is for the data to be saved in a database, and for the data to be passed to a thread object class. The thread is something that is strictly kicked-off based on a user action. My problem is that I can import threading, but cannot access threading.Thread. When the thread ends, it will update the server, so when the user views the job information, they'll see the results.
View:
#login_required(login_url='/login')
def createNetworkView(request):
if request.method == "POST":
# grab my variables from POST
job = models.MyJob()
# load my variables into MyJob object
job.save()
t = ProcessJobThread(job.id, my, various, POST, inputs, here)
t.start()
return HttpResponseRedirect("/viewJob?jobID=" + str(job.id))
else:
return HttpResponseRedirect("/")
My thread class:
import threading # this works
print "About to make thread object" # This works, I see this in the log
class CreateNetworkThread(threading.Thread): # failure here
def __init__(self, jobid, blah1, blah2, blah3):
threading.Thread.__init__(self)
def run(self):
doCoolStuff()
updateDB()
I get:
Exception Type: ImportError
Exception Value: cannot import name Thread
However, if I run python on the command line, I can import threading and also do from threading import Thread. What's the deal?
I have seen other things, like How to use thread in Django and Celery but that seemed overkill, and I don't see how that example could import threading and use threading.Thread, when I can't.
Thank you.
Edit: I'm using Django 1.4.1, Python 2.7.3, Ubuntu 12.10, SQLite for the DB, and I'm running the web application with ./manage.py runserver.
This was a silly issue I had. First, I had made a file called "threading.py" and someone suggest I delete it, which I did (or thought I did). The problem was because of me using Eclipse, the PyDev (Python) plugin for Eclipse only deleted the threading.py file I created, and hides the *.pyc file. I had a lingering threading.pyc file lingering around, even though PyDev has an option that I had enabled to delete orphaned .pyc files.

Python-Twisted Reactor Starting too Early

I have an application that uses PyQt4 and python-twisted to maintain a connection to another program. I am using "qt4reactor.py" as found here. This is all packaged up using py2exe. The application works wonderfully for 99% of users, but one user has reported that networking is failing completely on his Windows system. No other users report the issue, and I cannot replicate it on my own Windows VM. The user reports no abnormal configuration.
The debugging logs show that the reactor.connectTCP() call is executing immediately, even though the reactor hasn't been started yet! There's no mistaking run order because this is a single-threaded process with 60 sec of computation and multiple log messages between this line and when the reactor is supposed to start.
There's a lot of code, so I am only putting in pseudo-code, hoping that there is a general solution for this issue. I will link to the actual code below it.
import qt4reactor
qt4reactor.install()
# Start setting up main window
# ...
from twisted.internet import reactor
# Separate listener for detecting/processing multiple instances
self.InstanceListener = ListenerFactory(...)
reactor.listenTCP(LISTEN_PORT, self.InstanceListener)
# The active/main connection
self.NetworkingFactory = ClientFactory(...)
reactor.connectTCP(ACTIVE_IP, ACTIVE_PORT, self.NetworkingFactory)
# Finish setting up main window
# ...
from twisted.internet import reactor
reactor.runReturn()
The code is nested throughout the Armory project files. ArmoryQt.py (containing the above code) and armoryengine.py (containing the ReconnectingClientFactory subclass used for this connection).
So, the reactor.connectTCP() call executes immediately. The client code executes the send command and then immediately connectionLost() gets called. It does not appear to try to reconnect. It also doesn't throw any errors other than connectionLost(). Even more mysteriously, it receives messages from the remote node later on, and this app even processes them! But it believes it's not connected (and handshake never finished, so the remote node shouldn't be sending messages, but might be a bug/oversight in that program).
What on earth is going on!? How could the reactor get started before I tell it to start? I searched the code and found no other code that (I believe) could start the reactor.
The API that you're looking for is twisted.internet.reactor.callWhenRunning.
However, it wouldn't hurt to have less than 60 seconds of computation at startup, either :). Perhaps you should spread that out, or delegate it to a thread, if it's relatively independent?

Categories