Keep code from running during syncdb - python

I have some code that throws causes syncdb to throw an error (because it tries to access the model before the tables are created).
Is there a way to keep the code from running on syncdb? something like:
if not syncdb:
run_some_code()
Thanks :)
edit: PS - I thought about using the post_init signal... for the code that accesses the db, is that a good idea?
More info
Here is some more info as requested :)
I've run into this a couple times, for instance... I was hacking on django-cron and determined it necessary to make sure there are not existing jobs when you load django (because it searches all the installed apps for jobs and adds them on load anyway).
So I added the following code to the top of the __init__.py file:
import sqlite3
try:
# Delete all the old jobs from the database so they don't interfere with this instance of django
oldJobs = models.Job.objects.all()
for oldJob in oldJobs:
oldJob.delete()
except sqlite3.OperationalError:
# When you do syncdb for the first time, the table isn't
# there yet and throws a nasty error... until now
pass
For obvious reasons this is crap. it's tied to sqlite and I'm there are better places to put this code (this is just how I happened upon the issue) but it works.
As you can see the error you get is Operational Error (in sqlite) and the stack trace says something along the lines of "table django_cron_job not found"
Solution
In the end, the goal was to run some code before any pages were loaded.
This can be accomplished by executing it in the urls.py file, since it has to be imported before a page can be served (obviously).
And I was able to remove that ugly try/except block :) Thank god (and S. Lott)

"edit: PS - I thought about using the post_init signal... for the code that accesses the db, is that a good idea?"
Never.
If you have code that's accessing the model before the tables are created, you have big, big problems. You're probably doing something seriously wrong.
Normally, you run syncdb approximately once. The database is created. And your web application uses the database.
Sometimes, you made a design change, drop and recreate the database. And then your web application uses that database for a long time.
You (generally) don't need code in an __init__.py module. You should (almost) never have executable code that does real work in an __init__.py module. It's very, very rare, and inappropriate for Django.
I'm not sure why you're messing with __init__.py when Django Cron says that you make your scheduling arrangements in urls.py.
Edit
Clearing records is one thing.
Messing around with __init__.py and Django-cron's base.py are clearly completely wrong ways to do this. If it's that complicated, you're doing it wrong.
It's impossible to tell what you're trying to do, but it should be trivial.
Your urls.py can only run after syncdb and after all of the ORM material has been configured and bound correctly.
Your urls.py could, for example, delete some rows and then add some rows to a table. At this point, all syncdb issues are out of the way.
Why don't you have your logic in urls.py?

Code that tries to access the models before they're created can pretty much exist only at the module level; it would have to be executable code run when the module is imported, as your example indicates. This is, as you've guessed, the reason by syncdb fails. It tries to import the module, but the act of importing the module causes application-level code to execute; a "side-effect" if you will.
The desire to avoid module imports that cause side-effects is so strong in Python that the if __name__ == '__main__': convention for executable python scripts has become commonplace. When just loading a code library causes an application to start executing, headaches ensue :-)
For Django apps, this becomes more than a headache. Consider the effect of having oldJob.delete() executed every time the module is imported. It may seem like it's executing only once when you run with the Django development server, but in a production environment it will get executed quite often. If you use Apache, for example, Apache will frequently fire up several child processes waiting around to handle requests. As a long-running server progresses, your Django app will get bootstrapped every time a handler is forked for your web server, meaning that the module will be imported and delete() will be called several times, often unpredictably. A signal won't help, unfortunately, as the signal could be fired every time an Apache process is initialized as well.
It isn't, btw, just a webserver that could cause your code to execute inadvertently. If you use tools like epydoc, for example they will import your code to generate API documentation. This in turn would cause your application logic to start executing, which is obviously an undesired side-effect of just running a documentation parser.
For this reason, cleanup code like this is either best handled by a cron job, which looks for stale jobs on a periodic basis and cleans up the DB. This custom script can also be run manually, or by any process (for example during a deployment, or as part of your unit test setUp() function to ensure a clean test run). No matter how you do it, the important point is that code like this should always be executed explicitly, rather than implicitly as a result of opening the source file.
I hope that helps. I know it doesn't provide a way to determine if syncdb is running, but the syncdb issue will magically vanish if you design your Django app with production deployment in mind.

Related

Two flask apps using one database

Hello I don't think this is in the right place for this question but I don't know where to ask it. I want to make a website and an api for that website using the same SQLAlchemy database would just running them at the same time independently be safe or would this cause corruption from two write happening at the same time.
SQLA is a python wrapper for SQL. It is not it's own database. If you're running your website (perhaps flask?) and managing your api from the same script, you can simply use the same reference to your instance of SQLA. Meaning, when you use SQLA to connect to a database and save to a variable, what is really happening is it saves the connection to a variable, and you continually reference that variable, as opposed to the more inefficient method of creating a new connection every time. So when you say
using the same SQLAlchemy database
I believe you are actually referring to the actual underlying database itself, not the SQLA wrapper/connection to it.
If your website and API are not running in the same script (or even if they are, depending on how your API handles simultaneous requests), you may encounter a race condition, which, according to Wikipedia, is defined as:
the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when one or more of the possible behaviors is undesirable.
This may be what you are referring to when you mentioned
would this cause corruption from two write happening at the same time.
To avoid such situations, when a process accesses a file, (depending on the OS,) check is performed to see if there is a "lock" on that file, and if so, the OS refuses to open that file. A lock is created when a process accesses a file (and there is no other process holding a lock on that file), such as by using with open(filename): and is released when the process no longer holds an open reference to the file (such as when python execution leaves the with open(filename): indentation block.) This may be the real issue you might encounter when using two simultaneous connections to a SQLite db.
However, if you are using something like MySQL, where you connect to a SQL server process, and NOT a file, since there is no direct access to a file, there will be no lock on the database, and you may run in to that nasty race condition in the following made up scenario:
Stack Overflow queries the reputation an account to see if it should be banned due to negative reputation.
AT THE EXACT SAME TIME, Someone upvotes an answer made by that account that sets it one point under the account ban threshold.
The outcome is now determined by the speed of execution of these 2 tasks.
If the upvoter has, say, a slow computer, and the "upvote" does not get processed by StackOverflow before the reputation query completes, the account will be banned. However, if there is some lag on Stack Overflow's end, and the upvote processes before the account query finishes, the account will not get banned.
The key concept behind this example is that all of these steps can occur within fractions of a second, and the outcome depends of the speed of execution on both ends.
To address the issue of data corruption, most databases have a system in place that properly order database read and writes, however, there are still semantic issues that may arise, such as the example given above.
Two applications can use the same database as the DB is a separate application that will be accessed by each flask app.
What you are asking can be done and is the methodology used by many large web applications, specially when the API is written in a different framework than the main application.
Since SQL databases are ACID compliant, they have a system in place to queue the multiple read/write requests put to it and perform them in the correct order while ensuring data reliability.
One question to ask though is whether it is useful to write two separate applications. For most flask-only projects the best approach would be to separate the project using blueprints, having a “main” blueprint and a “api” blueprint.

Temporary object-pool for unit tests?

I am running a large unit test repository for a complex project.
This project has some things that don't play well with large test amounts:
caches (memoization) that cause objects not to be freed between tests
complex objects at module level that are singletons and might gather data when being used
I am interested in each test (or at least each test suite) having its own "python-object-pool" and being able to free it after.
Sort of a python-garbage-collector-problem workaround.
I imagine a python self-contained temporary and discardable interpreter that can run certain code for me and after i can call "interpreter.free()" and be assured it doesn't leak.
One tough solution for this I found is to use Nose or implement this via subprocess for each time I need an expendable interpreter that will run a test. So each test becomes "fork_and_run(conditions)" and leaks no memory in the original process.
Also saw Nose single process per each test and run the tests sequantially - though people mentioned it sometimes freezes midway - less fun..
Is there a simpler solution?
P.S.
I am not interested in going through vast amounts of other peoples code and trying to make all their caches/objects/projects be perfectly memory-managed objects that can be cleaned.
P.P.S
Our PROD code also creates a new process for each job, which is very comfortable since we don't have to mess around with "surviving forever" and other scary stories.
TL;DR
Module reload trick I tried worked locally, broke when used on a machine with a different python version... (?!)
I ended up taking any and all caches I wrote in code and adding them to a global cache list - then clearing them between tests.
Sadly this will break if anyone uses a cache/manual cache mechanism and misses this, tests will start growing in memory again...
For starters I wrote a loop that goes over sys.modules dict and reloads (loops twice) all modules of my code. this worked amazingly - all references were freed properly, but it seems it cannot be used in production/serious code for multiple reasons:
old python versions break when reloading and classes that inherit meta-classes are redefined (I still don't get how this breaks).
unit tests survive the reload and sometimes have bad instances to old classes - especially if the class uses another classes instance. Think super(class_name, self) where self is the previously defined class, and now class_name is the redefined-same-name-class.

Run a Python script on Heroku on schedule (as a separate app)

OK so I'm working on an app that has 2 Heroku apps - one is the writer that writes to my DB after scraping a site, and one is the reader that consumes the said DB.
The former is just a Python script that has a kind of a while 1 loop - it's actually a Twitter stream. I want this to run every x minutes independent of what the reader is doing.
Now, running the script locally works fine, but I'm not sure how getting this to work on Heroku would work. I've tried looking it up, but could not find a solid answer. I read about background tasks, Redis queue, One-off dynos etc, but I'm not sure what to really use for my purpose. Some of my requirements are:
have the Python script keep logs of whatever I want.
in the future, I might want to add an admin panel for the writer, that will just show me stats of the script (and the logs). So hooking up this admin panel (flask) should be easy-ish and not break the script itself.
I would love any suggestions or pointers here.
I suggest writing the consumer as a server that waits around, then processes the stream on the timed interval. That is, you start it once and it runs forever, doing some processing every 10 minutes or so.
See: sched Python module, which handles scheduling events at certain times and running them.
Simpler: use Heroku's scheduler service.
This technique is simpler -- it's just straight-through code -- but can lead to problems if you have two of the same consumer running at the same time.

Saltstack Manage and Query a Tally/Threshold via events and salt-call?

I have over 100 web servers instances running a php application using apc and we occasionally (order of once per week across the entire fleet) see a corruption to one of the caches which results in a distinctive error log message.
Once this occurs then the application is dead on that node any transactions routed to it will fail.
I've written a simple wrapper around tail -F which can spot the patter any time it appears in the log file and evaluate a shell command (using bash eval) to react. I have this using the salt-call command from salt-stack to trigger processing a custom module which shuts down the nginx server, warms (refreshes) the cache, and, of course, restarts the web server. (Actually I have two forms of this wrapper, bash and Python).
This is fine and the frequency of events is such that it's unlikely to be an issue. However my boss is, quite reasonably, concerned about a common mode failure pattern ... that the regular expression might appear in too many of these logs at once and take town the entire site.
My first thought would be to wrap my salt-call in a redis check (we already have a Redis infrastructure used for caching and certain other data structures). That would be implemented as an integer, with an expiration. The check would call INCR, check the result, and sleep if more than N returned (or if the Redis server were unreachable). If the result were below the threshold then salt-call would be dispatched and a decrement would be called after the server is back up and running. (Expiration of the Redis key would kill off any stale increments after perhaps a day or even a few hours ... our alerting system will already have notified us of down servers and our response time is more than adequate for such time frames).
However, I was reading about the Saltstack event handling features and wondering if it would be better to use that instead. (Advantage, the nodes don't have redis-cli command tool nor the Python Redis libraries, but, obviously, salt-call is already there with its requisite support). So using something in Salt would minimize the need to add additional packages and dependencies to these systems. (Alternatively I could just write all the Redis handling as a separate PHP command line utility and just have my shell script call that).
Is there a HOWTO for writing simple Saltstack modules? The docs seem to plunge deeply into reference details without any orientation. Even some suggestions about which terms to search on would be helpful (because their use of terms like pillars, grains, minions, and so on seems somewhat opaque).
The main doc for writing a Salt module is here: http://docs.saltstack.com/en/latest/ref/modules/index.html
There are many modules shipped with Salt that might be helpful for inspiration. You can find them here: https://github.com/saltstack/salt/tree/develop/salt/modules
One thing to keep in mind is that the Salt Minion doesn't do anything unless you tell it to do something. So you could create a module that checks for the error pattern you mention, but you'd need to add it to the Salt Scheduler or cron to make sure it gets run frequently.
If you need more help you'll find helpful people on IRC in #salt on freenode.

Is there a way to unit test Gtk/GLib code written in Python?

I'm in the process of writing a small/medium sized GUI application with PyGObject (the new introspection based bindings for Gtk). I started out with a reasonable test suite based on nose that was able to test most of the functions of my app by simply importing the modules and calling various functions and checking the results.
However, recently I've begun to take advantage of some Gtk features like GLib.timeout_add_seconds which is a fairly simple callback mechanism that simply calls the specified callback after a timer expires. The problem I'm naturally facing now is that my code seems to work when I use the app, but the testsuite is poorly encapsulated so when one test checks that it's starting with clean state, it finds that it's state has been trampled all over by a callback that was registered by a different test. Specifically, the test successfully checks that no files are loaded, then it loads some files, then checks that the files haven't been modified since loading, and the test fails!
It took me a while to figure out what was going on, but essentially one test would modify some files (which initiates a timer) then close them without saving, then another test would reopen the unmodified files and find that they're modified, because the callback altered the files after the timer was up.
I've read about Python's reload() builtin for reloading modules in the hopes that I could make it unload and reload my app to get a fresh start, but it just doesn't seem to be working.
I'm afraid that I might have to resort to launching the app as a subprocess, tinkering with it, then ending the subprocess and relaunching it when I need to guarantee fresh state. Are there any test frameworks out there that would make this easy, particularly for pygobject code?
Would a mocking framework help you isolate the callbacks? That way, you should be able to get back to the same state as when you started. Note that a SetUp() and tearDown() pattern may help you there as well -- but I am kind of assuming that you already are using that.

Categories