I realize there's a similar question here, but this one has a different approach: I have a django app that does queries over data indexed with djapian ; I'd like to write unit tests for this app's search component, and, obviously, I'd need the django settings module and all connections with the database active, so the test runner that django provides seems ideal. however, the django testing framework creates a dummy database and I'd hate to dump all my data to a fixture and then index it (the tests would take forever!);
My data isn't at risk because the tests would only read from the database, so, how could this be achieved? -I'm new at this whole unit testing thing, so the solution of writing a new test runner I read in that similar question doesn't enlighten me a bit, at least not without some details
Reading the test cases for djapian I found something really interesting: what those guys do is use the setUp method for the TestCase class: they create an object and then use the update method for the indexer, so they effectively have a document to search for and a way to write controlled query tests!
For the curious, the method looks something like this:
def setUp(self):
p = Person.objects.create(name="Alex")
for i in range(self.num_entries):
Entry.objects.create(author=p, title="Entry with number %s" % i, text="foobar " * i)
Entry.indexer.update()
I think this would do, but we have to remember I'm testing a little search engine here, so this solution might be the easy way out; I can't come up with an objection, though, so if you guys have an answer that'll help define a strategy for testing this kind of webApps in python in general, it's more than welcome!
-I think I'll settle for something like this for now (I wanted to test the latency of the queries with a fully populated database also, but I think I could do that later with bench tests in Funkload)
EDIT: Ok, to be faithful to a solution for anyone interested, I ran into another issue: the xapian index (as stated in the comment). To solve it, I created a default test runner that changed the production xapian index for a test index (a smaller one, created with a management script). This runner is fairly simple:
def custom_run_tests(test_labels, verbosity=1, interactive=True, extra_tests=[]):
"""Set the test indices"""
settings.CATEGORY_CLASSIFIER_DATA = settings.TEST_CLASSIFIER_DATA
return run_tests(test_labels, verbosity, interactive, extra_tests)
And, to use it, I simply added a setting:
TEST_RUNNER = 'search.tests.custom_run_tests'
I dropped the aforementioned approach (creating the documents in the setUp) for performance and readability reasons: to test the database I needed a decent amount of documents with some text (a paragraph or two), so I ended up creating a fixture for that (I used a management command that created the documents in the real database, serialized them -writing them to a file- and then deleted 'em).
So, in the end, I didn't read from the live db at all and instead used test fixtures created with a somewhat hacky script and a custom runner, and it wasn't that hard :)
Related
I've read quite a few answers on here about testing helper methods (not necessarily private) in my unit tests and I'm still not quite sure what the best approach should be for my current situation.
I currently have a block of logic that runs as a scheduled job. It does a number of mostly related things like update local repositories, convert file types, commit these to other repos, clean up old repos, etc. I need all of this code to run in a specific order, so rather than setting a bunch of scheduled jobs, I took a lot of these small methods and put them into one large method that would enforce the order in which the code is run:
def mainJob():
sync_repos()
convert_files()
commit_changes()
and so on. Now I'm not sure how to write my tests for this thing. It's frustrating to test the entire mainJob() function because it does so many things and is really more of a reliability feature anyway. I see a lot of people saying I should only test the public interface, but I worry that there will potentially be code that isn't directly verified.
I set fixtures in my Django project to populate my database. This works well but has a serious limit: you can't create lots of stuff.
In theory, you can put as much elements as you want, but since you need to write them one by one, it's impossible to have 20 000 items in your db.
I need a tool that would fill the primary keys itself, and would be able to generate random typed data to fill the fixtures (e.g: emails, integers in a range, dates in a range, phones). Another nice functionally would be to set functional rules in data generation.
Does someone knows a way (library, ...) to do this in a Django project?
I took a look at https://github.com/joke2k/faker - the tool itself seems good, but no integration with Django.
Otherwise, I guess I could write it myself using Faker (since writing a fixture file just consists on json generation), but I don't like to reinvent the wheel :)
Thanks.
Factory Boy: https://factoryboy.readthedocs.org
It's a fixtures replacement that works really well for unit testing or otherwise making fixture data. You can write classes that hook into your models and generate populated model instances and you can construct them to save to the database, or not.
I'm very new to couch, but I'm trying to use it on a new Python project, and I'd like to use python to write the design documents (views), also. I've already configured Couch to use the couchpy view server, and I can confirm this works by entering some simple map/reduce functions into Futon.
Are there any official recommendations on how to load/synchronize design documents when using Python's couchdb module?
I understand that I can post design documents to "install" them into Couch, but my question is really around best practices. I need some kind of strategy for deploying, both in development environments and in production environments. My intuition is to create a directory and store all of my design documents there, then write some kind of sync script that will upload each one into couch (probably just blindly overwriting what's already there). Is this a good idea?
The documentation for "Writing views in Python" is 5 sentences, and really just explains how to install couchpy. On the project's google code site, there is mention of a couchdb.design module that sounds like it might help, but there's no documentation (that I can find). The source code for that module indicates that it does most of what I'm interested in, but it stops short of actually loading files. I think I should do some kind of module discovery, but I've heard that's non-Pythonic. Advice?
Edit:
In particular, the idea of storing my map/reduce functions inside string literals seems completely hacky. I'd like to write real python code, in a real module, in a real package, with real unit tests. Periodically, I'd like to synchronize my "couch views" package with a couchdb instance.
Here's an approach that seems reasonable. First, I subclass couchdb.design.ViewDefinition. (Comments and pydocs removed for brevity.)
import couchdb.design
import inflection
DESIGN_NAME="version"
class CurrentVersion(couchdb.design.ViewDefinition):
def __init__(self):
map_fun = self.__class__.map
if hasattr(self.__class__, "reduce"):
reduce_fun = self.__class__.reduce
else:
reduce_fun = None
super_args = (DESIGN_NAME,
inflection.underscore(self.__class__.__name__),
map_fun,
reduce_fun,
'python')
super(CurrentVersion, self).__init__(*super_args)
#staticmethod
def map(doc):
if 'version_key' in doc and 'created_ts' in doc:
yield (doc['version_key'], [doc['_id'], doc['created_ts']])
#staticmethod
def reduce(keys, values, rereduce):
max_index = 0
for index, value in enumerate(values):
if value[1] > values[max_index][1]:
max_index = index
return values[max_index]
Now, if I want to synchronize:
import couchdb.design
from couchview.version import CurrentVersion
db = get_couch_db() # omitted for brevity
couchdb.design.ViewDefinition.sync_many(db, [CurrentVersion()], remove_missing=True)
The benefits of this approach are:
Organization. All designs/views exist as modules/classes (respectively) located in a single package.
Real code. My text editor will highlight syntax. I can write unit tests against my map/reduce functions.
The ViewDefinition subclass can also be used for querying.
current_version_view = couchview.version.CurrentVersion()
result = current_version_view(self.db, key=version_key)
It's still not ready for production, but I think this is a big step closer compared to storing map/reduce functions inside string literals.
Edit: I eventually wrote a couple blog posts on this topic, since I couldn't find any other sources of advice:
http://markhaase.com/2012/06/23/couchdb-views-in-python/
http://markhaase.com/2012/07/01/unit-tests-for-python-couchdb-views/
Pyramid uses gettext *.po files for translations, a very good and stable way to internationalize an application. It's one disadvantage is it cannot be changed from the app itself. I need some way to give a normal user the ability to change the translations on his own. Django allows changes in the file directly and after the change it restarts the whole app. I do not have that freedom, because the changes will be quite frequent.
Since I could not find any package that will help me with the task, I decided to override the Localizer. My idea is based on using a Translation Domain like Zope projects use and make Localizer search for registered domain, and if not found, back off to default translation strategy.
The problem is that I could not find a good way to place a custom translation solution into the Localizer itself. All I could think of is to reimplement the get_localizer method and rewrite the whole Localizer. But there are several things, that need to be copypasted here, such as interpolation of mappings and other tweeks related to translation strings.
I don't know how much things you have in there but I did something alike a while ago.. Will have to do it once again. The implementation is pretty simple...
If you can be sure that all calls will be handled trough _() or something alike. You can provide your own function. It will look something like it.
def _(val):
val = db.translations.find({key: id, locale: request.locale_name})
if val:
return val['value']
else:
return real_gettext(val)
this is pretty simple... then you need to have something that will dump the database into the file...
But I guess overriding the localizer makes more sense.. I did it a long time ago and overiding the function was easier than searching in the code.
The plus side of Localiser is that it will work everywhere. Monkey patch is pretty cool but it's also pretty ugly. If I had to do it again, I'd provide my own localizer that will load from a database first and then display it's own value. The reason behind database is that if someone closes the server and the file hasn't been modified you won't see the results.
If the DB is more than needed, then Localize is good enough and you can update the file on every update. If the server will get restarted it will load the new files... You'll have to compile the catalog first.
I'm about to embark on some large Python-based App Engine projects, and I think I should check with Stack Overflow's "wisdom of crowds" before committing to a unit-testing strategy. I have an existing unit-testing framework (based on unittest with custom runners and extensions) that I want to use, so anything "heavy-weight"/"intrusive" such as nose, webtest, or gaeunit doesn't seem appropriate. The crucial unit tests in my worldview are extremely lightweight and fast ones, ones that run in an extremely short time, so I can keep running them over and over all the time without breaking my development rhythm (e.g., for a different project, I get 97% or so coverage for a 20K-lines project with several dozens of super-fast tests that take 5-7 seconds, elapsed time, for a typical run, overall -- that's what I consider a decent suite of small, fast unit-tests). I'll have richer/heavier tests as well of course, all the way to integration tests with selenium or windmill, that's not what I'm asking about;-) -- my focus in this question (and in most of my development endeavors;-) is on the small, lightweight unit-tests that lightly and super-rapidly cover my code, not on the deeper ones.
So I think what I need is essentially a set of small, very lightweight simulations of the various key App Engine subsystems -- data store, memcache, request/response objects and calls to webapp handlers, user handling, mail, &c, roughly in this order of priority. I haven't found exactly what I'm looking for, so it seems to me that I should either rely on mox, as I've done often in the past, which basically means mocking each subsystem used in a given test and setting up all expectations &c (strong, but lots of work each time, and very sensitive to the tested-code's internals, i.e. very "white-box"y), or rolling my own simulation of each subsystem (and doing asserts on the simulated subsystems' states as part of the unit tests). The latters seems feasible, given GAE's Python-side strong "stubs" architecture... but I can't believe I need to roll my own, i.e., that nobody's already written such simple-minded simulators!-) E.g., for the datastore, it looks like what I need is more or less the "datastore on file" stub that's already part of the SDK, plus a way to mark it readonly and easy-to-use accessors for assertions about the datastore's state; and so forth, subsystem by subsystem -- each seems to need "just a bit more" than what's already in the SDK, "perched on top" of the existing "stubs" architecture.
So, before diving in and spending a day or two of precious development time "rolling my own" simulations of GAE subsystems for unit testing purposes, I thought I'd double check with the SO crowd and see what y'all think of this... or, if there's already some existing open source set of such simulators that I can simply reuse (or minimally tweak!-), and which I've just failed to spot in my searching!-)
Edit: to clarify, if I do roll my own, I do plan to leverage the SDK-supplied stubs where feasible; but for example there's no stub for a datastore that gets initially read in from a file but then not saved at the end, so I need to subclass and tweak the existing one (which also doesn't offer particularly convenient ways to do asserts on its state -- same for the mail service stub, etc). That's what I mean by "rolling my own" -- not "rewriting from scratch"!-)
Edit: "why not GAEUnit" -- GAEUnit is nice for its own use cases, but running dev_appserver and seeing results in my browser (or even via urllib.urlopen) is definitely not what I'm after -- I want to use a fully automated setup, suitable for running within an existing test-running framework which is based on extending unittest, and no HTTP in the way (said framework defines a "fast" test as one that among other thing does no sockets and minimal disk I/O -- we simulate or mock these -- so via gaeunit I could do no better than "medium" tests) + no convenient way to prepopulate datastore for each test (and no OO structure to help customize things).
You don't need to write your own stubs - the SDK includes them, since they're what it uses to emulate the production APIs. Not all of them are suitable for use in unit-tests, but most are. Check out this code for an example of the setup/teardown code you need to make use of the built in stubs.
NoseGAE is a nose plugin that support unittests by automatically setting up the development environment and a test datastore for you. Very useful when developing on dev_appserver.
I use GAEUnit for my Google App Engine App and I am quite happy with the speed of the tests. The thing that I like about GAEUnit,and I am sure Webtest does it, is that it creates its own version for stubs of everything for testing leaving your "live" versions alone for testing.
So your datastore that you may be using for development will be left as is when you run your GAETests.
I might also add that Fixture has been very useful in my unit tests. It lets you create models in a declarative syntax, which it converts into stored entities that you can load in your tests. This way you have the same data set at the beginning of every test case!, which saves you from having to create data by hand at the start of every test. Here is an example, from the Fixture documentation:
Given this model:
from google.appengine.ext import db
class Entry(db.Model):
title = db.StringProperty()
body = db.TextProperty()
added_on = db.DateTimeProperty(auto_now_add=True)
Your fixture would look like this:
from fixture import DataSet
class EntryData(DataSet):
class great_monday:
title = "Monday Was Great"
body = """\
Monday was the best day ever.
"""
Note however, that I ran into the following issues:
1. This bug, but the included patch does remedy it.
2. The datastore is not -by default- reset between test cases. So I use this to force a reset for each test case:
class TycoonTest(unittest.TestCase):
def setUp(self):
# Clear out the datastore before starting the test.
apiproxy_stub_map.apiproxy._APIProxyStubMap__stub_map['datastore_v3'].Clear()
self.data = self.load_data()
self.data.setup()
os.environ['SERVER_NAME'] = "dev_appserver"
self.after_setUp()
def load_data(self):
return datafixture.data(*dset.__all__)
def after_setUp(self):
""" After setup
"""
pass
def tearDown(self):
# Teardown data.
try:
self.data.teardown()
except:
pass
The SDK 1.4.3 Testbed API provides easy configuration of stub libraries for local integration tests.
Since 1.3.1 version of SDK there is the build-in unit test framework.
It is Java only right now but I feel like:
it is much the same you talk about in your question (and much more - as running test in the cloud for example)
and it is quite possible to port\implement the same on Python using SDK
So does the author of this framework - Max Ross and he explicitly tells us about it in his I/O presentation "Testing techniques for Google App Engine"
Does anyone have any updates on this topic?