Advice on database access approach in a custom environment using Python Pyramid

Advice on database access approach in a custom environment using Python Pyramid - python

I’m new to Pyramid. I’ve used Python for a few months. I've created a Python application on Linux to maintain an Oracle database using weekly data feeds from a vendor. To get that done, one of the things I did was to create a customized database wrapper class using the cx_Oracle package. I had specific requirements for maintaining history in the DB. All Oracle access goes through the methods in this wrapper. I now want to use Pyramid to create a simple reporting browser interface to the Oracle DB. To allow me the greatest flexibility, I’d like to use the wrapper I already have to get to the data on Oracle instead of Alchemy (or possibly with it, I'm not sure).
In my Pyramid app, I’ve considered importing my wrapper in my views.py init method but that seems to get executed with every browser submit.
Can anyone suggest how I might create a persistent connection to Oracle that I can use over and over from my reporting application which uses my wrapper class? I’m finding Pyramid a bit opaque. I’m never sure what’s happening behind the scenes but I’m willing to operate on trust until I get the swing of it. I need the benefit of the automatic authorization/ authentication and login.
What I’m really looking for is a good approach from experienced Pyramid users before going down the wrong track.
Many thanks.

You should definitely use SQLAlchemy as it makes use of connection pooling and such. In your case SQLAlchemy would use cx_Oracle underneath but make you be able to concentrate writing actual code instead of maintaining connections/pooling them/and such.
You should follow the patterns in Wiki2 tutorial to set up basic SQLAlchemy.

So, basically, the question boils down to "how do I use my existing API with Pyramid", right? This is quite easy as Pyramid is database-agnostic and very transparent in this area despite what you say :) Basically, you import it, call its methods, retrieve data and send it to the template:
import pokemon_api as api
#view_config(route_name='list_pokemon', renderer='pokemon_list.mako')
def list_pokemon(request):
# To illustrate how to get data from the request to send to your API
batch_start = request.GET.get('batch_start', 0)
batch_end = batch_start + request.GET.get('batch_size', 0)
sort_order = request.GET.get('sort_by', 'name')
# Here we call our API - we don't actually care where it gets the data from, it's a black box
pokemon = api.retrieve_pokemon(from=batch_start, to=batch_end, sort=sort_order)
# send the data to the renderer/template
return {
'pokemon': pokemon
}
#view_config(route_name='add_pokemon', request_method='POST')
def add_pokemon(request):
"""
Add a new Pokemon
"""
name = request.POST.get('name', 0)
weight = request.GET.get('weight', 0)
hight = request.GET.get('hight', 0)
api.create(name=name, weight=weight, height=height)
# go back to the listing
return HTTPFound('/pokemon_list')
and if your API needs some initialization, you can do it at startup time
import pokemon_api as api
def main(global_config, **settings):
""" This function returns a Pyramid WSGI application.
"""
config = Configurator(settings=settings)
...
api.init("MAGIC_CONNECTION_STRING")
return config.make_wsgi_app()
Of course, this assumes your API already handles transactions, connections, pooling and other boring stuff :)
One last point to mention - in Python you generally don't import things inside methods, you import them at the top of the file at module-level scope. There are exceptions for that rule, but I don't see why you would need that in this case. Also, importing a module should be free of side-effects (which you might have since "importing my wrapper in my views.py __init__ method" seems to be causing trouble).

Related

How to use active record as DTO to my business objects

I struggle with this problem for a long time. I searched and searched the whole internet for solution but nothing was acceptable for me.
In short, to react Daniel's comment:
I would like to use the Django's Active Record ORM's objects but only as a DTO with the ability to save the data back to the database, including complex relationships.
This way my business objects would be independent of it's data sources and they would only contain behavior.
Long version:
We started a project which looked very simple in terms of business logic required, so I picked Django as a framework to make things easier. The problem got complex and maybe the implementation will be used in a much larger project so I want to decouple my business objects objects from activerecord and I want to use django only as a DB backend and something what uses my objects. The UI is already decoupled from the start, it only calls a REST API provided by the django backend.
My problems with an example:
I have a Request model which connected to many other models. This request contains some requirement specification related to networks. These networks are associated to a Cloud models. The networks under a cloud will be connected to the reservation which is generated from the request (currently they are connected to the request's nw descriptors to determine the configurations during queries).
class Request(Model):
... # bunch_of_stuff_here
class NetworkDescriptor(Model):
request = ForeignKey(Request)
configA = ...
configB = ...
class Cloud(Model):
...
class Network(Model):
cloud = ForeignKey(Cloud)
used_by = ForeignKey(NetworkDescriptor)
Solutions I considered:
1) Embed the models to the BOs and delegate to them.
This leads to the following problem: When i try to access the networks of a cloud and simply delegate, for example with
def get_networks:
return orm_dto.networks
i get back AR objects, which is bad. I don't want to leak the AR details here. I would have to write a mapper to from ORMNetworks to BOnetworks, which tracks the changes of networks ( network.used_by=request ). This by definition an Object Mapper... which i already have (namely ORM) and i dont want this. :)
2) Embed the models but allow only high level interaction. This sounds much more Object Oriented, but still don't know how to do it:
class Cloud:
def serve_request(bo_request):
???
The result should be a bunch of networks where the used_by field is set to the ORMRequest object which is behind the BORequest parameter. How should i get this information? If i don't want to leak the ORM details, again... i have to write something which tracks the BO objects and can map them to AR objects, for example:
class Cloud:
def serve_request(bo_request):
for net in self._find_matching_networks(bo_requsets)
net.used_by = repo.get_ar_from_bo(bo_request)
which is again... not the best solution because i have to write the mapper, but in this case it seems much easier because i don't have to take care of the related fields.
3) Use Template Method pattern and make the AR objects to be only a data source.
class Request(Model, Resource):
def get_cpu():
return self._cpu
class Resource(object)
def __le__(self, other:Resource):
return self.get_cpu() <= other.get_cpu()
This is again a solution which i don't like because in this case the AR still the BO. I can't unit test it effectively without reaching the DB, but at least on code level the business part is separated from the Data access part.
4) Complete swap to SqlAlchemy. The problem is that the administartion of AR objects through Django is much easier than it would be with alchemy, but maybe this is the ultimate solution.

Transfering data to REST API without a database django

Basically I have a program which scraps some data from a website, I need to either print it out to a django template or to REST API without using a database. How do I do this without a database?

Your best bet is to
a.) Perform the scraping in views themselves, and pass the info in a context dict to the template
or
b.) Write to a file and have your view pull info from the file.

Django can be run without a database, but it depends on what applications you enable. Some of the default functionality (auth, sites, contenttypes) requires a database. So you'd need to disable those. If you need to use them, you're SOL.
Other functionality (like sessions) usually uses a database, but you can configure it to use a cache or file or something else.
I've taken two approaches in the past:
1) Disable the database completely and disable the applications that require the database:
DATABASES = {}
2) Use a dummy sqlite database just so it works out of box with the default apps without too much tweaking, but don't really use it for anything. I find this method faster and good for setting up quick testing/prototyping.
And to actually get the data from the scraper into your view, you can take a number of approaches. Store the data in a cache, or just write it directly to your context variables, etc.

Couchdb/Mongodb Application/Logic layer, like Oracle DB

At my work, we use Oracle for our database. Which works great. I am not the main db admin, but I do work with it. One thing I like is that the DB has a built in logic layer using PL/SQL which ca handle logic related to saving the data and retrieve it. I really like this because it allows our MVC application (PHP/Zend Framework) to be lighter, and makes it easier to tie in another platform into the data, such as desktop or mobile.
Although, I have a personal project where I want to use couchdb or mongodb, and I want to try and accomplish a similar goal. outside of the mvc/framework, I want to have an API layer that the main applications talk to. they dont actually talk directly to the database. They specify the design document (couchdb) or something similar for mongo, to get the results. And that API layer will validate the incoming data and make sure that data itself is saved and updated properly. Such as saving a new user, in the framework I only need to send a json obejct with the keys/values that need to be saved and the api layer saves the data in the proper places where needed.
This API would probably have a UI, but only for administrative purposes and to make my life easier. In general it will always reply with json strings, or pre-rendered/cached html in some cases. Since each api layer would be specific to the application anyways.
I was wondering if anyone has done anything like this, or had any tips on nethods I could accomplish this. I am currently looking to write my application in python, and the front end will likely be something like Angularjs. Although I am also looking at node.js for a back end.

We do this exact thing at my current job. We have MongoDB on the back end, a RESTful API on top of it and then PHP/Zend on the front end.
Most of our data is read only, so we import that data into MongoDB and then the RESTful API (in Java) just serves it up.
Some things to think about with this approach:
Write generic sorting/paging logic in your API. You'll need this for lists of data. The user can pass in things like http://yourapi.com/entity/1?pageSize=10&page=3.
Make sure to create appropriate indexes in Mongo to match what people will query on. Imagine you are storing users. Make an index in Mongo on the user id field, or just use the _id field that is already indexed in all your calls.
Make sure to include all relevant data in a given document. Mongo doesn't do joins like you're used to in Oracle. Just keep in mind modeling data is very different with a document database.
You seem to want to write a layer (the middle tier API) that is database agnostic. That's a good goal. Just be careful not to let Mongo specific terminology creep into your exposed API. Mongo has specific operators/concepts that you'll need to mask with more generic terms. For example, they have a $set operator. Don't expose that directly.
Finally after having a decent amount of experience with CouchDB and Mongo, I'd definitely go with Mongo.

Alternative to singleton?

I'm a Python & App Engine (and server-side!) newbie, and I'm trying to create very simple CMS. Each deployment of the application would have one -and only one -company object, instantiated from something like:
class Company(db.Model):
name = db.StringPropery()
profile = db.TextProperty()
addr = db.TextProperty()
I'm trying to provide the facility to update the company profile and other details.
My first thought was to have a Company entity singleton. But having looked at (although far from totally grasped) this thread I get the impression that it's difficult, and inadvisable, to do this.
So then I thought that perhaps for each deployment of the CMS I could, as a one-off, run a script (triggered by a totally obscure URL) which simply instantiates Company. From then on, I would get this instance with theCompany = Company.all()[0]
Is this advisable?
Then I remembered that someone in that thread suggested simply using a module. So I just created a Company.py file and stuck a few variables in it. I've tried this in the SDK and it seems to work -to my suprise, modified variable values "survived" between requests.
Forgive my ignorance but, I assume these values are only held in memory rather than on disk -unlike Datastore stuff? Is this a robust solution? (And would the module variables be in scope for all invocations of my application's scripts?)

Global variables are "app-cached." This means that each particular instance of your app will remember these variables' values between requests. However, when an instance is shutdown these values will be lost. Thus I do not think you really want to store these values in module-level variables (unless they are constants which do not need to be updated).
I think your original solution will work fine. You could even create the original entity using the remote API tool so that you don't need an obscure page to instantiate the one and only Company object.
You can also make the retrieval of the singleton Company entity a bit faster if you retrieve it by key.
If you will need to retrieve this entity frequently, then you can avoid round-trips to the datastore by using a caching technique. The fastest would be to app-cache the Company entity after you've retrieved it from the datastore. To protect against the entity from becoming too out of date, you can also app-cache the time you last retrieved the entity and if that time is more than N seconds old then you could re-fetch it from the datastore. For more details on this option and how it compares to alternatives, check out Nick Johnson's article Storage options on App Engine.

It sounds like you are trying to provide a way for your app to be configurable on a per-application basis.
Why not use the datastore to store your company entity with a key_name? Then you will always know how to fetch the company entity, and you'll be able edit the company without redeploying.
company = Company(key_name='c')
# set stuff on company....
company.put()
# later in code...
company = Company.get_by_key_name('c')
Use memcache to store the details of the company and avoid repeated datastore calls.
In addition to memcache, you can use module variables to cache the values. They are cached, as you have seen, between requests.

I think the approach you read about is the simplest:
Use module variables, initialized in None.
Provide accessors (get/setters) for these variables.
When a variable is accessed, if its value is None, fetch it from the database. Otherwise, just use it.
This way, you'll have app-wide variables provided by the module (which won't be instantiated again and again), they will be shared and you won't lose them.

Design pattern to organize non-trivial ORM queries?

I am developing a web API with 10 tables or so in the backend, with several one-to-many and many-to-many associations. The API essentially is a database wrapper that performs validated updates and conditional queries. It's written in Python, and I use SQLAlchemy for ORM and CherryPy for HTTP handling.
So far I have separated the 30-some queries the API performs into functions of their own, which look like this:
# in module "services.inventory"
def find_inventories(session, user_id, *inventory_ids, **kwargs):
query = session.query(Inventory, Product)
query = query.filter_by(user_id=user_id, deleted=False)
...
return query.all()
def find_inventories_by(session, app_id, user_id, by_app_id, by_type, limit, page):
....
# in another service module
def remove_old_goodie(session, app_id, user_id):
try:
old = _current_goodie(session, app_id, user_id)
services.inventory._remove(session, app_id, user_id, [old.id])
except ServiceException, e:
# log it and do stuff
....
The CherryPy request handler calls the query methods, which are scattered across several service modules, as needed. The rationale behind this solution is, since they need to access multiple model classes, they don't belong to individual models, and also these database queries should be separated out from direct handling of API accesses.
I realize that the above code might be called Foreign Methods in the realm of refactoring. I could well live with this way of organizing for a while, but as things are starting to look a little messy, I'm looking for a way to refactor this code.
Since the queries are tied directly to the API and its business logic, they are hard to generalize like getters and setters.
It smells to repeat the session argument like that, but as the current implementation of the API creates a new CherryPy handler instance for each API call and therefore the session object, there is no global way of getting at the current session.
Is there a well-established pattern to organize such queries? Should I stick with the Foreign Methods and just try to unify the function signature (argument ordering, naming conventions etc.)? What would you suggest?

The standard way to have global access to the current session in a threaded environment is ScopedSession. There are some important aspects to get right when integrating with your framework, mainly transaction control and clearing out sessions between requests. A common pattern is to have an autocommit=False (the default) ScopedSession in a module and wrap any business logic execution in a try-catch clause that rolls back in case of exception and commits if the method succeeded, then finally calls Session.remove(). The business logic would then import the Session object into global scope and use it like a regular session.
There seems to be an existing CherryPy-SQLAlchemy integration module, but as I'm not too familiar with CherryPy, I can't comment on its quality.
Having queries encapsulated as functions is just fine. Not everything needs to be in a class. If they get too numerous just split into separate modules by topic.
What I have found useful is too factor out common criteria fragments. They usually fit rather well as classmethods on model classes. Aside from increasing readability and reducing duplication, they work as implementation hiding abstractions up to some extent, making refactoring the database less painful. (Example: instead of (Foo.valid_from <= func.current_timestamp()) & (Foo.valid_until > func.current_timestamp()) you'd have Foo.is_valid())

SQLAlchemy strongly suggests that the session maker be part of some global configuration.
It is intended that the sessionmaker()
function be called within the global
scope of an application, and the
returned class be made available to
the rest of the application as the
single class used to instantiate
sessions.
Queries which are in separate modules isn't an interesting problem. The Django ORM works this way. A web site usually consists of multiple Django "applications", which sounds like your site that has many "service modules".
Knitting together multiple services is the point of an application. There aren't a lot of alternatives that are better.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Advice on database access approach in a custom environment using Python Pyramid - python

Related

How to use active record as DTO to my business objects

Transfering data to REST API without a database django

Couchdb/Mongodb Application/Logic layer, like Oracle DB

Alternative to singleton?

Design pattern to organize non-trivial ORM queries?

Categories

Resources