Temporary views are not supported in CouchDB - python

I'm building a web app using Django and couchdb 2.0.
The new version of couchdb doesn't support temporary views. They recommend using a Mongo query but I couldn't find any useful documentation.
What is the best approach or library to use couchdb 2.0 with Django?

Temporary views were indeed abandoned in CouchDB 2.0. With mango, you could emulate them using a Hack, but that's just as bad (read: performance-wise). The recommendation is to actually use persistent views. As only the delta of new or updated documents need indexing, this will likely need significantly less resources.
As opposed to relational DBs, the created view (which is a persisted index by keys), is meant to be queried many times with different parameters (there is no such a thing as a query optimizer taking your temp view definition or something). So, when you're built heavily on temporary views, you might consider changing the way you query in the first place. One place to start is thinking about which attribute will collapse the result set most quickly to what you're looking for and build a view for that. Then, go query this view with keys and post-filter for the rest.
The closest thing you can do to a temporary view (when you really, really need it) is creating a design doc (e.g. _design/temp<uuid>) and use it for the one query execution.
Just to add a link (not new - but timeless) on the details: http://guide.couchdb.org/draft/views.html

Related

Transfering data to REST API without a database django

Basically I have a program which scraps some data from a website, I need to either print it out to a django template or to REST API without using a database. How do I do this without a database?
Your best bet is to
a.) Perform the scraping in views themselves, and pass the info in a context dict to the template
or
b.) Write to a file and have your view pull info from the file.
Django can be run without a database, but it depends on what applications you enable. Some of the default functionality (auth, sites, contenttypes) requires a database. So you'd need to disable those. If you need to use them, you're SOL.
Other functionality (like sessions) usually uses a database, but you can configure it to use a cache or file or something else.
I've taken two approaches in the past:
1) Disable the database completely and disable the applications that require the database:
DATABASES = {}
2) Use a dummy sqlite database just so it works out of box with the default apps without too much tweaking, but don't really use it for anything. I find this method faster and good for setting up quick testing/prototyping.
And to actually get the data from the scraper into your view, you can take a number of approaches. Store the data in a cache, or just write it directly to your context variables, etc.

When should I use objects.raw() in Django?

I am quite new to Python and I have seen that both
Entries.objects.raw("SELECT * FROM register_entries")
and
Entries.objects.filter()
do the same query.
In which cases is better to use one or the other?
It depends on the database backend that you are using. The first assume that you have a SQL-based database engine. That is not always true. At the opposite, the second one will work in any case (if the backend is designed for). There was for instance few years ago a LDAP backend which was designed so, but LDAP queries do not use SQL language at all.
In all cases, I advice you to use the second one. It is the better way to go if you want to make reusable and long-term code.
There are also other ideas to prefer the second one to the first one
avoiding possible SQL injections ;
no need to know about database design (table's name, fields' name) ;
generic code is better than specific one ;
and moreover, it is shorter...
But you sometimes will have to use the first one when you do specific operations (calling specific backend's functions), but avoid them as much as possible.
In a nutshell, use the second one!!!
From django documentation:
When the model query APIs don’t go far enough, you can fall back to writing raw SQL
For all aspects, django queryset api offers you many options to customize your queries. But in some cases, you need to use very specific queries where django api become insufficient. But before you go for raw SQL, it is best to read all Queryset Api docs and learn everything about django queryset api.

Couchdb/Mongodb Application/Logic layer, like Oracle DB

At my work, we use Oracle for our database. Which works great. I am not the main db admin, but I do work with it. One thing I like is that the DB has a built in logic layer using PL/SQL which ca handle logic related to saving the data and retrieve it. I really like this because it allows our MVC application (PHP/Zend Framework) to be lighter, and makes it easier to tie in another platform into the data, such as desktop or mobile.
Although, I have a personal project where I want to use couchdb or mongodb, and I want to try and accomplish a similar goal. outside of the mvc/framework, I want to have an API layer that the main applications talk to. they dont actually talk directly to the database. They specify the design document (couchdb) or something similar for mongo, to get the results. And that API layer will validate the incoming data and make sure that data itself is saved and updated properly. Such as saving a new user, in the framework I only need to send a json obejct with the keys/values that need to be saved and the api layer saves the data in the proper places where needed.
This API would probably have a UI, but only for administrative purposes and to make my life easier. In general it will always reply with json strings, or pre-rendered/cached html in some cases. Since each api layer would be specific to the application anyways.
I was wondering if anyone has done anything like this, or had any tips on nethods I could accomplish this. I am currently looking to write my application in python, and the front end will likely be something like Angularjs. Although I am also looking at node.js for a back end.
We do this exact thing at my current job. We have MongoDB on the back end, a RESTful API on top of it and then PHP/Zend on the front end.
Most of our data is read only, so we import that data into MongoDB and then the RESTful API (in Java) just serves it up.
Some things to think about with this approach:
Write generic sorting/paging logic in your API. You'll need this for lists of data. The user can pass in things like http://yourapi.com/entity/1?pageSize=10&page=3.
Make sure to create appropriate indexes in Mongo to match what people will query on. Imagine you are storing users. Make an index in Mongo on the user id field, or just use the _id field that is already indexed in all your calls.
Make sure to include all relevant data in a given document. Mongo doesn't do joins like you're used to in Oracle. Just keep in mind modeling data is very different with a document database.
You seem to want to write a layer (the middle tier API) that is database agnostic. That's a good goal. Just be careful not to let Mongo specific terminology creep into your exposed API. Mongo has specific operators/concepts that you'll need to mask with more generic terms. For example, they have a $set operator. Don't expose that directly.
Finally after having a decent amount of experience with CouchDB and Mongo, I'd definitely go with Mongo.

Good way to make a SQL based activity feed faster

Need a way to improve performance on my website's SQL based Activity Feed. We are using Django on Heroku.
Right now we are using actstream, which is a Django App that implements an activity feed using Generic Foreign Keys in the Django ORM. Basically, every action has generic foreign keys to its actor and to any objects that it might be acting on, like this:
Action:
(Clay - actor) wrote a (comment - action object) on (Andrew's review of Starbucks - target)
As we've scaled, its become way too slow, which is understandable because it relies on big, expensive SQL joins.
I see at least two options:
Put a Redis layer on top of the SQL database and get activity feeds from there.
Try to circumvent the Django ORM and do all the queries in raw SQL, which I understand can improve performance.
Any one have thoughts on either of these two, or other ideas, I'd love to hear them.
You might want to look at Materialized Views. Since you're on Heroku, and that uses PostgreSQL generally, you could look at Materialized View Support for PostgreSQL. It is not as mature as for other database servers, but as far as I understand, it can be made to work. To work with the Django ORM, you would probably have to create a new "entity" (not familiar with Django here so modify as needed) for the feed, and then do queries over it as if it was a table. Manual management of the view is a consideration, so look into it carefully before you commit to it.
Hope this helps!
You said redis? Everything is better with redis.
Caching is one of the best ideas in software development, no mather if you use Materialized Views you should also consider trying to cache those, believe me your users will notice the difference.
Went with an approach that sort of combined the two suggestions.
We created a master list of every action in the database, which included all the information we needed about the actions, and stuck it in Redis. Given an action ID, we can now do a Redis look up on it and get a dictionary object that is ready to be returned to the front end.
We also created action id lists that correspond to all the different types of activity streams that are available to a user. So given a user id, we have his friends' activity, his own activity, favorite places activity, etc, available for look up. (These I guess correspond somewhat to materialized views, although they are in Redis, not in PSQL.)
So we get a user's feed as a list of action ids. Then we get the details of those actions by look ups on the ids in the master action list. Then we return the feed to the front end.
Thanks for the suggestions, guys.

Django Admin using RESTful API v.s. Database

This is a bit of a strange question, I know, but bear with me. We've developed a RESTful platform using Python for one of our iPhone apps. The webapp version has been built using Django, which makes use of this API as well. We were thinking it would be a great idea to use Django's built-in control panel capabilities to help manage the data.
This itself isn't the issue. The problem is that everyone has decided it would be best of the admin center was essentially a client that sits on top of the RESTful platform.
So, my question is, is there a way to manipulate the model layer of Django to access our API directly, rather than communicated directly with the database? The model layer would act as the client passing requests and responses to and from the admin center.
I'm sure this is possible, but I'm not so sure as to where I would start. Any input?
I remember I once thought about doing such thing. At the time, I created a custom Manager using a custom QuerySet. And I overrode some methods such as _filter_or_exclude(), count(), exists(), select_related(), ... and added some properties. It took less than a week to become a total mess that had probably no chance to work one day. So I immediately stopped everything and found a more suitable solution.
If I had to do it once again, I would take a long time to consider alternatives. And if it really sounds like the best thing to do, I'd probably create a custom database backend. This backend would, rather than converting Django ORM queries to SQL queries, convert them to HTTP requests.
To do so, I think the best starting point would be to get familiar with django source code concerning database backends.
I also think there are some important things to consider before starting such development:
Is the API able to handle any Django ORM request? Put another way: Will any Django ORM query be translatable to an API request?
If not, may "untranslatable" queries be safely ignored? For instance, an ORDER BY clause might be safe to ignore. While a GROUP BY clause is very unlikely to be safely dismissed.
If some queries can't be neither translated nor ignored, may them be reasonably emulated. For instance, if your API does not support a COUNT() operation, you could emulate it by getting the whole data and count it in python with len(), but is this reasonable?
If they are still some queries that you won't be able to handle (which is more than likely): Are all "common" queries (in this case, all queries potentially used by Django Admin) covered and will it be possible to upgrade the API if an uncovered case is discovered lately or is introduced in a future version of Django?
According to the use case, there are probably tons of other considerations to take, such as:
the integrity of the data
support of transactions
the timing of a query which will be probably much higher than just querying a local (or even remote) database.

Categories