Couchdb/Mongodb Application/Logic layer, like Oracle DB

Couchdb/Mongodb Application/Logic layer, like Oracle DB - python

At my work, we use Oracle for our database. Which works great. I am not the main db admin, but I do work with it. One thing I like is that the DB has a built in logic layer using PL/SQL which ca handle logic related to saving the data and retrieve it. I really like this because it allows our MVC application (PHP/Zend Framework) to be lighter, and makes it easier to tie in another platform into the data, such as desktop or mobile.
Although, I have a personal project where I want to use couchdb or mongodb, and I want to try and accomplish a similar goal. outside of the mvc/framework, I want to have an API layer that the main applications talk to. they dont actually talk directly to the database. They specify the design document (couchdb) or something similar for mongo, to get the results. And that API layer will validate the incoming data and make sure that data itself is saved and updated properly. Such as saving a new user, in the framework I only need to send a json obejct with the keys/values that need to be saved and the api layer saves the data in the proper places where needed.
This API would probably have a UI, but only for administrative purposes and to make my life easier. In general it will always reply with json strings, or pre-rendered/cached html in some cases. Since each api layer would be specific to the application anyways.
I was wondering if anyone has done anything like this, or had any tips on nethods I could accomplish this. I am currently looking to write my application in python, and the front end will likely be something like Angularjs. Although I am also looking at node.js for a back end.

We do this exact thing at my current job. We have MongoDB on the back end, a RESTful API on top of it and then PHP/Zend on the front end.
Most of our data is read only, so we import that data into MongoDB and then the RESTful API (in Java) just serves it up.
Some things to think about with this approach:
Write generic sorting/paging logic in your API. You'll need this for lists of data. The user can pass in things like http://yourapi.com/entity/1?pageSize=10&page=3.
Make sure to create appropriate indexes in Mongo to match what people will query on. Imagine you are storing users. Make an index in Mongo on the user id field, or just use the _id field that is already indexed in all your calls.
Make sure to include all relevant data in a given document. Mongo doesn't do joins like you're used to in Oracle. Just keep in mind modeling data is very different with a document database.
You seem to want to write a layer (the middle tier API) that is database agnostic. That's a good goal. Just be careful not to let Mongo specific terminology creep into your exposed API. Mongo has specific operators/concepts that you'll need to mask with more generic terms. For example, they have a $set operator. Don't expose that directly.
Finally after having a decent amount of experience with CouchDB and Mongo, I'd definitely go with Mongo.

Related

How to generate complex ORM code and REST API with swagger-codegen?

swagger-editor is with codegen tool built in, so I can generate server side code with it. I try the pet store example, and it works. Then my next try is to think about data persistence. I google around, and I often use Python/Java, so I focus on this 2 languages.
First I would like to try python.
In the swagger-codegen tool, it depends on flask/connexion/flask-swagg and so on, so I think connexion is a good start for me. However when I read the generated code, I found that swagger-codegen inherits swagger data Model itself, and I can use connexion with SQLAlchemy to implement db ORM. For example
Connexion-Example. Also I found SAFRS a good place to go, with this I can easily expose all the DB models as REST API, so that I don't need to create duplicate ORM code myself. So I try to find some suggestions, which can help me out, because now I am stuck.
I want to create a API specification file(yaml or json in swagger-editor), and also the DB model. Then generate server side code, including DB model with SQLAlchemy, with all the DB CRUD ops as REST API. My first thinking is to modify swagger template in swagger-codegen, to use SAFRS or connexion. But I would like to discuss here to get some advise on how you manage to do this kind of work?
Thanks.
Andes

Flask website backend structure guidance assistance?

I have a basic personal project website that I am looking to learn some web dev fundamentals with and database (SQL) fundamentals as well (If SQL is even the right technology to use??).
I have the basic skeleton up and running but as I am new to this, I want to make sure I am doing it in the most efficient and "correct" way possible.
Currently the site has a main index (landing) page and from there the user can select one of a few subpages. For the sake of understanding, each of these sub pages represents a different surf break and they each display relevant info about that particular break i.e. wave height, wind, tide.
As I have already been able to successfully scrape this data, my main questions revolve around how would I go about inserting this data into a database for future use (historical graphs, trends)? How would I ensure data is added to this database in a continuous manner (once/day)? How would I use data that was scraped from an earlier time, say at noon, to be displayed/used at 12:05 PM rather than scraping it again?
Any other tips, guidance, or resources you can point me to are much appreciated.

This kind of data is called time series. There are specialized database engines for time series, but with a not-extreme volume of observations - (timestamp, wave heigh, wind, tide, which break it is) tuples - a SQL database will be perfectly fine.
Try to model your data as a table in Postgres or MySQL. Start by making a table and manually inserting some fake data in a GUI client for your database. When it looks right, you have your schema. The corresponding CREATE TABLE statement is your DDL. You should be able to write SELECT queries against your table that yield the data you want to show on your webapp. If these queries are awkward, it's a sign that your schema needs revision. Save your DDL. It's (sort of) part of your source code. I imagine two tables: a listing of surf breaks, and a listing of observations. Each row in the listing of observations would reference the listing of surf breaks. If you're on a Mac, Sequel Pro is a decent tool for playing around with a MySQL database, and playing around is probably the best way to learn to use one.
Next, try to insert data to the table from a Python script. Starting with fake data is fine, but mold your Python script to read from your upstream source (the result of scraping) and insert into the table. What does your scraping code output? Is it a function you can call? A CSV you can read? That'll dictate how this script works.
It'll help if this import script is idempotent: you can run it multiple times and it won't make a mess by inserting duplicate rows. It'll also help if this is incremental: once your dataset grows large, it will be very expensive to recompute the whole thing. Try to deal with importing a specific interval at a time. A command-line tool is fine. You can specify the interval as a command-line argument, or figure out out from the current time.
The general problem here, loading data from one system into another on a regular schedule, is called ETL. You have a very simple case of it, and can use very simple tools, but if you want to read about it, that's what it's called. If instead you could get a continuous stream of observations - say, straight from the sensors - you would have a streaming ingestion problem.
You can use the Linux subsystem cron to make this script run on a schedule. You'll want to know whether it ran successfully - this opens a whole other can of worms about monitoring and alerting. There are various open-source systems that will let you emit metrics from your programs, basically a "hey, this happened" tick, see these metrics plotted on graphs, and ask to be emailed/texted/paged if something is happening too frequently or too infrequently. (These systems are, incidentally, one of the main applications of time-series databases). Don't get bogged down with this upfront, but keep it in mind. Statsd, Grafana, and Prometheus are some names to get you started Googling in this direction. You could also simply have your script send an email on success or failure, but people tend to start ignoring such emails.
You'll have written some functions to interact with your database engine. Extract these in a Python module. This forms the basis of your Data Access Layer. Reuse it in your Flask application. This will be easiest if you keep all this stuff in the same Git repository. You can use your chosen database engine's Python client directly, or you can use an abstraction layer like SQLAlchemy. This decision is controversial and people will have opinions, but just pick one. Whatever database API you pick, please learn what a SQL injection attack is and how to use user-supplied data in queries without opening yourself up to SQL injection. Your database API's documentation should cover the latter.
The / page of your Flask application will be based on a SQL query like SELECT * FROM surf_breaks. Render a link to the break-specific page for each one.
You'll have another page like /breaks/n where n identifies a surf break (an integer that increments as you insert surf break rows is customary). This page will be based on a query like SELECT * FROM observations WHERE surf_break_id = n. In each case, you'll call functions in your Data Access Layer for a list of rows, and then in a template, iterate through those rows and render some HTML. There are various Javascript and Python graphing libraries you can feed this list of rows into and get graphs out of (client side or server side). If you're interested in something like a week-over-week change, you should be able to express that in one SQL query and get that dataset directly from the database engine.
For performance, try not to get in a situation where more than one SQL query happens during a page load. By default, you'll be doing some unnecessary work by going back to the database and recomputing the page every time someone requests it. If this becomes a problem, you can add a reverse proxy cache in front of your Flask app. In your case this is easy, since nothing users do to the app cause its content to change. Simply invalidate the cache when you import new data.

Use django to expose python functions on the web

I have not worked with Django seriously and my only experience is the tutorials on their site.
I am trying to write my own application now, and what I want is to have some sort of API. My idea is that I will later be able to use it with a client written in any other language.
I have the simplest of all apps, a model that has a name and surname field.
So the idea is that I can now write an app lets say in c++ that will send two strings to my Django app so they can be saved in the database as name, surname respectively.
What I know until now is to create a form so a user can enter that information, or have the information in the url, and of curse adding them myself from the admin menu.
What I want though is some other better way, maybe creating a packet that contains that data. Later my client sends this data to my Django webpage and it will extract the info and save it as needed. But I do not know how to do this.
If my suggested method is a good idea, then I would like an example of how this is done. If not the I would like suggestions for possible things I could try out.

Typically, as stated by #DanielRoseman, you certainly want to:
Create a REST API to get data from another web site
Get data, typically in JSON or XML, that will contain all the required data (name and surname)
In the REST controller, Convert this data to the Model and save the Model to the database
Send an answer.
More information here: http://www.django-rest-framework.org/

Good way to make a SQL based activity feed faster

Need a way to improve performance on my website's SQL based Activity Feed. We are using Django on Heroku.
Right now we are using actstream, which is a Django App that implements an activity feed using Generic Foreign Keys in the Django ORM. Basically, every action has generic foreign keys to its actor and to any objects that it might be acting on, like this:
Action:
(Clay - actor) wrote a (comment - action object) on (Andrew's review of Starbucks - target)
As we've scaled, its become way too slow, which is understandable because it relies on big, expensive SQL joins.
I see at least two options:
Put a Redis layer on top of the SQL database and get activity feeds from there.
Try to circumvent the Django ORM and do all the queries in raw SQL, which I understand can improve performance.
Any one have thoughts on either of these two, or other ideas, I'd love to hear them.

You might want to look at Materialized Views. Since you're on Heroku, and that uses PostgreSQL generally, you could look at Materialized View Support for PostgreSQL. It is not as mature as for other database servers, but as far as I understand, it can be made to work. To work with the Django ORM, you would probably have to create a new "entity" (not familiar with Django here so modify as needed) for the feed, and then do queries over it as if it was a table. Manual management of the view is a consideration, so look into it carefully before you commit to it.
Hope this helps!

You said redis? Everything is better with redis.
Caching is one of the best ideas in software development, no mather if you use Materialized Views you should also consider trying to cache those, believe me your users will notice the difference.

Went with an approach that sort of combined the two suggestions.
We created a master list of every action in the database, which included all the information we needed about the actions, and stuck it in Redis. Given an action ID, we can now do a Redis look up on it and get a dictionary object that is ready to be returned to the front end.
We also created action id lists that correspond to all the different types of activity streams that are available to a user. So given a user id, we have his friends' activity, his own activity, favorite places activity, etc, available for look up. (These I guess correspond somewhat to materialized views, although they are in Redis, not in PSQL.)
So we get a user's feed as a list of action ids. Then we get the details of those actions by look ups on the ids in the master action list. Then we return the feed to the front end.
Thanks for the suggestions, guys.

Django Admin using RESTful API v.s. Database

This is a bit of a strange question, I know, but bear with me. We've developed a RESTful platform using Python for one of our iPhone apps. The webapp version has been built using Django, which makes use of this API as well. We were thinking it would be a great idea to use Django's built-in control panel capabilities to help manage the data.
This itself isn't the issue. The problem is that everyone has decided it would be best of the admin center was essentially a client that sits on top of the RESTful platform.
So, my question is, is there a way to manipulate the model layer of Django to access our API directly, rather than communicated directly with the database? The model layer would act as the client passing requests and responses to and from the admin center.
I'm sure this is possible, but I'm not so sure as to where I would start. Any input?

I remember I once thought about doing such thing. At the time, I created a custom Manager using a custom QuerySet. And I overrode some methods such as _filter_or_exclude(), count(), exists(), select_related(), ... and added some properties. It took less than a week to become a total mess that had probably no chance to work one day. So I immediately stopped everything and found a more suitable solution.
If I had to do it once again, I would take a long time to consider alternatives. And if it really sounds like the best thing to do, I'd probably create a custom database backend. This backend would, rather than converting Django ORM queries to SQL queries, convert them to HTTP requests.
To do so, I think the best starting point would be to get familiar with django source code concerning database backends.
I also think there are some important things to consider before starting such development:
Is the API able to handle any Django ORM request? Put another way: Will any Django ORM query be translatable to an API request?
If not, may "untranslatable" queries be safely ignored? For instance, an ORDER BY clause might be safe to ignore. While a GROUP BY clause is very unlikely to be safely dismissed.
If some queries can't be neither translated nor ignored, may them be reasonably emulated. For instance, if your API does not support a COUNT() operation, you could emulate it by getting the whole data and count it in python with len(), but is this reasonable?
If they are still some queries that you won't be able to handle (which is more than likely): Are all "common" queries (in this case, all queries potentially used by Django Admin) covered and will it be possible to upgrade the API if an uncovered case is discovered lately or is introduced in a future version of Django?
According to the use case, there are probably tons of other considerations to take, such as:
the integrity of the data
support of transactions
the timing of a query which will be probably much higher than just querying a local (or even remote) database.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.