I want to build a web app (SPA) that send the schema (not necessarily the raw db schema, but a representation of the data potentially in a JSON format) to the view, so in the view we can:
Generate grids based on that schema instead of wiring columns
Handle additional information from these fields, such as if it is editable or not, and the like.
This web app will allow users to see tabular information in a grid, and potentially do CRUD operations.
I see a lot of benefits on using the schema (We can implement validators based on the schema, forms generation should be very simple, and best, the impact of adding a simple field to the web app should be easily handled)
My question is: Is it a good strategy? Could you help me identify some drawbacks with this approach? (The stack I am using for this is not very important, but just for the sake of clarity, I am using Bottle (python) in the backend, and React in frontend)
One drawback I see is the time consumed to maintain this addition that you mention: schema generation, sending, interpreting in the view. But of course this is for you to decide, if this overhead is compensated by the advantages you mentioned. If it is, then go for it.
One other thing I would mention is: you want to do validation based on this schema. How many of the validations in your application can be done this way? Are there many cases in which validation will not fit in this pattern? Same question for grid generation, form generation, etc. If there are a lot, then maybe it is not worth it. I have more then once found a automatic solution like this that got me excited only to see later that it does exceptions for this pattern are many and overall I did not gain to much :).
Overall, you decide. One last thing: try to think long term. 90% of the lifetime of an application is spent in maintenance. Try to understand what happens after you release the application and bugs/small features requests are starting to come up.
Related
So I'm moving from PostgreSQL to Redis as main database in my project to get the best performance. Now my code contains a lot of HSET, HMSET, HGET, SADD, SREM, SINTER, SORT and other calls to Redis spread everywhere throughout the code. We don't have any ORM with Redis nor any data model description. While I have the structure of my data in my mind, it's not a big problem for me to maintain this code. But once another dev needs to work with it, or even me after just a year, it will become a huge pain to get it all together and be sure you don't miss anything.
Thus my question. What are the best practices for organizing the data-storage code without ORMs and proper data models? I see a lot of articles on how to call Redis here and there to store data in it, but no recommendations whatsoever on how to do it without making your code a mess.
Ended up using rom: https://github.com/josiahcarlson/rom
It allowed me to describe models clearly and use active record to work with them. Takes care of indexes and stuff under the hood, quite simple to use and nice
I am building my first Django web application and I need a bit of advice on code layout.
I need to integrate it with several other applications that are exposed through RESTful APIs and additionally Django's internal data. I need to develop the component that will pull data from various sources, django's DB, format it consistently for output and return it for rendering by the template.
I am thinking of the best way to write this component. There are a couple of ways to proceed and I wanted to solicit some feedback from more experienced web developers on things I may have missed.
Way #1
Develop a completely standalone objects for interacting with other applications via their APIs, etc... This would not have anything related with django and test independently. Then in django, import this module in the views that need it, run object methods to get required data, etc...
If I need to access any of this functionality via a GET request (like through JavaScript), I can have a dedicated view that imports the module and returns json.
Way #2
Develop this completely as django view(s) expose as a series of GET/POST calls that would all call each other to get the work done. This would be directly tied in the application.
Way #3
Start with Way #1, but instead of creating a view, package it as a django app. Develop unit tests on the app as well as the individual objects.
I think that way 1 or 3 would be very much encapsulated and controlled.
Way 2 would be more complicated, but facilitate higher component re-use.
What is better from a performance standpoint? If I roll with Way #1 or 3, would an instance of the object be instantiated for each request?
If so this approach may be a bit too heavy for this. If I proceed with this, can they be singletons?
Anyway, I hope this makes sense.
thanks in advance.
Definitely go for "way #1". Keeping an independent layer for your service(s) API will help a lot later when you have to enhance that layer for reliability or extending the API.
Stay away from singletons, they're just global variables with a new name. Use an appropriate life cycle for your interface objects. The obvious "instantiate for each request" is not the worst idea, and it's easier to optimize that if needed (by caching and/or memoization) than it's to unroll global vars everywhere.
Keep in mind that web applications are supposed to be several processes on a "shared nothing" design; the only shared resources must be external to the app: database, queue managers, cache storage.
Finally, try to avoid using this API directly from the view functions/CBV. Either use them from your models, or write a layer conceptually similar to the way models are used from the views. No need of an ORM-like api, but keep any 'business process' away from the views, which should be concerned only with the presentation and high level operations. Think "thick model, thin views" with your APIs as a new kind of models.
I'm trying to pull similar data in from several third party APIs, all of which have slightly varying schemas, and convert them all into a unified schema to store in a DB and expose through a unified API. This is actually a re-write of a system that already does this, minus storing in the DB, but which is hard to test and not very elegant. I figured I'd turn to the community for some wisdom.
Here are some thoughts/what I'd like to achieve.
An easy way to specify schema mappings from the external APIs schema to the internal schema. I realize that some nuances in the data might be lost by converting to a unified schema, but that's life. This schema mapping might not be easy to do and perhaps overkill from the academic papers I've found on the matter.
An alternative solution would be to allow third parties to develop the interfaces to the external APIs. The code quality of these third parties may or may not be known, but could be established via thorough tests.
Therefore the system should be easy to test, I'm thinking by mocking the external API calls to have reproducible data and ensure that parsing and conversion is being done correctly.
One of the external API interfaces crashing should not bring down the rest of them.
Some sort of schema validation/way to detect if the external API schemas have changed without warning
This will end up being integrated into a Django project, so it could be written as a Django app, which would likely make unit and integration testing easier. On the other hand, I would like to keep it as decoupled as possible from Django. Although the API interfaces would have to know what format to convert to, could this be specified at runtime?
Am I missing anything in the wishlist? Unrealistic? Headed down the wrong path? Would love to get some feedback.
I'm not sure if there are libraries/OS project which already do some of this. The less wheels I have to reinvent the better. Would any part of this be valuable as an OS project?
In the previous version I spawned a bunch of threads that would handle individual requests. Although I've never used it, I've been told I should look at gevent as a way to handle this.
For your second bullet point you should check out Temboo. Temboo normalizes access to over 100 APIs, meaning that you can talk to them all using a common syntax in the language of your choice. In this case you would use the Temboo Python SDK - available here.
(Full disclosure: I work at Temboo)
I am thinking about creating an open source data management web application for various types of data.
A privileged user must be able to
add new entity types (for example a 'user' or a 'family')
add new properties to entity types (for example 'gender' to 'user')
remove/modify entities and properties
These will be common tasks for the privileged user. He will do this through the web interface of the application. In the end, all data must be searchable and sortable by all types of users of the application. Two questions trouble me:
a) How should the data be stored in the database? Should I dynamically add/remove database tables and/or columns during runtime?
I am no database expert. I am stuck with the imagination that in terms of relational databases, the application has to be able to dynamically add/remove tables (entities) and/or columns (properties) at runtime. And I don't like this idea. Likewise, I am thinking if such dynamic data should be handled in a NoSQL database.
Anyway, I believe that this kind of problem has an intelligent canonical solution, which I just did not find and think of so far. What is the best approach for this kind of dynamic data management?
b) How to implement this in Python using an ORM or NoSQL?
If you recommend using a relational database model, then I would like to use SQLAlchemy. However, I don't see how to dynamically create tables/columns with an ORM at runtime. This is one of the reasons why I hope that there is a much better approach than creating tables and columns during runtime. Is the recommended database model efficiently implementable with SQLAlchemy?
If you recommend using a NoSQL database, which one? I like using Redis -- can you imagine an efficient implementation based on Redis?
Thanks for your suggestions!
Edit in response to some comments:
The idea is that all instances ("rows") of a certain entity ("table") share the same set of properties/attributes ("columns"). However, it will be perfectly valid if certain instances have an empty value for certain properties/attributes.
Basically, users will search the data through a simple form on a website. They query for e.g. all instances of an entity E with property P having a value V higher than T. The result can be sorted by the value of any property.
The datasets won't become too large. Hence, I think even the stupidest approach would still lead to a working system. However, I am an enthusiast and I'd like to apply modern and appropriate technology as well as I'd like to be aware of theoretical bottlenecks. I want to use this project in order to gather experience in designing a "Pythonic", state-of-the-art, scalable, and reliable web application.
I see that the first comments tend to recommending a NoSQL approach. Although I really like Redis, it looks like it would be stupid not to take advantage of the Document/Collection model of Mongo/Couch. I've been looking into mongodb and mongoengine for Python. By doing so, do I take steps into the right direction?
Edit 2 in response to some answers/comments:
From most of your answers, I conclude that the dynamic creation/deletion of tables and columns in the relational picture is not the way to go. This already is valuable information. Also, one opinion is that the whole idea of the dynamic modification of entities and properties could be bad design.
As exactly this dynamic nature should be the main purpose/feature of the application, I don't give up on this. From the theoretical point of view, I accept that performing operations on a dynamic data model must necessarily be slower than performing operations on a static data model. This is totally fine.
Expressed in an abstract way, the application needs to manage
the data layout, i.e. a "dynamic list" of valid entity types and a "dynamic list" of properties for each valid entity type
the data itself
I am looking for an intelligent and efficient way to implement this. From your answers, it looks like NoSQL is the way to go here, which is another important conclusion.
The SQL or NoSQL choice is not your problem. You need to read little more about database design in general. As you said, you're not a database expert(and you don't need to be), but
you absolutely must study a little more the RDBMS paradigm.
It's a common mistake for amateur enthusiasts to choose a NoSQL solution. Sometimes NoSQL
is a good solution, most of the times is not.
Take for example MongoDB, which you mentioned(and is one of the good NoSQL solutions I've tried). Schema-less, right? Err.. not exactly. You see when something is schema-less means no constraints, validation, etc. But your application's models/entities can't stand on thin air! Surely there will be some constraints and validation logic which you will implement on your software layer. So I give you mongokit! I will just quote from the project's description this tiny bit
MongoKit brings structured schema and validation layer on top of the
great pymongo driver
Hmmm... unstructured became structured.
At least we don't have SQL right? Yeah, we don't. We have a different query language which is of course inferior to SQL. At least you don't need to resort to map/reduce for basic queries(see CouchDB).
Don't get me wrong, NoSQL(and especially MongoDB) has its purpose, but most of the times these technologies are used for the wrong reason.
Also, if you care about serious persistence and data integrity forget about NoSQL solutions.
All these technologies are too experimental to keep your serious data. By researching a bit who(except Google/Amazon) uses NoSQL solutions and for what exactly, you will find that almost no one uses it for keeping their important data. They mostly use them for logging, messages and real time data. Basically anything to off-load some burden from their SQL db storage.
Redis, in my opinion, is probably the only project who is going to survive the NoSQL explosion unscathed. Maybe because it doesn't advertise itself as NoSQL, but as a key-value store, which is exactly what it is and a pretty damn good one! Also they seem serious about persistence. It is a swiss army knife, but not a good solution to replace entirely your RDBMS.
I am sorry, I said too much :)
So here is my suggestion:
1) Study the RDBMS model a bit.
2) Django is a good framework if most of your project is going to use an RDBMS.
3) Postgresql rocks! Also keep in mind that version 9.2 will bring native JSON support. You could dump all your 'dynamic' properties in there and you could use a secondary storage/engine to perform queries(map/reduce) on said properties. Have your cake and eat it too!
4) For serious search capabilities consider specialized engines like solr.
EDIT: 6 April 2013
5) django-ext-hstore gives you access to postgresql hstore type. It's similar
to a python dictionary and you can perform queries on it, with the limitation that
you can't have nested dictionaries as values. Also the value of key can be only
of type string.
Have fun
Update in response to OP's comment
0) Consider the application 'contains data' and has already been used
for a while
I am not sure if you mean that it contains data in a legacy dbms or you are just
trying to say that "imagine that the DB is not empty and consider the following points...".
In the former case, it seems a migration issue(completely different question), in the latter, well OK.
1) Admin deletes entity "family" and all related data
Why should someone eliminate completely an entity(table)? Either your application has to do with families, houses, etc or it doesn't. Deleting instances(rows) of families is understandable of course.
2) Admin creates entity "house"
Same with #1. If you introduce a brand new entity in your app then most probably it will encapsulate semantics and business logic, for which new code must be written. This happens to all applications as they evolve through time and of course warrants a creation of a new table, or maybe ALTERing an existing one. But this process is not a part of the functionality of your application. i.e. it happens rarely, and is a migration/refactoring issue.
3) Admin adds properties "floors", "age", ..
Why? Don't we know beforehand that a House has floors? That a User has a gender?
Adding and removing, dynamically, this type of attributes is not a feature, but a design flaw. It is part of the analysis/design phase to identify your entities and their respective properties.
4) Privileged user adds some houses.
Yes, he is adding an instance(row) to the existing entity(table) House.
5) User searches for all houses with at least five floors cheaper than
100 $
A perfectly valid query which can be achieved with either SQL or NoSQL solution.
In django it would be something along those lines:
House.objects.filter(floors__gte=5, price__lt=100)
provided that House has the attributes floors and price. But if you need to do text-based queries, then neither SQL nor NoSQL will be satisfying enough. Because you don't want to implement faceting or stemming on your own! You will use some of the already discussed solutions(Solr, ElasticSearch, etc).
Some more general notes:
The examples you gave about Houses, Users and their properties, do not warrant any dynamic schema. Maybe you simplified your example just to make your point, but you talk about adding/removing Entities(tables) as if they are rows in a db. Entities are supposed to be a big deal in an application. They define the purpose of your application and its functionality. As such, they can't change every minute.
Also you said:
The idea is that all instances ("rows") of a certain entity ("table") share the same set of properties/attributes ("columns"). However, it will be perfectly valid if certain instances have an empty value for certain properties/attributes.
This seems like a common case where an attribute has null=True.
And as a final note, I would like to suggest to you to try both approaches(SQL and NoSQL), since it doesn't seem like your career depends on this project. It will be a beneficiary experience, as you will understand first-hand, the cons and pros of each approach. Or even how to "blend" these approaches together.
What you're asking about is a common requirement in many systems -- how to extend a core data model to handle user-defined data. That's a popular requirement for packaged software (where it is typically handled one way) and open-source software (where it is handled another way).
The earlier advice to learn more about RDBMS design generally can't hurt. What I will add to that is, don't fall into the trap of re-implementing a relational database in your own application-specific data model! I have seen this done many times, usually in packaged software. Not wanting to expose the core data model (or permission to alter it) to end users, the developer creates a generic data structure and an app interface that allows the end user to define entities, fields etc. but not using the RDBMS facilities. That's usually a mistake because it's hard to be nearly as thorough or bug-free as what a seasoned RDBMS can just do for you, and it can take a lot of time. It's tempting but IMHO not a good idea.
Assuming the data model changes are global (shared by all users once admin has made them), the way I would approach this problem would be to create an app interface to sit between the admin user and the RDBMS, and apply whatever rules you need to apply to the data model changes, but then pass the final changes to the RDBMS. So for example, you may have rules that say entity names need to follow a certain format, new entities are allowed to have foreign keys to existing tables but must always use the DELETE CASCADE rule, fields can only be of certain data types, all fields must have default values etc. You could have a very simple screen asking the user to provide entity name, field names & defaults etc. and then generate the SQL code (inclusive of all your rules) to make these changes to your database.
Some common rules & how you would address them would be things like:
-- if a field is not null and has a default value, and there are already existing records in the table before that field was added by the admin, update existing records to have the default value while creating the field (multiple steps -- add field allowing null; update all existing records; alter the table to enforce not null w/ default) -- otherwise you wouldn't be able to use a field-level integrity rule)
-- new tables must have a distinct naming pattern so you can continue to distinguish your core data model from the user-extended data model, i.e. core and user-defined have different RDBMS owners (dbo. vs. user.) or prefixes (none for core, __ for user-defined) or somesuch.
-- it is OK to add fields to tables that are in the core data model (as long as they tolerate nulls or have a default), and it is OK for admin to delete fields that admin added to core data model tables, but admin cannot delete fields that were defined as part of the core data model.
In other words -- use the power of the RDBMS to define the tables and manage the data, but in order to ensure whatever conventions or rules you need will always be applied, do this by building an app-to-DB admin function, instead of giving the admin user direct DB access.
If you really wanted to do this via the DB layer only, you could probably achieve the same by creating a bunch of stored procedures and triggers that would implement the same logic (and who knows, maybe you would do that anyway for your app). That's probably more of a question of how comfortable are your admin users working in the DB tier vs. via an intermediary app.
So to answer your questions directly:
(1) Yes, add tables and columns at run time, but think about the rules you will need to have to ensure your app can work even once user-defined data is added, and choose a way to enforce those rules (via app or via DB / stored procs or whatever) when you process the table & field changes.
(2) This issue isn't strongly affected by your choice of SQL vs. NoSQL engine. In every case, you have a core data model and an extended data model. If you can design your app to respond to a dynamic data model (e.g. add new fields to screens when fields are added to a DB table or whatever) then your app will respond nicely to changes in both the core and user-defined data model. That's an interesting challenge but not much affected by choice of DB implementation style.
Good luck!
Maybe doesn't matter the persistence engine of your model objects (RDBMS, NoSQL, etc...). The technology you're looking for is an index to search for and find your objects.
I think you need to find your objects using their schema. So, if the schema is defined dynamically and persisted on a database you can build dynamic search forms, etc.. Some kind of reference of the entity and attributes to the real objects is needed.
Give a look to the Entity-Attribute-Model pattern (EAV). This can be implemented over SQLAlchemy to use an RDBMS database as mean to store vertically your schema and data and relate thems.
You're entering in the field of the Semantic Web Programming, maybe you should read at less the first chapter of this book:
Programming The Semantic Web
it tells the whole story of your problem: from rigid schemas to dynamic schemas, implemented first as a key-value store and later improved to a graph persistence over a relational model.
My opinion is that the best implementations of this could be achieved nowadays with graph databases and a very good example of current implementations are Berkeley DBs (some LDAP implementations use Berkeley DBs as a tech implementation to this indexing problem.)
Once in a graph model you could do some kind of "inferences" on the graph, making appear the DB with somekind of "intelligence". An example of this is stated on the book.
So, if you conceptualize your entities as "documents," then this whole problem maps onto a no-sql solution pretty well. As commented, you'll need to have some kind of model layer that sits on top of your document store and performs tasks like validation, and perhaps enforces (or encourages) some kind of schema, because there's no implicit backend requirement that entities in the same collection (parallel to table) share schema.
Allowing privileged users to change your schema concept (as opposed to just adding fields to individual documents - that's easy to support) will pose a little bit of a challenge - you'll have to handle migrating the existing data to match the new schema automatically.
Reading your edits, Mongo supports the kind of searching/ordering you're looking for, and will give you the support for "empty cells" (documents lacking a particular key) that you need.
If I were you (and I happen to be working on a similar, but simpler, product at the moment), I'd stick with Mongo and look into a lightweight web framework like Flask to provide the front-end. You'll be on your own to provide the model, but you won't be fighting against a framework's implicit modeling choices.
A few weeks ago I asked the question "Is a PHP, Python, PostgreSQL design suitable for a non-web business application?" Is a PHP, Python, PostgreSQL design suitable for a business application?
A lot of the answers recommended skipping the PHP piece and using Django to build the application. As I've explored Django, I've started to question one specific aspect of my goals and how Django comes into play for a non-web business application.
Based on my understanding, Django would manage both the view and controller pieces and PostgreSQL or MySQL would handle the data. But my goal was to clearly separate the layers so that the database, domain logic, and presentation could each be changed without significantly affecting the others. It seems like I'm only separating the M from the VC layers with the Django solution.
So, is it counterproductive for me to build the domain layer in Python with an SQL Alchemy/Elixir ORM tool, PostgreSQL for the database layer, and then still use Django or PHP for the presentation layer? Is this possible or pure insanity?
Basically, I'd be looking at an architecture of Django/PHP > Python/SQLAlchemy > PostgreSQL/MySQL.
Edit: Before the fanboys get mad at me for asking a question about Django, just realize: It's a question, not an accusation. If I knew the answer or had my own opinion, I wouldn't have asked!
You seem to be saying that choosing Django would prevent you from using a more heterogenous solution later. This isn't the case. Django provides a number of interesting connections between the layers, and using Django for all the layers lets you take advantage of those connections. For example, using the Django ORM means that you get the great Django admin app almost for free.
You can choose to use a different ORM within Django, you just won't get the admin app (or generic views, for example) along with it. So a different ORM takes you a step backward from full Django top-to-bottom, but it isn't a step backward from other heterogenous solutions, because those solutions didn't give you intra-layer goodness the admin app in the first place.
Django shouldn't be criticized for not providing a flexible architecture: it's as flexible as any other solution, you just forgo some of the Django benefits if you choose to swap out a layer.
If you choose to start with Django, you can use the Django ORM now, and then later, if you need to switch, you can change over to SQLalchemy. That will be no more difficult than starting with SQLalchemy now and later moving to some other ORM solution.
You haven't said why you anticipate needing to swap out layers. It will be a painful process no matter what, because there is necessarily much code that relies on the behavior of whichever toolset and library you're currently using.
Django will happily let you use whatever libraries you want for whatever you want to use them for -- you want a different ORM, use it, you want a different template engine, use it, and so on -- but is designed to provide a common default stack used by many interoperable applications. In other words, if you swap out an ORM or a template system, you'll lose compatibility with a lot of applications, but the ability to take advantage of a large base of applications typically outweighs this.
In broader terms, however, I'd advise you to spend a bit more time reading up on architectural patterns for web applications, since you seem to have some major conceptual confusion going on. One might just as easily say that, for example, Rails doesn't have a "view" layer since you could use different file systems as the storage location for the view code (in other words: being able to change where and how the data is stored by your model layer doesn't mean you don't have a model layer).
(and it goes without saying that it's also important to know why "strict" or "pure" MVC is an absolutely horrid fit for web applications; MVC in its pure form is useful for applications with many independent ways to initiate interaction, like a word processor with lots of toolbars and input panes, but its benefits quickly start to disappear when you move to the web and have only one way -- an HTTP request -- to interact with the application. This is why there are no "true" MVC web frameworks; they all borrow certain ideas about separation of concerns, but none of them implement the pattern strictly)
You seem to be confusing "separate layers" with "separate languages/technologies." There is no reason you can't separate your concerns appropriately within a single programming language, or within an appropriately modular framework (such as Django). Needlessly multiplying programming languages / frameworks is just needlessly multiplying complexity, which is likely to slow down your initial efforts so much that your project will never reach the point where it needs a technology switch.
You've effectively got a 3 layer architecture whether you use Django's ORM or SQLAlchemy, though your forgo some of the Django's benefits if you choose the latter.
Based on my understanding, Django would manage both the view and controller pieces and PostgreSQL or MySQL would handle the data.
Not really, Django has its own ORM, so it does separate data from view/controller.
here's an entry from the official FAQ about MVC:
Where does the “controller” fit in, then? In Django’s case, it’s probably the framework itself: the machinery that sends a request to the appropriate view, according to the Django URL configuration.
If you’re hungry for acronyms, you might say that Django is a “MTV” framework – that is, “model”, “template”, and “view.” That breakdown makes much more sense.
At the end of the day, of course, it comes down to getting stuff done. And, regardless of how things are named, Django gets stuff done in a way that’s most logical to us.
There's change and there's change. Django utterly seperates domain model, business rules and presentation. You can change (almost) anything with near zero breakage. And by change I am talking about meaningful end-user focused business change.
The technology mix-n-match in this (and the previous) question isn't really very helpful.
Who -- specifically -- benefits from replacing Django templates with PHP. It's not more productive. It's more complex for no benefit. Who benefits from changing ORM's. The users won't and can't tell.