Python MongoDB Multiple Database Design Query

Python MongoDB Multiple Database Design Query - python

I was hoping someone with more in-depth knowledge of MongoDB could provide some feedback on the implementation for my database requirements.
I am currently implementing a web application using Flask and MongoDB for my client who's business has multiple physical locations. They, as the Managing Director, want to be able to access information for each of the locations individually but they want to restrict the managers of each location to just their information.
I can envisage two different solutions to this but am not familiar enough with the underlying technology/design of MongoDB to fully appreciate the pros and cons of each implementation.
For each solution I have a user model that contains which location(s) a user has access to.
Solution 1 would involve each document having a field to record which location the object belonged to and then including that as a filter in every query. This seems the simplest to implement but would result in a single large database, could there be scaling issues?
Solution 2 involves having a separate user database, and an individual database for each location. When a request is made the application would have to establish from the user database which location they are accessing and connect to that database. This at first appears more complex to implement but would result in smaller isolated databases which may make it easier to scale but possibly harder to manage.
I would greatly appreciate any feedback on my solutions or any other solutions I have not considered.
Many thanks in advance.

Related

Using a consistent ODM for both PHP and Python

First question so please excuse any formalities I miss.
I am developing a system - the backend in Python, the frontend in PHP. The system will leverage a MongoDB database for a number of reasons. I plan to use an Object Document Mapper to make database lookups, and relationships, between pieces of data relatively trivial.
Adding this information as requested
The plan I envision is that the PHP interface would allow users to view the data, and navigate through to connected information, leveraging the relationship defined in the database. For example
Organisation > IP Ranges > IPs > TCP ports open on this IPs
The Python side is the system collecting this information, entering it into the database, and defining/establishing the relationship between the data (i.e. this IP range belongs to this organisation)
I can see that there are many options for both PHP, and Python, independently - but there dont appear to be any solutions that support both PHP and Python at the same time. Is there a way I can utilise the same ODM across both languages to ensure that data can be referenced/'looked up' in the same manner - or, is there a better solution for this that I am unaware of?
perhaps the simplified version of the question is - is there a language agnostic way I can/should be querying data, with defined relationships in that data, with mongodb?
Thanks! :-)

Dynamic database tables in django

I am working on a project which requires me to create a table of every user who registers on the website using the username of that user. The columns in the table are same for every user.
While researching I found this Django dynamic model fields. I am not sure how to use django-mutant to accomplish this. Also, is there any way I could do this without using any external apps?
PS : The backend that I am using is Mysql

An interesting question, which might be of wider interest.
Creating one table per user is a maintenance nightmare. You should instead define a single table to hold all users' data, and then use the database's capabilities to retrieve only those rows pertaining to the user of interest (after checking permissions if necessary, since it is not a good idea to give any user unrestricted access to another user's data without specific permissions having been set).
Adopting your proposed solution requires that you construct SQL statements containing the relevant user's table name. Successive queries to the database will mostly be different, and this will slow the work down because every SQL statement has to be “prepared” (the syntax has to be checked, the names of table and columns has to be verified, the requesting user's permission to access the named resources has to be authorized, and so on).
By using a single table (model) the same queries can be used repeatedly, with parameters used to vary specific data values (in this case the name of the user whose data is being sought). Your database work will move along faster, you will only need a single model to describe all users' data, and database management will not be a nightmare.
A further advantage is that Django (which you appear to be using) has an extensive user-based permission model, and can easily be used to authenticate user login (once you know how). These advantages are so compelling I hope you will recant from your heresy and decide you can get away with a single table (and, if you planning to use standard Django logins, a relationship with the User model that comes as a central part of any Django project).
Please feel free to ask more questions as you proceed. It seems you are new to database work, and so I have tried to present an appropriate level of detail. There are many pitfalls such as this if you cannot access knowledgable advice. People on SO will help you.

This page shows how to create a model and install table to database on the fly. So, you could use type('table_with_username', (models.Model,), attrs) to create a model and use django.core.management to install it to the database.

Good way to make a SQL based activity feed faster

Need a way to improve performance on my website's SQL based Activity Feed. We are using Django on Heroku.
Right now we are using actstream, which is a Django App that implements an activity feed using Generic Foreign Keys in the Django ORM. Basically, every action has generic foreign keys to its actor and to any objects that it might be acting on, like this:
Action:
(Clay - actor) wrote a (comment - action object) on (Andrew's review of Starbucks - target)
As we've scaled, its become way too slow, which is understandable because it relies on big, expensive SQL joins.
I see at least two options:
Put a Redis layer on top of the SQL database and get activity feeds from there.
Try to circumvent the Django ORM and do all the queries in raw SQL, which I understand can improve performance.
Any one have thoughts on either of these two, or other ideas, I'd love to hear them.

You might want to look at Materialized Views. Since you're on Heroku, and that uses PostgreSQL generally, you could look at Materialized View Support for PostgreSQL. It is not as mature as for other database servers, but as far as I understand, it can be made to work. To work with the Django ORM, you would probably have to create a new "entity" (not familiar with Django here so modify as needed) for the feed, and then do queries over it as if it was a table. Manual management of the view is a consideration, so look into it carefully before you commit to it.
Hope this helps!

You said redis? Everything is better with redis.
Caching is one of the best ideas in software development, no mather if you use Materialized Views you should also consider trying to cache those, believe me your users will notice the difference.

Went with an approach that sort of combined the two suggestions.
We created a master list of every action in the database, which included all the information we needed about the actions, and stuck it in Redis. Given an action ID, we can now do a Redis look up on it and get a dictionary object that is ready to be returned to the front end.
We also created action id lists that correspond to all the different types of activity streams that are available to a user. So given a user id, we have his friends' activity, his own activity, favorite places activity, etc, available for look up. (These I guess correspond somewhat to materialized views, although they are in Redis, not in PSQL.)
So we get a user's feed as a list of action ids. Then we get the details of those actions by look ups on the ids in the master action list. Then we return the feed to the front end.
Thanks for the suggestions, guys.

Sharding a Django Project

I'm starting a Django project and need to shard multiple tables that are likely to all be of too many rows. I've looked through threads here and elsewhere, and followed the Django multi-db documentation, but am still not sure how that all stitches together. My models have relationships that would be broken by sharding, so it seems like the options are to either drop the foreign keys of forgo sharding the respective models.
For argument's sake, consider the classic Authot, Publisher and Book scenario, but throw in book copies and users that can own them. Say books and users had to be sharded. How would you approach that? A user may own a copy of a book that's not in the same database.
In general, what are the best practices you have used for routing and the sharding itself? Did you use Django database routers, manually selected a database inside commands based on your sharding logic, or overridden some parts of the ORM to achive that?
I'm using PostgreSQL on Ubuntu, if it matters.
Many thanks.

In the past I've done something similar using Postgresql Table Partitioning, however this merely splits a table up in the same DB. This is helpful in reducing table search time. This is also nice because you don't need to modify your django code much. (Make sure you perform queries with the fields you're using for constraints).
But it's not sharding.
If you haven't seen it yet, you should check out Sharding Postgres with Instagram.

I agree with #DanielRoseman. Also, how many is too many rows. If you are careful with indexing, you can handle a lot of rows with no performance problems. Keep your indexed values small (ints). I've got tables in excess of 400 million rows that produce sub-second responses even when joining with other many million row tables.
It might make more sense to break user up into multiple tables so that the user object has a core of commonly used things and then the "profile" info lives elsewhere (std Django setup). Copies would be a small table referencing books which has the bulk of the data. Considering how much ram you can put into a DB server these days, sharding before you have too seems wrong.

Choosing the right database system for a Django web application

I am currently in the planning stages of building a simple Django web app (for learning proposes). Basically, teacher’s can login, input student grades, and then the students can login and view their grades. Each class has a group number.
Here is a template of a user:
Username: Johny098
Password: Y98uj*?877!(
Name: John Doe
Gender: Male
Group: 32
Secondary: 5
Taking into consideration that I am starting with django and web developpement in general, I find confusing the number of database systems that are available to me: MySQL, CouchDB, MongoDB, SQLite, etc. And I am having a hard time deciding which database system to use for my purpose (I have no prior experience with databases).
After some research I found Couchdb (and SQLite) which seems fairly simple to pick up and fun to use, but that's just me, and that's why I need help. I know there are numerous debates on SQL vs NoSQL, but I don't really know if this will have an impact for my use of the databases. Ideally, the database system should integrate well with django and be easy enough to pick up in a couple of days.
So, coming back to the question: What database system should I use for my web app?
Any resources would also be appreciated.
Thanks,
-Raphael

For learning purposes I'd recommend SQLite: no setup, no background daemons running, everything is bundled with Python, it's SQL (so it maps well with Django ORM), and it's a very simple DBMS. In fact, some people use SQLite for prototyping and then switch to MySQL/PostgreSQL in production.
As for NoSQL, I would not recommend to use it unless you know exactly why and for what purpose you need it.
Oh, and one last thing: store password hashes (md5 or sha1), not the raw passwords. It's not necessary in your case, but in real-world apps it's mandatory.

The Django website has a page that lists the database backends supported. However, Django's database access layer is designed to work more or less the same no matter which database backend you're using. There are some differences described on the page I linked, but they shouldn't come up in normal usage if you're writing just a basic web app. So as far as effects on how you write your web app, among the choices listed it really doesn't matter.
Note that all the database backends Django supports are SQL-based, I believe. Accessing the database through Django does eliminate some of the security issues that I believe prompted the NoSQL movement... in any case, NoSQL is something you can pretty much ignore for now.
In your case, I would suggest picking SQLite, simply because it's easier to set up, and you don't want to spend time worrying about how to configure the database when you should be worrying about how to build your web app. The difference between SQLite and most other DBMSs (database management systems) is that SQLite stores each database in a regular file, and the SQLite client works directly with that file. Other DBMSs (like MySQL, PostgreSQL, Oracle, etc.) have a central location for the databases, a server to manage them, and a client that connects to the server and handles all the database access. A server-based DBMS works well for a busy web app, because it has features to handle many simultaneous requests to the database, but since you're just using this as a learning project, you don't need those features.

I would go for MySQL ... why? because among the ones you named, it is the most popular and, better than that, it will give you more credit when looking for a job (for instance, you can go to monster dot com and find how many jabs require you to know MySQL vs how many SQL-Lite).
Also, MySQL is simple and easy to learn, there are many GUI clients (some of which are Open Source) and it has pretty good documentation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.