Database Design for tracking user flow

Database Design for tracking user flow - python

I want to design a user tracking system between different models. There is a Module table which is related to sections and sections are related to sub-sections. Each sub-section can relate to different independent tables. The independent tables store different items (slides, videos, texts) which the user can go through in a linear flow just like a Coursera course. The flow will be predetermined by us but can be changed by us. So, the linear flow between the sub-sections can't be hardcoded. I also need to keep track of the user's progress through these sub-sections.
For Example:
A sub-section can point to game table where the user_info, game_score, game_completion_date_time will be stored or there might be case where the sub-section is pointing to slides table storing slide_text, slide_url and user_info.
I want to keep track of changes in those table so what should be my approach. Below i have posted an image in which i am approaching to the problem.

You might make a 'courseflows' table with the order being a sequence of comma-separated integers for ids in the proper order, then parse through that sequence when presenting sections to the user. Add whatever other information you need, and a timestamp so that any new users are assigned the most recent flow. By linking this, you can also preserve any user's flow in case you change it while users are halfway through.
I would also consider using different secondary tables to mark user completion of each module, section, and subsection instead of just one.

Related

Persistent object with Django?

So I have a site that on a per-user basis, and it is expected to query a very large database, and flip through the results. Due to the size of the number of entries returned, I run the query once (which takes some time...), store the result in a global, and let folks iterate through the results (or download them) as they want.
Of course, this isn't scalable, as the globals are shared across sessions. What is the correct way to do this in Django? I looked at session management, but I always ran into the "xyz is not serializeable on json" issue. Do I look into how I do this correctly using sessions, or is there another preferred way to do this?

If the user is flipping through the results, you probably don't want to pull back and render any more than you have to. Most SQL dialects have TOP and LIMIT clauses that will let you pull back a limited range of results, as long as your data is ordered consistently. Django's Pagination classes are a nice abstraction of this on top of Django Model classes: https://docs.djangoproject.com/en/dev/topics/pagination/
I would be careful of storing large amounts of data in user sessions, as it won't scale as your number of users grows, and user sessions can stay around for a while after the user has left the site. If you're set on this option, make sure you read about clearing the expired sessions. Django doesn't do it for you:
https://docs.djangoproject.com/en/1.7/topics/http/sessions/#clearing-the-session-store

Solr & User data

Let's assume I am developing a service that provides a user with articles. Users can favourite articles and I am using Solr to store these articles for search purposes.
However, when the user adds an article to their favourites list, I would like to be able to figure out out which articles the user has added to favourites so that I can highlight the favourite button.
I am thinking of two approaches:
Fetch articles from Solr and then loop through each article to fetch the "favourite-status" of this article for this specific user from MySQL.
Whenever a user favourites an article, add this user's ID to a multi-valued column in Solr and check whether the ID of the current user is in this column or not.
I don't know the capacity of the multivalued column... and I also don't think the second approach would be a "good practice" (saving user-related data in index).
What other options do I have, if any? Is approach 2 a correct approach?

I'd go with a modified version of the first one - it'll keep user specific data that's not going to be used for search out of the index (although if you foresee a case where you want to search for favourite'd articles, it would probably be an interesting field to have in the index) for now. For just display purposes like in this case, I'd take all the id's returned from Solr, fetch them in one SQL statement from the database and then set the UI values depending on that. It's a fast and easy solution.
If you foresee that "search only in my fav'd articles" as a use case, I would try to get that information into the index as well (or other filter applications against whether a specific user has added the field as a favourite). I'd try to avoid indexing anything more than the user id that fav'd the article in that case.
Both solutions would however work, although the latter would require more code - and the required response from Solr could grow large if a large number of users fav's an article, so I'd try to avoid having to return a set of userid's if that's the case (many fav's for a single article).

Dynamic database tables in django

I am working on a project which requires me to create a table of every user who registers on the website using the username of that user. The columns in the table are same for every user.
While researching I found this Django dynamic model fields. I am not sure how to use django-mutant to accomplish this. Also, is there any way I could do this without using any external apps?
PS : The backend that I am using is Mysql

An interesting question, which might be of wider interest.
Creating one table per user is a maintenance nightmare. You should instead define a single table to hold all users' data, and then use the database's capabilities to retrieve only those rows pertaining to the user of interest (after checking permissions if necessary, since it is not a good idea to give any user unrestricted access to another user's data without specific permissions having been set).
Adopting your proposed solution requires that you construct SQL statements containing the relevant user's table name. Successive queries to the database will mostly be different, and this will slow the work down because every SQL statement has to be “prepared” (the syntax has to be checked, the names of table and columns has to be verified, the requesting user's permission to access the named resources has to be authorized, and so on).
By using a single table (model) the same queries can be used repeatedly, with parameters used to vary specific data values (in this case the name of the user whose data is being sought). Your database work will move along faster, you will only need a single model to describe all users' data, and database management will not be a nightmare.
A further advantage is that Django (which you appear to be using) has an extensive user-based permission model, and can easily be used to authenticate user login (once you know how). These advantages are so compelling I hope you will recant from your heresy and decide you can get away with a single table (and, if you planning to use standard Django logins, a relationship with the User model that comes as a central part of any Django project).
Please feel free to ask more questions as you proceed. It seems you are new to database work, and so I have tried to present an appropriate level of detail. There are many pitfalls such as this if you cannot access knowledgable advice. People on SO will help you.

This page shows how to create a model and install table to database on the fly. So, you could use type('table_with_username', (models.Model,), attrs) to create a model and use django.core.management to install it to the database.

How to setup a 3-tier web application project

EDIT:
I have added [MVC] and [design-patterns] tags to expand the audience for this question as it is more of a generic programming question than something that has direclty to do with Python or SQLalchemy. It applies to all applications with business logic and an ORM.
The basic question is if it is better to keep business logic in separate modules, or to add it to the classes that our ORM provides:
We have a flask/sqlalchemy project for which we have to setup a structure to work in. There are two valid opinions on how to set things up, and before the project really starts taking off we would like to make our minds up on one of them.
If any of you could give us some insights on which of the two would make more sense and why, and what the advantages/disadvantages would be, it would be greatly appreciated.
My example is an HTML letter that needs to be sent in bulk and/or displayed to a single user. The letter can have sections that display an invoice and/or a list of articles for the user it is addressed to.
Method 1:
Split the code into 3 tiers - 1st tier: web interface, 2nd tier: processing of the letter, 3rd tier: the models from the ORM (sqlalchemy).
The website will call a server side method in a class in the 2nd tier, the 2nd tier will loop through the users that need to get this letter and it will have internal methods that generate the HTML and replace some generic fields in the letter, with information for the current user. It also has internal methods to generate an invoice or a list of articles to be placed in the letter.
In this method, the 3rd tier is only used for fetching data from the database and perhaps some database related logic like generating a full name from a users' first name and last name. The 2nd tier performs most of the work.
Method 2:
Split the code into the same three tiers, but only perform the loop through the collection of users in the 2nd tier.
The methods for generating HTML, invoices and lists of articles are all added as methods to the model definitions in tier 3 that the ORM provides. The 2nd tier performs the loop, but the actual functionality is enclosed in the model classes in the 3rd tier.
We concluded that both methods could work, and both have pros and cons:
Method 1:
separates business logic completely from database access
prevents that importing an ORM model also imports a lot of methods/functionality that we might not need, also keeps the code for the model classes more compact.
might be easier to use when mocking out ORM models for testing
Method 2:
seems to be in line with the way Django does things in Python
allows simple access to methods: when a model instance is present, any function it
performs can be immediately called. (in my example: when I have a letter-instance available, I can directly call a method on it that generates the HTML for that letter)
you can pass instances around, having all appropriate methods at hand.

Normally, you use the MVC pattern for this kind of stuff, but most web frameworks in python have dropped the "Controller" part for since they believe that it is an unnecessary component. In my development I have realized, that this is somewhat true: I can live without it. That would leave you with two layers: The view and the model.
The question is where to put business logic now. In a practical sense, there are two ways of doing this, at least two ways in which I am confrontet with where to put logic:
Create special internal view methods that handle logic, that might be needed in more than one view, e.g. _process_list_data
Create functions that are related to a model, but not directly tied to a single instance inside a corresponding model module, e.g. check_login.
To elaborate: I use the first one for strictly display-related methods, i.e. they are somehow concerned with processing data for displaying purposes. My above example, _process_list_data lives inside a view class (which groups methods by purpose), but could also be a normal function in a module. It recieves some parameters, e.g. the data list and somehow formats it (for example it may add additional view parameters so the template can have less logic). It then returns the data set to the original view function which can either pass it along or process it further.
The second one is used for most other logic which I like to keep out of my direct view code for easier testing. My example of check_login does this: It is a function that is not directly tied to display output as its purpose is to check the users login credentials and decide to either return a user or report a login failure (by throwing an exception, return False or returning None). However, this functionality is not directly tied to a model either, so it cannot live inside an ORM class (well it could be a staticmethod for the User object). Instead it is just a function inside a module (remember, this is Python, you should use the simplest approach available, and functions are there for something)
To sum this up: Display logic in the view, all the other stuff in the model, since most logic is somehow tied to specific models. And if it is not, create a new module or package just for logic of this kind. This could be a separate module or even a package. For example, I often create a util module/package for helper functions, that are not directly tied for any view, model or else, for example a function to format dates that is called from the template but contains so much python could it would be ugly being defined inside a template.
Now we bring this logic to your task: Processing/Creation of letters. Since I don't know exactly what processing needs to be done, I can only give general recommendations based on my assumptions.
Let's say you have some data and want to bring it into a letter. So for example you have a list of articles and a costumer who bought these articles. In that case, you already have the data. The only thing that may need to be done before passing it to the template is reformatting it in such a way that the template can easily use it. For example it may be desired to order the purchased articles, for example by the amount, the price or the article number. This is something that is independent of the model, the order is now only display related (you could have specified the order already in your database query, but let's assume you didn't). In this case, this is an operation your view would do, so your template has the data ready formatted to be displayed.
Now let's say you want to get the data to create a specifc letter, for example a list of articles the user bough over time, together with the date when they were bought and other details. This would be the model's job, e.g. create a query, fetch the data and make sure it is has all the properties required for this specifc task.
Let's say in both cases you with to retrieve a price for the product and that price is determined by a base value and some percentages based on other properties: This would make sense as a model method, as it operates on a single product or order instance. You would then pass the model to the template and call the price method inside it. But you might as well reformat it in such a way, that the call is made already in the view and the template only gets tuples or dictionaries. This would make it easier to pass the same data out as an API (see below) but it might not necessarily be the easiest/best way.
A good rule for this decision is to ask yourself If I were to provide a JSON API additionally to my standard view, how would I need to modify my code to be as DRY as possible?. If theoretical is not enough at the start, build some APIs for the templates and see where you need to change things to the API makes sense next to the views themselves. You may never use this API and so it does not need to be perfect, but it can help you figure out how to structure your code. However, as you saw above, this doesn't necessarily mean that you should do preprocessing of the data in such a way that you only return things that can be turned into JSON, instead you might want to make some JSON specifc formatting for the API view.
So I went on a little longer than I intended, but I wanted to provide some examples to you because that is what I missed when I started and found out those things via trial and error.

How to implement searching for a specified user's documents?

In my current project, users can like songs, and now I'm going to add a song search so that a user can search for some song she has liked before.
I have implemented search engines using xapian before, which involves building indexes of documents periodically.
In my case, do I have to build indexes for every user's songs independently?
If I want the search results to be more real-time, does this mean that I need to build indexes incrementally every short period of time?

To take your questions separately.
Do I have to build indexes for every user's songs independently?
No; a common technique for this kind of situation is to index each like separately with both the information about the song and additionally the identifier of the user. Then when you search, you want to filter the results of the user's natural text search by the user identifier who's actually logged in.
In Xapian you'd do this by adding a term representing the user (with a suitable prefix, so you might have XU175 for a user with id 175, perhaps), and then using OP_FILTER to restrict the search to just likes by the logged-in user.
Do I need to build indexes incrementally every short period of time [to support real-time indexing]?
This is entirely dependent on the search system you're using. With Xapian you can either do that and periodically 'compact' the databases generated into one base one; or you can index live into the database -- although since Xapian is single-writer, you'd want to find a way of serialising this, such as by putting new likes onto a queue and having a single process that pops them off and indexes into the database. One largely off-the-shelf solution to this would be to use Restpose, written by one of the Xapian developers, which fills the same sort of role as Solr does for Lucene.
You can also get fancier by indexing into one database, then replicating that to another and searching the replicated version, which also gives you options to scale horizontally in future. There's a discussion of replication in the Xapian documentation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.