I have been reading up more on CouchDB and really like it (master:master replication is one of the primary reasons I want to use it).
However, I have a query to ask of you guys... I cam from php, and used the Drupal CMS fairly often. One of my favorite (probably of the drupal community as a whole) was the 'Views' plugin written by MerlinOfChaos. The idea is that an admin can use the views ui system, to create a dynamic stream of content from the database. This content could be from any content type (blog posts, articles, users, image, et. al.) and could be filtered, ordered, arranged in grids, and so on. One simple example is creating a source of content for a animating slider. Where the admin could go in at any time and change what is shown in there. Though typically I would set it up as the most 5 recent of content type X.
So with something like mongo, I could kinda see how to could do this. A fairly advanced parser that would then convert what the admin wants into a db query. Since mongo is all based on dynamic querying, it is very doable. However, I want to use couch.
I have seen that I can create a view that takes a parameter and will return results based on that (such as a parameter of the 5 article id's you want displayed). But what if I want to be able to build something more advanced from the UI? would I just add more parameters? For example, say the created view selects all documents with the value 'contentType' = 'post' and the argument is the id/page title. But what if I want the end user to also be able to choose the content type that the view queries against. Or the 5 most recent articles as long as the content type is one of 3 different values?
Another thing this makes me think of, is once a view like this is created and saved to the db, and called for the first time, it spends the time to build the results. Would you do this on a production/live system?
Part of the idea is that I want an end user to be able to create a custom feed of content on their profile page based on articles and posts on the site. and to be able to filter them and make their own categories, so to speak and label them. Such as their 'tech' feed, and their 'food' feed.
I am still new to couch and still have reading to do. But this is something that was buggins me and I am trying to wrap my head around it. Since the product I have in mind is going to be heavily dynamic based on the end users input.
The application itself will be written in python
In a nutshell, you would need to emit something like this in the view:
emit([doc.contentType, doc.addDate], doc); // emit the entire doc,
// add date is timestamp (assuming)
or
emit([doc.contentType, doc.addDate], null); // use with include_docs=true
Then, when you need to fetch the listing:
startkey=["post",0]&endkey=["post",999999999]&limit=5&descending=true
Explain:
startkey = ["post",0] = contentType is post, and addDate >= 0
endkey = ["post",9999999999] = contentType is post, and addDate <= 9999999999
limit = 5, limit to five posts
descending = true = sort descending, which is sort by adddDate descending
To overcome the drawback of updating views on live db,
you can also create a new design(view) doc.
So, at least your existing code and view won't be affected.
Only after your new view is created,
you deploy the latest code to switch to this new view,
and you can retire the older view.
Related
I've been trying to build a Tutorial system that we usually see on websites. Like the ones we click next -> next -> previous etc to read.
All Posts are stored in a table(model) called Post. Basically like a pool of post objects.
Post.objects.all() will return all the posts.
Now there's another Table(model)
called Tutorial That will store the following,
class Tutorial(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
tutorial_heading = models.CharField(max_length=100)
tutorial_summary = models.CharField(max_length=300)
series = models.CharField(max_length=40) # <---- Here [10,11,12]
...
Here entries in this series field are post_ids stored as a string representation of a list.
example: series will have [10,11,12] where 10, 11 and 12 are post_id that correspond to their respective entries in the Post table.
So my table entry for Tutorial model looks like this.
id heading summary series
"5" "Series 3 Tutorial" "lorem on ullt consequat." "[12, 13, 14]"
So I just read the series field and get all the Posts with the ids in this list then display them using pagination in Django.
Now, I've read from several stackoverflow posts that having multiple entries in a single field is a bad idea. And having this relationship to span over multiple tables as a mapping is a better option.
What I want to have is the ability to insert new posts into this series anywhere I want. Maybe in the front or middle. This can be easily accomplished by treating this series as a list and inserting as I please. Altering "[14,12,13]" will reorder the posts that are being displayed.
My question is, Is this way of storing multiple values in field for my usecase is okay. Or will it take a performance hit Or generally a bad idea. If no then is there a way where I can preserve or alter order by spanning the relationship by using another table or there is an entirely better way to accomplish this in Django or MYSQL.
Here entries in this series field are post_ids stored as a string representation of a list.
(...)
So I just read the series field and get all the Posts with the ids in this list then display them using pagination in Django.
DON'T DO THIS !!!
You are working with a relational database. There is one proper way to model relationships between entities in a relational database, which is to use foreign keys. In your case, depending on whether a post can belong only to a single tutorial ("one to many" relationship) or to many tutorials at the same time ("many to many" relationship, you'll want either to had to post a foreign key on tutorial, or to use an intermediate "post_tutorials" table with foreign keys on both post and tutorials.
Your solution doesn't allow the database to do it's job properly. It cannot enforce integrity constraints (what if you delete a post that's referenced by a tutorial ?), it cannot optimize read access (with proper schema the database can retrieve a tutorial and all it's posts in a single query) , it cannot follow reverse relationships (given a post, access the tutorial(s) it belongs to) etc. And it requires an external program (python code) to interact with your data, while with proper modeling you just need standard SQL.
Finally - but this is django-specific - using proper schema works better with the admin features, and with django rest framework if you intend to build a rest API.
wrt/ the ordering problem, it's a long known (and solved) issue, you just need to add an "order" field (small int should be enough). There are a couple 3rd part django apps that add support for this to both your models and the admin so it's almost plug and play.
IOW, there are absolutely no good reason to denormalize your schema this way and only good reasons to use proper relational modeling. FWIW I once had to work on a project based on some obscure (and hopefully long dead) PHP cms that had the brillant idea to use your "serialized lists" anti-pattern, and I can tell you it was both a disaster wrt/ performances and a complete nightmare to maintain. So do yourself and the world a favour: don't try to be creative, follow well-known and established best practices instead, and your life will be much happier. My 2 cents...
I can think of two approaches:
Approach One: Linked List
One way is using linked list like this:
class Tutorial(models.Model):
...
previous = models.OneToOneField('self', null=True, blank=True, related_name="next")
In this approach, you can access the previous Post of the series like this:
for tutorial in Tutorial.objects.filter(previous__isnull=True):
print(tutorial)
while(tutorial.next_post):
print(tutorial.next)
tutorial = tutorial.next
This is kind of complicated approach, for example whenever you want to add a new tutorial in middle of a linked-list, you need to change in two places. Like:
post = Tutorial.object.first()
next_post = post.next
new = Tutorial.objects.create(...)
post.next=new
post.save()
new.next = next_post
new.save()
But there is a huge benefit in this approach, you don't have to create a new table for creating series. Also, there is possibility that the order in tutorials will not be modified frequently, which means you don't need to take too much hassle.
Approach Two: Create a new Model
You can simply create a new model and FK to Tutorial, like this:
class Series(models.Model):
name = models.CharField(max_length=255)
class Tutorial(models.Model):
..
series = models.ForeignKey(Series, null=True, blank=True, related_name='tutorials')
order = models.IntegerField(default=0)
class Meta:
unique_together=('series', 'order') # it will make sure that duplicate order for same series does not happen
Then you can access tutorials in series by:
series = Series.object.first()
series.tutorials.all().order_by('tutorials__order')
Advantage of this approach is its much more flexible to access Tutorials through series, but there will be an extra table created for this, and one extra field as well to maintain order.
Let's assume I am developing a service that provides a user with articles. Users can favourite articles and I am using Solr to store these articles for search purposes.
However, when the user adds an article to their favourites list, I would like to be able to figure out out which articles the user has added to favourites so that I can highlight the favourite button.
I am thinking of two approaches:
Fetch articles from Solr and then loop through each article to fetch the "favourite-status" of this article for this specific user from MySQL.
Whenever a user favourites an article, add this user's ID to a multi-valued column in Solr and check whether the ID of the current user is in this column or not.
I don't know the capacity of the multivalued column... and I also don't think the second approach would be a "good practice" (saving user-related data in index).
What other options do I have, if any? Is approach 2 a correct approach?
I'd go with a modified version of the first one - it'll keep user specific data that's not going to be used for search out of the index (although if you foresee a case where you want to search for favourite'd articles, it would probably be an interesting field to have in the index) for now. For just display purposes like in this case, I'd take all the id's returned from Solr, fetch them in one SQL statement from the database and then set the UI values depending on that. It's a fast and easy solution.
If you foresee that "search only in my fav'd articles" as a use case, I would try to get that information into the index as well (or other filter applications against whether a specific user has added the field as a favourite). I'd try to avoid indexing anything more than the user id that fav'd the article in that case.
Both solutions would however work, although the latter would require more code - and the required response from Solr could grow large if a large number of users fav's an article, so I'd try to avoid having to return a set of userid's if that's the case (many fav's for a single article).
EDIT:
I have added [MVC] and [design-patterns] tags to expand the audience for this question as it is more of a generic programming question than something that has direclty to do with Python or SQLalchemy. It applies to all applications with business logic and an ORM.
The basic question is if it is better to keep business logic in separate modules, or to add it to the classes that our ORM provides:
We have a flask/sqlalchemy project for which we have to setup a structure to work in. There are two valid opinions on how to set things up, and before the project really starts taking off we would like to make our minds up on one of them.
If any of you could give us some insights on which of the two would make more sense and why, and what the advantages/disadvantages would be, it would be greatly appreciated.
My example is an HTML letter that needs to be sent in bulk and/or displayed to a single user. The letter can have sections that display an invoice and/or a list of articles for the user it is addressed to.
Method 1:
Split the code into 3 tiers - 1st tier: web interface, 2nd tier: processing of the letter, 3rd tier: the models from the ORM (sqlalchemy).
The website will call a server side method in a class in the 2nd tier, the 2nd tier will loop through the users that need to get this letter and it will have internal methods that generate the HTML and replace some generic fields in the letter, with information for the current user. It also has internal methods to generate an invoice or a list of articles to be placed in the letter.
In this method, the 3rd tier is only used for fetching data from the database and perhaps some database related logic like generating a full name from a users' first name and last name. The 2nd tier performs most of the work.
Method 2:
Split the code into the same three tiers, but only perform the loop through the collection of users in the 2nd tier.
The methods for generating HTML, invoices and lists of articles are all added as methods to the model definitions in tier 3 that the ORM provides. The 2nd tier performs the loop, but the actual functionality is enclosed in the model classes in the 3rd tier.
We concluded that both methods could work, and both have pros and cons:
Method 1:
separates business logic completely from database access
prevents that importing an ORM model also imports a lot of methods/functionality that we might not need, also keeps the code for the model classes more compact.
might be easier to use when mocking out ORM models for testing
Method 2:
seems to be in line with the way Django does things in Python
allows simple access to methods: when a model instance is present, any function it
performs can be immediately called. (in my example: when I have a letter-instance available, I can directly call a method on it that generates the HTML for that letter)
you can pass instances around, having all appropriate methods at hand.
Normally, you use the MVC pattern for this kind of stuff, but most web frameworks in python have dropped the "Controller" part for since they believe that it is an unnecessary component. In my development I have realized, that this is somewhat true: I can live without it. That would leave you with two layers: The view and the model.
The question is where to put business logic now. In a practical sense, there are two ways of doing this, at least two ways in which I am confrontet with where to put logic:
Create special internal view methods that handle logic, that might be needed in more than one view, e.g. _process_list_data
Create functions that are related to a model, but not directly tied to a single instance inside a corresponding model module, e.g. check_login.
To elaborate: I use the first one for strictly display-related methods, i.e. they are somehow concerned with processing data for displaying purposes. My above example, _process_list_data lives inside a view class (which groups methods by purpose), but could also be a normal function in a module. It recieves some parameters, e.g. the data list and somehow formats it (for example it may add additional view parameters so the template can have less logic). It then returns the data set to the original view function which can either pass it along or process it further.
The second one is used for most other logic which I like to keep out of my direct view code for easier testing. My example of check_login does this: It is a function that is not directly tied to display output as its purpose is to check the users login credentials and decide to either return a user or report a login failure (by throwing an exception, return False or returning None). However, this functionality is not directly tied to a model either, so it cannot live inside an ORM class (well it could be a staticmethod for the User object). Instead it is just a function inside a module (remember, this is Python, you should use the simplest approach available, and functions are there for something)
To sum this up: Display logic in the view, all the other stuff in the model, since most logic is somehow tied to specific models. And if it is not, create a new module or package just for logic of this kind. This could be a separate module or even a package. For example, I often create a util module/package for helper functions, that are not directly tied for any view, model or else, for example a function to format dates that is called from the template but contains so much python could it would be ugly being defined inside a template.
Now we bring this logic to your task: Processing/Creation of letters. Since I don't know exactly what processing needs to be done, I can only give general recommendations based on my assumptions.
Let's say you have some data and want to bring it into a letter. So for example you have a list of articles and a costumer who bought these articles. In that case, you already have the data. The only thing that may need to be done before passing it to the template is reformatting it in such a way that the template can easily use it. For example it may be desired to order the purchased articles, for example by the amount, the price or the article number. This is something that is independent of the model, the order is now only display related (you could have specified the order already in your database query, but let's assume you didn't). In this case, this is an operation your view would do, so your template has the data ready formatted to be displayed.
Now let's say you want to get the data to create a specifc letter, for example a list of articles the user bough over time, together with the date when they were bought and other details. This would be the model's job, e.g. create a query, fetch the data and make sure it is has all the properties required for this specifc task.
Let's say in both cases you with to retrieve a price for the product and that price is determined by a base value and some percentages based on other properties: This would make sense as a model method, as it operates on a single product or order instance. You would then pass the model to the template and call the price method inside it. But you might as well reformat it in such a way, that the call is made already in the view and the template only gets tuples or dictionaries. This would make it easier to pass the same data out as an API (see below) but it might not necessarily be the easiest/best way.
A good rule for this decision is to ask yourself If I were to provide a JSON API additionally to my standard view, how would I need to modify my code to be as DRY as possible?. If theoretical is not enough at the start, build some APIs for the templates and see where you need to change things to the API makes sense next to the views themselves. You may never use this API and so it does not need to be perfect, but it can help you figure out how to structure your code. However, as you saw above, this doesn't necessarily mean that you should do preprocessing of the data in such a way that you only return things that can be turned into JSON, instead you might want to make some JSON specifc formatting for the API view.
So I went on a little longer than I intended, but I wanted to provide some examples to you because that is what I missed when I started and found out those things via trial and error.
I want to copy the datas from one database to another in Postgres. I wrote a script in django and was able to grab a datas from one specific table but how can i add that data in other database.New database has same table and column name, i want to save that old database files to new database.
This might be easy for some of you guys but i really couldnt figure that out.
I'm not familiar with either API but if the rows/columns have the same dimmensions you could do something like (and this is partially pseudocode):
for x in range(height):
for y in range(width):
data = call_data_from_database_A(x, y)
new_entry = enter_data_into_database_B(x, y)
Where the Call_data is you're getting data from that specific row/column, and enter_data enters the data into that specific row/column. I'm not familiar with either API but if you find the two I'm sure you could figure it out rather quickly.
Instead of writing your own import and export code, why not use the native capabilities of Postgres and dump the table from your old database then import it into your new one:
http://www.postgresql.org/docs/current/static/sql-copy.html
The simplest way to do this with Django (move one Django database to another, defined with a different model django database) is to write a 2 Django views and one jquery html page.
The first view will be in the original Django app. It will essentially create a json object model of the database and push it out on a get request. This is custom to your Django's models.
The second view will be in the new Django app. This will take in json data and format it to match your current Django database (fields might not match up exactly, hence the reason for doing this migration). You then just add elements into the new database just as you were creating a new Django model entry(example).
I personally use a one off jquery html page that gets the json data from the first view and posts it to the second one. You could exclude this piece and just write it all in python in the second view, but I find doing it this way to be much cleaner.
One feature I would like to add to my django app is the ability for users to create some content (without signing up / creating an account), and then generating a content-specific link that the users can share with others. Clicking on the link would take the user back to the content they created.
Basically, I'd like the behavior to be similar to sites like pastebin - where users get a pastebin link they can share with other people (example: http://pastebin.com/XjEJvSJp)
I'm not sure what the best way is to generate these types of links - does anyone have any ideas?
Thanks!
You can create these links in any way you want, as long as each link is unique. For example, take the MD5 of the content and use the first 8 characters of the hex digest.
A simple model for that could be:
class Permalink(models.Model):
key = models.CharField(primary_key = True, max_length = 8)
refersTo = models.ForeignKey(MyContentModel, unique = True)
You could also make refersTo a property that automatically assigns a unique key (as described above).
And you need a matching URL:
url("^permalink/(?P<key>[a-f0-9]{8})$",
"view.that.redirects.to.permalink.refersTo"),
You get the idea...
Usually all that is made up of is a (possibly random, possibly sequential) token, plus the content, stored in a DB and then served up on demand.
If you don't mind that your URLs will get a bit longer you can have a look at the uuid module. This should guarantee unique IDs.
Basically you just need a view that stores data and a view that shows it.
e.g. Store with:
server.com/objects/save
And then, after storing the new model, it could be reached with
server.com/objects/[id]
Where [id] is the id of the model you created when you saved.
This doesn't require users to sign in - it could work for anonymous users as well.