How to generate shareable links in django? (similar to pastebin links)

How to generate shareable links in django? (similar to pastebin links) - python

One feature I would like to add to my django app is the ability for users to create some content (without signing up / creating an account), and then generating a content-specific link that the users can share with others. Clicking on the link would take the user back to the content they created.
Basically, I'd like the behavior to be similar to sites like pastebin - where users get a pastebin link they can share with other people (example: http://pastebin.com/XjEJvSJp)
I'm not sure what the best way is to generate these types of links - does anyone have any ideas?
Thanks!

You can create these links in any way you want, as long as each link is unique. For example, take the MD5 of the content and use the first 8 characters of the hex digest.
A simple model for that could be:
class Permalink(models.Model):
key = models.CharField(primary_key = True, max_length = 8)
refersTo = models.ForeignKey(MyContentModel, unique = True)
You could also make refersTo a property that automatically assigns a unique key (as described above).
And you need a matching URL:
url("^permalink/(?P<key>[a-f0-9]{8})$",
"view.that.redirects.to.permalink.refersTo"),
You get the idea...

Usually all that is made up of is a (possibly random, possibly sequential) token, plus the content, stored in a DB and then served up on demand.

If you don't mind that your URLs will get a bit longer you can have a look at the uuid module. This should guarantee unique IDs.

Basically you just need a view that stores data and a view that shows it.
e.g. Store with:
server.com/objects/save
And then, after storing the new model, it could be reached with
server.com/objects/[id]
Where [id] is the id of the model you created when you saved.
This doesn't require users to sign in - it could work for anonymous users as well.

Related

Return track-list using musicbrainzngs.search_releases()

I'm getting acquainted with musicbrainzngs and have run into a snag. All of the track-lists which are returned from the following are empty. Are there additional parameters I need to provide or is this a bug?
releases = musicbrainzngs.search_releases(
query='arid:' + musicbrainz_arid
)

This is expected. You have three ways of retrieving data from the MusicBrainz web service (using musicbrainzngs or directly):
lookup/get info for one entity by id: lots of info for that id
browse a list of entities: possibility to get long list, medium amount of information
search for entities: powerful to find things, but not much data given
When you know an entity by id you can look it up directly. You can even add includes to get very detailed information.
When you not only want one entity, but a list (like a list of releases for one artist) you can browse. Even for these you can add includes.
And only when you don't know the id of the entity (or an attached entity) or if you want to cut down the list of entities you search.
In your case you know the artist id and want to get the list of releases. In that case you should use browse_releases and set an include for recordings:
releases = musicbrainzngs.browse_releases(artist=musicbrainz_arid,
inc=["recordings"])

Solr & User data

Let's assume I am developing a service that provides a user with articles. Users can favourite articles and I am using Solr to store these articles for search purposes.
However, when the user adds an article to their favourites list, I would like to be able to figure out out which articles the user has added to favourites so that I can highlight the favourite button.
I am thinking of two approaches:
Fetch articles from Solr and then loop through each article to fetch the "favourite-status" of this article for this specific user from MySQL.
Whenever a user favourites an article, add this user's ID to a multi-valued column in Solr and check whether the ID of the current user is in this column or not.
I don't know the capacity of the multivalued column... and I also don't think the second approach would be a "good practice" (saving user-related data in index).
What other options do I have, if any? Is approach 2 a correct approach?

I'd go with a modified version of the first one - it'll keep user specific data that's not going to be used for search out of the index (although if you foresee a case where you want to search for favourite'd articles, it would probably be an interesting field to have in the index) for now. For just display purposes like in this case, I'd take all the id's returned from Solr, fetch them in one SQL statement from the database and then set the UI values depending on that. It's a fast and easy solution.
If you foresee that "search only in my fav'd articles" as a use case, I would try to get that information into the index as well (or other filter applications against whether a specific user has added the field as a favourite). I'd try to avoid indexing anything more than the user id that fav'd the article in that case.
Both solutions would however work, although the latter would require more code - and the required response from Solr could grow large if a large number of users fav's an article, so I'd try to avoid having to return a set of userid's if that's the case (many fav's for a single article).

CouchDB - How to create dynamic design docs based on end user input

I have been reading up more on CouchDB and really like it (master:master replication is one of the primary reasons I want to use it).
However, I have a query to ask of you guys... I cam from php, and used the Drupal CMS fairly often. One of my favorite (probably of the drupal community as a whole) was the 'Views' plugin written by MerlinOfChaos. The idea is that an admin can use the views ui system, to create a dynamic stream of content from the database. This content could be from any content type (blog posts, articles, users, image, et. al.) and could be filtered, ordered, arranged in grids, and so on. One simple example is creating a source of content for a animating slider. Where the admin could go in at any time and change what is shown in there. Though typically I would set it up as the most 5 recent of content type X.
So with something like mongo, I could kinda see how to could do this. A fairly advanced parser that would then convert what the admin wants into a db query. Since mongo is all based on dynamic querying, it is very doable. However, I want to use couch.
I have seen that I can create a view that takes a parameter and will return results based on that (such as a parameter of the 5 article id's you want displayed). But what if I want to be able to build something more advanced from the UI? would I just add more parameters? For example, say the created view selects all documents with the value 'contentType' = 'post' and the argument is the id/page title. But what if I want the end user to also be able to choose the content type that the view queries against. Or the 5 most recent articles as long as the content type is one of 3 different values?
Another thing this makes me think of, is once a view like this is created and saved to the db, and called for the first time, it spends the time to build the results. Would you do this on a production/live system?
Part of the idea is that I want an end user to be able to create a custom feed of content on their profile page based on articles and posts on the site. and to be able to filter them and make their own categories, so to speak and label them. Such as their 'tech' feed, and their 'food' feed.
I am still new to couch and still have reading to do. But this is something that was buggins me and I am trying to wrap my head around it. Since the product I have in mind is going to be heavily dynamic based on the end users input.
The application itself will be written in python

In a nutshell, you would need to emit something like this in the view:
emit([doc.contentType, doc.addDate], doc); // emit the entire doc,
// add date is timestamp (assuming)
or
emit([doc.contentType, doc.addDate], null); // use with include_docs=true
Then, when you need to fetch the listing:
startkey=["post",0]&endkey=["post",999999999]&limit=5&descending=true
Explain:
startkey = ["post",0] = contentType is post, and addDate >= 0
endkey = ["post",9999999999] = contentType is post, and addDate <= 9999999999
limit = 5, limit to five posts
descending = true = sort descending, which is sort by adddDate descending
To overcome the drawback of updating views on live db,
you can also create a new design(view) doc.
So, at least your existing code and view won't be affected.
Only after your new view is created,
you deploy the latest code to switch to this new view,
and you can retire the older view.

MongoDB: Embedded users into comments

I cant find "best" solution for very simple problem(or not very)
Have classical set of data: posts that attached to users, comments that attached to post and to user.
Now i can't decide how to build scheme/classes
On way is to store user_id inside comments and inside.
But what happens when i have 200 comments on page?
Or when i have N posts on page?
I mean it should be 200 additional requests to database to display user info(such as name,avatar)
Another solution is to embed user data into each comment and each post.
But first -> it is huge overhead, second -> model system is getting corrupted(using mongoalchemy), third-> user can change his info(like avatar). And what then? As i understand update operation on huge collections of comments or posts is not simple operation...
What would you suggest? Is 200 requests per page to mongodb is OK(must aim for performance)?
Or may be I am just missing something...

You can avoid the N+1-problem of hundreds of requests using $in-queries. Consider this:
Post {
PosterId: ObjectId
Text: string
Comments: [ObjectId, ObjectId, ...] // option 1
}
Comment {
PostId: ObjectId // option 2 (better)
Created: dateTime,
AuthorName: string,
AuthorId: ObjectId,
Text: string
}
Now you can find the posts comments with an $in query, and you can also easily find all comments made by a specific author.
Of course, you could also store the comments as an embedded array in post, and perform an $in query on the user information when you fetch the comments. That way, you don't need to de-normalize user names and still don't need hundreds of queries.
If you choose to denormalize the user names, you will have to update all comments ever made by that user when a user changes e.g. his name. On the other hand, if such operations don't occur very often, it shouldn't be a big deal. Or maybe it's even better to store the name the user had when he made the comment, depending your requirements.
A general problem with embedding is that different writers will write to the same object, so you will have to use the atomic modifiers (such as $push). This is sometimes harder to use with mappers (I don't know mongoalchemy though), and generally less flexible.

What I would do with mongodb would be to embed the user id into the comments (which are part of the structure of the "post" document).
Three simple hints for better performances:
1) Make sure to ensure an index on the user_id
2) Use comment pagination method to avoid querying 200 times the database
3) Caching is your friend

You could cache your user objects so you don't have to query the database each time.
I like the idea of embedding user data into each post but then you have to think about what happens when a user's profile is updated? have to make sure that no post is missed.
I would recommend starting out just by skimming how mongo recommends you handle schemas.
Generally, for "contains" relationships between entities,
embedding should be be chosen. Use linking when not using linking would result in
duplication of data.
http://www.mongodb.org/display/DOCS/Schema+Design

There's a pretty good use case from the MongoDB docs: http://docs.mongodb.org/manual/use-cases/storing-comments/
Conveniently it's also written in Python :-)

Is it safe to pass Google App Engine Entity Keys into web pages to maintain context?

I have a simple GAE system that contains models for Account, Project and Transaction.
I am using Django to generate a web page that has a list of Projects in a table that belong to a given Account and I want to create a link to each project's details page. I am generating a link that converts the Project's key to string and includes that in the link to make it easy to lookup the Project object. This gives a link that looks like this:
My Project Name
Is it secure to create links like this? Is there a better way? It feels like a bad way to keep context.
The key string shows up in the linked page and is ugly. Is there a way to avoid showing it?
Thanks.

There is few examples, in GAE docs, that uses same approach, and also Key are using characters safe for including in URLs. So, probably, there is no problem.
BTW, I prefer to use numeric ID (obj_key.id()), when my model uses number as identifier, just because it's looks not so ugly.

Whether or not this is 'secure' depends on what you mean by that, and how you implement your app. Let's back off a bit and see exactly what's stored in a Key object. Take your key, go to shell.appspot.com, and enter the following:
db.Key(your_key)
this returns something like the following:
datastore_types.Key.from_path(u'TestKind', 1234, _app=u'shell')
As you can see, the key contains the App ID, the kind name, and the ID or name (along with the kind/id pairs of any parent entities - in this case, none). Nothing here you should be particularly concerned about concealing, so there shouldn't be any significant risk of information leakage here.
You mention as a concern that users could guess other URLs - that's certainly possible, since they could decode the key, modify the ID or name, and re-encode the key. If your security model relies on them not guessing other URLs, though, you might want to do one of a couple of things:
Reconsider your app's security model. You shouldn't rely on 'secret URLs' for any degree of real security if you can avoid it.
Use a key name, and set it to a long, random string that users will not be able to guess.
A final concern is what else users could modify. If you handle keys by passing them to db.get, the user could change the kind name, and cause you to fetch a different entity kind to that which you intended. If that entity kind happens to have similarly named fields, you might do things to the entity (such as revealing data from it) that you did not intend. You can avoid this by passing the key to YourModel.get instead, which will check the key is of the correct kind before fetching it.
All this said, though, a better approach is to pass the key ID or name around. You can extract this by calling .id() on the key object (for an ID - .name() if you're using key names), and you can reconstruct the original key with db.Key.from_path('kind_name', id) - or just fetch the entity directly with YourModel.get_by_id.

After doing some more research, I think I can now answer my own question. I wanted to know if using GAE keys or ids was inherently unsafe.
It is, in fact, unsafe without some additional code, since a user could modify URLs in the returned webpage or visit URL that they build manually. This would potentially let an authenticated user edit another user's data just by changing a key Id in a URL.
So for every resource that you allow access to, you need to ensure that the currently authenticated user has the right to be accessing it in the way they are attempting.
This involves writing extra queries for each operation, since it seems there is no built-in way to just say "Users only have access to objects that are owned by them".

I know this is an old post, but i want to clarify one thing. Sometimes you NEED to work with KEYs.
When you have an entity with a #Parent relationship, you cant get it by its ID, you need to use the whole KEY to get it back form the Datastore. In these cases you need to work with the KEY all the time if you want to retrieve your entity.

They aren't simply increasing; I only have 10 entries in my Datastore and I've already reached 7001.
As long as there is some form of protection so users can't simply guess them, there is no reason not to do it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.