Getting text from db as django dictionary - python

I recently joined a company which is using django to build their product. I'm currently responsible for one of the apps, which was already developed a little bit before I was here.
One of the entities in the app has a json dictionary attribute, which has been kept in the db as a text field. Also, this attribute is marked in the model as a text field. So, as you can imagine it's not being handled correctly.
I wanted to change this and set it as a json field using https://github.com/bradjasper/django-jsonfield , which works really well.
However, I've run into a peculiar problem. Previous data stored in the db was not correctly handled and since it was unicode data, the text field in the db looks like:
{u'key': u'value'}
Now when the entity manager tries to load those values using the json field, it of course breaks since it's no longer a valid json string.
I've done some research on how to overcome this, but haven't found nothing.
My question:
Do you have any suggestion on how to overcome this? It can be any type of solution.
Something that I can run over night altering that field to transform it to a valid json string.
Some changes to the json-field code, which enables it to correctly handle these values.
Additional info
We use postgres with psycopg2 as django's db backend.
Thank you very much.

You're probably just going to need to iterate over the whole table, load the field, convert it into a real Python dict, and dump it back out with json.dumps. ast.literal_eval is a good choice for the conversion stage because it works like the built-in eval but is more restricted, so less risky to your system.
for obj in MyModel.objects.all():
value = ast.literal_eval(obj.dict_value)
obj.dict_value = json.dumps(value)
value.save()

Related

Mongoengine change document structure

I'm trying for the first time to use mongo, and I choose mongoengine.
After defining the Document structure if I try to change it (adding a field, removing a field, renaming ecc..) the reading operations still works, but any other operation on previously stored document fail since they're note compliant anymore with the document structure.
Is there any way to manage this situation? should I only user Dynamic documents with Dictionaries instead of EmbeddedDocuments?
Using DynamicDocument or setting meta = {'strict': False} on your Document may help in some cases but the only proper solution to this is running a migration script.
I'd recommend doing this using pymongo but you could also do that from the mongo shell. Every time your model change in a way that is not compatible, you should run a migration on the existing data so that it fits the new model. Otherwise mongoengine will complain at some point (mongoengine contributor here)

Storing unstructured data with ramses to be searched with Ramses-API?

I would like to give my users the possibility to store unstructured data in JSON-Format, alongside the structured data, via an API generated with Ramses.
Since the data is made available via Elasticsearch, I try to achieve that this data is indexed and searchable, too.
I can't find any mentioning in the docs or searching.
Would this be possible and how would one do it?
Cheers /Carsten
I put an answer here because needed to give a several docs links and this is a new SO account limited to a couple: https://gitter.im/ramses-tech/ramses?at=56bc0c7a4dfe1fa71ffc0b61
This is Chrisses answer, copied from gitter.im:
You can use the dict field type for "unstructured data", as it takes arbitrary json. If the db engine is postgres, it uses jsonfield under the hood, and if the db engine is mongo, it's converted to a bson document as usual. Either way it should index automatically as expected in ES and will be queryable through the Ramses API.
The following ES queries are supported on documents/fields: nefertari-readthedocs-org/en/stable/making_requests.html#query-syntax-for-elasticsearch
See the docs for field types here, start at the high level (ramses) and it should "just work", but you can see what the code is mapped to at each level below down to the db if desired:
ramses: ramses-readthedocs-org/en/stable/fields.html
nefertari (underlying web framework): nefertari-readthedocs-org/en/stable/models.html#wrapper-api
nefertari-sqla (postgres-specific engine): nefertari-sqla-readthedocs-org/en/stable/fields.html
nefertari-mongodb (mongo-specific engine): nefertari-mongodb-readthedocs-org/en/stable/fields.html
Let us know how that works out, sounds like it could be a useful thing. So far we've just used that field type to hold data like user settings that the frontend wants to persist but for which the API isn't concerned.

Django: storing/querying a dictionary-like data set?

I apologize if this has been asked already, or if this is answered somewhere else.
Anyways, I'm working on a project that, in short, stores image metadata and then allows the user to search said metadata (which resembles a long list of key-value pairs). This wouldn't be too big of an issue if the metadata was standardized. However, the problem is that for any given image in the database, there is any number of key/values in its metadata. Also there is no standard list of what keys there are.
Basically, I need to find a way to store a dictionary for each model, but with arbitrary key/value pairs. And I need to be able to query them. And the organization I'm working for is planning on uploading thousands of images to this program, so it has to query reasonably fast.
I have one model in my database, an image model, with a filefield.
So, I'm in between two options, and I could really use some help from people with more experience on choosing the best one (or any other solutions that would work better)
Using a traditional relational database like MySql, and creating a separate model with a foreignkey to the image model, a key field, and a value field. Then, when I need to query the data, I'll ask for every instance of this separate table that relates to an image, and then query those rows for the key/value combination I need.
Using something like MongoDB, with django-toolbox and its DictField to store the metadata. Then, when I need to query, I'll access the dict and search it for the key/value combination I need.
While I feel like 1 would be much better in terms of query time, each image may have up to 40 key/values of metadata, and that makes me worry about that separate "dictionary" table growing far too large if there's thousands of images.
Any advice would be much appreciated. Thanks!
What's the type of metadata? Both key and value are string? I assume it's the case.
The scale of your dataset matters. If you will have up to thousands images and each image has up to 40 key-value pairs, then in option 1, the separate table would have at most 400k records. That's no problem for modern database, as long as you have not bad machine and correct DB settings. One issue to take care is to composite index fields in the table. In Django ORM, it would be something like:
class ImageMeta(models.Model):
image = models.ForeignKey('Image')
key = models.CharField(max_length=XXXX)
value = models.CharField(max_length=XXXX)
class Meta:
index_together = [ ["image", "key", "value"], ] # Django 1.5 and above
In a Django project you've got 4 alternatives for this kind of problem, in no particular order:
using PostgreSQL, you can use the hstore field type, that's basically a pickled python dictionary. It's not very helpful in terms of querying it, but does its job saving your data.
using Django-NoRel with mongodb you get the ListField field type that does the same thing and can be queried just like anything in mongo. (option 2)
using Django-eav to create an entity attribute value store with your data. Elegant solution but painfully slow queries. (option 1)
storing your data as a json string in a long enough TextField and creating your own functions to serializing and deserializing the data, without thinking on being able to make a query over it.
In my own experience, if you by any chance need to query over the data, your option two is by far the best choice. EAV in Django, without composite keys, is painful.

Django models - assign id instead of object

I apologize if my question turns out to be silly, but I'm rather new to Django, and I could not find an answer anywhere.
I have the following model:
class BlackListEntry(models.Model):
user_banned = models.ForeignKey(auth.models.User,related_name="user_banned")
user_banning = models.ForeignKey(auth.models.User,related_name="user_banning")
Now, when i try to create an object like this:
BlackListEntry.objects.create(user_banned=int(user_id),user_banning=int(banning_id))
I get a following error:
Cannot assign "1": "BlackListEntry.user_banned" must be a "User" instance.
Of course, if i replace it with something like this:
user_banned = User.objects.get(pk=user_id)
user_banning = User.objects.get(pk=banning_id)
BlackListEntry.objects.create(user_banned=user_banned,user_banning=user_banning)
everything works fine. The question is:
Does my solution hit the database to retrieve both users, and if yes, is it possible to avoid it, just passing ids?
The answer to your question is: YES.
Django will hit the database (at least) 3 times, 2 to retrieve the two User objects and a third one to commit your desired information. This will cause an absolutelly unnecessary overhead.
Just try:
BlackListEntry.objects.create(user_banned_id=int(user_id),user_banning_id=int(banning_id))
These is the default name pattern for the FK fields generated by Django ORM. This way you can set the information directly and avoid the queries.
If you wanted to query for the already saved BlackListEntry objects, you can navigate the attributes with a double underscore, like this:
BlackListEntry.objects.filter(user_banned__id=int(user_id),user_banning__id=int(banning_id))
This is how you access properties in Django querysets. with a double underscore. Then you can compare to the value of the attribute.
Though very similar, they work completely different. The first one sets an atribute directly while the second one is parsed by django, that splits it at the '__', and query the database the right way, being the second part the name of an attribute.
You can always compare user_banned and user_banning with the actual User objects, instead of their ids. But there is no use for this if you don't already have those objects with you.
Hope it helps.
I do believe that when you fetch the users, it is going to hit the db...
To avoid it, you would have to write the raw sql to do the update using method described here:
https://docs.djangoproject.com/en/dev/topics/db/sql/
If you decide to go that route keep in mind you are responsible for protecting yourself from sql injection attacks.
Another alternative would be to cache the user_banned and user_banning objects.
But in all likelihood, simply grabbing the users and creating the BlackListEntry won't cause you any noticeable performance problems. Caching or executing raw sql will only provide a small benefit. You're probably going to run into other issues before this becomes a problem.

Is it safe to pass Google App Engine Entity Keys into web pages to maintain context?

I have a simple GAE system that contains models for Account, Project and Transaction.
I am using Django to generate a web page that has a list of Projects in a table that belong to a given Account and I want to create a link to each project's details page. I am generating a link that converts the Project's key to string and includes that in the link to make it easy to lookup the Project object. This gives a link that looks like this:
My Project Name
Is it secure to create links like this? Is there a better way? It feels like a bad way to keep context.
The key string shows up in the linked page and is ugly. Is there a way to avoid showing it?
Thanks.
There is few examples, in GAE docs, that uses same approach, and also Key are using characters safe for including in URLs. So, probably, there is no problem.
BTW, I prefer to use numeric ID (obj_key.id()), when my model uses number as identifier, just because it's looks not so ugly.
Whether or not this is 'secure' depends on what you mean by that, and how you implement your app. Let's back off a bit and see exactly what's stored in a Key object. Take your key, go to shell.appspot.com, and enter the following:
db.Key(your_key)
this returns something like the following:
datastore_types.Key.from_path(u'TestKind', 1234, _app=u'shell')
As you can see, the key contains the App ID, the kind name, and the ID or name (along with the kind/id pairs of any parent entities - in this case, none). Nothing here you should be particularly concerned about concealing, so there shouldn't be any significant risk of information leakage here.
You mention as a concern that users could guess other URLs - that's certainly possible, since they could decode the key, modify the ID or name, and re-encode the key. If your security model relies on them not guessing other URLs, though, you might want to do one of a couple of things:
Reconsider your app's security model. You shouldn't rely on 'secret URLs' for any degree of real security if you can avoid it.
Use a key name, and set it to a long, random string that users will not be able to guess.
A final concern is what else users could modify. If you handle keys by passing them to db.get, the user could change the kind name, and cause you to fetch a different entity kind to that which you intended. If that entity kind happens to have similarly named fields, you might do things to the entity (such as revealing data from it) that you did not intend. You can avoid this by passing the key to YourModel.get instead, which will check the key is of the correct kind before fetching it.
All this said, though, a better approach is to pass the key ID or name around. You can extract this by calling .id() on the key object (for an ID - .name() if you're using key names), and you can reconstruct the original key with db.Key.from_path('kind_name', id) - or just fetch the entity directly with YourModel.get_by_id.
After doing some more research, I think I can now answer my own question. I wanted to know if using GAE keys or ids was inherently unsafe.
It is, in fact, unsafe without some additional code, since a user could modify URLs in the returned webpage or visit URL that they build manually. This would potentially let an authenticated user edit another user's data just by changing a key Id in a URL.
So for every resource that you allow access to, you need to ensure that the currently authenticated user has the right to be accessing it in the way they are attempting.
This involves writing extra queries for each operation, since it seems there is no built-in way to just say "Users only have access to objects that are owned by them".
I know this is an old post, but i want to clarify one thing. Sometimes you NEED to work with KEYs.
When you have an entity with a #Parent relationship, you cant get it by its ID, you need to use the whole KEY to get it back form the Datastore. In these cases you need to work with the KEY all the time if you want to retrieve your entity.
They aren't simply increasing; I only have 10 entries in my Datastore and I've already reached 7001.
As long as there is some form of protection so users can't simply guess them, there is no reason not to do it.

Categories