django orm - check if dictionary key exists in a model field? - python

I have a series of dictionaries that I want to save if a unique id for each dictionary doesn't already exist in the database. If it does exist it would be good to check the values for each key are the same as their corresponding value in the database and update if not.
What's the best way to do this in Django?
I was thinking something along the lines of:
if Thing.object.get(unique_id=dict1['unique_key']):
thing = Thing()
thing.unique_id = dict1['unique_key']
thing.property = dict1['other_key']
thing.save()
I'm not sure how the else block should work?
(Note can the primary key autofield id be non sequential so I can store the unique id from the dict in the default Model id field without needing an additional unique id column?)

the best way to achieve this is to use the update_or_create method in a similar way to Javier's response.
thing, created = Thing.object.update_or_create(
unique_id=dict1['unique_key'],
defaults={'property': dict1['other_key']}
)

Your best bet is to do this:
from django.forms.models import modelform_factory
# Do NOT do this in your view for every request
# add this to your <app>/forms.py and import it (this is just an example)
ThingForm = modelfor_factory(Thing, exclude=[])
# end
# in your view
thing, created = Thing.object.get_or_create(
unique_id=dict1['unique_key'],
defaults={'property': dict1['other_key']}
)
# already exist
if not created:
form = ThingForm(dict1, instance=thing)
# you can also see what changed using `form.changed_data`
if form.has_changed():
form.save()
Internally it will do exactly what you're trying to do, but its easier on the eyes.
https://github.com/django/django/blob/master/django/db/models/query.py#L454

can the primary key autofield id be non sequential so I can store the unique id from the dict in the default Model id field without needing an additional unique id column?
Yes, yes, it can. I override the hash builtin to return unique identifiers for records. A simple hash function for a dictionary follows:
def __hash__(self):
hash_value = sum([hash(v) for v in self.values()])
return hash_value
I hope that helps.

Related

Assigning identifiable unique IDs to rows when importing data from XML

I'm designing a database in which I'll be importing a large amount of data from XML daily to create or update existing rows.
Item data spans dozens of tables all related to the item_id in the main item table
For every item in the XML file, I need to check if it already exists in the database and update or create if it's not there.
Every XML belongs to a source_id and every item in the XML contains a unique alphanumeric ID up to 50 chars (but those IDs are not unique across all XMLs), so source_id:xml_item_id would be unique here
What I need is a way of finding if the item already exists in the database. Ideally, I will search by pk and use the same pk to join other tables
Attempt 1
I've tried encoding source_id:xml_item_id into a bigint for the pk as well as decode the bigint back to the original source_id:xml_item_id, but most of the times this is overflowing
So this is not going to work
Attempt 2
Use a UUID for the pk and source_id:xml_item_id as unique_id (string) for which to search by, but join related tables to UUID
While I don't see anything wrong here (IMO), JOINs might be affected, and I would prefer numeric pk for use in URLs
Attempt 3
Use source_id:xml_item_id as pk (string)
Same worries as with Attempt 2
The reason I've avoided AI PKs in all attempts is that there is a high possibility to shard this data in the future and I'd like this to have a relatively low impact on how PKs are being generated when this happens
What would be the best approach to handle this?
To identify if items already exist in the database
Have a user-friendly pk for URLs
Try not to impact JOIN performance too much
You can use unique together
class Data(models.Model):
source_id = models.CharField()
xml_item_id = models.CharField()
# ... other fields
class Meta:
unique_together = ("source_id", "xml_item_id")
Then in your import function just:
scid = your_xml_source_id
xmlid = your_xml_id
obj, created = Data.objects.get_or_create(source_id=scid, xml_item_id=xmlid)
if created:
# it's the new object, populate obj with the rest of the data
obj.other_field = your_xml_other_field
else:
# it's existing object, update object with a new value
obj.other_field = new_value
obj.save()

Using the Django ORM, How can you create a unique hash for all possible combinations

I want to maintain a Django model with a unique id for every combination of choices within the model. I would then like to be able to update the model with a new field and not have the previous unique id's change.
The id's can be a hash or integer or anything.
What's the best way to achieve this?
class MyModel(models.Model):
WINDOW_MIN = 5
WINDOW_MAX = 7
WINDOW_CHOICES = [(i,i) for i in range(WINDOW_MIN - 1, WINDOW_MAX - 1)]
window = models.PositiveIntegerField('Window', editable=True, default=WINDOW_MIN, choices=WINDOW_CHOICES)
is_live = models.BooleanField('Live', editable=True, default=False)
unique_id = ....
Given the above example there will be 3 * 2 == 6 unique id's. If I add another editable boolean field, I don't want to change the previous unique id's but I want the new unique id's to be generated for the new boolean field.
The thought process behind this is the parameters in MyModel define the inputs to a function who's results are stored in another Django model MyResultModel by unique_id and the name of the model. The reasoning behind this is there are multiple variants of MyModel each with it's own set unique combination's that get updated regularly but the result set in MyResultModel is the same across MyModel1 to MyModelN. Ideally I would like the unique_id's to be autogenerated. In other words the key for the result set stored in MyResultModel is the model_name (MyModel) and a unique_id. I want to sanely manage this many (MyModel1,...MyModelN) to one (MyResultModel) relationship.
class MyResultModel(models.Model):
unique_id = ...
model_name = models.CharField(max_length=200, default='', blank=False) # Points to a Django Model ex MyModel
result1 = ...
result2 = ...
A common approach, given that all your options are boolean, categorical or small numbers, you could just pack them into a bigger bit field (https://en.wikipedia.org/wiki/Bit_field) and whenever you add a new option, make sure to push it to the most-significant part of your bit field and avoid having a value of 0 (that is, simple add 1 to whatever). Not only would every single unique_id represent a different configuration, you could even do without the model and just use bit operations to extract all your values from the bit field.
I have 2 thing:
First, if you just want to create unique hash for all the combinations
# Just set it all together like
uniq= cb1+cb2+cb3... (a string key is ok =) )
# If it is very very long string ~> we think about using hash to short it (and have same size for all value).
And next:
For your question above:
i can't understand why do you make the model into a complicate thing (like a mess) but still have to solve it:
As i read your ques: you have many variants of Models ~> want to gather it into 1 model res?
So:
it is better to set FK
map = models.ForeignKey(Model_name, null=True)
# but if you have too many model, it can't be help...
So i recomment:
~ create just 1 model, because if you have too many model, you can't even call it to get data (As you did) but when you add more FIELD, unique id ~> should change.
One good way to have special unique_id:
Use the first solution i said.
Or:
Create NEW table just to store YOUR COMBO:
New table will have ALL field and you will SET value for each (or you can write script to create it). ~> You have id for each combo ~> it same as unique id
You can create ALL THE POSSIBLE combo or just check and add when the NEW combo show up
Another possible solution is to store the parameters JsonField
https://bitbucket.org/schinckel/django-jsonfield

django query for sorting the data on a field present as foreign key

class abc(xyz):
user = models.ForeignKey(User, max_length=100)
What'll be the django query for sorting the data based on a foreign key field?
I tried with query:
abc.objects.filter(User__user=user).
abc.objects.filter(Q(user__icontains=search_query) ).
I have done this 2 queries, But dont know how to combine & make it work
I don't know how to proceed. Can someone please lend a helping hand?
The first query does not work with your model. Change it to
qs = abc.objects.filter(user=user)
Now, sorting this queryset by user (or user_id or any other user's property which would work)
qs = qs.order_by('user_id')
wouldn't make much sense as all elements in your queryset have the same user.
The second query does not work since icontains is a query that works for strings whereas user is a model instance. The following might work:
abc.objects.filter(user__username__icontains=search_query) # .order_by('user__username')
Generally, you can order by properties that have a (natural) order, like ints, floats, strings, dates, etc. Thus, you can order by sth. like user_id, user__username, user__email, user__date_joined:
abc.objects.all().order_by('user_id') # for instance
I believe you want to sort by user's username. You can do it by:
abc.objects.all().order_by('user__username')
Look at the docs for details.
Note the double underscore '__' here. It is required to separate the foreignkey's field from foreignkey by double underscore while referring foreignkey's field.

Returning Multiple QuerySets in Json Dictionaries via Ajax

I am returning my model as a dictionary from my views.py via the following code
data = serializers.serialize('json', response_dict)
return HttpResponse(data, mimetype='application/javascript')
The thing is that I have a foreign key that in the object and I want the actual value of the object that the foreign key points to, but I just get the ID. I want to try to return the corresponding objects for the ID. One way I have tried is to return a separate list with the corresponding objects for each foreign key with the following code, but it does not work.
#original dictionary that returns id values for foreign keys
data1 = serializers.serialize('json', response_dict)
#The corresponding objects from the foreign key table stored in a parallel list of equal length to response_dict
data2 = serializers.serialize('json', other_list)
data = simplejson.dumps([data1, data2])
#return json dump to template via ajax
return HttpResponse(data, mimetype='application/javascript')
How would I go about returning both the initial dictional and list with corresponding values for the foreign key? I am also open to a better method that gets me the actual object values for each foreign key
I think what you are looking for is a way to serialize relations.
Have a look at:
https://code.google.com/p/wadofstuff/wiki/DjangoFullSerializers#Relations
Also note that simplejson is deprecated as of Django 1.5:
https://code.djangoproject.com/ticket/18023#comment:10
I created a version of the Wad of stuff serializer that works with Django 1.5 a while ago:
https://github.com/kolben/wadofstuff
Eventually in my Django serializer I set use_natural_keys=True and defined the natural keys for the tables whose values I wanted to see. See the tutorial in the Django docs here:
https://docs.djangoproject.com/en/dev/topics/serialization/#natural-keys
This solution works for my specific purposes, but is limited in general. I marked sunn0's answers as the accepted answer because the 'wadofstuff' serializer in this link seems to give a more extensive and general solution.

Mongoengine integer id, and User creating

MongoDB is using string(hash) _id field instead of integer; so, how to get classic id primary key? Increment some variable each time I create my class instance?
class Post(Document):
authors_id = ListField(IntField(required=True), required=True)
content = StringField(max_length=100000, required=True)
id = IntField(required=True, primary_key=True)
def __init__(self):
//what next?
Trying to create new user raises exception:
mongoengine.queryset.OperationError: Tried to save duplicate unique keys
(E11000 duplicate key error index: test.user.$_types_1_username_1
dup key: { : "User", : "admin" })
Code:
user = User.create_user(username='admin', email='example#mail.com',
password='pass')
user.is_superuser = True
user.save()
Why?
There is the SequenceField which you could use to provide this. But as stated incrementing id's dont scale well and are they really needed? Can't you use ObjectId or a slug instead?
If you want to use an incrementing integer ID, the method to do it is described here:
http://www.mongodb.org/display/DOCS/How+to+Make+an+Auto+Incrementing+Field
This won't scale for a vary large DB/app but it works well for small or moderate application.
1) If you really want to do it you have to override the mongoengine method saving your documents, to make it look for one document with the highest value for your id and save the document using that id+1. This will create overhead (and one additional read every write), therefore I discourage you to follow this path. You could also have issues of duplicated IDs (if you save two records at the exactly same time, you'll read twice the last id - say 1 and save twice the id 1+1 = 2 => that's really bad - to avoid this issue you'd need to lock the entire collection at every insert, by losing performances).
2) Simply you can't save more than one user with the same username (as the error message is telling you) - and you already have a user called "admin".

Categories