Django Selective Dumpdata

Django Selective Dumpdata - python

Is it possible to selectively filter which records Django's dumpdata management command outputs? I have a few models, each with millions of rows, and I only want to dump records in one model fitting a specific criteria, as well as all foreign-key linked records referencing any of those records.
Consider this use-case. Say I had a production database where my User model has millions of records. I have several other models (Log, Transaction, Purchase, Bookmarks, etc) all referencing the User model. I want to do development on my Django app, and I want to test using realistic data. However, my production database is so enormous, I can't realistically take a snapshot of the entire thing and load it locally. So ideally, I'd want to use dumpdata to dump 50 random User records, and all related records to JSON, and use that to populate a development database.
Is there an easy way to accomplish this?

I think django-fixture-magic might be worth a look at.
You'll find some additional background info in Scrubbing your Django database.

This snippet might be helpful for you (it follows relationships and serializes them):
http://djangosnippets.org/snippets/918/
You could use also that management command and override the default managers for whichever models you would like to return custom querysets.

This isn't a simple answer to my question, but I found some interesting docs on Django's built-in natural keys feature, which would allow representing serialized records without the primary key. Unfortunately, it doesn't look like this is fully integrated into dumpdata, and there's an old outstanding ticket to fully rely on natural keys.
It also seems the serializers.serialize() function allows serialization of an arbitrary list of specific model instances.
Presumably, if I implemented a natural_key() method on all my models, and then called serializers.serialize([Users.objects.filter(criteria)]), it should come close to accomplishing what I want. I might have to write a function to crawl all the FK references, and include those in the list of objects passed to serialize().

This is a very old question, but I recently wrote a custom management command to do just that. It looks very similar to the existing dumpdata command except that it takes some extra arguments to define how I want to filter the querysets and it overrides the get_objects function to perform the actual filtering:
def get_objects(dump_attributes, dump_values):
qs_1 = ModelClass1.objects.filter(**options["filter_options_for_model_class_1"])
qs_2 = ModelClass2.objects.filter(**options["filter_options_for_model_class_2"])
# ...repeat for as many different model classes you want to dump...
yield from chain(qs_1, qs_2, ...)

I had the same problem but i didn't want to add another package and the snippet still didn't let me to filter my data and i just want a temporary solution
So i thought with my self why not override the default manager apply my filter there, take the dump and then revert my code back. This is of course too hacky and dangerous but in my case made sense.
Yes I had to vim code on live server but you don't need to reload the server since running command through manage.py would run your current code base so the server from the end-user perspective basically remained on-touched.
from django.db.models import Manager
class DahlBookManager(Manager):
def get_queryset(self):
return super().get_queryset().filter(is_edited=False)
class FriendshipQuestion(models.Model):
objects = DahlBookManager()
and then running the dumpdata command did exactly what i needed which was returning all the unedited questions in my case.
Then I git checkout mymodelfile.py to revert it back to the original.
This by no mean is a good solution but it will get somebody either fired or unstuck.

As of Django 3.2, you can use dumpdata to dump a specific app and/or model. For example, for an app named customer:
python manage.py dumpdata customer
or, to dump a model named shoppingcart within the customer app:
python manage.py dumpdata customer.shoppingcart
There are many options with dumpdata, including writing to several output file formats and handling custom managers on models. For example:
python manage.py dumpdata customer --all --indent 4 --output my_fixtures.json
The options:
--all: dumps the records even if you use a custom manager on the model
--indent : amount to indent when writing to file
--output : Send output to a file instead of stdout. Default format is JSON.
See the docs at:
https://docs.djangoproject.com/en/3.2/ref/django-admin/#dumpdata

Related

Django not using databases from settings.py

I'm a few weeks into Python/Django and encountering an annoying problem. I have some existing databases set up in settings.py, everything looks good, I've even accessed the databases using connections[].cursor()
But the databases (and data) are not making their way into models that I want to use, despite doing the makemigrations and migrate commands. I was able to use py manage.py inspectdb --database-dbname and copied that class information manually into my models.py, but that didn't work either (typing py manage.py inspectdb on its own does not pull up these databases, I was only able to view by that --database extension). So I'm stumped, as it seems I'm doing all the right steps but not able to use these existing databases in Django.
Any other hints and steps I can take are welcome!

(Almost) all the tutorials, examples, and third-party app you'll find on the internet, and most of the Django documentation assume you use one database for your app. That's because it's fairly tricky and unusual to use multiple databases in one app.
But it's not impossible to use multiple databases and the documentation contains instructions on how to do this and what changes you'll need to make to make it work.
IMO, these are the pre-conditions to use multiple databases in one project:
The databases contain explicitly unrelated information, i.e. you won't have SQL relationships between tables in different databases. One database may contain a table with a column that maps to a column in a table in another database, but they aren't explicit (no ForeignKey or ManyToManyField in your models).
You don't need to mix databases in one query: This basically derives from the previous condition. It just means that if you need to get objects from one database that depend on the rows coming from another database, you establish the relationship in python. E.g. fetching as list of names from one database and using that list to filter a queryset on the other database.
For example, if you have an existing database that contains Strava routes (which are regularly updated via some external mechanism) and your app is a broader app that helps users getting to know their neighbourhood where they can recommend locations and things to do, being able to offer a list of routes with a starting point nearby might be something you'd want to show.
Now that you know this, the way to go is described in the doc linked above:
Create a database router so that queries for certain models are automatically routed to the correct database. E.g. Route.objects.filter(start_city=city) would automatically fetch routes from your Strava routes database.
If you need to save information about a route in your app, save it in a model in the default database and use a unique identifier of the route that will map to the strava database. Use separate queries (no relationships) to fetch information about a specific route.
That being said, if the Strava database is not regularly updated via 3rd channels and its purpose is just to pre-populate your default database, then export the data from the Strava database as json and import it into your django db using manage.py loaddata or a migration file, the latter being more flexible as to the structure of the json file.

how to do migrations dynamically in django?

Is there any way that we can create an create an input field dynamically in the form without doing manually work, like first create the particular field in model and then run makemigartions command and then run migrate command.
I have tried using formset but that is not what i am looking for.
refer to vtiger demo
username - admin
password - admin
when you open this link there is a option ADD CUSTOM FIELD. i want to do same with my django. Hope i am able to explain you what i wants to do. I am searching for this since 3 days and cannot able to implement that.

You DO NOT (I repeat: "you DO NOT") want to "dynamically add fields" to a model (that is, to your database schema). You want your database schema to be stable, known, and totally under version control. If you don't get why, just ask yourself how your code could use a field that it's not even aware of (and that's only one of the oh so many reasons not to do such a thing).
"Features" like the one you mention are built using a fixed schema that is used to describe a "meta schema", where each "custom field" is actually a record in a "custom_fields" table, and then you usually have yet another table to store the matching values. This doesn't come without a lot of code complexity and a huge impact on performances both at the code AND database level.
If this is a project requirement, you now at least have a first idea of how this is to be done. But if your point is just to avoid having to write code and run migrations, then well, you really want to think twice about it...

Django models with external DBs

I have a typical Django project with one primary database where I keep all the data I need.
Suppose there is another DB somewhere with some additional information. That DB isn't directly related to my Django project so let's assume I do not even have a control under it.
The problem is that I do ont know if I need to create and maintain a model for this external DB so I could use Django's ORM. Or maybe the best solution is to use raw SQL to fetch data from external DB and then use this ifo to filter data from primary DB using ORM, or directly in views.
The solution with creating a model seems to be quite ok but the fact that DB isn't a part of my project means I am not aware of possible schema changes and looks like it's a bad practice then.
So in the end if I have some external resources like DBs that are not related to but needed for my project should I:
Try to create django models for them
Use raw SQL to get info from external DB and then use it for filtering data from the primary DB with ORM as well as using data directly in views if needed
Use raw SQL both for a primary and an external DB where they intersect in app's logic

An alternative is to use SQLAlchemy for the external database. It can use reflection to generate the SQLAlchemy-equivalent of django models during runtime.
It still won't be without issues. If your code depends on a certain column, it would still break if that column is removed or changed in an incompatible way. However, it will add a bit more flexibility to your database interactions, e.g. a Django model would definitely break if an int column is changed to a varchar column, but using database reflection, it will only break if your code depends on the fact that it is an int. If you simply display the data or something, it will remain fully functional. However, there is always a chance that a change doesn't break the system, but causes unexpected behaviour.
If, like Benjamin said, the external system has an API, that would be the preferred choice.

I suggest you to read about inspectdb and database routers. It's possible to use the django ORM to manipulate a external DB.
https://docs.djangoproject.com/en/1.7/ref/django-admin/#inspectdb

I would create the minimal django models on the external databases => those that interact with your code:
Several outcomes to this
If parts of the database you're not interested in change, it won't have an impact on your app.
If the external models your using change, you probably want to be aware of that as quickly as possible (your app is likely to break in that case too).
All the relational databases queries in your code are handled by the same ORM.

Django Reusing models

I have Django application with 30+ models. I want to write an application that can take a snapshot of the data in some of the models. I want to write the models once and reuse them in each application so that if I maintain it in one place, the only difference being that when I call python manage.py syncdb the same table are created with different table prefixes.
Is there any way to do this?

This is exactly where the reusable app principle comes into play.
(as explained at the django website)

How to use a python class with data objects in mysql

I am beginning to learn Python and Django. I want to know how if I have a simple class of "player" with some properties, like: name, points, inventory, how would I make the class also write the values to the database if they are changed. My thinking is that I create Django data models and then call the .save method within my classes. Is this correct?

You are correct that you call the save() method to save models to your db, But you don't have to define the save method within your model classes if you don't want to. It would be extremely helpful to go through the django tutorial which explains all.
https://docs.djangoproject.com/en/dev/intro/tutorial01/
https://docs.djangoproject.com/en/dev/topics/db/models/
Explains django models
django uses its own ORM (object-relational mapping)
This does exacxtly what it sounds like maps your django/python objects (models) to your backend.
It provides a sleek, intuitive, pythonic, very easy to use interface for creating models (tables in your rdbms) adding data and retrieving data.
First you would define your model
class Player(models.Model):
points = models.IntegerField()
name = models.CharField(max_length=255)
django provides commands for chanign this python object into a table.
python manage.py syncdb
you could also use python manage.py sql <appname> to show the actual sql that django is generating to turn this object into a table.
Once you have a storage for this object you can create new ones in the same manner you would create python objects
new_player = Player(points=100, name='me')
new_player.save()
Calling save() actually writes the object to your backend.

You're spot on...
Start at https://docs.djangoproject.com/en/dev/intro/tutorial01/
Make sure you have the python bindings for MySQL and work your way through it... Then if you have specific problems, ask again...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.