Use alembic migration or docker volumes to populate docker postgres database?

Use alembic migration or docker volumes to populate docker postgres database? - python

I believe this question already shows that I am new to docker and alembic. I am building a flask+sqlalchemy app using docker and postgres. So far I am not using alembic, but I am about to plug it in and some questions came up. I will have to create a pg_trgm extension and also populate one of the tables with data I already have. Until now I have only created brand new databases using sqlalchemy for the tests. So here is what I am thinking/doing:
To create the extension I could simple add a volume to the postgres docker service like: ./pg_dump.sql:/docker-entrypoint-initdb.d/pg_dump.sql. The extension does not depend on any specific db, so a simple "CREATE EXTENSION IF NOT EXISTS pg_trgm WITH SCHEMA public;" would do it, right?
If I use the same strategy to populate the tables I need a pg_dump.sql that creates the complete db and tables. To accomplish that I first created the brand new database on sqlalchemy, then I used a script to populate the tables with data I have on a json file. I then generated the complete pg_dump.sql and now I can place this complete .sql file on the docker service volume and when I run my docker-compose the postgres container will have the dabatase ready to go.
Now I am starting with alembic and I am thinking I could just keep the pg_dump.sql to create the extensions, and have a alembic migration script to populate the empty tables (dropping the item 2 above).
Which way is the better way? 2, 3 or none of them? tks

Create the extension in a /docker-entrypoint-initdb.d script (1). Load the data using your application's migration system (3).
Mechanically, one good reason to do this is that the database init scripts only run the very first time you create a database container on a given storage. If you add a column to a table and need to run migrations, the init-script sequence requires you to completely throw away and recreate the database.
Philosophically, I'd give you the same answer whether you were using Docker or something else. You could imagine running a database on a dedicated server, or using a cloud-hosted database. You'd have to ask your database administrator to install the extension for you, but they'd generally expect to give you credentials to an empty database and have you load the data yourself; or in a cloud setup you could imagine checking a "install this extension" checkbox in their console but there wouldn't be a way to load the data without connecting to the database remotely.
So, a migration system will work anywhere you have access to the database, and will allow incremental changes to the schema. The init script setup is Docker-specific and requires deleting the database to make any change.

Related

Should an embedded SQLite DB used by CLI app be uploaded to version-control (Git)?

I'm working on a Python CLI app that has to manage some data on a sqlite db (creating, updating and deleting records). I want the users to be able to install the app and use it right away. So my question is, can I just upload an empty sqlite db to GitHub? Or should I just upload a schema file and during installation build the db in a build step? I suppose if going the second way, users should have sqlite pre-installed or else the installation will fail. What I want is for them to just install the app, without worrying about dependencies and such.

When it comes to SQLite, My understanding is that SQLite is generally used as an embedded DB thus users wouldn't need to have SQLite preinstalled. (Of course, it can be used as a standalone DB server, but it's mainly known for its "ease of embeddability" aka...simply just run). Without any effort, in the embedded form, the client itself would create this db.
Using SQLite is just a one-liner as:
conn = sqlite3.connect('my.db')
or
conn = sqlite3.connect('/path/to/my.db')
Or even in-memory (as cache)
conn = sqlite3.connect(':memory:')
When this line runs, it would create a connection by either opening the file (if it exists) or create this file (as an empty DB) if the file is not present. In short, The SQLite library will always read the existing file or create it if it doesn't exist. Thus, You will always have a running DB out of the box. (The only time I can see it failing is if this db file is corrupt for some reason or the SQLite library cannot create the file in a location due to permission issues)
From a user perspective (or developer perspective for that matter), there is nothing that needs to be done to install SQLite. There are no external dependencies for embedded DB or anything to be preinstalled. It simply works. If there are other applications that share this database, they just need to open the particular db file and that's it.
Therefore coming back to your main question, the general best practice is that the application instantiates the database (Whatever the DB is for that matter) on its first run by importing the SQL/Schema (and initial data) file (SQL File, CSV, JSON, XML, from code etc...). The SQL/Schema file can be maintained along with the application source in Github (or whatever VCS) or packaged with the binary in the packaged format (zip, tar...etc) that is given for distribution. So in your case, the second approach that you have thought of might be better. This is even good from a code maintenance and review perspective.
It is best not to upload the "database" as a binary, rather instantiate it on the first run and populate it with data.

If your sqlite db have some pre tables and records, you should upload it to vc in order to be used by the users. but if you need a clean db for each instance of your project I suggest creating db during the initialization process of your app.
Also if your app needs some pre-data inside the db, one of the best practices is to put the data into a file like predata.json and during initialization, create db and import it into the db.

Dockerized Django app not applying unique constraint

I have two issues both of which are inter-related
Issue #1
My app has an online Postgres Database that it is using to store data. Because this is a Dockerized app, migrations that I create no longer appear on my local host but are instead stored in the Docker container.
All of the questions I've seen so far do not seem to have issues with making migrations and adding the unique constraint to one of the fields within the table.
I have written shell code to run a python script that returns me the contents of the migrations file in the command prompt window. I was able to obtain the migrations file that was to be applied and added a row to the django_migrations table to specify the same. I then ran makemigrations and migrate but it said there were no changes applied (which leads me to believe that the row which I added into the database should only have automatically been created by django after it had detected migrations on its own instead of me specifying the migrations file and asking it to make the changes). The issue is that now, the new migrations still detect the following change
Migrations for 'mdp':
db4mdp/mdp/migrations/0012_testing.py
- Alter field mdp_name on languages
Despite detecting this apparent 'change', I get the following error,
return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: relation "mdp_mdp_mdp_fullname_281e4228_uniq" already exists
I have already checked on my postgres server using pgadmin4 to check if the constraint has actually been applied on it. And it has with the name next to relation as specified above. So why then, does Django apparently detect this as a change that is to be made. The thing is, if I now remove the new migrations file that I created in my python directory, it probably will run (since the changes have 'apparently' been made in the database) but I won't have the migrations file to keep track of the changes. I don't need if I need to keep the migrations around now that I'm using an online database though. I will not be rolling any changes I make back nor will I be making changes to often. This is just a one/two time thing but I want to resolve the error.
Issue #2
The reason I used 'apparently' in my above issue is that even though the constraints section in my public schema show me that the constraints have been applied, for some reason, when I try to create a new entry into my table with a non-unique string in the field that I've defined as unique, it allows it creation anyway.

You never add anything manually to django_migrations table. Let django do it. If it is not doing it, no matter what, you code is not production ready.
I understand that you are doing your development inside docker. When you do it, you mount your docker volume to local volume. Since you have not mounted that, your migratins will not show in local.
Refer Volumes. It should resolve your issues.

For anyone trying to find an alternate solution to this problem other than mounting volumes due to time constraints, this answer might help but #deosha 's is still the correct way to go about it. I fixed the problem by deleting all my tables and the rows corresponding to migrations to my specific app (it isn't necessary to delete the auth tables etc. because you won't be deleting the rows corresponding to those in the django_migrations table). Following this I used the following within the shell script called by my Dockerfile.
python manage.py makemigrations --name testing
python testing_migrations.py
It needs to be named for the next step. After this line of code I ran the python script testing_migrations which contains the following code:
import os
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
migrations_file = os.path.join(BASE_DIR, '<path_to_new_migration_file>')
with open(migrations_file, 'r') as file:
print(file.read())
Typically the first migration created for the app you have made will be /0001_testing.py (which is why naming it was necessary earlier). Since the contents of this file are visible to the container while it is up and running, you will be able to print the contents of the file. Then run the migrate command. This creates a column in the django_migrations table that makes it appear to django that the migration has been applied. However, on your local machine this migrations file doesn't exist. So copy the contents of the file from the print statement above and put it into a .py file with the same name as mentioned above and save it in the migrations folder of the above on your local device.
You can follow this method for all successive migrations by repeating the process and incrementing the number in the testing_migrations file as required.
Squashing migrations once you're done making the table will help. If you're doing this all in development and have no requirement to roll-back the changes to the database schema then just put this into production after deleting all the migrations files and the rows in your django_migrations table corresponding to your app as was done initially. Drop your tables and let your first new migrations file re-create them and then import in your data once again.
This is not the recommended method. Use deosha's if you're not on a time crunch.

Where Does Django Store It's Test Databse

When using an SQLite database where does Django store the database that it uses when running tests?
Is there way to define this path?
I would like to be able to manually look at the contents of the test database following each test.
The command python manage.py test --keepdb is supposed to keep the database, which it does, but I cannot seem to find where this database is stored.
The dev database is stored in the root of the project but the test database is not found there.

from the docs:
When using SQLite, the tests will use an in-memory database by default (i.e., the database will be created in memory, bypassing the filesystem entirely!). The TEST dictionary in DATABASES offers a number of settings to configure your test database. For example, if you want to use a different database name, specify NAME in the TEST dictionary for any given database in DATABASES.
for more info see:
https://docs.djangoproject.com/en/2.1/topics/testing/overview/#the-test-database

How will django datadump overwrite existing records in database?

My production database currently contains 4k MyModels (loaded from the dev database last year). I started working on this project again. I now have 270k MyModels (including the original 4k MyModels). I want to export this new datadump to my production database. What will happen to the 4k MyModels already there (doing a simple dumpdata/loaddata)? How will the records be overwritten?

After you dump your data into a file, you go and cd into folder where you keep your dump file and do
mysql -u root -p your_database_name < DumpDevDatabase.sql
NOTE:
Bare in mind that you will create new database in production every time you want to dump your data into it and that is a bad thing.
You should not do this, this should work the other way around, Production database need to be isolated from this things, you should dump data from your production to your development database so you can work with the data.
In that case when you dump data from production to development, again you need to create new database for loading data into it.
You can use tools like mysql workbench or pgadmin if you use postgreql, this will help you to easier work with your database.
I'm still not sure why you want to do it, but I strongly advice you not do overwrite your production database.

Flask-SQLAlchemy - When are the tables/databases created and destroyed?

I am a little confused with the topic alluded to in the title.
So, when a Flask app is started, does the SQLAlchemy search theSQLALCHEMY_DATABASE_URI for the correct, in my case, MySQL database. Then, does it create the tables if they do not exist already?
What if the database that is programmed into theSQLALCHEMY_DATABASE_URI variable in the config.py file does not exist?
What if that database exists, and only a few of the tables exist (There are more tables coded into the SQLAlchemy code than exist in the actual MySQL database)? Does it erase those tables and then create new tables with the current specs?
And what if those tables do all exist? Do they get erased and re-created?
I am trying to understand how the entire process works so that I (1) Don't lose database information when changes are made to the schema, and (2) can write the necessary code to completely manage how and when the SQLAlchemy talks to the actual Database.

Tables are not created automatically; you need to call the SQLAlchemy.create_all() method to explicitly to have it create tables for you:
db = SQLAlchemy(app)
db.create_all()
You can do this with command-line utility, for example. Or, if you deploy to a PaaS such as Google App Engine, a dedicated admin-only view.
The same applies for database table destruction; use the SQLAlchemy.drop_all() method.
See the Creating and Dropping tables chapter of the documentation, or take a look at the database chapter of the Mega Flask Tutorial.
You can also delegate this task to Flask-Migrate or similar schema versioning tools. These help you record and edit schema creation and migration steps; the database schema of real-life projects is never static and you would want to be able to move existing data between versions or the schema. Creating the initial schema is then just the first step.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.