SQLite, MySQLdb to open .db file conceptual Python process - python

I have a db file that I would like to open, process the data within and re-save as another db file to be inserted into a MySQLdb database.
I have read that the only way to open a db file is with SQLlite. I'm working in Ubuntu 11.04.
I need to write the process code in Python.
What is the correct conceptual procedure to do this?

I would recommend sqlalchemy for this type of problem. You can use it to
Open your SQLite3 DB and figure out the schema
Save that schema as a sqlalchemy model
< OPTIONAL do any processing you like >
Using the same sqlalchemy model from 1, open a MySQL connection, create the tables and load the data
Note I - you can do all this with the django ORM too - but the SQLAlchemy route will allow you to have less redundant code and more future flexibility.
Note II - sqlautocode can help you with 1.

Related

How do I use sqlalchemy to dump to a .sql file?

I've got some python code that runs ~15 select queries use sqlalchemy and gets the data. I'd like to dump all of these into a single .sql file so that I can later load this into a MySQL databae. I can't find any docs on how to do this.

Should an embedded SQLite DB used by CLI app be uploaded to version-control (Git)?

I'm working on a Python CLI app that has to manage some data on a sqlite db (creating, updating and deleting records). I want the users to be able to install the app and use it right away. So my question is, can I just upload an empty sqlite db to GitHub? Or should I just upload a schema file and during installation build the db in a build step? I suppose if going the second way, users should have sqlite pre-installed or else the installation will fail. What I want is for them to just install the app, without worrying about dependencies and such.
When it comes to SQLite, My understanding is that SQLite is generally used as an embedded DB thus users wouldn't need to have SQLite preinstalled. (Of course, it can be used as a standalone DB server, but it's mainly known for its "ease of embeddability" aka...simply just run). Without any effort, in the embedded form, the client itself would create this db.
Using SQLite is just a one-liner as:
conn = sqlite3.connect('my.db')
or
conn = sqlite3.connect('/path/to/my.db')
Or even in-memory (as cache)
conn = sqlite3.connect(':memory:')
When this line runs, it would create a connection by either opening the file (if it exists) or create this file (as an empty DB) if the file is not present. In short, The SQLite library will always read the existing file or create it if it doesn't exist. Thus, You will always have a running DB out of the box. (The only time I can see it failing is if this db file is corrupt for some reason or the SQLite library cannot create the file in a location due to permission issues)
From a user perspective (or developer perspective for that matter), there is nothing that needs to be done to install SQLite. There are no external dependencies for embedded DB or anything to be preinstalled. It simply works. If there are other applications that share this database, they just need to open the particular db file and that's it.
Therefore coming back to your main question, the general best practice is that the application instantiates the database (Whatever the DB is for that matter) on its first run by importing the SQL/Schema (and initial data) file (SQL File, CSV, JSON, XML, from code etc...). The SQL/Schema file can be maintained along with the application source in Github (or whatever VCS) or packaged with the binary in the packaged format (zip, tar...etc) that is given for distribution. So in your case, the second approach that you have thought of might be better. This is even good from a code maintenance and review perspective.
It is best not to upload the "database" as a binary, rather instantiate it on the first run and populate it with data.
If your sqlite db have some pre tables and records, you should upload it to vc in order to be used by the users. but if you need a clean db for each instance of your project I suggest creating db during the initialization process of your app.
Also if your app needs some pre-data inside the db, one of the best practices is to put the data into a file like predata.json and during initialization, create db and import it into the db.

Importing data from multiple related tables in mySQL to SQLite3 or postgreSQL

I'm updating from an ancient language to Django. I want to keep the data from the old project into the new.
But old project is mySQL. And I'm currently using SQLite3 in dev mode. But read that postgreSQL is most capable. So first question is: Is it better to set up postgreSQL while in development. Or is it an easy transition to postgreSQL from SQLite3?
And for the data in the old project. I am bumping up the table structure from the old mySQL structure. Since it got many relation db's. And this is handled internally with foreignkey and manytomany in SQLite3 (same in postgreSQL I guess).
So I'm thinking about how to transfer the data. It's not really much data. Maybe 3-5.000 rows.
Problem is that I don't want to have same table structure. So a import would be a terrible idea. I want to have the sweet functionality provided by SQLite3/postgreSQL.
One idea I had was to join all the data and create a nested json for each post. And then define into what table so the relations are kept.
But this is just my guessing. So I'm asking you if there is a proper way to do this?
Thanks!
better create the postgres database. write down the python script which take the data from the mysql database and import in postgres database.

Python ORM - save or read sql data from/to files

I'm completely new to managing data using databases so I hope my question is not too stupid but I did not find anything related using the title keywords...
I want to setup a SQL database to store computation results; these are performed using a python library. My idea was to use a python ORM like SQLAlchemy or peewee to store the results to a database.
However, the computations are done by several people on many different machines, including some that are not directly connected to internet: it is therefore impossible to simply use one common database.
What would be useful to me would be a way of saving the data in the ORM's format to be able to read it again directly once I transfer the data to a machine where the main database can be accessed.
To summarize, I want to do:
On the 1st machine: Python data -> ORM object -> ORM.fileformat
After transfer on a connected machine: ORM.fileformat -> ORM object -> SQL database
Would anyone know if existing ORMs offer that kind of feature?
Is there a reason why some of the machine cannot be connected to the internet?
If you really can't, what I would do is setup a database and the Python app on each machine where data is collected/generated. Have each machine use the app to store into its own local database and then later you can create a dump of each database from each machine and import those results into one database.
Not the ideal solution but it will work.
Ok,
thanks to MAhsan's and Padraic's answers I was able to find the how this can be done: the CSV format is indeed easy to use for import/export from a database.
Here are examples for SQLAlchemy (import 1, import 2, and export) and peewee

Using odo to migrate data to SQL

I have a large 3 GB CSV file, and I'd like to use Blaze to investigate the data, select down to the data I'm interesting in analyzing, with the eventual goal to migrate that data into a suitable computational backend such as SQlite, PostgresSQL etc. I can get that data into Blaze and work on it fine, but this is the part I'm having trouble with:
db = odo(bdata, 'sqlite:///report.db::report')`
I'm not sure how to properly create a db file to open with sqlite.
You can go directly from CSV to sqlite using the directions listed here.
http://odo.pydata.org/en/latest/perf.html?highlight=sqlite#csv-sqlite3-57m-31s
I think you are missing the column names as warned about here: http://odo.pydata.org/en/latest/sql.html?highlight=sqlite
dshape = discover(resource('report_2015.csv'))
t = odo('report_2015.csv', 'sqlite:///report.db::report', dshape=dshape)

Categories