Relational DB - separate joined tables - python

Is there any way to join tables from a relational database and then separate them again ?
I'm working on a project that involves modifying the data after having joined them. Unfortunately, modifying the data bejore the join is not an option. I would then want to separate the data according to the original schema.
I'm stuck at the separating part. I have metadata (python dictionary) with the information on the tables (primary keys, foreign keys, fields, etc.).
I'm working with python. So, if you have a solution in python, it would be greatly appretiated. If an SQL solution works, that also helps.
Edit : Perhaps the question was unclear. To summarize I would like to create a new database with an identical schema to the old one. I do not want to make any modifications to the original database. The data that makes up the new database must originally be in a single table (result of a join of the old tables) as the operations that need to be run must be run on a single table and I cannot run these operations on invidual tables as the outcome will not be as desired.
I would like to know if this is possible and, if so, how can I achieve this?
Thanks!

Related

How can I have a database with thousands of tables with varying number of columns that are all of the same class in Django / SQLAlchemy ORM?

I have financial statement data on thousands of different companies. Some of the companies have data only for 2019, but for some I have decade long data. Each company financial statement have its own table structured as follows with columns in bold:
lineitem---2019---2018---2017
2...............1000....800.....600
3206...........700....300....-200
56.................50....100.....100
200...........1200......90.....700
This structure is preferred over more of a flat file structure like lineitem-year-amount since one query gives me the correct structure of the output for a financial statement table. lineitem is a foreignkey linking to the primary key of a mapping table with over 10,000 records. 3206 can for example mean "Debt to credit instituions". I also have a companyIndex table which has the company ID, company name, and table name. I am able to get the data into the database and make queries using sqlite3 in python, but advanced queries is somewhat of a challenge at times, not to mention that it can take a lot of time and not be very readable. I like the potential of using ORM in Django or SQLAlchemy. The ORM in SQLAlchemy seems to want me to know the name of the table I am about to create and want me to know how many columns to create, but I don't know that since I have a script that parses a datadump in csv which includes the company ID and financial statement data for the number of years it has operated. Also, one year later I will have to update the table with one additional year of data.
I have been watching and reading tutorials Django and SQLAlchemy, but have not been able to try it out too much in practise due to this initial problem which is a prerequisite for succeding in my project. I have googled and googled, and checked stackoverflow for a solution, but not found any solved questions (which is really surprising since I always find the solution on here).
So how can I insert the data using Django/SQLAlchemy given the structure I plan to have it fit into? How can I have the selected table(s) (based on company ID or company name) be an object(s) in ORM just like any other object allowing me the select the data I want at the granularity level I want?
Ideally there is a solution to this in Django, but since I haven't found anything I suspect there is not or that how I have structured the database is insanity.
You cannot find a solution because there is none.
You are mixing the input data format with the table schema.
You establish an initial database table schema and then add data as rows to the tables.
You never touch the database table columns again, unless you decide that the schema has to be altered to support different, usually additional functionality in the application, because for example, at a certain point in the application lifetime, new attributes become required for data. Not because there is more data, wich simply translates to new data rows in one or more tables.
So first you decide about a proper schema for database tables, based on the data records you will be reading or importing from somewhere.
Then you make sure the database is normalized until 3rd normal form.
You really have to understand this. Haven't read it, just skimmed over but I assume it is correct. This is fundamental database knowledge you cannot escape. After learning it right and with practice it becomes second nature and you will apply the rules without even noticing.
Then your problems will vanish, and you can do what you want with whatever relational database or ORM you want to use.
The only remaining problem is that input data needs validation, and sometimes it is not given to us in the proper form. So the program, or an initial import procedure, or further data import operations, may need to give data some massaging before writing the proper data rows into the existing tables.

Trying to merge the same structured sqlite databases- each contain 3 tables, 1 being a joining table for many to many relationships from the other 2

I am new to both python and SQLite.
I have used python to extract data from xlsx workbooks. Each workbook is one series of several sheets and is its own database, but I would also like a merged database of every series together. The structure is the same for all.
The structure of my database is:
*Table A with autoincrement primary key id, logical variable and 1 other variable.
*Table B autoincrement primary key id, logical variable and 4 other variables
*Table C is joined by table A id and table B id, together the primary key, and also has 4 other variables specific to this instance of intersection between table A and table B.
I tried using the answer at
Sqlite merging databases into one, with unique values, preserving foregin key relation
along with various other ATTACH solutions, but each time I got an error message ("cannot ATTACH database within transaction").
Can anyone suggest why I can't get ATTACH to work?
I also tried a ToMerge like the one at How can I merge many SQLite databases?
and it couldn't do ToMerge in the transaction either.
I also initially tried connecting to the existing SQLite db, making dictionaries from the existing tables in python, then adding the information in the dictionaries into a new 'merged' db, but this actually seemed to be far slower than the original process of extracting everything from the xlsx files.
I realize I can easily just run my xlsx to SQL python script again and again for each series directing it all into the one big SQL database and that is my backup plan, but I want to learn how to do it the best, fastest way.
So, what is the best way for me to merge identical structured SQLite databases into one, maintaining my foreign keys.
TIA for any suggestions
:-)L
You cannot execute the ATTACH statement from inside a transaction.
You did not start a transaction, but Python tried to be clever, got the type of your statement wrong, and automatically started a transaction for you.
Set connection.isolation_level = None.

Python or SQL: Populating an excel form (multiple times and saving outputs) from another table

Problem: Customer has requested we fill out a form (excel) for each item we provide them. Due to us providing them a large amount of parts, I would like to figure out a way to automate it as much as possible.
Idea: Create a table ('Data') with each part number and relevant information in the columns. Use Python to read 'Data' table, open blank customer form, populate blank customer form, and then save customer form.
Questions:
Can SQL accomplish this task as well? In relation to this task, I've only really created flat table outputs with SQL. Not really sure how this would work.
Recommended Python packages / documentation?
Similar example with code available? Just helps me learn being able to walk through something.
Any other ideas? Maybe I am tackling this issue the wrong way.
I am just unsure of my best path of action.
You could create a simple table on your SQL system (PostgreSQL, MySQL), so you can add modify simply your items.
Then you can export your table in excel format as the customer wants with:
Copy (Select * From foo) To '/tmp/test.csv' With CSV DELIMITER ',';
You can also do it with python, but i think it's more complicated to update item with python, with a SQL system you could create and HTML/PHP front-end page making it more customizable.

Is it possible to let users create and perform database migrations from a form?

Can you take form data and change database schema? Is it a good idea? Is there a downside to many migrations from a 'default' database?
I want users to be able to add / remove tables, columns, and rows. Making schema changes requires migrations, so adding in that functionality would require writing a view that takes form data and inserts it into a function that then uses Flask-Migrate.
If I manage to build this, don't migrations build the required separate scripts and everything that goes along with that each time something is added or removed? Is that practical for something like this, where 10 or 20 new tables might be added to the starting database?
If I allow users to add columns to a table, it will have to modify the table's class. Is that possible, or a safe idea? If not, I'd appreciate it if someone could help me out, and at least get me pointed in the right direction.
In a typical web application, the deployed database does not change its schema at runtime. The schema is only changed during an upgrade, and only the developers make these changes. Operations that users perform on the application can add, remove or modify rows, but not modify the tables or columns themselves.
If you need to offer your users a way to add flexible data structures, then you should design your database schema in a way that this is possible. For example, if you wanted your users to add custom key/value pairs, you could have a table with columns user_id, key_name and value. You may also want to investigate if a schema-less database fits your needs better.

Bigquery: how to preserve nested data in derived tables?

I have a few large hourly upload tables with RECORD fieldtypes. I want to pull select records out of those tables and put them in daily per-customer tables. The trouble I'm running into is that using QUERY to do this seems to flatten the data out.
Is there some way to preserve the nested RECORDs, or do I need to rethink my approach?
If it helps, I'm using the Python API.
It is now possible to preserve nested field structure in query results.... more here
use flatten_results flag in bq util
--[no]flatten_results: Whether to flatten nested and repeated fields in the result schema. If
not set, the default behavior is to flatten.
API Documentation
https://developers.google.com/bigquery/docs/reference/v2/jobs#configuration.query.flattenResults
Unfortunately, there isn't a way to do this right now, since, as you realized, all results are flattened.

Categories