Store array in mySql - python

I have a table in mysql with restaurants and I want to store an array for the categories the restaurants fall under. How should I do this, as mysql doesn't have an array section. So what I want is something like this:
id|name |categories |
1|chipotle|[mexican,fast_food]|
How can I do this?

Ok, give me a minute or two to add sample data and FK's
create table food
(
id int auto_increment primary key,
name varchar(100) not null
);
create table category
(
id int auto_increment primary key,
name varchar(100) not null
);
create table fc_junction
( -- Food/Category junction table
-- if a row exists here, then the food and category intersect
id int auto_increment primary key,
foodId int not null,
catId int not null,
-- the below unique key makes certain no duplicates for the combo
-- duplicates = trash
unique key uk_blahblah (foodId,catId),
-- Below are the foreign key (FK) constraints. A part of Referential Integrity (RI).
-- So a row cannot exist with faulty foodId or catId. That would mean insert/update here.
-- It also means the parents (food and category) row(s) cannot be deleted and thus
-- orphaning the children (the children are these rows in fc_junction)
CONSTRAINT fc_food FOREIGN KEY (foodId) REFERENCES food(id),
CONSTRAINT fc_cat FOREIGN KEY (catId) REFERENCES category(id)
);
So you are free to add food and categories and hook them up later via the junction table. You can create chipotle, burritos, hotdogs, lemonade, etc. And in this model (the generally accepted way = "don't do it any other way), you do not need to know what categories the foods are in until whenever you feel like it.
In the original comma-separated way (a.k.a. the wrong way), you have zero RI and you can bet there will be no use of fast indexes. Plus getting to your data, modifying, deleting a category, adding one, all of that is a kludge and there is much snarling and gnashing of teeth.

See How to store arrays in MySQL?, you need to create a separate table and use a join.
Or you can use Postgres, http://www.postgresql.org/docs/9.4/static/arrays.html.

Related

can we do an autoincrement strings in sqlite3?

Can we do autoincrement string in sqlite3? IF not how can we do that?
Exemple:
RY001
RY002
...
With Python, I can do it easily with print("RY"+str(rowid+1)). But how about it performances?
Thank you
If your version of SQLite is 3.31.0+ you can have a generated column, stored or virtual:
CREATE TABLE tablename(
id INTEGER PRIMARY KEY AUTOINCREMENT,
str_id TEXT GENERATED ALWAYS AS (printf('RY%03d', id)),
<other columns>
);
The column id is declared as the primary key of the table and AUTOINCREMENT makes sure that no missing id value (because of deletions) will ever be reused.
The column str_id will be generated after each new row is inserted, as the concatenation of the 'RY' and the left padded with 0s value of id.
As it is, str_id will be VIRTUAL, meaning that it will be created every time you query the table.
If you add STORED to its definition:
str_id TEXT GENERATED ALWAYS AS (printf('RY%03d', id)) STORED
it will be stored in the table.
Something like this:
select
printf("RY%03d", rowid) as "id"
, *
from myTable
?

How to insert Python/Pandas data into a normalized database

Say I have a Pandas data frame with records such as:
Time Action User Company User2
---------------------------------------------------
00:02 buy share msmith ACME tjones
00:03 sell share tjones Alpha msmith
...
and I have a database with tables:
ActionType (ID INT IDENTITY(1,1), Name VARCHAR)
Users (ID INT IDENTITY(1,1), Username VARCHAR, CompanyID INT FOREIGN KEY)
Companies (ID INT IDENTITY(1,1), CompanyName VARCHAR)
Events (ID INT IDENTITY(1,1), ActionID INT FOREIGN KEY, UserID INT FOREIGN KEY, CompanyID INT FOREIGN KEY, User2ID INT FOREIGN KEY)
I want to insert the data frame into the events table, but I want it to store the associated ID for each column, rather than the raw text. Is there a way to easily do that through SQLAlchemy (or other RDBMS or ORM packages), or do I need to go row by row and set variables such as
userid = session.query(Users).filter(Users.Username == df.User)
Alternatively, is the best way to handle this through the database? I could accomplish this by inserting the raw pandas data directly into a "staging" table, and then split the data points out into their respective tables using SQL.
That seems doable, I'm just looking to see if there is a more efficient solution through Python?
Bonus (possibly separate) question, how would I go about entering a new value into the tables when it is encountered (i.e. df.User is not in Users table, so I want to INSERT INTO Users VALUES ...)

Replace integer id field with uuid

I need to replace the default integer id in my model with an uuid. The problem is that it's beeing used in another model (foreignkey).
Any idea on how to perform this operation without losing data?
class A(Base):
__tablename__ = 'a'
b_id = Column(
GUID(), ForeignKey('b.id'), nullable=False,
server_default=text("uuid_generate_v4()")
)
class B(Base):
__tablename__ = 'b'
id = Column(
GUID(), primary_key=True,
server_default=text("uuid_generate_v4()")
)
Unfortunately it doesn't work, also I'm afraid I'll break the relation.
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) default for column "id" cannot be cast automatically to type uuid
Alembic migration I've tried looks similar to:
op.execute('ALTER TABLE a ALTER COLUMN b_id SET DATA TYPE UUID USING (uuid_generate_v4())')
Add an id_tmp column to b with autogenerated UUID values, and a b_id_tmp column to a. Update a joining b on the foreign key to fill a.b_id_tmp with the corresponding UUIDs. Then drop a.b_id and b.id, rename the added columns, and reestablish the primary key and foreign key.
CREATE TABLE a(id int PRIMARY KEY, b_id int);
CREATE TABLE b(id int PRIMARY KEY);
ALTER TABLE a ADD CONSTRAINT a_b_id_fkey FOREIGN KEY(b_id) REFERENCES b(id);
INSERT INTO b VALUES (1), (2), (3);
INSERT INTO a VALUES (1, 1), (2, 2), (3, 2);
ALTER TABLE b ADD COLUMN id_tmp UUID NOT NULL DEFAULT uuid_generate_v1mc();
ALTER TABLE a ADD COLUMN b_id_tmp UUID;
UPDATE a SET b_id_tmp = b.id_tmp FROM b WHERE b.id = a.b_id;
ALTER TABLE a DROP COLUMN b_id;
ALTER TABLE a RENAME COLUMN b_id_tmp TO b_id;
ALTER TABLE b DROP COLUMN id;
ALTER TABLE b RENAME COLUMN id_tmp TO id;
ALTER TABLE b ADD PRIMARY KEY (id);
ALTER TABLE a ADD CONSTRAINT b_id_fkey FOREIGN KEY(b_id) REFERENCES b(id);
Just as an aside, it's more efficient to index v1 UUIDs than v4 since they contain some reproducible information, which you'll notice if you generate several in a row. That's a minor savings unless you need the higher randomness for external security reasons.

select all id's and insert missing data

I have a database where i store some values with a auto generated index key. I also have a n:m mapping table like this:
create table data(id int not null identity(1,1), col1 int not null, col2 varchar(256) not null);
create table otherdata(id int not null identity(1.1), value varchar(256) not null);
create table data_map(dataid int not null, otherdataid int not null);
every day the data table needs to be updated with a list of new values, where a lot of them are already present but needs to be inserted into the data_map (the key in otherdata is then generated, so in this table the data is always new).
one way of doing it would be to first try to insert all values, then select the generated id, then insert into data_map:
mydata = [] # list of tuples
cursor.executemany("if not exists (select * from data where col1 = %d and col2 = %d) insert into data (col1, col2) values (%d, %d)", mydata);
# now select the id's
# [...]
but that obviously is quite bad because i need to select all things without using the key and also i need to do the check without using the key, so i need indexed data first, otherwise everything is very slow.
my next approach was to use a hashfunction (like md5 or crc64) to generate my own hash over col1 and col2, to be able to insert all values without using a select and be able to use the indexed key when inserting missing values.
can this be optimized or is it the best thing i could do?
the amout of lines is >500k per change, where maybe ~20-50% will be already in the database.
timing wise it looks like that calculating the hashes is much faster than inserting data into the database.
As far as I concern, you use mysql.connector. If it is, when you run cursor.execute() you should not use %d types. Everything should be just %s and connector will do this job about type conversions

sqlalchemy create a foreign key?

I have a composite PK in table Strings (integer id, varchar(2) lang)
I want to create a FK to ONLY the id half of the PK from other tables. This means I'd have potentially many rows in Strings table (translations) matching the FK. I just need to store the id, and have referential integrity maintained by the DB.
Is this possible? If so, how?
This is from wiki
The columns in the referencing table
must be the primary key or other
candidate key in the referenced table. The values in one row of the referencing columns must occur in a single row in the referenced table.
Let's say you have this:
id | var
1 | 10
1 | 11
2 | 10
The foreign key must reference exactly one row from the referenced table. This is why usually it references the primary key.
In your case you need to make another Table1(id) where you stored the ids and make the column unique/primary key. The id column in your current table is not unique - you can't use it in your situation... so you make a Table1(id - primary key) and make the id in your current table a foreign key to the Table1. Now you can create foreign keys to id in Table1 and the primary key in your current table is ok.

Categories