Sqlalchemy many to one array response - python

Im working with SQLAlchemy and Flask. I have a content table like:
--------------------------------------------
| id | title | description |
--------------------------------------------
| 1 | example | my content |
| 2 | another piece| my other content|
--------------------------------------------
And a status table like this:
--------------------------------------------------------
| id | content_id | status type | date |
--------------------------------------------------------
| 1 | 1 | written | 1/5/2020 |
| 2 | 1 | edited | 1/7/2020 |
--------------------------------------------------------
I want to be able to query the db and get a content with all of the status's in one row instead of have multiple rows of the content repeated. For example I want:
----------------------------------------------------------
| id | title | description | status's |
----------------------------------------------------------
| 1 | example | my content | [1,2] |
----------------------------------------------------------
Is there a way to do this with sqlalchemy?

You can use this query for fetching your answer:
SELECT b.*,
(SELECT GROUP_CONCAT (id) FROM status_table
WHERE content_id = b.id) AS `status's`
FROM status_table a JOIN content_table b
ON a.content_id = b.id
GROUP BY a.content_id;

Related

Reference a Many-To-Many row

I am dealing with the design of a database in Flask connected to Postgresql. I have 2 Tables Reservation and Device which are related through a many-to-many relationship Table ReservationItem as follows:
| Reservation | | Device | | ReservationItem |
| ----------- | | ------ | | --------------- |
| id_res | | id_dev | | res_id (FK/PK) |
| etc... | | etc.. | | dev_id (FK/PK) |
| created_at |
| status |
Where dev_id and res_id are foreign keys and make up the composite primary key for the table. The columns created_at and status where originally conceived to track the history of the development of each Reservation-Device status.
Example
Someone reserves 3 Devices (respectively with id_dev's 1 - 2 - 3) on the 1st of January 2021 hence I would create 1 Reservation entry (id_res 1) and 3 ReservationItem entry with status "booked".
ReservationItem
| --------------------------------------|
| res_id | dev_id | created_at | status |
| ------------------------------------- |
| 1 | 1 | 2021-01-01 | booked |
| 1 | 2 | 2021-01-01 | booked |
| 1 | 3 | 2021-01-01 | booked |
On the 2nd of January the client returns the Device.id = 1 so I would create a fourth entry in the ReservationItem Table where the only updated fields are created_at and status, so that I could track where the devices are.
| --------------------------------------- |
| res_id | dev_id | created_at | status |
| --------------------------------------- |
| 1 | 1 | 2021-01-01 | booked |
| ... | ... | ... | ... |
| 1 | 1 | 2021-01-02 | returned |
Which basically weaken the uniqueness of the composite key (res_id,dev_id).
So I thought: Should I created another table lets say History to track these updates?
These would be my new models...
| ReservationItem | | History |
| --------------- | | ------------- |
| id_assoc (PK) | | id_hist (PK) |
| res_id (FK) | | assoc_id (FK) |
| dev_id (FK) | | created_at |
| | | status |
I would change the ReservationItem Table so that res_id are dev_id are not primary keys anymore. I would move the created_at and status into the History table and I would add the column id_assoc and use it as primary key, so that I can reference it from the History table.
I've been looking around and it seems that using one column as primary key in a many to many relationship is not ideal.
How would you design the relationships otherwise?
Is there any tool that Flask offers?
EDIT
After reading this post, which suggests to audit database table and write logs to track changed entries (or operations on databases) I found this article which suggests how to implement audit logs in Flask. But why wouldn't my solution work (or lets say "isn't ideal")?
thank you!

Extracting Queryset of latest records based on filtered columns in Django

EDIT
I searched solution for my problem from stackoverflow and django documentation, but no luck. My models:
class User(models.Model):
name = models.CharField(max_length=10)
type = models.CharField(max_length=10)
class Userlog(models.Model):
user = models.ForeignKey('User', related_name='userlog')
pc_type = models.CharField(max_length=10)
login_date = models.DateField()
This table below is very simplified version of my sqlite table in Django app:
Table User:
+----+------+---------+
| ID | User | Type |
+----+------+---------+
| 1 | A | Admin |
| 2 | B | Admin |
| 3 | C | User |
+----+------+---------+
Table Userlog:
+----+---------+------------+------------+
| ID | user | pc_type | login_date |
+----+---------+------------+------------+
| 1 | A | Desktop | 2017/01/01 |
| 2 | A | Server | 2017/01/05 |
| 3 | B | Desktop | 2017/01/11 |
| 4 | A | Server | 2017/02/03 |
| 5 | C | Desktop | 2017/02/09 |
| 6 | B | Server | 2017/02/21 |
| 7 | A | Desktop | 2017/03/18 |
| 8 | C | Desktop | 2017/03/31 |
+----+---------+------------+------------+
I tried different approaches like that:
q = Userlog.objects.values('login_date').annotate(last_date =\
Max('login_date')).filter(pc_type='Desktop', user='Admin', login_date=F('last_date'))
But cannot extract latest dates for filtered columns.
I need Django QuerySet expression to get the result below:
(Latest login dates of Admins logged using Desktop)
+----+---------+------------+------------+
| ID | User_ID | pc_type | login_date |
+----+---------+------------+------------+
| 3 | B | Desktop | 2017/01/11 |
| 7 | A | Desktop | 2017/03/18 |
+----+---------+------------+------------+
I found similar question, but I want to use django expressions.
In Django documentation you can read more about annotate.
But in your case, try:
User.objects\
.filter(type='Admin', user_log__pc_type='Desktop')\
.annotate(last_login_date=Max('user_log__login_date'))
With .filter - you filter records you need
With .annotate - you get Maximum login date
To run exactly this code, you need to have related_name='user_log' for relation between User and UserLog:
class UserLog(models.Model)
user = models.ForeignKey(User, related_name='user_log')
...
Try :
Userlog.objects\
.filter(pc_type='Desktop', user__type='Admin')\
.order_by('-login_date')\
.disctinct('user')
You can access to user throught userlog.user.
Please, paste your models for exact query.

Last Touch Attribution in MySQL

Conversions
user_id | tag | timestamp
|--------- |-------- |---------------------|
| 1 | click1 | 2016-11-01 01:20:39 |
| 2 | click2 | 2016-11-01 09:48:10 |
| 3 | click1 | 2016-11-04 14:27:22 |
| 4 | click4 | 2016-11-05 17:50:14 |
User Sessions
user_id | utm_campaign | session_start
|--------- |--------------- |---------------------|
| 1 | outbrain_2 | 2016-11-01 00:15:34 |
| 1 | email | 2016-11-01 01:00:29 |
| 2 | google_1 | 2016-11-01 08:24:39 |
| 3 | google_4 | 2016-11-04 14:25:06 |
| 4 | google_1 | 2016-11-05 17:43:02 |
Given the 2 tables above, I want to map each conversion event to the most recent campaign that brought a particular user to a site (aka last touch/last click attribution).
The desired output is a table of the format:
user_id | tag | timestamp | campaign
|--------- |-------- |---------------------|-----------
| 1 | click1 | 2016-11-01 01:20:39 | email
| 2 | click2 | 2016-11-01 09:48:10 | google_1
| 3 | click1 | 2016-11-04 14:27:22 | google_4
| 4 | click4 | 2016-11-05 17:50:14 | google_1
Note how user 1 visited the site via the outbrain_2 campaign and then came back to the site via the email campaign. Sometime during the user's second visit, they converted, thus the conversion should be attributed to email and not outbrain_2.
Is there a way to do this in MySQL or Python?
You can do this in Python with Pandas. I assume you can load the data from MySQL tables to Pandas dataframes conversions and sessions. First, concatenate both tables:
all = pd.concat([conversions,sessions])
Some of the elements in the new frame will be NAs. Create a new column that collects the time stamps from both tables:
all["ts"] = np.where(all["session_start"].isnull(),
all["timestamp"],
all["session_start"])
Sort by this column, forward fill the time values, group by the user ID, and select the last (most recent) row from each group:
groups = all.sort_values("ts").ffill().groupby("user_id",as_index=False).last()
Select the right columns:
result = groups[["user_id", "tag", "timestamp", "utm_campaign"]]
I tried this code with your sample data and got the right answer.

sqlalchemy orm - change column in a table depending on another table

I have a 3 tables
table 1
| id | name |
|:---:|:----:|
| 1 | name |
table 2
| id | name | status |
|:---:|:----:|:------:|
| 1 | name | True |
table 3
| id_table1 | id_table2 | datetime | status_table2 |
|:----------:|----------:|:--------:|:-------------:|
| 1 | 1 |01/11/2011| True |
How I can change a status in table 2 when I create a link in table 3, with sqlalchemy ORM in python, status must be changed when link in table 3 created and also must be changed when link deleted, who have any cool and simple ideas?
solved problem by use ORM Events

Why django order_by is so slow in a manytomany query?

I have a ManyToMany field. Like this:
class Tag(models.Model):
books = models.ManyToManyField ('book.Book', related_name='vtags', through=TagBook)
class Book (models.Model):
nump = models.IntegerField (default=0, db_index=True)
I have around 450,000 books, and for some tags, it related around 60,000 books. When I did a query like:
tag.books.order_by('nump')[1:11]
It gets extremely slow, like 3-4 minutes.
But if I remove order_by, it run queries as normal.
The raw sql for the order_by version looks like this:
'SELECT `book_book`.`id`, ... `book_book`.`price`, `book_book`.`nump`,
FROM `book_book` INNER JOIN `book_tagbook` ON (`book_book`.`id` =
`book_tagbook`.`book_id`) WHERE `book_tagbook`.`tag_id` = 1 ORDER BY
`book_book`.`nump` ASC LIMIT 11 OFFSET 1'
Do you have any idea on this? How could I fix it? Thanks.
---EDIT---
Checked the previous raw query in mysql as #bouke suggested:
SELECT `book_book`.`id`, `book_book`.`title`, ... `book_book`.`nump`,
`book_book`.`raw_data` FROM `book_book` INNER JOIN `book_tagbook` ON
(`book_book`.`id` = `book_tagbook`.`book_id`) WHERE `book_tagbook`.`tag_id` = 1
ORDER BY `book_book`.`nump` ASC LIMIT 11 OFFSET 1;
11 rows in set (4 min 2.79 sec)
Then use explain to find out why:
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
| 1 | SIMPLE | book_tagbook | ref | book_tagbook_3747b463,book_tagbook_752eb95b | book_tagbook_3747b463 | 4 | const | 116394 | Using temporary; Using filesort |
| 1 | SIMPLE | book_book | eq_ref | PRIMARY | PRIMARY | 4 | legend.book_tagbook.book_id | 1 | |
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
2 rows in set (0.10 sec)
And for the table book_book:
mysql> explain book_book;
+----------------+----------------+------+-----+-----------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------------+------+-----+-----------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(200) | YES | | NULL | |
| href | varchar(200) | NO | UNI | NULL | |
..... skip some part.............
| nump | int(11) | NO | MUL | 0 | |
| raw_data | varchar(10000) | YES | | NULL | |
+----------------+----------------+------+-----+-----------+----------------+
24 rows in set (0.00 sec)

Categories