Why django order_by is so slow in a manytomany query? - python

I have a ManyToMany field. Like this:
class Tag(models.Model):
books = models.ManyToManyField ('book.Book', related_name='vtags', through=TagBook)
class Book (models.Model):
nump = models.IntegerField (default=0, db_index=True)
I have around 450,000 books, and for some tags, it related around 60,000 books. When I did a query like:
tag.books.order_by('nump')[1:11]
It gets extremely slow, like 3-4 minutes.
But if I remove order_by, it run queries as normal.
The raw sql for the order_by version looks like this:
'SELECT `book_book`.`id`, ... `book_book`.`price`, `book_book`.`nump`,
FROM `book_book` INNER JOIN `book_tagbook` ON (`book_book`.`id` =
`book_tagbook`.`book_id`) WHERE `book_tagbook`.`tag_id` = 1 ORDER BY
`book_book`.`nump` ASC LIMIT 11 OFFSET 1'
Do you have any idea on this? How could I fix it? Thanks.
---EDIT---
Checked the previous raw query in mysql as #bouke suggested:
SELECT `book_book`.`id`, `book_book`.`title`, ... `book_book`.`nump`,
`book_book`.`raw_data` FROM `book_book` INNER JOIN `book_tagbook` ON
(`book_book`.`id` = `book_tagbook`.`book_id`) WHERE `book_tagbook`.`tag_id` = 1
ORDER BY `book_book`.`nump` ASC LIMIT 11 OFFSET 1;
11 rows in set (4 min 2.79 sec)
Then use explain to find out why:
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
| 1 | SIMPLE | book_tagbook | ref | book_tagbook_3747b463,book_tagbook_752eb95b | book_tagbook_3747b463 | 4 | const | 116394 | Using temporary; Using filesort |
| 1 | SIMPLE | book_book | eq_ref | PRIMARY | PRIMARY | 4 | legend.book_tagbook.book_id | 1 | |
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
2 rows in set (0.10 sec)
And for the table book_book:
mysql> explain book_book;
+----------------+----------------+------+-----+-----------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------------+------+-----+-----------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(200) | YES | | NULL | |
| href | varchar(200) | NO | UNI | NULL | |
..... skip some part.............
| nump | int(11) | NO | MUL | 0 | |
| raw_data | varchar(10000) | YES | | NULL | |
+----------------+----------------+------+-----+-----------+----------------+
24 rows in set (0.00 sec)

Related

Flask SqlAlchemy MySql Boolean Type Always Returns True

I have a Flask application connected to a MySql DB using SqlAlchemy. The table has 3 x boolean (bit) fields as shown below:
+------------------------+---------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default |
Extra |
+------------------------+---------------+------+-----+-------------------+----------------+
| ID | int(11) | NO | PRI | NULL |
auto_increment |
| clientID | int(11) | YES | | NULL |
|
| accountType | varchar(2) | YES | | NULL |
|
| systemType | varchar(1) | YES | | NULL |
|
| clientName | varchar(400) | YES | | NULL |
|
| clientURL | varchar(5000) | YES | | NULL |
|
| clientTelephone | varchar(300) | YES | | NULL |
|
| clientAddressLine1 | varchar(500) | YES | | NULL |
|
| clientAddressLine2 | varchar(500) | YES | | NULL |
|
| clientAddressLine3 | varchar(500) | YES | | NULL |
|
| clientPostcode | varchar(50) | YES | | NULL |
|
| clientCountry | varchar(100) | YES | | NULL |
|
| accessBenchmarking | bit(1) | YES | | NULL |
|
| accessTechnicalSupport | bit(1) | YES | | NULL |
|
| accountLive | bit(1) | YES | | NULL |
|
| clientTown | varchar(100) | YES | | NULL |
|
| clientCounty | varchar(100) | YES | | NULL |
|
| dateTimeStamp | timestamp | YES | | CURRENT_TIMESTAMP |
|
+------------------------+---------------+------+-----+-------------------+----------------+
Each of the bit fields has a value set to 0.
The SqlAlchemy Model for this is:
class ClientAccounts(db.Model):
id = db.Column(db.Integer, primary_key=True)
clientID = db.Column(db.Integer)
accountType = db.Column(db.Text(2))
systemType = db.Column(db.Text(1))
clientName = db.Column(db.Text(400))
clientURL = db.Column(db.Text(5000))
clientTelephone = db.Column(db.Text(300))
clientAddressLine1 = db.Column(db.Text(500))
clientAddressLine2 = db.Column(db.Text(500))
clientAddressLine3 = db.Column(db.Text(500))
clientPostcode = db.Column(db.Text(50))
clientCountry = db.Column(db.Text(100))
accessBenchmarking = db.Column(db.Boolean)
accessTechnicalSupport = db.Column(db.Boolean)
accountLive = db.Column(db.Boolean)
clientTown = db.Column(db.Text(100))
clientCounty = db.Column(db.Text(100))
The code to retrieve the values is here:
#check for valid and live user account
CheckAccount = ClientAccounts.query.filter_by(
clientID=accountNo,
).first()
if not CheckAccount is None:
accessBenchmarking = CheckAccount.accessBenchmarking
accessTechnicalSupport = CheckAccount.accessTechnicalSupport
accountLive = CheckAccount.accountLive
print 'db return ...'
print accessBenchmarking
print accessTechnicalSupport
print accountLive
The values are always returned as True even though they are set to False in the DB. The returned vales can be seen here:
INFO:sqlalchemy.engine.base.Engine:('11111111', 1)
db return ...
True
True
True
Does anybody have any idea what's causing this?
I figured out a fix for this. Changing the field data type from bit to tinyint for each boolean field did the trick. I'm still none the wiser as to why bit doesn't work with SqlAlchemy. Maybe it's the version of MySql Python I'm using?
For those who come across this thread without finding the solid solution for this:
I fixed this issue by changing the MYSQL connector to mysql-connector from pymysql.
pip3 install mysql-connector
'mysql+mysqlconnector://username:password!!#127.0.0.1:3306/'
I was lost for a long time, making this work. Didn't know the connector would be the issue.

Pandas + MySQL On Duplicate Key Update Broken

My ON DUPLICATE UPDATE clause stopped updating and I'm not sure why.
Below is my code to create a temporary table via Pandas:
#connect to mysql database
engine = sqlalchemy.create_engine('mysql://username:#localhost/db?charset=utf8')
conn = engine.connect()
#Create df and write to temp table
df = pd.DataFrame(item_bank,columns=['email','id', 'mbid','artist','track','plays','track_count'])
df.to_sql(con=conn, name='temp', if_exists='replace',index=False)
It successfully creates a MySQL table with all of the data types as 'Text' except for user_tracks which writes as a bigint(20).
I then run this, but the table does not update. It is especially strange to me because I have many scripts that use a similar method, and the only thing I remember changing was to stop updating the other static columns.
mysql_statement = """
INSERT INTO pickaresk.permanent
(email, id, mbid, artist, track, plays, track_count)
SELECT * FROM temp
ON DUPLICATE KEY UPDATE
plays=temp.plays,
track_count=temp.track_count,
lastfm_last_update=NOW()
;
"""
conn.execute(mysql_statement)
conn.close()
The permanent table column's schema that are being updated is shown below. The multiple unique key constraint is the combination of id and email. I also confirmed that there are duplicate keys in both tables
| Field | Type | Null | Key | Default | Extra |
+-------------------------------+--------------+------+-----+---------+-------+
|
| id | varchar(255) | NO | | NULL | |
| email | varchar(120) | NO | MUL | NULL | |
| mbid | varchar(120) | YES | | NULL | |
| artist | varchar(250) | YES | | NULL | |
| track | varchar(250) | YES | | NULL | |
| plays | float | YES | | NULL | |
| track_count | int(11) | YES | | NULL | |
| lastfm_last_update | datetime | YES | | NULL | |
+-------------------------------+--------------+------+-----+---------+-------+

Django join two tables while keeping ORM

Using Django, I am trying to fetch this specific result view from the database using Django:
select * from CO2_Low_Adj a JOIN CO2_Low_Metrics b on a.gene_id_B = b.gene_id where a.gene_id_A='Traes_1AL_00A8A2030'
I know I can do it using connections, cursor, fetchall and get back a list of dictionaries. However, I am wondering if there is a way to do this in Django while keeping the ORM.
The tables look like this:
class Co2LowMetrics(models.Model):
gene_id = models.CharField(primary_key=True, max_length=24)
modular_k = models.FloatField()
modular_k_rank = models.IntegerField()
modular_mean_exp_rank = models.IntegerField()
module = models.IntegerField()
k = models.FloatField()
k_rank = models.IntegerField()
mean_exp = models.FloatField()
mean_exp_rank = models.IntegerField()
gene_gene = models.ForeignKey(Co2LowGene, db_column='Gene_gene_id') # Field name made lowercase.
class Meta:
managed = False
db_table = 'CO2_Low_Metrics'
class Co2LowGene(models.Model):
gene_id = models.CharField(primary_key=True, max_length=24)
entry = models.IntegerField(unique=True)
gene_gene_id = models.CharField(db_column='Gene_gene_id', max_length=24) # Field name made lowercase.
class Meta:
managed = False
db_table = 'CO2_Low_Gene'
class Co2LowAdj(models.Model):
gene_id_a = models.CharField(db_column='gene_id_A', max_length=24) # Field name made lowercase.
edge_number = models.IntegerField(unique=True)
gene_id_b = models.CharField(db_column='gene_id_B', max_length=24) # Field name made lowercase.
value = models.FloatField()
gene_gene_id_a = models.ForeignKey('Co2LowGene', db_column='Gene_gene_id_A') # Field name made lowercase.
gene_gene_id_b = models.ForeignKey('Co2LowGene', db_column='Gene_gene_id_B') # Field name made lowercase.
class Meta:
managed = False
db_table = 'CO2_Low_Adj'
The database table descriptions are:
mysql> describe CO2_Low_Metrics;
+-----------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+-------------+------+-----+---------+-------+
| gene_id | varchar(24) | NO | PRI | NULL | |
| modular_k | double | NO | | NULL | |
| modular_k_rank | int(8) | NO | | NULL | |
| modular_mean_exp_rank | int(8) | NO | | NULL | |
| module | int(8) | NO | | NULL | |
| k | double | NO | | NULL | |
| k_rank | int(8) | NO | | NULL | |
| mean_exp | double | NO | | NULL | |
| mean_exp_rank | int(8) | NO | | NULL | |
| Gene_gene_id | varchar(24) | NO | MUL | NULL | |
+-----------------------+-------------+------+-----+---------+-------+
mysql> describe CO2_Low_Gene;
+--------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+----------------+
| gene_id | varchar(24) | NO | PRI | NULL | |
| entry | int(8) | NO | UNI | NULL | auto_increment |
| Gene_gene_id | varchar(24) | NO | | NULL | |
+--------------+-------------+------+-----+---------+----------------+
mysql> describe CO2_Low_Adj;
+----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+----------------+
| gene_id_A | varchar(24) | NO | MUL | NULL | |
| edge_number | int(9) | NO | PRI | NULL | auto_increment |
| gene_id_B | varchar(24) | NO | MUL | NULL | |
| value | double | NO | | NULL | |
| Gene_gene_id_A | varchar(24) | NO | MUL | NULL | |
| Gene_gene_id_B | varchar(24) | NO | MUL | NULL | |
+----------------+-------------+------+-----+---------+----------------+
Assume that I do not have the ability to change the underlying database schema. That may change and if a suggestion can help in making it easier to use Django's ORM then I can attempt to get it changed.
However, I have been trying to use prefetch_related and select_related but I'm doing something wrong and am not getting everything back right.
With my SQL query I get essentially with the described tables in order CO2_Low_Adj then CO2_Low_Metrics where gene_id_A is the same as gene_gene_id_A ('Traes_1AL_00A8A2030') and gene_id_B is the same as gene_gene_id_B. CO2_Low_Gene does not seem to be used at all with the SQL query.
Thanks.
Django does not have a way to perform JOIN queries without foreign keys. This is why prefetch_related and select_related will not work - they work on foreign keys.
I am not sure what are you trying to achieve. Since your gene_id is unique, there will be only one CO2_Low_Metrics instance and a list of adj instances:
adj = CO2_Low_Adj.objects.filter(gene_id_A='Traes_1AL_00A8A2030')
metrics = CO2_Low_Metrics.objects.get(pk='Traes_1AL_00A8A2030')
and then work on a separate list.

combine strings from mysql tables without extra spaces

I have been working on this django app. We pull a big set of tables from a California state agency, process the data and re-publish it. I have been trying to do something simple but the simple implementation is really slow and I may be thinking myself into a hole. Here is a bit of one of the tables. There are a lot of tables like this.
mysql> desc EXPN_CD;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| AGENT_NAMF | varchar(45) | NO | | NULL | |
| AGENT_NAML | varchar(200) | NO | | NULL | |
| AGENT_NAMS | varchar(10) | NO | | NULL | |
| AGENT_NAMT | varchar(10) | NO | | NULL | |
| AMEND_ID | int(11) | NO | MUL | NULL | |
| AMOUNT | decimal(14,2) | NO | | NULL | |
| BAKREF_TID | varchar(20) | NO | | NULL | |
| BAL_JURIS | varchar(40) | NO | | NULL | |
| BAL_NAME | varchar(200) | NO | | NULL | |
| BAL_NUM | varchar(7) | NO | | NULL | |
| CAND_NAMF | varchar(45) | NO | | NULL | |
| CAND_NAML | varchar(200) | NO | | NULL | |
| CAND_NAMS | varchar(10) | NO | | NULL | |
| CAND_NAMT | varchar(10) | NO | | NULL | |
| CMTE_ID | varchar(9) | NO | | NULL | |
| CUM_OTH | decimal(14,2) | YES | | NULL | |
| CUM_YTD | decimal(14,2) | YES | | NULL | |
| DIST_NO | varchar(3) | NO | | NULL | |
| ENTITY_CD | varchar(3) | NO | | NULL | |
| EXPN_CHKNO | varchar(20) | NO | | NULL | |
| EXPN_CODE | varchar(3) | NO | | NULL | |
| EXPN_DATE | date | YES | | NULL | |
| EXPN_DSCR | varchar(400) | NO | | NULL | |
| FILING_ID | int(11) | NO | MUL | NULL | |
...
I am going through all of these tables. I pull out each name, the "CAND" (candidate), "AGENT", and so on, and put each reference into a row:
mysql> desc calaccess_campaign_browser_name;
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ext_pk | int(11) | NO | MUL | NULL | |
| ext_table | varchar(255) | NO | | NULL | |
| ext_prefix | varchar(255) | NO | | NULL | |
| naml | varchar(255) | YES | | NULL | |
| namf | varchar(255) | YES | | NULL | |
| nams | varchar(255) | YES | | NULL | |
| namt | varchar(255) | YES | | NULL | |
| name | varchar(1023) | YES | | NULL | |
+-------------+---------------+------+-----+---------+----------------+
The values are never null but many, sometimes the vast majority, are empty strings.
I am building the name column. The obvious way to do this is:
concat(namt, ' ', namf, ' ', naml, ' ', nams)
But when 2 or 3 of those are blank that gives me a lot of double-spaces and space padding at the beginning or end of the string.
Things I have done:
1) use python regex's to find and remove the extra spaces. This works if I have a month or so for it to run.
2) put the name together as above and use SQL to find and replace the extra spaces. Again, takes a really long time.
One of the problems is that the MySQL library for python has a cursor especially set up for dealing with large result sets. There is nothing similar for large query operations. Or perhaps I am looking at this wrong.
% pip freeze
...
MySQL-python==1.2.5c
...
3) Pull the names out into a tab-separated text file and do the fixing there, and then load the file into to the new table. Blech. Lots of dumb scripting. Use sed or awk? What?
4) I can do the concat() operations in 15 different queries and I do the proper concat for each so that I do not have extra spaces in the name. I have:
namt = null and namf = null and naml = null and nams != null (case 0001)
namt = null and namf = null and naml != null and nams = null (case 0010)
namt = null and namf = null and naml != null and nams != null (case 0011)
etc, etc.
This is actually what I went with. It takes less than a day to run. Woohoo!
But I am doing similar things for other reasons too and how the heck many times do I want to write this kind of code? Ick!
There must be a smarter way to do this that I am not seeing. I am doing this in about 2 dozen tables, with 2 - 5 names in each table, with sometimes around 15,000 rows and sometimes 20,000,000 rows. Most tables are in the 300,000 to 750,000 range. And, jeez, am I tired....
In MySQL, I think you are looking for concat_ws():
concat_ws(' ', nullif(namt, ''), nullif(namf, ''), nullif(naml, ''), nullif(nams, ''))
The nullif() turns the value to NULL if it is blank. concat_ws() ignores NULL values, so you won't get duplicated spaces.

How to insert hex /binary ? into a mysql database?

i am trying to insert some hex values into a field of a mysql db
this is the kind of value i need to insert :
['D\x93\xb4s\xa5\x9eM\\\x14\xf3*\x95\xf9\x83\x1d*%P\xdb\xa2', 'D\xbf\xef\xb0\xc8\xff\x17\xc6Y6\xc6\xb4,p\xaa\xb1\xf2V\xdaa', 'D\xd7~~\x02\xd3|}\xfcN\xc1\x03\x97\x07\xb5<U\x16Y\x9e', '\xf3\xb6\xc2,Y/[i\x98\x93\x9d\xb2R\x93\x84\x12W\x1a3\x19', '\xf3\xb7\xce\x1f-n\x89\xb6\x87K\x9dsf\xcb=w\xab\x1a\xa0<', '\xf3\xbf7\x04d\xe6\xdf\xf8"9\x1d\x05\x01\xe4\xd4\xb0\xad\x80\xc0\xf5']
this is my table
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| consensus | char(40) | NO | | NULL | |
| identityb32 | char(40) | NO | | NULL | |
| pubdate | char(40) | NO | | NULL | |
| dirport | char(6) | NO | | NULL | |
| ip | char(40) | NO | | NULL | |
| orport | char(40) | NO | | NULL | |
| identityhash | char(40) | NO | | NULL | |
| nick | char(40) | NO | | NULL | |
| version | char(40) | NO | | NULL | |
| flags | varchar(500) | NO | | NULL | |
| identity | char(40) | NO | | NULL | |
| digest | char(40) | NO | | NULL | |
| pubtime | char(40) | NO | | NULL | |
+--------------+--------------+------+-----+---------+-------+
13 rows in set (0.00 sec)
currently i am adding the hex data as i would do a normal string but this results in a non readable input being added:
D??s??M?*???*%P?
how can the hex data be added?
Check CHARSET
CREATE TABLE `t` (
`id` varchar(32) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Maybe the way really is HEX() and UNHEX() functions. But, this post maybe help Inserting hex value mysql
Those aren't hex values.
I may be way off on this, but the only way I found to insert your values, was like this:
$string = 'D\x93\xb4s\xa5\x9eM\\\x14\xf3*\x95\xf9\x83\x1d*%P\xdb\xa2';
$pattern = '#\\\#';
$replacement = '\\\\\\';
$insert_value = preg_replace($pattern, $replacement, $string);
After which proceed to insert. You could use VARCHAR(256) for the column.
I'm positive there are a lot of better ways to go about this.
Apparently I'm on the python part of SO. I thought it was PHP. This is what I get for subscribing to multiple tags.

Categories