combine strings from mysql tables without extra spaces - python

I have been working on this django app. We pull a big set of tables from a California state agency, process the data and re-publish it. I have been trying to do something simple but the simple implementation is really slow and I may be thinking myself into a hole. Here is a bit of one of the tables. There are a lot of tables like this.
mysql> desc EXPN_CD;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| AGENT_NAMF | varchar(45) | NO | | NULL | |
| AGENT_NAML | varchar(200) | NO | | NULL | |
| AGENT_NAMS | varchar(10) | NO | | NULL | |
| AGENT_NAMT | varchar(10) | NO | | NULL | |
| AMEND_ID | int(11) | NO | MUL | NULL | |
| AMOUNT | decimal(14,2) | NO | | NULL | |
| BAKREF_TID | varchar(20) | NO | | NULL | |
| BAL_JURIS | varchar(40) | NO | | NULL | |
| BAL_NAME | varchar(200) | NO | | NULL | |
| BAL_NUM | varchar(7) | NO | | NULL | |
| CAND_NAMF | varchar(45) | NO | | NULL | |
| CAND_NAML | varchar(200) | NO | | NULL | |
| CAND_NAMS | varchar(10) | NO | | NULL | |
| CAND_NAMT | varchar(10) | NO | | NULL | |
| CMTE_ID | varchar(9) | NO | | NULL | |
| CUM_OTH | decimal(14,2) | YES | | NULL | |
| CUM_YTD | decimal(14,2) | YES | | NULL | |
| DIST_NO | varchar(3) | NO | | NULL | |
| ENTITY_CD | varchar(3) | NO | | NULL | |
| EXPN_CHKNO | varchar(20) | NO | | NULL | |
| EXPN_CODE | varchar(3) | NO | | NULL | |
| EXPN_DATE | date | YES | | NULL | |
| EXPN_DSCR | varchar(400) | NO | | NULL | |
| FILING_ID | int(11) | NO | MUL | NULL | |
...
I am going through all of these tables. I pull out each name, the "CAND" (candidate), "AGENT", and so on, and put each reference into a row:
mysql> desc calaccess_campaign_browser_name;
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ext_pk | int(11) | NO | MUL | NULL | |
| ext_table | varchar(255) | NO | | NULL | |
| ext_prefix | varchar(255) | NO | | NULL | |
| naml | varchar(255) | YES | | NULL | |
| namf | varchar(255) | YES | | NULL | |
| nams | varchar(255) | YES | | NULL | |
| namt | varchar(255) | YES | | NULL | |
| name | varchar(1023) | YES | | NULL | |
+-------------+---------------+------+-----+---------+----------------+
The values are never null but many, sometimes the vast majority, are empty strings.
I am building the name column. The obvious way to do this is:
concat(namt, ' ', namf, ' ', naml, ' ', nams)
But when 2 or 3 of those are blank that gives me a lot of double-spaces and space padding at the beginning or end of the string.
Things I have done:
1) use python regex's to find and remove the extra spaces. This works if I have a month or so for it to run.
2) put the name together as above and use SQL to find and replace the extra spaces. Again, takes a really long time.
One of the problems is that the MySQL library for python has a cursor especially set up for dealing with large result sets. There is nothing similar for large query operations. Or perhaps I am looking at this wrong.
% pip freeze
...
MySQL-python==1.2.5c
...
3) Pull the names out into a tab-separated text file and do the fixing there, and then load the file into to the new table. Blech. Lots of dumb scripting. Use sed or awk? What?
4) I can do the concat() operations in 15 different queries and I do the proper concat for each so that I do not have extra spaces in the name. I have:
namt = null and namf = null and naml = null and nams != null (case 0001)
namt = null and namf = null and naml != null and nams = null (case 0010)
namt = null and namf = null and naml != null and nams != null (case 0011)
etc, etc.
This is actually what I went with. It takes less than a day to run. Woohoo!
But I am doing similar things for other reasons too and how the heck many times do I want to write this kind of code? Ick!
There must be a smarter way to do this that I am not seeing. I am doing this in about 2 dozen tables, with 2 - 5 names in each table, with sometimes around 15,000 rows and sometimes 20,000,000 rows. Most tables are in the 300,000 to 750,000 range. And, jeez, am I tired....

In MySQL, I think you are looking for concat_ws():
concat_ws(' ', nullif(namt, ''), nullif(namf, ''), nullif(naml, ''), nullif(nams, ''))
The nullif() turns the value to NULL if it is blank. concat_ws() ignores NULL values, so you won't get duplicated spaces.

Related

Storing Imagehash in mysql database

I am trying to save hash_value in mysql database using python. I have obtained hash value hash = imagehash.dhash(Image.open('temp_face.jpg')) but after the execution of insert query cursor.execute("INSERT INTO image(hash,name,photo) VALUES(%d,%s,%s )", (hash,name, binary_image))it gives me error "Python 'imagehash' cannot be converted to a MySQL type".
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+-------------------+-------------------+
| hash | binary(32) | NO | PRI | NULL | |
| name | varchar(25) | NO | | NULL | |
| photo | blob | NO | | NULL | |
| arrival | datetime | NO | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
+---------+-------------+------+-----+-------------------+-------------------+
So what can be done to store the value or is there any other way to do the same task?

Flask SqlAlchemy MySql Boolean Type Always Returns True

I have a Flask application connected to a MySql DB using SqlAlchemy. The table has 3 x boolean (bit) fields as shown below:
+------------------------+---------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default |
Extra |
+------------------------+---------------+------+-----+-------------------+----------------+
| ID | int(11) | NO | PRI | NULL |
auto_increment |
| clientID | int(11) | YES | | NULL |
|
| accountType | varchar(2) | YES | | NULL |
|
| systemType | varchar(1) | YES | | NULL |
|
| clientName | varchar(400) | YES | | NULL |
|
| clientURL | varchar(5000) | YES | | NULL |
|
| clientTelephone | varchar(300) | YES | | NULL |
|
| clientAddressLine1 | varchar(500) | YES | | NULL |
|
| clientAddressLine2 | varchar(500) | YES | | NULL |
|
| clientAddressLine3 | varchar(500) | YES | | NULL |
|
| clientPostcode | varchar(50) | YES | | NULL |
|
| clientCountry | varchar(100) | YES | | NULL |
|
| accessBenchmarking | bit(1) | YES | | NULL |
|
| accessTechnicalSupport | bit(1) | YES | | NULL |
|
| accountLive | bit(1) | YES | | NULL |
|
| clientTown | varchar(100) | YES | | NULL |
|
| clientCounty | varchar(100) | YES | | NULL |
|
| dateTimeStamp | timestamp | YES | | CURRENT_TIMESTAMP |
|
+------------------------+---------------+------+-----+-------------------+----------------+
Each of the bit fields has a value set to 0.
The SqlAlchemy Model for this is:
class ClientAccounts(db.Model):
id = db.Column(db.Integer, primary_key=True)
clientID = db.Column(db.Integer)
accountType = db.Column(db.Text(2))
systemType = db.Column(db.Text(1))
clientName = db.Column(db.Text(400))
clientURL = db.Column(db.Text(5000))
clientTelephone = db.Column(db.Text(300))
clientAddressLine1 = db.Column(db.Text(500))
clientAddressLine2 = db.Column(db.Text(500))
clientAddressLine3 = db.Column(db.Text(500))
clientPostcode = db.Column(db.Text(50))
clientCountry = db.Column(db.Text(100))
accessBenchmarking = db.Column(db.Boolean)
accessTechnicalSupport = db.Column(db.Boolean)
accountLive = db.Column(db.Boolean)
clientTown = db.Column(db.Text(100))
clientCounty = db.Column(db.Text(100))
The code to retrieve the values is here:
#check for valid and live user account
CheckAccount = ClientAccounts.query.filter_by(
clientID=accountNo,
).first()
if not CheckAccount is None:
accessBenchmarking = CheckAccount.accessBenchmarking
accessTechnicalSupport = CheckAccount.accessTechnicalSupport
accountLive = CheckAccount.accountLive
print 'db return ...'
print accessBenchmarking
print accessTechnicalSupport
print accountLive
The values are always returned as True even though they are set to False in the DB. The returned vales can be seen here:
INFO:sqlalchemy.engine.base.Engine:('11111111', 1)
db return ...
True
True
True
Does anybody have any idea what's causing this?
I figured out a fix for this. Changing the field data type from bit to tinyint for each boolean field did the trick. I'm still none the wiser as to why bit doesn't work with SqlAlchemy. Maybe it's the version of MySql Python I'm using?
For those who come across this thread without finding the solid solution for this:
I fixed this issue by changing the MYSQL connector to mysql-connector from pymysql.
pip3 install mysql-connector
'mysql+mysqlconnector://username:password!!#127.0.0.1:3306/'
I was lost for a long time, making this work. Didn't know the connector would be the issue.

How to insert hex /binary ? into a mysql database?

i am trying to insert some hex values into a field of a mysql db
this is the kind of value i need to insert :
['D\x93\xb4s\xa5\x9eM\\\x14\xf3*\x95\xf9\x83\x1d*%P\xdb\xa2', 'D\xbf\xef\xb0\xc8\xff\x17\xc6Y6\xc6\xb4,p\xaa\xb1\xf2V\xdaa', 'D\xd7~~\x02\xd3|}\xfcN\xc1\x03\x97\x07\xb5<U\x16Y\x9e', '\xf3\xb6\xc2,Y/[i\x98\x93\x9d\xb2R\x93\x84\x12W\x1a3\x19', '\xf3\xb7\xce\x1f-n\x89\xb6\x87K\x9dsf\xcb=w\xab\x1a\xa0<', '\xf3\xbf7\x04d\xe6\xdf\xf8"9\x1d\x05\x01\xe4\xd4\xb0\xad\x80\xc0\xf5']
this is my table
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| consensus | char(40) | NO | | NULL | |
| identityb32 | char(40) | NO | | NULL | |
| pubdate | char(40) | NO | | NULL | |
| dirport | char(6) | NO | | NULL | |
| ip | char(40) | NO | | NULL | |
| orport | char(40) | NO | | NULL | |
| identityhash | char(40) | NO | | NULL | |
| nick | char(40) | NO | | NULL | |
| version | char(40) | NO | | NULL | |
| flags | varchar(500) | NO | | NULL | |
| identity | char(40) | NO | | NULL | |
| digest | char(40) | NO | | NULL | |
| pubtime | char(40) | NO | | NULL | |
+--------------+--------------+------+-----+---------+-------+
13 rows in set (0.00 sec)
currently i am adding the hex data as i would do a normal string but this results in a non readable input being added:
D??s??M?*???*%P?
how can the hex data be added?
Check CHARSET
CREATE TABLE `t` (
`id` varchar(32) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Maybe the way really is HEX() and UNHEX() functions. But, this post maybe help Inserting hex value mysql
Those aren't hex values.
I may be way off on this, but the only way I found to insert your values, was like this:
$string = 'D\x93\xb4s\xa5\x9eM\\\x14\xf3*\x95\xf9\x83\x1d*%P\xdb\xa2';
$pattern = '#\\\#';
$replacement = '\\\\\\';
$insert_value = preg_replace($pattern, $replacement, $string);
After which proceed to insert. You could use VARCHAR(256) for the column.
I'm positive there are a lot of better ways to go about this.
Apparently I'm on the python part of SO. I thought it was PHP. This is what I get for subscribing to multiple tags.

Why django order_by is so slow in a manytomany query?

I have a ManyToMany field. Like this:
class Tag(models.Model):
books = models.ManyToManyField ('book.Book', related_name='vtags', through=TagBook)
class Book (models.Model):
nump = models.IntegerField (default=0, db_index=True)
I have around 450,000 books, and for some tags, it related around 60,000 books. When I did a query like:
tag.books.order_by('nump')[1:11]
It gets extremely slow, like 3-4 minutes.
But if I remove order_by, it run queries as normal.
The raw sql for the order_by version looks like this:
'SELECT `book_book`.`id`, ... `book_book`.`price`, `book_book`.`nump`,
FROM `book_book` INNER JOIN `book_tagbook` ON (`book_book`.`id` =
`book_tagbook`.`book_id`) WHERE `book_tagbook`.`tag_id` = 1 ORDER BY
`book_book`.`nump` ASC LIMIT 11 OFFSET 1'
Do you have any idea on this? How could I fix it? Thanks.
---EDIT---
Checked the previous raw query in mysql as #bouke suggested:
SELECT `book_book`.`id`, `book_book`.`title`, ... `book_book`.`nump`,
`book_book`.`raw_data` FROM `book_book` INNER JOIN `book_tagbook` ON
(`book_book`.`id` = `book_tagbook`.`book_id`) WHERE `book_tagbook`.`tag_id` = 1
ORDER BY `book_book`.`nump` ASC LIMIT 11 OFFSET 1;
11 rows in set (4 min 2.79 sec)
Then use explain to find out why:
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
| 1 | SIMPLE | book_tagbook | ref | book_tagbook_3747b463,book_tagbook_752eb95b | book_tagbook_3747b463 | 4 | const | 116394 | Using temporary; Using filesort |
| 1 | SIMPLE | book_book | eq_ref | PRIMARY | PRIMARY | 4 | legend.book_tagbook.book_id | 1 | |
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
2 rows in set (0.10 sec)
And for the table book_book:
mysql> explain book_book;
+----------------+----------------+------+-----+-----------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------------+------+-----+-----------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(200) | YES | | NULL | |
| href | varchar(200) | NO | UNI | NULL | |
..... skip some part.............
| nump | int(11) | NO | MUL | 0 | |
| raw_data | varchar(10000) | YES | | NULL | |
+----------------+----------------+------+-----+-----------+----------------+
24 rows in set (0.00 sec)

i am tried for readin a csv file using python and wish to store values into MySQL,but now i am meeting some errors

#!/usr/bin/python
import MySQLdb
import sys
import csv
csvfile='/home/arun/Documents/smartearn/Linux/SA/xbrlparser/UGT-2012-Concepts.csv'
def csvparsing():
spamReader = csv.reader(open(csvfile, 'rb'), delimiter=',', quotechar='"')
header=spamReader.next()
conn = MySQLdb.connect (host = "localhost",
user = "arunkamaly",
db = "se")
cursor=conn.cursor()
for clist in spamReader:
name=clist[1]+clist[2]
cursor.execute("""
INSERT INTO se_xbrl_concepts( xbrl_id,xbrl_name,xbrl_type,Xbrl_enumerations,Xbrl_substitutiongroup,Xbrl_balance, Xbrl_periodtype, Xbrl_IsAbstract,xbrl_label,Xbrl_desc)
values(clist[0],%s,%s,%s,%s,%s,%s,%s,clist[9],%s)""")(name,clist[3],clist[4],clist[5],clist[6],clist[7],clist[8],clist[10])
# print name
conn.commit()
cursor.close()
if(__name__=='__main__'):
csvparsing()
the mysql table is
Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+---------+----------------+
| xbrl_id | bigint(20) | NO | PRI | NULL | auto_increment |
| xbrl_name | varchar(255) | NO | | NULL | |
| xbrl_type | varchar(10) | NO | | NULL | |
| Xbrl_enumerations | varchar(25) | NO | | NULL | |
| Xbrl_substitutiongroup | varchar(255) | YES | | NULL | |
| Xbrl_balance | varchar(50) | YES | | NULL | |
| Xbrl_periodtype | varchar(25) | YES | | NULL | |
| Xbrl_IsAbstract | bit(1) | YES | | NULL | |
| xbrl_label | varchar(255) | YES | | NULL | |
| Xbrl_desc | varchar(45) | YES | | NULL | |
+------------------------+--------------+-----
Without the error trace, it's hard to say if this is the only error, but this is certainly one error:
cursor.execute("""
INSERT INTO se_xbrl_concepts( xbrl_id,xbrl_name,xbrl_type,Xbrl_enumerations,Xbrl_substitutiongroup,Xbrl_balance, Xbrl_periodtype, Xbrl_IsAbstract,xbrl_label,Xbrl_desc)
values(clist[0],%s,%s,%s,%s,%s,%s,%s,clist[9],%s)""")(name,clist[3],clist[4],clist[5],clist[6],clist[7],clist[8],clist[10])
You need a comma to separate your argument list from the string, and you need to move the parenthesis to the outside of that pair:
cursor.execute("""
INSERT INTO se_xbrl_concepts( xbrl_id,xbrl_name,xbrl_type,Xbrl_enumerations,Xbrl_substitutiongroup,Xbrl_balance, Xbrl_periodtype, Xbrl_IsAbstract,xbrl_label,Xbrl_desc)
values(clist[0],%s,%s,%s,%s,%s,%s,%s,clist[9],%s)""",(name,clist[3],clist[4],clist[5],clist[6],clist[7],clist[8],clist[10]))

Categories