postgresql: out of shared memory? - python

I'm running a bunch of queries using Python and psycopg2. I create one large temporary table w/ about 2 million rows, then I get 1000 rows at a time from it by using cur.fetchmany(1000) and run more extensive queries involving those rows. The extensive queries are self-sufficient, though - once they are done, I don't need their results anymore when I move on to the next 1000.
However, about 1000000 rows in, I got an exception from psycopg2:
psycopg2.OperationalError: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Funnily enough, this happened when I was executing a query to drop some temporary tables that the more extensive queries created.
Why might this happen? Is there any way to avoid it? It was annoying that this happened halfway through, meaning I have to run it all again. What might max_locks_per_transaction have to do with anything?
NOTE: I'm not doing any .commit()s, but I'm deleting all the temporary tables I create, and I'm only touching the same 5 tables anyway for each "extensive" transaction, so I don't see how running out of table locks could be the problem...

when you create a table, you get an exclusive lock on it that lasts to the end of the transaction. Even if you then go ahead and drop it.
So if I start a tx and create a temp table:
steve#steve#[local] *=# create temp table foo(foo_id int);
CREATE TABLE
steve#steve#[local] *=# select * from pg_locks where pid = pg_backend_pid();
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted
---------------+----------+-----------+------+-------+------------+---------------+---------+-----------+----------+--------------------+-------+---------------------+---------
virtualxid | | | | | 2/105315 | | | | | 2/105315 | 19098 | ExclusiveLock | t
transactionid | | | | | | 291788 | | | | 2/105315 | 19098 | ExclusiveLock | t
relation | 17631 | 10985 | | | | | | | | 2/105315 | 19098 | AccessShareLock | t
relation | 17631 | 214780901 | | | | | | | | 2/105315 | 19098 | AccessExclusiveLock | t
object | 17631 | | | | | | 2615 | 124616403 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
object | 0 | | | | | | 1260 | 16384 | 0 | 2/105315 | 19098 | AccessShareLock | t
(6 rows)
These 'relation' locks aren't dropped when I drop the table:
steve#steve#[local] *=# drop table foo;
DROP TABLE
steve#steve#[local] *=# select * from pg_locks where pid = pg_backend_pid();
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted
---------------+----------+-----------+------+-------+------------+---------------+---------+-----------+----------+--------------------+-------+---------------------+---------
virtualxid | | | | | 2/105315 | | | | | 2/105315 | 19098 | ExclusiveLock | t
object | 17631 | | | | | | 1247 | 214780902 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
transactionid | | | | | | 291788 | | | | 2/105315 | 19098 | ExclusiveLock | t
relation | 17631 | 10985 | | | | | | | | 2/105315 | 19098 | AccessShareLock | t
relation | 17631 | 214780901 | | | | | | | | 2/105315 | 19098 | AccessExclusiveLock | t
object | 17631 | | | | | | 2615 | 124616403 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
object | 17631 | | | | | | 1247 | 214780903 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
object | 0 | | | | | | 1260 | 16384 | 0 | 2/105315 | 19098 | AccessShareLock | t
(8 rows)
In fact, it added two more locks... It seems if I continually create/drop that temp table, it adds 3 locks each time.
So I guess one answer is that you will need enough locks to cope with all these tables being added/dropped throughout the transaction. Alternatively, you could try to reuse the temp tables between queries, simply truncate them to remove all the temp data?

Did you create multiple savepoints with the same name without releasing them?
I followed these instructions, repeatedly executing
SAVEPOINT savepoint_name but without ever executing any corresponding RELEASE SAVEPOINT savepoint_name statements. PostgreSQL was just masking the old savepoints, never freeing them. It kept track of each until it ran out of memory for locks. I think my postgresql memory limits were much lower, it only took ~10,000 savepoints for me to hit max_locks_per_transaction.

Well, are you running the entire create + queries inside a single transaction? This would perhaps explain the issue. Just because it happened when you were dropping tables would not necessarily mean anything, that may just happen to be the point when it ran out of free locks.
Using a view might be an alternative to a temporary table and would definitely by my first pick if you're creating this thing and then immediately removing it.

Related

Storing Imagehash in mysql database

I am trying to save hash_value in mysql database using python. I have obtained hash value hash = imagehash.dhash(Image.open('temp_face.jpg')) but after the execution of insert query cursor.execute("INSERT INTO image(hash,name,photo) VALUES(%d,%s,%s )", (hash,name, binary_image))it gives me error "Python 'imagehash' cannot be converted to a MySQL type".
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+-------------------+-------------------+
| hash | binary(32) | NO | PRI | NULL | |
| name | varchar(25) | NO | | NULL | |
| photo | blob | NO | | NULL | |
| arrival | datetime | NO | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
+---------+-------------+------+-----+-------------------+-------------------+
So what can be done to store the value or is there any other way to do the same task?

pivot multiple views into single result table/view

I have 2 views as below:
experiments:
select * from experiments;
+--------+--------------------+-----------------+
| exp_id | exp_properties | value |
+--------+--------------------+-----------------+
| 1 | indicator:chemical | phenolphthalein |
| 1 | base | NaOH |
| 1 | acid | HCl |
| 1 | exp_type | titration |
| 1 | indicator:color | faint_pink |
+--------+--------------------+-----------------+
calculations:
select * from calculations;
+--------+------------------------+--------------+
| exp_id | exp_report | value |
+--------+------------------------+--------------+
| 1 | molarity:base | 0.500000000 |
| 1 | volume:acid:in_ML | 23.120000000 |
| 1 | volume:base:in_ML | 5.430000000 |
| 1 | moles:H | 0.012500000 |
| 1 | moles:OH | 0.012500000 |
| 1 | molarity:acid | 0.250000000 |
+--------+------------------------+--------------+
I managed to pivot each of these views individually as below:
experiments_pivot:
+-------+--------------------+------+------+-----------+----------------+
|exp_id | indicator:chemical | base | acid | exp_type | indicator:color|
+-------+--------------------+------+------+-----------+----------------+
| 1 | phenolphthalein | NaOH | HCl | titration | faint_pink |
+------+---------------------+------+------+-----------+----------------+
calculations_pivot:
+-------+---------------+---------------+--------------+-------------+------------------+-------------------+
|exp_id | molarity:base | molarity:acid | moles:H | moles:OH | volume:acid:in_ML| volume:base:in_ML |
+-------+---------------+---------------+--------------+-------------+------------------+-------------------+
| 1 | 0.500000000 | 0.250000000 | 0.012500000 | 0.012500000 | 23.120000000 | 5.430000000 |
+------+---------------------+------+------+-----------+----------------------------------------------------+
My question is how to get these two pivot results as a single row? Desired result is as below:
+-------+--------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
|exp_id | indicator:chemical | base | acid | exp_type | indicator:color|molarity:base | molarity:acid | moles:H | moles:OH | volume:acid:in_ML| volume:base:in_ML |
+-------+--------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
| 1 | phenolphthalein | NaOH | HCl | titration | faint_pink | 0.500000000 | 0.250000000 | 0.012500000 | 0.012500000 | 23.120000000 | 5.430000000 |
+------+---------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
Database Used: Mysql
Important Note: Each of these views can have increasing number of rows. Hence I considered "dynamic pivoting" for each of the view individually.
For reference -- Below is a prepared statement I used to pivot experiments in MySQL(and a similar statement to pivot the other view as well):
set #sql = Null;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'MAX(IF(exp_properties = ''',
exp_properties,
''', value, NULL)) AS ',
concat("`",exp_properties, "`")
)
)into #sql
from experiments;
set #sql = concat(
'select exp_id, ',
#sql,
' from experiment group by exp_id'
);
prepare stmt from #sql;
execute stmt;

Python - SQLAlchemy getting 'Table' object is not callable error

I have defined an existing DB Table in my python script and whenever I tried to insert a row to db table, I receive an error message stating the "Table object is not callable"
Below you can find the code and error message I receive. Any support will be appreciated:
engine = create_engine('postgresql://user:pwd#localhost:5432/dbname',
client_encoding='utf8')
metadata = MetaData()
MyTable = Table('target_table', metadata, autoload=True, autoload_with=engine)
Session = sessionmaker()
Session.configure(bind=engine)
session = Session()
:
:
:
def recod_to_db(db_hash):
db_instance = MyTable(**db_hash)
session.add(db_instance)
session.commit()
return
Error Message:
File "myprog.py", line 319, in recod_to_db
db_instance = MyTable(**db_hash)
TypeError: 'Table' object is not callable
This is how the table looks like
Table "public.target_table"
Column | Type | Modifiers | Storage | Stats target | Description
-------------------+-----------------------------+--------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('target_table_id_seq'::regclass) | plain | |
carid | integer | | plain | |
triplecode | character varying | | extended | |
lookup | integer | | plain | |
type | character varying | | extended | |
make | character varying | | extended | |
series | character varying | | extended | |
model | character varying | | extended | |
year | integer | | plain | |
fuel | character varying | | extended | |
transmission | character varying | | extended | |
mileage | integer | | plain | |
hp | integer | | plain | |
color | character varying | | extended | |
door | integer | | plain | |
location | character varying | | extended | |
url | character varying | | extended | |
register_date | date | | plain | |
auction_end_time | timestamp without time zone | | plain | |
body_damage | integer | | plain | |
mechanical_damage | integer | | plain | |
target_buy | integer | | plain | |
price | integer | | plain | |
currency | character varying | | extended | |
auctionid | integer | | plain | |
seller | character varying | | extended | |
auction_type | character varying | | extended | |
created_at | timestamp without time zone | not null | plain | |
updated_at | timestamp without time zone | not null | plain | |
estimated_value | integer | | plain | |
Indexes:
"target_table_pkey" PRIMARY KEY, btree (id)
Another way of inserting without auto_map is using the table's method for insert. Documentation is here
insert(dml, values=None, inline=False, **kwargs)
Generate an insert() construct against this TableClause.
E.g.:
table.insert().values(name='foo')
In code it would look like this:
def record_to_db(MyTable):
insert_stmnt = MyTable.insert().values(column_name=value_you_want_to_insert)
session.execute(insert_stmnt)
session.commit()
return
Ideally, you'd have your table defined in a separate folder other than in your app.py. You can also have utils functions that yields the session and then commits or catches an exception and that a rollback on it. Something like this:
def get_db_session_scope(sql_db_session):
session = sql_db_session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
Then your function would look like this:
def record_to_db(MyTable):
with get_db_session_scope(db) as db_session:
insert_stmnt =
MyTable.insert().values(column_name=value_you_want_to_insert)
session.execute(insert_stmnt)
return
You can get db from your app.py through
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy(app)

combine strings from mysql tables without extra spaces

I have been working on this django app. We pull a big set of tables from a California state agency, process the data and re-publish it. I have been trying to do something simple but the simple implementation is really slow and I may be thinking myself into a hole. Here is a bit of one of the tables. There are a lot of tables like this.
mysql> desc EXPN_CD;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| AGENT_NAMF | varchar(45) | NO | | NULL | |
| AGENT_NAML | varchar(200) | NO | | NULL | |
| AGENT_NAMS | varchar(10) | NO | | NULL | |
| AGENT_NAMT | varchar(10) | NO | | NULL | |
| AMEND_ID | int(11) | NO | MUL | NULL | |
| AMOUNT | decimal(14,2) | NO | | NULL | |
| BAKREF_TID | varchar(20) | NO | | NULL | |
| BAL_JURIS | varchar(40) | NO | | NULL | |
| BAL_NAME | varchar(200) | NO | | NULL | |
| BAL_NUM | varchar(7) | NO | | NULL | |
| CAND_NAMF | varchar(45) | NO | | NULL | |
| CAND_NAML | varchar(200) | NO | | NULL | |
| CAND_NAMS | varchar(10) | NO | | NULL | |
| CAND_NAMT | varchar(10) | NO | | NULL | |
| CMTE_ID | varchar(9) | NO | | NULL | |
| CUM_OTH | decimal(14,2) | YES | | NULL | |
| CUM_YTD | decimal(14,2) | YES | | NULL | |
| DIST_NO | varchar(3) | NO | | NULL | |
| ENTITY_CD | varchar(3) | NO | | NULL | |
| EXPN_CHKNO | varchar(20) | NO | | NULL | |
| EXPN_CODE | varchar(3) | NO | | NULL | |
| EXPN_DATE | date | YES | | NULL | |
| EXPN_DSCR | varchar(400) | NO | | NULL | |
| FILING_ID | int(11) | NO | MUL | NULL | |
...
I am going through all of these tables. I pull out each name, the "CAND" (candidate), "AGENT", and so on, and put each reference into a row:
mysql> desc calaccess_campaign_browser_name;
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ext_pk | int(11) | NO | MUL | NULL | |
| ext_table | varchar(255) | NO | | NULL | |
| ext_prefix | varchar(255) | NO | | NULL | |
| naml | varchar(255) | YES | | NULL | |
| namf | varchar(255) | YES | | NULL | |
| nams | varchar(255) | YES | | NULL | |
| namt | varchar(255) | YES | | NULL | |
| name | varchar(1023) | YES | | NULL | |
+-------------+---------------+------+-----+---------+----------------+
The values are never null but many, sometimes the vast majority, are empty strings.
I am building the name column. The obvious way to do this is:
concat(namt, ' ', namf, ' ', naml, ' ', nams)
But when 2 or 3 of those are blank that gives me a lot of double-spaces and space padding at the beginning or end of the string.
Things I have done:
1) use python regex's to find and remove the extra spaces. This works if I have a month or so for it to run.
2) put the name together as above and use SQL to find and replace the extra spaces. Again, takes a really long time.
One of the problems is that the MySQL library for python has a cursor especially set up for dealing with large result sets. There is nothing similar for large query operations. Or perhaps I am looking at this wrong.
% pip freeze
...
MySQL-python==1.2.5c
...
3) Pull the names out into a tab-separated text file and do the fixing there, and then load the file into to the new table. Blech. Lots of dumb scripting. Use sed or awk? What?
4) I can do the concat() operations in 15 different queries and I do the proper concat for each so that I do not have extra spaces in the name. I have:
namt = null and namf = null and naml = null and nams != null (case 0001)
namt = null and namf = null and naml != null and nams = null (case 0010)
namt = null and namf = null and naml != null and nams != null (case 0011)
etc, etc.
This is actually what I went with. It takes less than a day to run. Woohoo!
But I am doing similar things for other reasons too and how the heck many times do I want to write this kind of code? Ick!
There must be a smarter way to do this that I am not seeing. I am doing this in about 2 dozen tables, with 2 - 5 names in each table, with sometimes around 15,000 rows and sometimes 20,000,000 rows. Most tables are in the 300,000 to 750,000 range. And, jeez, am I tired....
In MySQL, I think you are looking for concat_ws():
concat_ws(' ', nullif(namt, ''), nullif(namf, ''), nullif(naml, ''), nullif(nams, ''))
The nullif() turns the value to NULL if it is blank. concat_ws() ignores NULL values, so you won't get duplicated spaces.

Why django order_by is so slow in a manytomany query?

I have a ManyToMany field. Like this:
class Tag(models.Model):
books = models.ManyToManyField ('book.Book', related_name='vtags', through=TagBook)
class Book (models.Model):
nump = models.IntegerField (default=0, db_index=True)
I have around 450,000 books, and for some tags, it related around 60,000 books. When I did a query like:
tag.books.order_by('nump')[1:11]
It gets extremely slow, like 3-4 minutes.
But if I remove order_by, it run queries as normal.
The raw sql for the order_by version looks like this:
'SELECT `book_book`.`id`, ... `book_book`.`price`, `book_book`.`nump`,
FROM `book_book` INNER JOIN `book_tagbook` ON (`book_book`.`id` =
`book_tagbook`.`book_id`) WHERE `book_tagbook`.`tag_id` = 1 ORDER BY
`book_book`.`nump` ASC LIMIT 11 OFFSET 1'
Do you have any idea on this? How could I fix it? Thanks.
---EDIT---
Checked the previous raw query in mysql as #bouke suggested:
SELECT `book_book`.`id`, `book_book`.`title`, ... `book_book`.`nump`,
`book_book`.`raw_data` FROM `book_book` INNER JOIN `book_tagbook` ON
(`book_book`.`id` = `book_tagbook`.`book_id`) WHERE `book_tagbook`.`tag_id` = 1
ORDER BY `book_book`.`nump` ASC LIMIT 11 OFFSET 1;
11 rows in set (4 min 2.79 sec)
Then use explain to find out why:
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
| 1 | SIMPLE | book_tagbook | ref | book_tagbook_3747b463,book_tagbook_752eb95b | book_tagbook_3747b463 | 4 | const | 116394 | Using temporary; Using filesort |
| 1 | SIMPLE | book_book | eq_ref | PRIMARY | PRIMARY | 4 | legend.book_tagbook.book_id | 1 | |
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
2 rows in set (0.10 sec)
And for the table book_book:
mysql> explain book_book;
+----------------+----------------+------+-----+-----------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------------+------+-----+-----------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(200) | YES | | NULL | |
| href | varchar(200) | NO | UNI | NULL | |
..... skip some part.............
| nump | int(11) | NO | MUL | 0 | |
| raw_data | varchar(10000) | YES | | NULL | |
+----------------+----------------+------+-----+-----------+----------------+
24 rows in set (0.00 sec)

Categories