I have a pretty straightforward Python script. It kicks off a pool of 10 processes that each:
Make an external API request for 1,000 records
Parses the XML response
Inserts each record into a MySQL database
There's nothing particularly tricky here, but about the time I reach 90,000 records the script hangs.
mysql> show processlist;
+----+------+-----------------+---------------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------------+---------------+---------+------+-------+------------------+
| 44 | root | localhost:48130 | my_database | Sleep | 57 | | NULL |
| 45 | root | localhost:48131 | NULL | Sleep | 6 | | NULL |
| 59 | root | localhost | my_database | Sleep | 506 | | NULL |
| 60 | root | localhost | NULL | Query | 0 | NULL | show processlist |
+----+------+-----------------+---------------+---------+------+-------+------------------+
I have roughly a million records to import in the way so I have a long, long way to go.
What can I do to prevent this hang and keep my script moving?
Python 2.7.6
MySQL-python 1.2.5
Not exactly what I wanted to do, but I have found that opening and closing the connection as required seems to move things along.
Related
I have an sql table with the below data:
select day, counter from table1 where day LIKE '%201601%';
+----------+---------+
| day | counter |
+----------+---------+
| 20160101 | 125777 |
| 20160102 | 31720 |
| 20160105 | 24981 |
| 20160106 | 240366 |
| 20160107 | 270560 |
| 20160108 | 268788 |
| 20160109 | 254286 |
| 20160110 | 218154 |
| 20160111 | 250186 |
| 20160112 | 94532 |
| 20160113 | 71437 |
| 20160114 | 71121 |
| 20160115 | 71135 |
| 20160116 | 71325 |
| 20160117 | 209762 |
| 20160118 | 210305 |
| 20160119 | 257627 |
| 20160120 | 306353 |
| 20160121 | 214687 |
| 20160122 | 214680 |
| 20160123 | 149844 |
| 20160124 | 133741 |
| 20160125 | 82404 |
| 20160126 | 71403 |
| 20160127 | 71437 |
| 20160128 | 72005 |
| 20160129 | 71417 |
| 20160130 | 0 |
| 20160131 | 69937 |
+----------+---------+
I have a python script where i run by
python myapp.py January
The January variable includes the query below:
January = """select day, counter from table1 where day LIKE '%201601%';"""
What i would like to do is to have to run the script with different flags and get the script to calculate the sum for all of the days of the :
last month,
this month,
last week,
last two weeks
specific month.
At the moment i have different variables for the months of 2016, this way the script will become huge, i am sure there is an easier way to do this.
cursor = db.cursor()
cursor.execute(eval(sys.argv[1]))
display = cursor.fetchall()
MonthlyLogs = sum(int(row[1]) for row in display)
The point of this is that i want to see discrepancies between data. My overall aim is to display this data in php at a later date, but for now i would like for it to be written to a file, which at the moment it is being written to.
What is the best way to achieve this?
Thanks,
Incubator
For the first part, you can get access to current month and year using datetime.
from datetime import datetime
currentMonth = datetime.now().month
currentYear = datetime.now().year
parameter = str(currentYear) + str(currentMonth)
Passing this parameter in your query should work.
For the second, you may define a function with a flag that when set to 1, will set parameter as above otherwise parameter's value will be sys.argv[1].
I have a database that has the following structure:
mysql> show tables;
+--------------------+
| Tables_in_my_PROD |
+--------------------+
| table_A |
| Table_B |
| table_C |
| view_A |
| table_D |
| table_E |
| ... |
+--------------------+
I use a script to make a gzip dump file of my entire database and then I upload that file to Amazon S3. The python code to create the dump file is below:
dump_cmd = ['mysqldump ' +
'--user={mysql_user} '.format(mysql_user=cfg.DB_USER) +
'--password={db_pw} '.format(db_pw=cfg.DB_PW) +
'--host={db_host} '.format(db_host=cfg.DB_HOST) +
'{db_name} '.format(db_name=cfg.DB_NAME) +
'| ' +
'gzip ' +
'> ' +
'{filepath}'.format(filepath=self.filename)]
dc = subprocess.Popen(dump_cmd, shell=True)
dc.wait()
This creates the zip file. Next, I upload it to Amazon S3 using python's boto library.
When I go to restore a database from that zip file, I only get tables A, B and C restored. Tables D and E are nowhere to be found.
Tables D and E are after the view
Is there something about that view that is causing problems? I do not know if the tables are getting dumped to the file because I do not know how to look into that file (table_B has 8 million rows and any attempt to inspect the file crashes everything)
I am using Mariadb version
+-------------------------+------------------+
| Variable_name | Value |
+-------------------------+------------------+
| innodb_version | 5.6.23-72.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 10.0.19-MariaDB |
| version_comment | MariaDB Server |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
| version_malloc_library | bundled jemalloc |
+-------------------------+------------------+
I have a table with more than a million record with the following structure:
mysql> SELECT * FROM Measurement;
+----------------+---------+-----------------+------+------+
| Time_stamp | Channel | SSID | CQI | SNR |
+----------------+---------+-----------------+------+------+
| 03_14_14_30_14 | 7 | open | 40 | -70 |
| 03_14_14_30_14 | 7 | roam | 31 | -79 |
| 03_14_14_30_14 | 8 | open2 | 28 | -82 |
| 03_14_14_30_15 | 8 | roam2 | 29 | -81 |....
I am reading data from this table into python for plotting. The problem is, the MySQL reads are too slow and it is taking me hours to get the plots even after using
MySQLdb.cursors.SSCursor (as suggested by a few in this forum) to quicken up the task.
con = mdb.connect('localhost', 'testuser', 'conti', 'My_Freqs', cursorclass = MySQLdb.cursors.SSCursor);
cursor=con.cursor()
cursor.execute("Select Time_stamp FROM Measurement")
for row in cursor:
... Do processing ....
Will normalizing the table help me in speeding up the task? If so, How should i normalize it?
P.S: Here is the result for EXPLAIN
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| Time_stamp | varchar(128) | YES | | NULL | |
| Channel | int(11) | YES | | NULL | |
| SSID | varchar(128) | YES | | NULL | |
| CQI | int(11) | YES | | NULL | |
| SNR | float | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
The problem is probably that you are looping over the cursor instead of just dumping out all the data at once and then processing it. You should be able to dump out a couple million rows in a couple/few seconds. Try to do something like
cursor.execute("select Time_stamp FROM Measurement")
data = cusror.fetchall()
for row in data:
#do some stuff...
Well, since you're saying the whole table has to be read, I guess you can't do much about it. It has more than 1 million records... you're not going to optimize much on the database side.
How much time does it take you to process just one record? Maybe you could try optimizing that part. But even if you got down to 1 millisecond per record, it would still take you about half an hour to process the full table. You're dealing with a lot of data.
Maybe run multiple plotting jobs in parallel? With the same metrics as above, dividing your data in 6 equal-sized jobs would (theoretically) give you the plots in 5 minutes.
Do your plots have to be fine-grained? You could look for ways to ignore certain values in the data, and generate a complete plot only when the user needs it (wild speculation here, I really have no idea what your plots look like)
Using MySQL I cannot import a file using load data local infile. My server is on AWS RDS. This works on Ubuntu 10.04. I installed the client using apt-get install mysql-client. Same error if I use mysqldb or mysql.connector in Python.
File "/usr/lib/pymodules/python2.7/mysql/connector/protocol.py", line 479, in cmd_query
return self.handle_cmd_result(self.conn.recv())
File "/usr/lib/pymodules/python2.7/mysql/connector/connection.py", line 179, in recv_plain
errors.raise_error(buf)
File "/usr/lib/pymodules/python2.7/mysql/connector/errors.py", line 82, in raise_error
raise get_mysql_exception(errno,errmsg)
mysql.connector.errors.NotSupportedError: 1148: The used command is not allowed with this MySQL version
I have a lot of data to upload... I can't believe 12.04 is not supported and I have to use 12.04.
Not really a python question... but the long and short of the matter is that mysql, as compiled and distributed by Ubuntu > 12.04, does not support using load data local infile directly from the mysql client as is.
If you search the MySQL Reference Documentation for Error 1148, further down the page linked above, in the comments:
Posted by Aaron Peterson on November 9 2005 4:35pm
With a defalut installation from FreeBSD ports, I had to use the command line
mysql -u user -p --local-infile menagerie
to start the mysql monitor, else the LOAD DATA LOCAL command failed with an error like
the following:
ERROR 1148 (42000): The used command is not allowed with this MySQL version
... which does work.
monte#oobun2:~$ mysql -h localhost -u monte -p monte --local-infile
Enter password:
...
mysql> LOAD DATA LOCAL INFILE 'pet.txt' INTO TABLE pet;
Query OK, 8 rows affected (0.04 sec)
Records: 8 Deleted: 0 Skipped: 0 Warnings: 0
mysql> SELECT * FROM pet;
+----------+--------+---------+------+------------+------------+
| name | owner | species | sex | birth | death |
+----------+--------+---------+------+------------+------------+
| Fluffy | Harold | cat | f | 1993-02-04 | NULL |
| Claws | Gwen | cat | m | 1994-03-17 | NULL |
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
| Fang | Benny | dog | m | 1990-08-27 | NULL |
| Bowser | Diane | dog | m | 1979-08-31 | 1995-07-29 |
| Chirpy | Gwen | bird | f | 1998-09-11 | NULL |
| Whistler | Gwen | bird | NULL | 1997-12-09 | NULL |
| Slim | Benny | snake | m | 1996-04-29 | NULL |
| Puffball | Diane | hamster | f | 1999-03-30 | NULL |
+----------+--------+---------+------+------------+------------+
9 rows in set (0.00 sec)
mysql>
I generally don't need to load data via code, so that suffices for my needs. If you do, and have the ability/permissions to edit your mysql config file, then the local-infile=1 line in the appropriate section(s) may be simpler.
I'm running a bunch of queries using Python and psycopg2. I create one large temporary table w/ about 2 million rows, then I get 1000 rows at a time from it by using cur.fetchmany(1000) and run more extensive queries involving those rows. The extensive queries are self-sufficient, though - once they are done, I don't need their results anymore when I move on to the next 1000.
However, about 1000000 rows in, I got an exception from psycopg2:
psycopg2.OperationalError: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Funnily enough, this happened when I was executing a query to drop some temporary tables that the more extensive queries created.
Why might this happen? Is there any way to avoid it? It was annoying that this happened halfway through, meaning I have to run it all again. What might max_locks_per_transaction have to do with anything?
NOTE: I'm not doing any .commit()s, but I'm deleting all the temporary tables I create, and I'm only touching the same 5 tables anyway for each "extensive" transaction, so I don't see how running out of table locks could be the problem...
when you create a table, you get an exclusive lock on it that lasts to the end of the transaction. Even if you then go ahead and drop it.
So if I start a tx and create a temp table:
steve#steve#[local] *=# create temp table foo(foo_id int);
CREATE TABLE
steve#steve#[local] *=# select * from pg_locks where pid = pg_backend_pid();
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted
---------------+----------+-----------+------+-------+------------+---------------+---------+-----------+----------+--------------------+-------+---------------------+---------
virtualxid | | | | | 2/105315 | | | | | 2/105315 | 19098 | ExclusiveLock | t
transactionid | | | | | | 291788 | | | | 2/105315 | 19098 | ExclusiveLock | t
relation | 17631 | 10985 | | | | | | | | 2/105315 | 19098 | AccessShareLock | t
relation | 17631 | 214780901 | | | | | | | | 2/105315 | 19098 | AccessExclusiveLock | t
object | 17631 | | | | | | 2615 | 124616403 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
object | 0 | | | | | | 1260 | 16384 | 0 | 2/105315 | 19098 | AccessShareLock | t
(6 rows)
These 'relation' locks aren't dropped when I drop the table:
steve#steve#[local] *=# drop table foo;
DROP TABLE
steve#steve#[local] *=# select * from pg_locks where pid = pg_backend_pid();
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted
---------------+----------+-----------+------+-------+------------+---------------+---------+-----------+----------+--------------------+-------+---------------------+---------
virtualxid | | | | | 2/105315 | | | | | 2/105315 | 19098 | ExclusiveLock | t
object | 17631 | | | | | | 1247 | 214780902 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
transactionid | | | | | | 291788 | | | | 2/105315 | 19098 | ExclusiveLock | t
relation | 17631 | 10985 | | | | | | | | 2/105315 | 19098 | AccessShareLock | t
relation | 17631 | 214780901 | | | | | | | | 2/105315 | 19098 | AccessExclusiveLock | t
object | 17631 | | | | | | 2615 | 124616403 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
object | 17631 | | | | | | 1247 | 214780903 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
object | 0 | | | | | | 1260 | 16384 | 0 | 2/105315 | 19098 | AccessShareLock | t
(8 rows)
In fact, it added two more locks... It seems if I continually create/drop that temp table, it adds 3 locks each time.
So I guess one answer is that you will need enough locks to cope with all these tables being added/dropped throughout the transaction. Alternatively, you could try to reuse the temp tables between queries, simply truncate them to remove all the temp data?
Did you create multiple savepoints with the same name without releasing them?
I followed these instructions, repeatedly executing
SAVEPOINT savepoint_name but without ever executing any corresponding RELEASE SAVEPOINT savepoint_name statements. PostgreSQL was just masking the old savepoints, never freeing them. It kept track of each until it ran out of memory for locks. I think my postgresql memory limits were much lower, it only took ~10,000 savepoints for me to hit max_locks_per_transaction.
Well, are you running the entire create + queries inside a single transaction? This would perhaps explain the issue. Just because it happened when you were dropping tables would not necessarily mean anything, that may just happen to be the point when it ran out of free locks.
Using a view might be an alternative to a temporary table and would definitely by my first pick if you're creating this thing and then immediately removing it.