Given this piece of code:
record = session.query(Foo).filter(Foo.id == 1).first()
session.delete(record)
session.flush()
has_record = session.query(Foo).filter(Foo.id == 1).first()
I think the 'has_record' should be None here, but it turns out to be the same row as record.
Did I miss something to get the assumed result. Or is there any way that can make the delete take effect without commit?
Mysql would behave in a different way under similar process.
start transaction;
select * from Foo where id = 1; # Hit one record
delete from Foo where id = 1; # Nothing goes to the disk
select * from Foo where id = 1; # Empty set
commit; # Everything geos to the disk
I made a stupid mistake here. The session I'm using is a routing session, which has a master/slave session behind it. The fact might be that the delete is flushed to master and the query still goes to slave, so of course I can query the record again.
Related
I have a question regarding MySQL and transactions. I work with MySQL 5.7.18, python 3 and the Oracle mysql connector v2.1.4
I do not understand the difference between
a) having a transaction and –in case of error – rollback and
b) not having a transaction and – in case of error – simply not commiting the changes.
Both seem to leave me with exactly the same results (i.e. no entries in table, see code example below). Does this have to do with using InnoDB – would the results differ otherwise?
What is the advantage of using a transaction if
1) I cannot rollback commited changes and
2) I could just as well not commit changes (until I am done with my task or sure that some query didn’t raise any exceptions)?
I have tried to find the answers to those questions in https://downloads.mysql.com/docs/connector-python-en.a4.pdf but failed to find the essential difference.
Somebody asked an almost identical question and received some replies but I don’t think those actually contain an answer: Mysql transaction : commit and rollback Replies focused on having multiple connections open and visibility of changes. Is that all there is to it?
import mysql.connector
# Connect to MySQL-Server
conn = mysql.connector.connect(user='test', password='blub',
host='127.0.0.1', db='my_test')
cursor = conn.cursor(buffered=True)
# This is anyway the default in mysql.connector
# cursor.autocommit = False
sql = """CREATE TABLE IF NOT EXISTS `my_test`.`employees` (
`emp_no` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(14) NOT NULL,
PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8"""
try:
cursor.execute(sql)
conn.commit()
except:
print("error")
# Arguments on default values
# conn.start_transaction(consistent_snapshot=False,
# isolation_level=None, readonly=False)
sql = """INSERT INTO `my_test`.`employees`
(`first_name`)
VALUES
(%s);"""
employees = {}
employees["1"] = ["Peter"]
employees["2"] = ["Bruce"]
for employee, value in employees.items():
cursor.execute(sql, (value[0],))
print(conn.in_transaction)
# If I do not commit the changes, table is left empty (whether I write
# start_transaction or not)
# If I rollback the changes (without commit first), table is left empty
# (whether I write start_transaction or not)
# If I commit and then rollback, the rollback had no effect (i.e. there are
# values in the table (whether I write start_transaction or not)
conn.commit()
conn.rollback()
Thank you very much for your help in advance! I appreciate it.
I think having not committed nor rolled back leaves the transaction in a running state, in which it may still hold resources like locks etc
Well it doesn't matter which db you are using when you call a transaction ,it will lock the resource (I.e any table) until the transaction is completed or rolled back for example if i write a transaction to insert something to a table test the test table will be locked until the transaction is completed this may leads to deadlock since others may need that table...You can try it on yourself just open two instances of your mysql in the first instance run transaction without commit and in the second try to insert something on the same table ...it will clear your doubt
Transactions prevent other queries from modifying the data while your query is running. Furthermore, a transaction scope can contain multiple queries - so you can rollback ALL of them in the event of an error, whereas that is not the case if some of them run successfully and only one query results in error, in which case you may end up with partially committed results, like JLH said.
Your decision to have a transaction should take into account the numerous reasons for having one, including having multiple statements each of which commits writes the database.
In your example I don't think it makes a difference, but in more complicated scenarios you need a transaction to ensure ACID.
Newbie here (and it seems like it might be a newbie question).
Using Ubuntu 14.04 with a fresh install of Cassandra 2.1.1, CQL 3.2.0 (it says).
Writing a back-end database for a CherryPy site, initially as a session database.
I've come up with a scheme for a kind of 'row locking' as a session lock, but it doesn't seem to be hanging together, so I've reduced it to a simple test program running against a local Cassandra instance. To run this test, I open two terminal windows to run two python instances of it at the same time, each with different instance numbers ('1' and '2').
import time, sys, os, cassandra
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
instance = sys.argv[1]
cluster = Cluster( auth_provider=PlainTextAuthProvider( username='cassandra', password='cassandra'))
cdb = cluster.connect()
cdb.execute("CREATE KEYSPACE IF NOT EXISTS test WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1}")
cdb.execute("CREATE TABLE IF NOT EXISTS test.test ( id text primary key, val int, lock text )")
cdb.execute("INSERT INTO test.test (id, val, lock) VALUES ('session_id1', 0, '') ")
raw_input( '<Enter> to start ... ')
i = 0
while i < 10000:
i += 1
# set lock
while True:
r = cdb.execute( "UPDATE test.test SET lock = '%s' WHERE id = 'session_id1' IF lock = '' " % instance)
if r[0].applied == True:
break
# check lock and increment val
s0 = cdb.execute("SELECT val,lock FROM test.test WHERE id = 'session_id1' " )[0]
if s0.lock != instance:
print 'error: instance [%s] %s %s' % (instance, s0, r[0])
cdb.execute( "UPDATE test.test SET val = %s WHERE id = 'session_id1'", (s0.val + 1,))
# clear lock
cdb.execute( "UPDATE test.test SET lock = '' WHERE id = 'session_id1' ")
time.sleep( .01)
So if I understand correctly, the UPDATE..IF should be 'applied' (and the break taken) only if the existing value of lock is '' (an empty string), so this should give an effective exclusive lock on the row.
The problem is that the 's1.lock != instance' test quite frequently fires, showing that despite the UPDATE being applied, the value of lock afterwards is variously still '' or that of the other instance...
I know that when I roll out to a cluster I'm going to have to manage consistency issues, but this is against a single local Cass instance - surely consistency shouldn't be a problem here?
I can't imagine this CQL form is broken (tm), so it must be me. What am I doing wrong, or what is it I don't understand? TIA.
UPDATE: Ok, I googled a lot on this before I posted here, and now have spent the day since posting doing the same.
In particular, the stackoverflow posting Cassandra Optimistic Locking is addressing a similar issue (for a different reason), and his solution was:
"update table1 set version_num = 5 where name = 'abc' if version_num = 4"
which he says works for him - but is really exactly what I am doing, but which isn't working for me.
So I believe my approach to be sound, but clearly I have a problem.
Are there any environmental issues that could be affecting me? (installation, pythonic, whatever...)
Unsatisfactory Work-Around found*
After trying a lot of variations of the test code (above), I have come to the view that the statement:
"UPDATE test.test SET lock = '%s' WHERE id = 'session_id1' IF lock = '' "
around 5% of the time it finds lock is '' (empty string), it actually fails to write the value to lock, but nevertheless returns 'applied=True'.
By way of further testing, I modified that test code as follows:
# set lock
while True:
r = cdb.execute( "UPDATE test.test SET lock = '%s' WHERE id = 'session_id1' IF lock = '' " % instance)
if r[0].applied == True:
s = cdb.execute("SELECT lock FROM test.test WHERE id = 'session_id1' " )
if s[0].lock == instance:
break
# check lock and increment val
(etc)
... so this code now confirms that the lock had been applied, and if not, it goes back to try again...
So this is:
1) Horrible
2) Kludgy
3) Inefficient
4) Totally reliable (the only thing that really matters to me)
I've tested this on the 'single local Cassandra instance', and the main point is that the incrementing of the 'val' column that the lock is supposed to be protecting, does reach the proper terminating value (20000 with the code as above).
I've also tested it on a 2-node cluster with a replication factor of 2, with one instance of the test code running on each node, and that works too (although the "UPDATE ... IF" statement, now with a consistency of QUORUM, occasionally returns:
exception - code=1100 [Coordinator node timed out waiting for replica nodes' responses]\
message="Operation timed out - received only 1 responses." \
info={'received_responses': 1, 'required_responses': 2, 'write_type': 5, 'consistency': 8}
... that needs careful handling, as it appears that the lock has always been set, despite not having received all of the replies... and it cannot be retried, as the operation isn't idempotent...)
So I clearly haven't fixed the underlying problem, and although I have fixed the symptom, I would still appreciate a more thorough insight into what is happening...
I'd appreciate any feedback (but at least I can make progress again). TIA
So I've had some communication with Tyler Hobbs (Datastax), and in a nutshell:
"The correct functioning of the mechanism that provides the atomic test-and-set facility (via LightWeight Transactions) depends upon using the same mechanism to clear the lock."
... so I need to use a similar 'IF' construct to clear it, even though I already know the contents...
# clear lock
cdb.execute( "UPDATE test.test SET lock = '' WHERE id = 'session_id1' IF lock = '%s'" % instance)
... and that works.
I'm new to Python - coming from PHP - and have been bouncing back and forth between Python official documentation and SqlAlchemy (which I'm trying to use as easily as Laravel's DB class)
I have this bit of code:
from sqlalchemy import *
engine = create_engine('mysql://root:pass==#aws.com/db')
db_connection = engine.connect()
meta = MetaData()
video_processing = Table('video_processing', meta, autoload=True, autoload_with=engine)
while True:
sleep(1)
stmt = select([video_processing]).where(video_processing.c.finished_processing == 0).where(video_processing.c.in_progress == 0)
result = db_connection.execute(stmt)
rows = result.fetchall()
print len(rows)
stmt = None
result = None
rows = None
When I execute my statement, I let it run and print out the number of rows that it fetches.
While that's going, I go in and delete rows from my db.
The problem is that even though I'm resetting pretty much everything I can think of that is related to the query, it's still printing out the same number of fetched rows in every iteration of my loop, even though I'm changing the underlying data.
Any ideas?
The tricky part is that the connection needs to be closed if using engine.connect() with db_connection.close() otherwise you might not see new data changes.
I ended up bypassing the connection and executing my statement directly on the engine, which makes more sense logically anyways:
result = engine.execute(stmt)
We still have a rare case of duplicate entries when this POST method is called.
I had asked for advice previously on Stack overflow and was given a solution, that is utilising the parent/child methodology to retain strongly consistent queries.
I have migrated all data into that form and let it run for another 3 months.
However the problem was never solved.
The problem is right here with this conditional if recordsdb.count() == 1:
It should be true in order to update the entry, but instead HRD might not always find the latest entry and creates a new entry instead.
As you can see, we are writing/reading from the Record via Parent/Child methodology as recommended:
new_record = FeelTrackerRecord(parent=user.key,...)
And yet still upon retrieval, the HRD still doesn't always fetch the latest entry:
recordsdb = FeelTrackerRecord.query(ancestor = user.key).filter(FeelTrackerRecord.record_date == ... )
So we are quite stuck on this and don't know how to solve it.
#requires_auth
def post(self, ios_sync_timestamp):
user = User.query(User.email == request.authorization.username).fetch(1)[0]
if user:
json_records = request.json['records']
for json_record in json_records:
recordsdb = FeelTrackerRecord.query(ancestor = user.key).filter(FeelTrackerRecord.record_date == date_parser.parse(json_record['record_date']))
if recordsdb.count() == 1:
rec = recordsdb.fetch(1)[0]
if 'timestamp' in json_record:
if rec.timestamp < json_record['timestamp']:
rec.rating = json_record['rating']
rec.notes = json_record['notes']
rec.timestamp = json_record['timestamp']
rec.is_deleted = json_record['is_deleted']
rec.put()
elif recordsdb.count() == 0:
new_record = FeelTrackerRecord(parent=user.key,
user=user.key,
record_date = date_parser.parse(json_record['record_date']),
rating = json_record['rating'],
notes = json_record['notes'],
timestamp = json_record['timestamp'])
new_record.put()
else:
raise Exception('Got more than two records for the same record date - among REST post')
user.last_sync_timestamp = create_timestamp(datetime.datetime.today())
user.put()
return '', 201
else:
return '', 401
Possible Solution:
The very last idea I have to solve this would be, stepping away from Parent/Child strategy and using the user.key PLUS date-string as part of the key.
Saving:
new_record = FeelTrackerRecord(id=str(user.key) + json_record['record_date'], ...)
new_record.put()
Loading:
key = ndb.Key(FeelTrackerRecord, str(user.key) + json_record['record_date'])
record = key.get();
Now I could check if record is None, I shall create a new entry, otherwise I shall update it. And hopefully HRD has no reason not finding the record anymore.
What do you think, is this a guaranteed solution?
The Possible Solution appears to have the same problem as the original code. Imagine the race condition if two servers execute the same instructions practically simultaneously. With Google's overprovisioning, that is sure to happen once in a while.
A more robust solution should use Transactions and a rollback for when concurrency causes a consistency violation. The User entity should be the parent of its own Entity Group. Increment a records counter field in the User entity within a transaction. Create the new FeelTrackerRecord only if the Transaction completes successfully. Therefore the FeelTrackerRecord entities must have a User as parent.
Edit: In the case of your code the following lines would go before user = User.query(... :
Transaction txn = datastore.beginTransaction();
try {
and the following lines would go after user.put() :
txn.commit();
} finally {
if (txn.isActive()) {
txn.rollback();
}
}
That may overlook some flow control nesting detail, it is the concept that this answer is trying to describe.
With an active transaction, if multiple processes (for example on multiple servers executing the same POST concurrently because of overprovisioning) the first process will succeed with its put and commit, while the second process will throw the documented ConcurrentModificationException.
Edit 2: The transaction that increments the counter (and may throw an exception) must also create the new record. That way if the exception is thrown, the new record is not created.
I have a script that waits until some row in a db is updated:
con = MySQLdb.connect(server, user, pwd, db)
When the script starts the row's value is "running", and it waits for the value to become "finished"
while(True):
sql = '''select value from table where some_condition'''
cur = self.getCursor()
cur.execute(sql)
r = cur.fetchone()
cur.close()
res = r['value']
if res == 'finished':
break
print res
time.sleep(5)
When I run this script it hangs forever. Even though I see the value of the row has changed to "finished" when I query the table, the printout of the script is still "running".
Is there some setting I didn't set?
EDIT: The python script only queries the table. The update to the table is carried out by a tomcat webapp, using JDBC, that is set on autocommit.
This is an InnoDB table, right? InnoDB is transactional storage engine. Setting autocommit to true will probably fix this behavior for you.
conn.autocommit(True)
Alternatively, you could change the transaction isolation level. You can read more about this here:
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html
The reason for this behavior is that inside a single transaction the reads need to be consistent. All consistent reads within the same transaction read the snapshot established by the first read. Even if you script only reads the table this is considered a transaction too. This is the default behavior in InnoDB and you need to change that or run conn.commit() after each read.
This page explains this in more details: http://dev.mysql.com/doc/refman/5.0/en/innodb-consistent-read.html
I worked around this by running
c.execute("""set session transaction isolation level READ COMMITTED""")
early on in my reading session. Updates from other threads do come through now.
In my instance I was keeping connections open for a long time (inside mod_python) and so updates by other processes weren't being seen at all.