python database implementation - python

I am trying to implement a simple database program in python. I get to the point where I have added elements to the db, changed the values, etc.
class db:
def __init__(self):
self.database ={}
def dbset(self, name, value):
self.database[name]=value
def dbunset(self, name):
self.dbset(name, 'NULL')
def dbnumequalto(self, value):
mylist = [v for k,v in self.database.items() if v==value]
return mylist
def main():
mydb=db()
cmd=raw_input().rstrip().split(" ")
while cmd[0]!='end':
if cmd[0]=='set':
mydb.dbset(cmd[1], cmd[2])
elif cmd[0]=='unset':
mydb.dbunset(cmd[1])
elif cmd[0]=='numequalto':
print len(mydb.dbnumequalto(cmd[1]))
elif cmd[0]=='list':
print mydb.database
cmd=raw_input().rstrip().split(" ")
if __name__=='__main__':
main()
Now, as a next step I want to be able to do nested transactions within this python code.I begin a set of commands with BEGIN command and then commit them with COMMIT statement. A commit should commit all the transactions that began. However, a rollback should revert the changes back to the recent BEGIN. I am not able to come up with a suitable solution for this.

A simple approach is to keep a "transaction" list containing all the information you need to be able to roll-back pending changes:
def dbset(self, name, value):
self.transaction.append((name, self.database.get(name)))
self.database[name]=value
def rollback(self):
# undo all changes
while self.transaction:
name, old_value = self.transaction.pop()
self.database[name] = old_value
def commit(self):
# everything went fine, drop undo information
self.transaction = []

If you are doing this as an academic exercise, you might want to check out the Rudimentary Database Engine recipe on the Python Cookbook. It includes quite a few classes to facilitate what you might expect from a SQL engine.
Database is used to create database instances without transaction support.
Database2 inherits from Database and provides for table transactions.
Table implements database tables along with various possible interactions.
Several other classes act as utilities to support some database actions that would normally be supported.
Like and NotLike implement the LIKE operator found in other engines.
date and datetime are special data types usable for database columns.
DatePart, MID, and FORMAT allow information selection in some cases.
In addition to the classes, there are functions for JOIN operations along with tests / demonstrations.

This is all available for free in the built in sqllite module. The commits and rollbacks for sqllite are discussed in more detail than I can understand here

Related

Can an association class be implemented in Python?

I have just started learning software development and I am modelling my system in a UML Class diagram. I am unsure how I would implement this in code.
To keep things simple let’s assume the followimg example:
There is a Room and a Guest Class with association Room(0..)-Guest(0..) and an association class RoomBooking, which contains booking details. How would I model this in Python if my system wants to see all room bookings made by a particular guest?
Most Python applications developed from a UML design are backed by a relational database, usually via an ORM. In which case your design is pretty trivial: your RoomBooking is a table in the database, and the way you look up all RoomBooking objects for a given Guest is just an ORM query. Keeping it vague rather than using a particular ORM syntax, something like this:
bookings = RoomBooking.select(Guest=guest)
With an RDBMS but no ORM, it's not much different. Something like this:
sql = 'SELECT Room, Guest, Charge, Paid FROM RoomBooking WHERE Guest = ?'
cur = db.execute(sql, (guest.id))
bookings = [RoomBooking(*row) for row in cur]
And this points to what you'd do if you're not using a RDBMS: any relation that would be stored as a table with a foreign key is instead stored as some kind of dict in memory.
For example, you might have a dict mapping guests to sets of room bookings:
bookings = guest_booking[guest]
Or, alternatively, if you don't have a huge number of hotels, you might have this mapping implicit, with each hotel having a 1-to-1 mapping of guests to bookings:
bookings = [hotel.bookings[guest] for hotel in hotels]
Since you're starting off with UML, you're probably thinking in strict OO terms, so you'll want to encapsulate this dict in some class, behind some mutator and accessor methods, so you can ensure that you don't accidentally break any invariants.
There are a few obvious places to put it—a BookingManager object makes sense for the guest-to-set-of-bookings mapping, and the Hotel itself is such an obvious place for the per-hotel-guest-to-booking that I used it without thinking above.
But another place to put it, which is closer to the ORM design, is in a class attribute on the RoomBooking type, accessed by classmethods. This also allows you to extend things if you later need to, e.g., look things up by hotel—you'd then put two dicts as class attributes, and ensure that a single method always updates both of them, so you know they're always consistent.
So, let's look at that:
class RoomBooking
guest_mapping = collections.defaultdict(set)
hotel_mapping = collections.defaultdict(set)
def __init__(self, guest, room):
self.guest, self.room = guest, room
#classmethod
def find_by_guest(cls, guest):
return cls.guest_mapping[guest]
#classmethod
def find_by_hotel(cls, hotel):
return cls.hotel_mapping[hotel]
#classmethod
def add_booking(cls, guest, room):
booking = cls(guest, room)
cls.guest_mapping[guest].add(booking)
cls.hotel_mapping[room.hotel].add(booking)
Of course your Hotel instance probably needs to add the booking as well, so it can raise an exception if two different bookings cover the same room on overlapping dates, whether that happens in RoomBooking.add_booking, or in some higher-level function that calls both Hotel.add_booking and RoomBooking.add_booking.
And if this is multi-threaded (which seems like a good possibility, given that you're heading this far down the Java-inspired design path), you'll need a big lock, or a series of fine-grained locks, around the whole transaction.
For persistence, you probably want to store these mappings along with the public objects. But for a small enough data set, or for a server that rarely restarts, it might be simpler to just persist the public objects, and rebuild the mappings at load time by doing a bunch of add_booking calls as part of the load process.
If you want to make it even more ORM-style, you can have a single find method that takes keyword arguments and manually executes a "query plan" in a trivial way:
#classmethod
def find(cls, guest=None, hotel=None):
if guest is None and hotel is None:
return {booking for bookings in cls.guest_mapping.values()
for booking in bookings}
elif hotel is None:
return cls.guest_mapping[guest]
elif guest is None:
return cls.hotel_mapping[hotel]
else:
return {booking for booking in cls.guest_mapping[guest]
if booking.room.hotel == hotel}
But this is already pushing things to the point where you might want to go back and ask whether you were right to not use an ORM in the first place. If that sounds ridiculously heavy duty for your simple toy app, take a look at sqlite3 for the database (which comes with Python, and which takes less work to use than coming up with a way to pickle or json all your data for persistence) and SqlAlchemy for the ORM. There's not much of a learning curve, and not much runtime overhead or coding-time boilerplate.
Sure you can implement it in Python. But there is not a single way. Quite often you have a database layer where the association class is used with two foreign keys (in your case to the primaries of Room and Guest). So in order to search you would just code an according SQL to be sent. In case you want to cache this table you would code it like this (or similarly) with an associative array:
from collections import defaultdict
class Room():
def __init__(self, num):
self.room_number = num
def key(self):
return str(self.room_number)
class Guest():
def __init__(self, name):
self.name = name
def key(self):
return self.name
def nested_dict(n, type):
if n == 1:
return defaultdict(type)
else:
return defaultdict(lambda: nested_dict(n-1, type))
room_booking = nested_dict(2, str)
class Room_Booking():
def __init__(self, date):
self.date = date
room1 = Room(1)
guest1 = Guest("Joe")
room_booking[room1.key()][guest1.key()] = Room_Booking("some date")
print(room_booking[room1.key()][guest1.key()])

With SQLAlchemy, how can I convert a row to a "real" Python object?

I've been using SQLAlchemy with Alembic to simplify the database access I use, and any data structure changes I make to the tables. This has been working out really well up until I started to notice more and more issues with SQLAlchemy "expiring" fields from my point of view nearly at random.
A case in point would be this snippet,
class HRDecimal(Model):
dec_id = Column(String(50), index=True)
#staticmethod
def qfilter(*filters):
"""
:rtype : list[HRDecimal]
"""
return list(HRDecimal.query.filter(*filters))
class Meta(Model):
dec_id = Column(String(50), index=True)
#staticmethod
def qfilter(*filters):
"""
:rtype : list[Meta]
"""
return list(Meta.query.filter(*filters))
Code:
ids = ['1', '2', '3'] # obviously fake list of ids
decs = HRDecimal.qfilter(
HRDecimal.dec_id.in_(ids))
metas = Meta.qfilter(
Meta.dec_id.in_(ids))
combined = []
for ident in ids:
combined.append((
ident,
[dec for dec in decs if dec.dec_id == ident],
[hm for hm in metas if hm.dec_id == ident]
))
For the above, there wasn't a problem, but when I'm processing a list of ids that may contain a few thousand ids, this process started taking a huge amount of time, and if done from a web request in flask, the thread would often be killed.
When I started poking around with why this was happening, the key area was
[dec for dec in decs if dec.dec_id == ident],
[hm for hm in metas if hm.dec_id == ident]
At some point during the combining of these (what I thought were) Python objects, at some point calling dec.dec_id and hm.dec_id, in the SQLAlchemy code, at best, we go into,
def __get__(self, instance, owner):
if instance is None:
return self
dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
return self.impl.get(instance_state(instance), dict_)
Of InstrumentedAttribute in sqlalchemy/orm/attributes.py which seems to be very slow, but even worse than this, I've observed times when fields expired, and then we enter,
def get(self, state, dict_, passive=PASSIVE_OFF):
"""Retrieve a value from the given object.
If a callable is assembled on this object's attribute, and
passive is False, the callable will be executed and the
resulting value will be set as the new value for this attribute.
"""
if self.key in dict_:
return dict_[self.key]
else:
# if history present, don't load
key = self.key
if key not in state.committed_state or \
state.committed_state[key] is NEVER_SET:
if not passive & CALLABLES_OK:
return PASSIVE_NO_RESULT
if key in state.expired_attributes:
value = state._load_expired(state, passive)
Of AttributeImpl in the same file. Horrible issue here is that state._load_expired re-runs the SQL Query completely. So in a situation like this, with a big list of idents, we end up running thousands of "small" SQL queries to the database, where I think we should have only been running two "large" ones at the top.
Now, I've gotten around the expired issue by how I initialise the database for flask with session-options, changing
app = Flask(__name__)
CsrfProtect(app)
db = SQLAlchemy(app)
to
app = Flask(__name__)
CsrfProtect(app)
db = SQLAlchemy(
app,
session_options=dict(autoflush=False, autocommit=False, expire_on_commit=False))
This has definitely improved the above situation for when a rows fields just seemed to expire seemingly (from my observations) at random, but the "normal" slowness of accessing items to SQLAlchemy is still an issue for what we're currently running.
Is there any way with SQLAlchemy, to get a "real" Python object returned from a query, instead of a proxied one like it is now, so it isn't being affected by this?
Your randomness is probably related to either explicitly committing or rolling back at an inconvenient time, or due to auto-commit of some kind. In its default configuration SQLAlchemy session expires all ORM-managed state when a transaction ends. This is usually a good thing, since when a transaction ends you've no idea what the current state of the DB is. This can be disabled, as you've done with expire_on_commit=False.
The ORM is also ill suited for extremely large bulk operations in general, as explained here. It is very well suited for handling complex object graphs and persisting those to a relational database with much less effort on your part, as it organizes the required inserts etc. for you. An important part of that is tracking changes to instance attributes. The SQLAlchemy Core is better suited for bulk.
It looks like you're performing 2 queries that produce a potentially large amount of results and then do a manual "group by" on the data, but in a rather unperforming way, because for each id you have you scan the entire list of results, or O(nm), where n is the number of ids and m the results. Instead you should group the results to lists of objects by id first and then perform the "join". On some other database systems you could handle the grouping in SQL directly, but alas MySQL has no notion of arrays, other than JSON.
A possibly more performant version of your grouping could be for example:
from itertools import groupby
from operator import attrgetter
ids = ['1', '2', '3'] # obviously fake list of ids
# Order the results by `dec_id` for Python itertools.groupby. Cannot
# use your `qfilter()` method as it produces lists, not queries.
decs = HRDecimal.query.\
filter(HRDecimal.dec_id.in_(ids)).\
order_by(HRDecimal.dec_id).\
all()
metas = Meta.query.\
filter(Meta.dec_id.in_(ids)).\
order_by(Meta.dec_id).\
all()
key = attrgetter('dec_id')
decs_lookup = {dec_id: list(g) for dec_id, g in groupby(decs, key)}
metas_lookup = {dec_id: list(g) for dec_id, g in groupby(metas, key)}
combined = [(ident,
decs_lookup.get(ident, []),
metas_lookup.get(ident, []))
for ident in ids]
Note that since in this version we iterate over the queries only once, all() is not strictly necessary, but it should not hurt much either. The grouping could also be done without sorting in SQL with defaultdict(list):
from collections import defaultdict
decs = HRDecimal.query.filter(HRDecimal.dec_id.in_(ids)).all()
metas = Meta.query.filter(Meta.dec_id.in_(ids)).all()
decs_lookup = defaultdict(list)
metas_lookup = defaultdict(list)
for d in decs:
decs_lookup[d.dec_id].append(d)
for m in metas:
metas_lookup[m.dec_id].append(m)
combined = [(ident, decs_lookup[ident], metas_lookup[ident])
for ident in ids]
And finally to answer your question, you can fetch "real" Python objects by querying for the Core table instead of the ORM entity:
decs = HRDecimal.query.\
filter(HRDecimal.dec_id.in_(ids)).\
with_entities(HRDecimal.__table__).\
all()
which will result in a list of namedtuple like objects that can easily be converted to dict with _asdict().

Why use MongoAlchemy when you could subclass a Python Dict?

A friend recently showed me that you can create an instance that is a subclass of dict in Python, and then use that instance to save, update, etc. Seems like you have more control, and it looks easier as well.
class Marker(dict):
def __init__(self, username, email=None):
self.username = username
if email:
self.email = email
#property
def username(self):
return self.get('username')
#username.setter
def username(self, val):
self['username'] = val
def save(self):
db.collection.save(self)
Author here. The general reason you'd want to use it (or one of the many similar libraries) is for safety. When you assign a value to a MongoAlchemy Document it does a check a check to make sure all of the constraints you specified are satisfied (e.g. type, lengths of strings, numeric bounds).
It also has a query DSL that can be more pleasant to use than the json-like built in syntax. Here's an example from the docs:
>>> query = session.query(BloodDonor)
>>> for donor in query.filter(BloodDonor.first_name == 'Jeff', BloodDonor.age < 30):
>>> print donor
Jeff Jenkins (male; Age: 28; Type: O+)
The MongoAlchemy Session object also allows you to simulate transactions:
with session:
do_stuff()
session.insert(doc1)
do_more_stuff()
session.insert(doc2)
do_even_more_stuff()
session.insert(doc3)
# note that at this point nothing has been inserted
# now things are inserted
This doesn't mean that these inserts are one atomic operation—or even that all of the write will succeed—but it does mean that if your application has errors in the "do_stuff" functions that you won't have done half of the inserts. So it prevents a specific and reasonably common type of error

icontains and SQL Security

I have a web app that allows users to enter a search query which will then retrieve models that match this search criteria. Here are my methods:
#staticmethod
def searchBody(query):
'''
Return all entries whose body text contains the query.
'''
return Entry.objects.get(text__icontains=query)
#staticmethod
def searchTitle(query):
'''
Return all entries whose title text contains the query.
'''
return Entry.objects.get(title__icontains=query)
#staticmethod
def searchAuthor(query):
'''
Return all entries whose author text contains the query.
'''
return Entry.objects.get(author.icontains=query)
My question is simply: is this secure as it stands? In other words, does incontains perform the necessary string escaping operations so a person can't inject SQL or Python code into the query to launch an attack?
Yes, the Django ORM protects you against SQL injection.
Of course you can never be entirely sure that there is no security vulnerability in an application. Nevertheless, the ORM is the component responsible for protecting you against SQL injection, so you should assume it's safe and keep your django install up to date!
On an unrelated note, there is a typo in Entry.objects.get(author.icontains=query).
Also, using .get is going to throw a lot of errors here (whenever the object doesn't exist, or more than one exist). It doesn't do what your docstring says either.
You probably want to be using .filter instead.

Basic logic on Python OOP

I have studied Python and understand how the OOP concept work. The doubt I have is how to implement in a application that has sql interaction..
Assume I have a employee table in SQL which contains Emp name, Emp address , Emp Salary..Now, I need to give a salary raise for all the employees for which I create a class with a method.
My Database logic
db = MySQLdb.connect("localhost","testuser","test123","TESTDB" )
cursor = db.cursor()
sql = "Select * from Emp"
cursor.execute(sql)
results = cursor.fetchall()
for row in results:
name = row[0]
address = row[1]
salary = row[2]
db.close()
Class definiton:
Class Emp():
def __init__(self,name,salary):
self.name = name
self.salary = salary
def giveraise(self):
self.salary = self.salary*1.4
How do I create the object with the details fetched from the Employee table...I know you don't need to create a class to perform this small operation..but Im thinking in the lines of practical implementation.
I need to process record by record.
It sounds like you are looking for an Object-Relational Mapper (ORM). This is a way of automatically integrating an SQL database with an object-oriented representation of the data, and sounds perfect for your use case. A list of ORMs is at http://wiki.python.org/moin/HigherLevelDatabaseProgramming. Once you have the ORM set up, you can just call the giveraise method on all the Emp objects. (Incidentally, the giveraise function you have defined is not valid Python.)
As Abe has mentioned, there are libraries that do this sort of mapping for you already (such as SQLAlchmy or Django's included ORM).
One thing to keep in mind though is that a "practical implementation" is subjective. For a small application and a developer with no experience with ORMs, coming up to speed on the general idea of an ORM can add time to the project, and subtle errors to the code. Too many times will someone be confused why something like this takes forever with an orm...
for employee in database.query(Employee).all():
employee.give_raise(0.5)
Assuming a table with 100 employees, this could make anything between a few and 200+ individual calls to the database, depending on how you've set up your ORM.
It may be completely justifiable to not use OOP, or at least not in the way you've described. This is a completely valid object:
class Workforce(object):
def __init__(self, sql_connection):
self.sql_connection = sql_connection
def give_raise(self, employee_id, raise_amount):
sql = 'UPDATE Employee SET Salary=Salary + ? WHERE ID = ?'
self.sql_connection.execute(sql, raise_amount, employee_id)
By no means am I trying to knock on ORMs or dissuade you from using them. Especially if you're just experimenting for the heck of it, learning how to use them properly can make them valuable tools. Just keep in mind that they CAN be a hassle, especially if you're never used one before, and to consider that there are simpler solutions if your application is small enough.

Categories