How to reduce number of requests to the Datastore

How to reduce number of requests to the Datastore - python

When running the below with 200 Documents and 1 DocUser the script takes approx 5000ms according to AppStats. The culprint is that there is a request to the datastore for each lockup of the lastEditedBy (datastore_v3.Get) taking 6-51ms each.
What I'm trying do is to make something that makes possible to show many entities with several properties where some of them are derived from other entities. There will never be a large number of entities (<5000) and since this is more of an admin interface there will never be many simultaneous users.
I have tried to optimize by caching the DocUser entities but I am not able to get the DocUser key from the query above without making a new request to the datastore.
1) Does this make sense - is the latency I am experiencing normal?
2) Is there a way to make this work without the additional requests to the datastore?
models.py
class Document(db.Expando):
title = db.StringProperty()
lastEditedBy = db.ReferenceProperty(DocUser, collection_name = 'documentLastEditedBy')
...
class DocUser(db.Model):
user = db.UserProperty()
name = db.StringProperty()
hasWriteAccess= db.BooleanProperty(default = False)
isAdmin = db.BooleanProperty(default = False)
accessGroups = db.ListProperty(db.Key)
...
main.py
$out = '<table>'
documents = Document.all()
for i,d in enumerate(documents):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
$out = '</table>'

This is a typical anti-pattern. You can workaround this by:
Prefetch all of the references. Please see Nick's blog entry for details.
Use ndb. This module doesn't have ReferenceProperty. It has various goodies like 2 automatic caching layers, asynchronous mechanism called tasklets, etc. For more details, see the ndb documentation.

One way to do it is to prefetch all the docusers to make a lookup dictionary, with the keys being docuser.key() and values being docuser.name.
docusers = Docuser.all().fetch(1000)
docuser_dict = dict( [(i.key(), i.name) for i in docusers] )
Then in your code, you can get the names from the docuser_dict by using get_value_for_datastore to get the docuser.key() without pulling the object from the datastore.
documents = Document.all().fetch(1000)
for i,d in enumerate(documents):
docuser_key = Document.lastEditedBy.get_value_for_datastore(d)
last_editedby_name = docuser_dict.get(docuser_key)
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, last_editedby_name)

If you want to cut instance-time, you can break a single synchronous query into multiple asynchronous queries, which can prefetch results while you do other work. Instead of using Document.all().fetch(), use Document.all().run(). You may have to block on the first query you iterate on, but by the time it is done, all other queries will have finished loading results. If you want to get 200 entities, try using 5 queries at once.
q1 = Document.all().run(prefetch_size=20, batch_size=20, limit=20, offset=0)
q2 = Document.all().run(prefetch_size=45, batch_size=45, limit=45, offset=20)
q3 = Document.all().run(prefetch_size=45, batch_size=45, limit=45, offset=65)
q4 = Document.all().run(prefetch_size=45, batch_size=45, limit=45, offset=110)
q5 = Document.all().run(prefetch_size=45, batch_size=45, limit=45, offset=155)
for i,d in enumerate(q1):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
for i,d in enumerate(q2):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
for i,d in enumerate(q3):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
for i,d in enumerate(q4):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
for i,d in enumerate(q5):
out += '<tr><td>%s</td><td>%s</td></tr>' % (d.title, d.lastEditedBy.name)
I apologize for my crummy python; but the idea is simple. set your prefetch_size = batch_size = limit, and start all your queries at once. q1 has a smaller size because we will block on it first, and blocking is what wastes time. By the time q1 is done, q2 will be done or almost done, and q3-5 you will pay zero latency.
See https://developers.google.com/appengine/docs/python/datastore/async#Async_Queries for details.

Related

Looking for a better strategy for an SQLAlchemy bulk upsert

I have a Flask application with a RESTful API. One of the API calls is a 'mass upsert' call with a JSON payload. I am struggling with performance.
The first thing I tried was to use merge-result on a Query object, because...
This is an optimized method which will merge all mapped instances, preserving the structure of the result rows and unmapped columns with less method overhead than that of calling Session.merge() explicitly for each value.
This was the initial code:
class AdminApiUpdateTasks(Resource):
"""Bulk task creation / update endpoint"""
def put(self, slug):
taskdata = json.loads(request.data)
existing = db.session.query(Task).filter_by(challenge_slug=slug)
existing.merge_result(
[task_from_json(slug, **task) for task in taskdata])
db.session.commit()
return {}, 200
A request to that endpoint with ~5000 records, all of them already existing in the database, takes more than 11m to return:
real 11m36.459s
user 0m3.660s
sys 0m0.391s
As this would be a fairly typical use case, I started looking into alternatives to improve performance. Against my better judgement, I tried to merge the session for each individual record:
class AdminApiUpdateTasks(Resource):
"""Bulk task creation / update endpoint"""
def put(self, slug):
# Get the posted data
taskdata = json.loads(request.data)
for task in taskdata:
db.session.merge(task_from_json(slug, **task))
db.session.commit()
return {}, 200
To my surprise, this turned out to be more than twice as fast:
real 4m33.945s
user 0m3.608s
sys 0m0.258s
I have two questions:
Why is the second strategy using merge faster than the supposedly optimized first one that uses merge_result?
What other strategies should I pursue to optimize this more, if any?

This is an old question, but I hope this answer can still help people.
I used the same idea as this example set by SQLAlchemy, but I added benchmarking for doing UPSERT (insert if exists, otherwise update the existing record) operations. I added the results on a PostgreSQL 11 database below:
Tests to run: test_customer_individual_orm_select, test_customer_batched_orm_select, test_customer_batched_orm_select_add_all, test_customer_batched_orm_merge_result
test_customer_individual_orm_select : UPSERT statements via individual checks on whether objects exist and add new objects individually (10000 iterations); total time 9.359603 sec
test_customer_batched_orm_select : UPSERT statements via batched checks on whether objects exist and add new objects individually (10000 iterations); total time 1.553555 sec
test_customer_batched_orm_select_add_all : UPSERT statements via batched checks on whether objects exist and add new objects in bulk (10000 iterations); total time 1.358680 sec
test_customer_batched_orm_merge_result : UPSERT statements using batched merge_results (10000 iterations); total time 7.191284 sec
As you can see, merge-result is far from the most efficient option. I'd suggest checking in batches whether the results exist and should be updated. Hope this helps!
"""
This series of tests illustrates different ways to UPSERT
or INSERT ON CONFLICT UPDATE a large number of rows in bulk.
"""
from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
from profiler import Profiler
Base = declarative_base()
engine = None
class Customer(Base):
__tablename__ = "customer"
id = Column(Integer, primary_key=True)
name = Column(String(255))
description = Column(String(255))
Profiler.init("bulk_upserts", num=100000)
#Profiler.setup
def setup_database(dburl, echo, num):
global engine
engine = create_engine(dburl, echo=echo)
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
s = Session(engine)
for chunk in range(0, num, 10000):
# Insert half of the customers we want to merge
s.bulk_insert_mappings(
Customer,
[
{
"id": i,
"name": "customer name %d" % i,
"description": "customer description %d" % i,
}
for i in range(chunk, chunk + 10000, 2)
],
)
s.commit()
#Profiler.profile
def test_customer_individual_orm_select(n):
"""
UPSERT statements via individual checks on whether objects exist
and add new objects individually
"""
session = Session(bind=engine)
for i in range(0, n):
customer = session.query(Customer).get(i)
if customer:
customer.description += "updated"
else:
session.add(Customer(
id=i,
name=f"customer name {i}",
description=f"customer description {i} new"
))
session.flush()
session.commit()
#Profiler.profile
def test_customer_batched_orm_select(n):
"""
UPSERT statements via batched checks on whether objects exist
and add new objects individually
"""
session = Session(bind=engine)
for chunk in range(0, n, 1000):
customers = {
c.id: c for c in
session.query(Customer)\
.filter(Customer.id.between(chunk, chunk + 1000))
}
for i in range(chunk, chunk + 1000):
if i in customers:
customers[i].description += "updated"
else:
session.add(Customer(
id=i,
name=f"customer name {i}",
description=f"customer description {i} new"
))
session.flush()
session.commit()
#Profiler.profile
def test_customer_batched_orm_select_add_all(n):
"""
UPSERT statements via batched checks on whether objects exist
and add new objects in bulk
"""
session = Session(bind=engine)
for chunk in range(0, n, 1000):
customers = {
c.id: c for c in
session.query(Customer)\
.filter(Customer.id.between(chunk, chunk + 1000))
}
to_add = []
for i in range(chunk, chunk + 1000):
if i in customers:
customers[i].description += "updated"
else:
to_add.append({
"id": i,
"name": "customer name %d" % i,
"description": "customer description %d new" % i,
})
if to_add:
session.bulk_insert_mappings(
Customer,
to_add
)
to_add = []
session.flush()
session.commit()
#Profiler.profile
def test_customer_batched_orm_merge_result(n):
"UPSERT statements using batched merge_results"
session = Session(bind=engine)
for chunk in range(0, n, 1000):
customers = session.query(Customer)\
.filter(Customer.id.between(chunk, chunk + 1000))
customers.merge_result(
Customer(
id=i,
name=f"customer name {i}",
description=f"customer description {i} new"
) for i in range(chunk, chunk + 1000)
)
session.flush()
session.commit()

I think that either this was causing your slowness in the first query:
existing = db.session.query(Task).filter_by(challenge_slug=slug)
Also you should probably change this:
existing.merge_result(
[task_from_json(slug, **task) for task in taskdata])
To:
existing.merge_result(
(task_from_json(slug, **task) for task in taskdata))
As that should save you some memory and time, as the list won't be generated in memory before sending it to the merge_result method.

Google App Engine (Python) handle large number of writing tasks

I am reading a large amount of data from an API provider. Once get the response, I need to scan through and repackage the data and put into App Engine datastore. A particular big account will contain ~50k entries.
Every time I get some entries from the API, I will store 500 entries as a batch in a temp table and send the processing task to a queue. In case too many tasks get jammed inside one queue, I use 6 queues in total:
count = 0
worker_number = 6
for folder, property in entries:
data[count] = {
# repackaging data here
}
count = (count + 1) % 500
if count == 0:
cache = ClientCache(parent=user_key, data=json.dumps(data))
cache.put()
params = {
'access_token': access_token,
'client_key': client.key.urlsafe(),
'user_key': user_key.urlsafe(),
'cache_key': cache.key.urlsafe(),
}
taskqueue.add(
url=task_url,
params=params,
target='dbworker',
queue_name='worker%d' % worker_number)
worker_number = (worker_number + 1) % 6
And the task_url will lead to the following:
logging.info('--------------------- Process File ---------------------')
user_key = ndb.Key(urlsafe=self.request.get('user_key'))
client_key = ndb.Key(urlsafe=self.request.get('client_key'))
cache_key = ndb.Key(urlsafe=self.request.get('cache_key'))
cache = cache_key.get()
data = json.loads(cache.data)
for property in data.values():
logging.info(property)
try:
key_name = '%s%s' % (property['key1'], property['key2'])
metadata = Metadata.get_or_insert(
key_name,
parent=user_key,
client_key=client_key,
# ... other info
)
metadata.put()
except StandardError, e:
logging.error(e.message)
All the tasks are running in the backend.
With such structure, it's working fine. well... most of time. But sometimes I get this error:
2013-09-19 15:10:07.788
suspended generator transaction(context.py:938) raised TransactionFailedError(The transaction could not be committed. Please try again.)
W 2013-09-19 15:10:07.788
suspended generator internal_tasklet(model.py:3321) raised TransactionFailedError(The transaction could not be committed. Please try again.)
E 2013-09-19 15:10:07.789
The transaction could not be committed. Please try again.
It seems to be the problem of writing into datastore too frequently? I want to find out how I can balance the pace and let the worker run smoothly...
Also is there any other way I can improve the performance further? My queue configuration is something like this:
- name: worker0
rate: 120/s
bucket_size: 100
retry_parameters:
task_retry_limit: 3

You are writing single entities at a time.
How about modifing your code to write in batches using ndb.put_multi that will reduce the round trip time for each transaction.
And why are you using get_or_insert as you are overwriting the record each time. You might as well just write. Both of these will reduce the workload a lot

Using yield with multiple ndb.get_multi_async

I am trying to improve efficiency of my current query from appengine datastore. Currently, I am using a synchronous method:
class Hospital(ndb.Model):
name = ndb.StringProperty()
buildings= ndb.KeyProperty(kind=Building,repeated=True)
class Building(ndb.Model):
name = ndb.StringProperty()
rooms= ndb.KeyProperty(kind=Room,repeated=True)
class Room(ndb.Model):
name = ndb.StringProperty()
beds = ndb.KeyProperty(kind=Bed,repeated=True)
class Bed(ndb.Model):
name = ndb.StringProperty()
.....
Currently I go through stupidly:
currhosp = ndb.Key(urlsafe=valid_hosp_key).get()
nbuilds = ndb.get_multi(currhosp.buildings)
for b in nbuilds:
rms = ndb.get_multi(b.rooms)
for r in rms:
bds = ndb.get_multi(r.beds)
for b in bds:
do something with b object
I would like to transform this into a much faster query using get_multi_async
My difficulty is in how I can do this?
Any ideas?
Best
Jon

using the given structures above, it is possible, and was confirmed that you can solve this with a set of tasklets. It is a SIGNIFICANT speed up over the iterative method.
#ndb.tasklet
def get_bed_info(bed_key):
bed_info = {}
bed = yield bed_key.get_async()
format and store bed information into bed_info
raise ndb.Return(bed_info)
#nbd.tasklet
def get_room_info(room_key):
room_info = {}
room = yield room_key.get_async()
beds = yield map(get_bed_info,room.beds)
store room info in room_info
room_info["beds"] = beds
raise ndb.Return(room_info)
#ndb.tasklet
def get_building_info(build_key):
build_info = {}
building = yield build_key.get_async()
rooms = yield map(get_room_info,building.rooms)
store building info in build_info
build_info["rooms"] = rooms
raise ndb.Return(build_info)
#ndb.toplevel
def get_hospital_buildings(hospital_object):
buildings = yield map(get_building_info,hospital_object.buildings)
raise ndb.Return(buildings)
Now comes the main call from the hospital function where you have the hospital object (hosp).
hosp_info = {}
buildings = get_hospital_buildings(hospital_obj)
store hospital info in hosp_info
hosp_info["buildings"] = buildings
return hosp_info
There you go! It is incredibly efficient and lets the schedule complete all the information in the fastest possible manner within the GAE backbone.

You can do something with query.map(). See https://developers.google.com/appengine/docs/python/ndb/async#tasklets and https://developers.google.com/appengine/docs/python/ndb/queryclass#Query_map

Its impossible.
Your 2nd query (ndb.get_multi(b.rooms)) depends on the result of your first query.
So pulling it async dosnt work, as at this point the (first) result of the first query has to be avaiable anyway.
NDB does something like that in the background (it allready buffers the next items of ndb.get_multi(currhosp.buildings) while you process the first result).
However, you could use denormalization, i.e. keeping a big table with one entry per Building-Room-Bed pair, and pull your results from that table.
If you have more reads than writes to this table, this will get you a massive speed improvement (1 DB read, instead of 3).

Do I need to use transactions in google appengine

update 0
My def post() code has changed dramatically because originally it was base on a digital form which included both checkboxes and text entry fields, not just text entry fields, which is the current design to be more paper-like. However, as a result I have other problems which may be solved by one of the proposed solutions, but I cannot exactly follow that proposed solution, so let me try to explain new design and the problems.
The smaller problem is the inefficiency of my implementation because in the def post() I create a distinct name for each input timeslot which is a long string <courtname><timeslotstarthour><timeslotstartminute>. In my code this name is read in a nested for loop with the following snippet [very inefficient, I imagine].
tempreservation=courtname+str(time[0])+str(time[1])
name = self.request.get('tempreservation',None)
The more serious immediate problem is that my def post() code is never read and I cannot figure out why (and maybe it wasn't being read before, either, but I had not tested that far). I wonder if the problem is that for now I want both the post and the get to "finish" the same way. The first line below is for the post() and the second is for the get().
return webapp2.redirect("/read/%s" % location_id)
self.render_template('read.html', {'courts': courts,'location': location, ... etc ...}
My new post() is as follows. Notice I have left in the code the logging.info to see if I ever get there.
class MainPageCourt(BaseHandler):
def post(self, location_id):
logging.info("in MainPageCourt post ")
startTime = self.request.get('startTime')
endTime = self.request.get('endTime')
day = self.request.get('day')
weekday = self.request.get('weekday')
nowweekday = self.request.get('nowweekday')
year = self.request.get('year')
month = self.request.get('month')
nowmonth = self.request.get('nowmonth')
courtnames = self.request.get_all('court')
for c in courtnames:
logging.info("courtname: %s " % c)
times=intervals(startTime,endTime)
for courtname in courtnames:
for time in times:
tempreservation=courtname+str(time[0])+str(time[1])
name = self.request.get('tempreservation',None)
if name:
iden = courtname
court = db.Key.from_path('Locations',location_id,'Courts', iden)
reservation = Reservations(parent=court)
reservation.name = name
reservation.starttime = time
reservation.year = year
reservation.nowmonth = int(nowmonth)
reservation.day = int(day)
reservation.nowweekday = int(nowweekday)
reservation.put()
return webapp2.redirect("/read/%s" % location_id)
Eventually I want to add checking/validating to the above get() code by comparing the existing Reservations data in the datastore with the implied new reservations, and kick out to an alert which tells the user of any potential problems which she can address.
I would also appreciate any comments on these two problems.
end of update 0
My app is for a community tennis court. I want to replace the paper sign up sheet with an online digital sheet that mimics a paper sheet. As unlikely as it seems there may be "transactional" conflicts where two tennis appointments collide. So how do I give the second appointment maker a heads up to the conflict but also give the successful party the opportunity to alter her appointment like she would on paper (with an eraser).
Each half hour is a time slot on the form. People normally sign up for multiple half hours at one time before "submitting".
So in my code within a loop I do a get_all. If any get succeeds I want to give the user control over whether to accept the put() or not. I am still thinking the put() would be an all or nothing, not selective.
So my question is, do I need to make part of the code use an explicit "transaction"?
class MainPageCourt(BaseHandler):
def post(self, location_id):
reservations = self.request.get_all('reservations')
day = self.request.get('day')
weekday = self.request.get('weekday')
nowweekday = self.request.get('nowweekday')
year = self.request.get('year')
month = self.request.get('month')
nowmonth = self.request.get('nowmonth')
if not reservations:
for r in reservations:
r=r.split()
iden = r[0]
temp = iden+' '+r[1]+' '+r[2]
court = db.Key.from_path('Locations',location_id,'Courts', iden)
reservation = Reservations(parent=court)
reservation.starttime = [int(r[1]),int(r[2])]
reservation.year = int(r[3])
reservation.nowmonth = int(r[4])
reservation.day = int(r[5])
reservation.nowweekday = int(nowweekday)
reservation.name = self.request.get(temp)
reservation.put()
return webapp2.redirect("/read/%s" % location_id)
else:
... this important code is not written, pending ...
return webapp2.redirect("/adjust/%s" % location_id)

Have a look at optimistic concurrency control:
http://en.wikipedia.org/wiki/Optimistic_concurrency_control

You can check for the availability of the time slots in a given Court, and write the corresponding Reservations child entities only if their stat_time don't conflict.
Here is how you would do it for 1 single reservation using a ancestor Query:
#ndb.transactional
def make_reservation(court_id, start_time):
court = Court(id=court_id)
existing = Reservation.query(Reservation.start_time == start_time,
ancestor=court.key).fetch(2, keys_only=True)
if len(existing):
return False, existing[0]
return True, Reservation(start_time=start_time, parent=court.key).put()
Alternativly, if you make the slot part of the Reservation id, you can remove the query and construct the Reservation entity keys to check if they already exists:
#ndb.transactional
def make_reservations(court_id, slots):
court = Court(id=court_id)
rs = [Reservation(id=s, parent=court.key) for s in slots]
existing = ndb.get_multi(r.key for r in rs)
if any(existing):
return False, existing
return True, ndb.put_multi(rs)

I think you should always use transactions, but I don't think your concerns are best addressed by transactions.
I think you should implement a two-stage reservation system - which is what you see on most shopping bags and ticketing companies.
Posting the form creates a "reservation request" , which blocks out the time(s) as "in someone else's shopping bag" for 5-15 minutes
Users must submit again on an approval screen to confirm the times. You can give them the ability to update the conflicts on that screen too, and reset the 'reservation lock' on the timeslots as long as possible.
A cronjob - or a faked one that is triggered by a request coming in at a certain window - clears out expired reservation locks and returns the times back to the pool of available slots.

Buying many products at once from a webshop

It's quite simple to program just one product to get sold via my payment system (api.payson.se) but buying many products at the same time in various amounts posed trouble for me since it was not implemented and I didn't have a good idea how to do it. Now I have a solution that I just put together which works but the modelling and control flow is kind of very quick and dirty and I wonder whether this is even acceptable or should need a rewrite. The system now behaves so that I can enter the shop (step 1) and enter the amounts for the products I want to buy
Then if I press Buy ("Köp") my Python calculates the sum correctly and this works whatever combination of amounts and products I have saying which the total is and this page could also list the specification but that is not implemented yet:
The total sum is Swedish currency is correct and it has written an order to my datastore with status "unpaid" and containing which products are ordered and what amount for every product in the datastore:
The user can then either cancel the purchase or go on and actually pay through the payment system api.payson.se:
So all I need to do is listen to the response from Payson and update the status of the orders that get paid. But my solution does not look very clean and I wonder if I can go on with code like that, the data model is two stringlists, one with the amounts and one with which product (Item ID) since that was the easiest way I could solve it but it is then not directly accessible and only from the lists. Is there a better data model I can use?
The code that does the handling is slightly messy and could use a better data model and a better algorithm than just strings and lists:
class ShopHandler(NewBaseHandler):
#user_required
def get(self):
user = \
auth_models.User.get_by_id(long(self.auth.get_user_by_session()['user_id'
]))
self.render_jinja('shop.htm', items=Item.recent(), user=user)
return ''
#user_required
def post(self, command):
user = \
auth_models.User.get_by_id(long(self.auth.get_user_by_session()['user_id'
]))
logging.info('in shophandler http post item id'+self.request.get('item'))
items = [ self.request.get('items[1]'),self.request.get('items[2]'),self.request.get('items[3]'),self.request.get('items[4]'),self.request.get('items[5]'),self.request.get('items[6]'),self.request.get('items[7]'),self.request.get('items[8]') ]
amounts = [ self.request.get('amounts[1]'),self.request.get('amounts[2]'),self.request.get('amounts[3]'),self.request.get('amounts[4]'),self.request.get('amounts[5]'),self.request.get('amounts[6]'),self.request.get('amounts[7]'),self.request.get('amounts[8]') ]
total = 0
total = int(self.request.get('amounts[1]'))* long(Item.get_by_id(long(self.request.get('items[1]'))).price_fraction()) if self.request.get('amounts[1]') else total
total = total + int(self.request.get('amounts[2]'))* long(Item.get_by_id(long(self.request.get('items[2]'))).price_fraction()) if self.request.get('amounts[2]') else total
total = total + int(self.request.get('amounts[3]'))* long(Item.get_by_id(long(self.request.get('items[3]'))).price_fraction()) if self.request.get('amounts[3]') else total
total = total + int(self.request.get('amounts[4]'))* long(Item.get_by_id(long(self.request.get('items[4]'))).price_fraction()) if self.request.get('amounts[4]') else total
total = total + int(self.request.get('amounts[5]'))* long(Item.get_by_id(long(self.request.get('items[5]'))).price_fraction()) if self.request.get('amounts[5]') else total
total = total + int(self.request.get('amounts[6]'))* long(Item.get_by_id(long(self.request.get('items[6]'))).price_fraction()) if self.request.get('amounts[6]') else total
total = total + int(self.request.get('amounts[7]'))* long(Item.get_by_id(long(self.request.get('items[7]'))).price_fraction()) if self.request.get('amounts[7]') else total
total = total + int(self.request.get('amounts[8]'))* long(Item.get_by_id(long(self.request.get('items[8]'))).price_fraction()) if self.request.get('amounts[8]') else total
logging.info('total:'+str(total))
trimmed = str(total)+',00'
order = model.Order(status='UNPAID')
order.items = items
order.amounts = amounts
order.put()
logging.info('order was written')
ExtraCost = 0
GuaranteeOffered = 2
OkUrl = 'http://' + self.request.host + r'/paysonreceive/'
Key = '3110fb33-6122-4032-b25a-329b430de6b6'
text = 'niklasro#gmail.com' + ':' + str(trimmed) + ':' + str(ExtraCost) \
+ ':' + OkUrl + ':' + str(GuaranteeOffered) + Key
m = hashlib.md5()
BuyerEmail = user.email
AgentID = 11366
self.render_jinja('order.htm', order=order, user=user, total=total, Generated_MD5_Hash_Value = hashlib.md5(text).hexdigest(), BuyerEmail=user.email, Description='Bnano Webshop', trimmed=trimmed, OkUrl=OkUrl, BuyerFirstName=user.firstname, BuyerLastName=user.lastname)
My model for the order, where not all fields are used, is
class Order(db.Model):
'''a transaction'''
item = db.ReferenceProperty(Item)
items = db.StringListProperty()
amounts = db.StringListProperty()
owner = db.UserProperty()
purchaser = db.UserProperty()
created = db.DateTimeProperty(auto_now_add=True)
status = db.StringProperty( choices=( 'NEW', 'CREATED', 'ERROR', 'CANCELLED', 'RETURNED', 'COMPLETED', 'UNPAID', 'PAID' ) )
status_detail = db.StringProperty()
reference = db.StringProperty()
secret = db.StringProperty() # to verify return_url
debug_request = db.TextProperty()
debug_response = db.TextProperty()
paykey = db.StringProperty()
shipping = db.TextProperty()
And the model for a product ie an item is
class Item(db.Model):
'''an item for sale'''
owner = db.UserProperty() #optional
created = db.DateTimeProperty(auto_now_add=True)
title = db.StringProperty(required=True)
price = db.IntegerProperty() # cents / fractions, use price_decimal to get price in dollar / wholes
image = db.BlobProperty()
enabled = db.BooleanProperty(default=True)
silver = db.IntegerProperty() #number of silver
def price_dollars( self ):
return self.price / 100.0
def price_fraction( self ):
return self.price / 100.0
def price_silver( self ): #number of silvers an item "is worth"
return self.silver / 1000.000
def price_decimal( self ):
return decimal.Decimal( str( self.price / 100.0 ) )
def price_display( self ):
return str(self.price_fraction()).replace('.',',')
#staticmethod
def recent():
return Item.all().filter( "enabled =", True ).order('-created').fetch(10)
I think you now have an idea what's going on and that this kind of works towards the user but the code is not looking good. Do you think I can leave the code like this and go on and keep this "solution" or must I do a rewrite to make it more proper? There are only 8 products in the store and with this solution it becomes difficult to add a new Item for sale since then I must reprogram the script which is not perfect.
Could you comment or answer, I'd be very glad to get some feedback about this quick and dirty solution to my use case.
Thank you
Update
I did a rewrite to allow for adding new products and the following seems better than the previous:
class ShopHandler(NewBaseHandler):
#user_required
def get(self):
user = \
auth_models.User.get_by_id(long(self.auth.get_user_by_session()['user_id'
]))
self.render_jinja('shop.htm', items=Item.recent(), user=user)
return ''
#user_required
def post(self, command):
user = \
auth_models.User.get_by_id(long(self.auth.get_user_by_session()['user_id'
]))
logging.info('in shophandler http post')
total = 0
order = model.Order(status='UNPAID')
for item in self.request.POST:
amount = self.request.POST[item]
logging.info('item:'+str(item))
purchase = Item.get_by_id(long(item))
order.items.append(purchase.key())
order.amounts.append(int(amount))
order.put()
price = purchase.price_fraction()
logging.info('amount:'+str(amount))
logging.info('product price:'+str(price))
total = total + price*int(amount)
logging.info('total:'+str(total))
order.total = str(total)
order.put()
trimmed = str(total).replace('.',',') + '0'
ExtraCost = 0
GuaranteeOffered = 2
OkUrl = 'http://' + self.request.host + r'/paysonreceive/'
Key = '6230fb54-7842-3456-b43a-349b340de3b8'
text = 'niklasro#gmail.com' + ':' + str(trimmed) + ':' \
+ str(ExtraCost) + ':' + OkUrl + ':' \
+ str(GuaranteeOffered) + Key
m = hashlib.md5()
BuyerEmail = user.email # if user.email else user.auth_id[0]
AgentID = 11366
self.render_jinja(
'order.htm',
order=order,
user=user,
total=total,
Generated_MD5_Hash_Value=hashlib.md5(text).hexdigest(),
BuyerEmail=user.email,
Description='Bnano Webshop',
trimmed=trimmed,
OkUrl=OkUrl,
BuyerFirstName=user.firstname,
BuyerLastName=user.lastname,
)

Man, this is a really strange code. If you will want to add new items in you shop you must rewrite you shop's script.
At the first unlink your items from interface, you must send POST request to controller with your items ids and quantity, i don know how work gae request object, but it must be like that:
from your order page make POST request with dict of items which really need {"item_id":"qnt"}.
When in the controller you can fetch all objects like:
for item, qnt in request.POST:
{do something with each item, for example where you can sum total}
and etc
Don't link controllers with your interfaces directly. You must write more abstraction code, if you want make really flexible app.

I'm going to try to focus on one very obvious problem with your code, but there are lots of problems with it that I'm not going to get into. My advice is to stop right now. You're implementing a web-based payment system. You really should leave that to people with more skills and experience. "Web-based" is a pretty difficult thing to get right whilst ensuring security, but an online payment system is the sort of thing that well-paid consultants with decades of experience are well-paid for, and they still manage to get it wrong pretty often. You're opening yourself up to a lot of legal liability.
If you're still dead set on it, please read The Python Tutorial cover to cover, possibly several times. Python is a very different language to whatever classical OOP language you're mentally cramming into it. After that, at least leaf through the other documentation. If you're having trouble with these, pick up an O'Reilly book on Python; approaching it from another angle should help. After you done all this (and maybe at the same time), write as much code as you can that is not going to get you sued into oblivion if you do it wrong. Then maybe you can write an order/payment system.
I'm sorry if this sounds harsh, but the world doesn't need any more shoddy web stores; 1999 took care of that for us.
Anyway, on to your code :D When you write something repetitive and copy-pasted like this:
items = [ self.request.get('items[1]'),self.request.get('items[2]'),self.request.get('items[3]'),self.request.get('items[4]'),self.request.get('items[5]'),self.request.get('items[6]'),self.request.get('items[7]'),self.request.get('items[8]') ]
You should be thinking to yourself, "Wait a second! Repetitive task are exactly what computers are designed to do." You could get your text editor to do it (see Vim Macros), but concise (but not too concise ;) code is always better than long code, since you make it faster to maintain, less prone to programmer error, and easier to debug, not to mention the amount of time you save not copying and pasting, so let's improve the code.
Here's how I would revise this in Python (advanced programmers do this in their heads, or just skip to the end):
#1. with a for loop
MAX_ITEMS = 8
items = []
for i in range(MAX_ITEMS):
items.append(self.request.get('items[{}]'.format(i + 1))
#2. with a list comprehension
MAX_ITEMS = 8
items = [self.request.get('items[{}]'.format(i + 1)) for i in range(MAX_ITEMS)]
Actually, having a limit to the number of items is rather amateurish and will only frustrate your users. You can fix it like this:
items = []
i = 0
while True:
try:
items.append(self.request[i + 1]) #attempt to get the next item
except IndexError as exc: #but if it fails...
break #we must be at the last one
i += 1
I think this is the way you should leave it for now because it's clear but not repetitive. However, you could shorten it even further using functions from the itertools module.
A few quick tips:
Avoid string concatenation, especially where user-supplied strings and especially especially when user-supplied string from over the web are concerned. Use str.format and "%d" % (5,) modulus string formatting. BONUS: You don't have to convert everything to strings!
Get those constants (e.g., ExtraCost = 2) out of the middle and put them somewhere safe (at the top of the module, or in a special file in the package)
You trust the user way too much: At for item in self.request.POST:, you're assuming everything in the request is going to be an item, and you do zero validation.
Please, please, please. Never turn off autocomplete. I really don't know why that attribute exists, except to annoy.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to reduce number of requests to the Datastore - python

Related

Looking for a better strategy for an SQLAlchemy bulk upsert

Google App Engine (Python) handle large number of writing tasks

Using yield with multiple ndb.get_multi_async

Do I need to use transactions in google appengine

Buying many products at once from a webshop

Categories

Resources