I'm developing a REST API in Python 3.6 using Flask-Rebar and PostgreSQL and am having trouble trying to execute some queries simultaneously using psycopg2.
More specifically, I execute a query and require the id value from this query for use in the next query. The first query successfully returns the expected value, however the function that calls the subsequent query doesn't even execute.
Here is the function responsible for calling the query function:
psql = PostgresHandler()
user_ids = [1, 5, 9]
horse = {"name": "Adam", "age": 400}
def createHorseQuery(user_ids, horse):
time_created = strftime("%Y-%m-%dT%H:%M:%SZ")
fields, values = list(), list()
for key, val in horse.items():
fields.append(key)
values.append(val)
fields.append('time_created')
values.append(time_created)
fields = str(fields).replace('[', '(').replace(']', ')').replace("'", "")
values = str(values).replace('[', '(').replace(']', ')')
create_horse_query = f"INSERT INTO horse {fields} VALUES {values} RETURNING horse_id;"
horse_id = None
for h_id in psql.queryDatabase(create_horse_query, returnInsert=True):
horse_id = h_id
link_user_query = ''
for u_id in user_ids:
link_user_query += f"INSERT INTO user_to_horse (user_id, horse_id) VALUES ({u_id}, {horse_id['horse_id']});"
psql.queryDatabase(link_user_query)
return horse_id, 201
Here is the PostgresHandler() class that contains the function queryDatabase:
class PostgresHandler(object):
def __init__(self):
self.connectToDatabase()
def connectToDatabase(self):
self.connection = psycopg2.connect(
host = '...',
user = '...',
password = '...',
database = '...'
)
def queryDatabase(self, query, returnInsert=False):
cursor = self.connection.cursor(cursor_factory=RealDictCursor)
cursor.execute(query)
if "SELECT" in query.upper():
for result in cursor.fetchall():
yield result
elif "INSERT" in query.upper():
if returnInsert:
for result in cursor.fetchall():
yield result
self.connection.commit()
cursor.close()
I can verify that the psql.queryDatabase(create_horse_query, returnInsert=True) operation is successful by querying the database manually and comparing against the return value, h_id.
I can verify that link_user_query is created and contains the user_ids and horse_id as expected by printing. I know the query that's generated is okay as I have tested this manually in the database.
It appears that the function called on the line psql.queryDatabase(link_user_query) is never actually called as a print statement at the very top of the queryDatabase function does not get executed.
I've tried with delays between the two query function calls, initialising a new connection with each function call and many other things to no avail and I am absolutely stumped. Any insight is greatly appreciated.
EDIT: FYI, The createHorseQuery function returns successfully and displays the two returned values as expected.
queryDatabase in your code is a generator because it contains a yield statement. The generator only actually does things when you iterate over it (i.e. cause __next__() to be called). Consider the following:
def gen():
print("Gen is running!")
yield "Gen yielded: hello"
print("Gen did: commit")
print("***Doing stuff with b***")
b = gen()
for a in b:
print(a)
print("***Doing stuff with c***")
c = gen()
print("***Done***")
Output is:
***Doing stuff with b***
Gen is running!
Gen yielded: hello
Gen did: commit
***Doing stuff with c***
***Done***
When we called gen() to create c we didn't actually run it, we just instantiated it as a generator.
We could force it to run by calling __next__() on it a bunch of times:
c.__next__()
try:
c.__next__()
except StopIteration:
print("Iteration is over!")
outputs:
Gen is running!
Gen did: commit
Iteration is over!
But really, you should probably not use a generator like this where you are never intending to yield from it. You could consider adding a new function which is not a generator called insertSilently (or similar).
Related
I am currently trying to append to the output list in my code the id of the query result. I can get it to do one of the ids but it will override the first one how can I change my code to allow any amount of looping to the output.append(q.id)
Here is the code:
#app.route('/new-mealplan', methods=['POST'])
def create_mealplan():
data = request.get_json()
recipes = data['recipes']
output = []
for recipe in recipes:
try:
query = Recipes.query.filter(func.lower(Recipes.recipe_name) == func.lower(recipe)).all()
# print(recipe)
if query:
query = Recipes.query.filter(func.lower(Recipes.recipe_name) == func.lower(recipe)).all()
for q in query:
output.append(q.id)
finally:
return jsonify({"data" : output})
To fix this I removed the
Try and Finally blocks.
Then returned after the for-loop was completed.
I'm trying to write unittests for my own Elasticsearch client. It uses the client from elasticsearch-py.
Most of my tests are fine, but when running a test on my own search() function (which uses the search() function from Elasticsearch client) I get very random behaviour. This is the way my test is implemented:
def setUp(self) -> None:
self.es = ESClient(host="localhost")
self.es_acc = ESClient()
self.connection_res = (False, {})
self.t = self.es_acc.get_connection_status(self._callback)
self.t.join()
# Create test index and index some documents
self.es.create_index(self.TEST_INDEX)
names = ["Gregor", "Alice", "Per Svensson", "Mats Hermelin", "Mamma Mia"
, "Eva Dahlgren", "Per Morberg", "Maja Larsson", "Ola Salo", "Magrecievic Holagrostokovic"]
self.num_docs = len(names)
self.payload = []
random.seed(123)
for i, name in enumerate(names):
n = name.split(" ")
fname = n[0]
lname = n[1] if len(n) > 1 else n[0]
self.payload.append({"name": {"first": fname, "last": lname}, "age": random.randint(-100, 100),
"timestamp": datetime.utcnow() - timedelta(days=1 * i)})
self.es.upload(self.TEST_INDEX, self.payload, ids=list(range(len(names))))
def test_search(self):
# Test getting docs based on ids
ids = ["1", "4", "9"]
status, hits = self.es.search(self.TEST_INDEX, ids=ids) # Breakpoint
docs = hits["hits"]["hits"]
self.assertTrue(status, "Status not correct for search!")
returned_ids = [d["_id"] for d in docs]
names = [d["_source"]["name"] for d in docs]
self.assertListEqual(sorted(returned_ids), ids, "Returned ids from search not correct!")
self.assertListEqual(names, [self.payload[i]["name"] for i in [1, 4, 9]], "Returned source from search not correct!")
In setUp() I'm just uploading a few documents to test on, so there should always be 10 documents to test on. Below is an excerpt from my search() function.
if ids:
try:
q = Query().ids(ids).compile_and_get()
res = self.es.search(index=index, body=q)
print(res)
return True, res
except exceptions.ElasticsearchException as e:
self._handle_elastic_exceptions("search", e, index=index)
return False, {}
I've implemented Query. Anyway, when I just run the test, I ALMOST always get 0 hits. But if I debug the application, with a breakpoint in test_search() on the row where I make the call to search() and step, everything works fine. If I put it just one line below, I get 0 hits again. What is going on? Why is it not blocking correctly?
It seems like I found my solution!
I did not understand that setUp was called on every test method. This was actually not the problem however.
The problem is that for some tests, uploading documents simply took to much time (which was done in setUp) and so when the test started, the documents did not exist yet! Solution: add sleep(1) to the end of setUp.
In the code below, the worker function checks if the data passed is valid and if it is valid, it returns a dictionary which will be used in a bulk SQLAlchemy Core insert. If its invalid, I want the None value not to be added to the receiving_list because if it is, the bulk insert will fail as a single None value cannot map out to the table structure.
from datetime import datetime
from sqlalchemy import Table
import multiprocessing
CONN = Engine.connect() #Engine is imported from another module
NUM_CONSUMERS = multiprocessing.cpu_count()
p = multiprocessing.Pool(NUM_CONSUMERS)
def process_data(data):
#Long process to validate data
if is_valid_data(data) == True:
returned_dict = {}
returned_dict['created_at'] = datetime.now()
returned_dict['col1'] = data[0]
returned_dict['colN'] = data[N]
return returned_dict
else:
return None
def spawn_some_processes(data):
table_to_insert = Table('postgresql_database_table', meta, autoload=True, autoload_with=Engine)
While True:
#Get some data here and pass it on to the worker
receiving_list = p.map(process_data, data_to_process)
try:
if len(receiving_list) > 0:
trans = CONN.begin()
CONN.execute(table_to_insert.insert(), receiving_list)
trans.commit()
except IntegrityError:
trans.rollback()
except:
trans.rollback()
Trying to rephrase the question, how can I stop a spawned process from adding to receiving_list when the value None is returned by the spawned process?
A workaround is incorporating a queue with queue.put() and queue.get() that will put only valid data. The disadvantage with this is that after the processes are over, I have to then unpack the queue which adds overhead. My ideal solution would be one where a clean list of dictionaries is returned which SQLAlchemy can use to do the bulk insert
You can just remove the None entries from the list:
received_list = filter(None, p.map(process_data, data_to_process))
This is pretty quick even for really huge lists:
>>> timeit.timeit('l = filter(None, l)', 'l = range(0,10000000)', number=1)
0.47683095932006836
Note that using filter will remove anything where bool(val) is False, like empty strings, empty lists, etc. This should be fine for your use-case, though.
I'm trying to fetch results in a python2.7 appengine app using cursors, but each time I use with_cursor() it fetches the same result set.
query = Model.all().filter("profile =", p_key).order('-created')
if r.get('cursor'):
query = query.with_cursor(start_cursor = r.get('cursor'))
cursor = query.cursor()
objs = query.fetch(limit=10)
count = len(objs)
for obj in objs:
...
Each time through I'm getting same 10 results. I'm thinkng it has to do with using end_cursor, but how do I get that value if query.cursor() is returning the start_cursor. I've looked through the docs but this is poorly documented.
Your formatting is a bit screwy by the way. Looking at your code (which is incomplete and therefore potentially leaving something out.) I have to assume you have forgotten to store the cursor after fetching results (or return to the user - I am assuming r is a request ?).
So after you have fetched some data you need to call cursor() on the query. e.g This function counts all entities using a cursor.
def count_entities(kind):
c = None
count = 0
q = kind.all(keys_only=True)
while True:
if c:
q.with_cursor(c)
i = q.fetch(1000)
count = count + len(i)
if not i:
break
c = q.cursor()
return count
See how after fetch() has been called the c=q.cursor() call and it's is used as the cursor next time through the loop.
Here's what finally worked:
query = Model.all().filter("profile =", p_key).order('-created')
if request.get('cursor'):
query = query.with_cursor(request.get('cursor'))
objs = query.fetch(limit=10)
cursor = query.cursor()
for obj in objs:
...
I am trying to update an atomic count counter with Python Boto 2.3.0, but can find no documentation for the operation.
It seems there is no direct interface, so I tried to go to "raw" updates using the layer1 interface, but I was unable to complete even a simple update.
I tried the following variations but all with no luck
dynoConn.update_item(INFLUENCER_DATA_TABLE,
{'HashKeyElement': "9f08b4f5-d25a-4950-a948-0381c34aed1c"},
{'new': {'Value': {'N':"1"}, 'Action': "ADD"}})
dynoConn.update_item('influencer_data',
{'HashKeyElement': "9f08b4f5-d25a-4950-a948-0381c34aed1c"},
{'new': {'S' :'hello'}})
dynoConn.update_item("influencer_data",
{"HashKeyElement": "9f08b4f5-d25a-4950-a948-0381c34aed1c"},
{"AttributesToPut" : {"new": {"S" :"hello"}}})
They all produce the same error:
File "/usr/local/lib/python2.6/dist-packages/boto-2.3.0-py2.6.egg/boto/dynamodb/layer1.py", line 164, in _retry_handler
data)
boto.exception.DynamoDBResponseError: DynamoDBResponseError: 400 Bad Request
{u'Message': u'Expected null', u'__type': u'com.amazon.coral.service#SerializationException'}
I also investigated the API docs here but they were pretty spartan.
I have done a lot of searching and fiddling, and the only thing I have left is to use the PHP API and dive into the code to find where it "formats" the JSON body, but that is a bit of a pain. Please save me from that pain!
Sorry, I misunderstood what you were looking for. You can accomplish this via layer2 although there is a small bug that needs to be addressed. Here's some Layer2 code:
>>> import boto
>>> c = boto.connect_dynamodb()
>>> t = c.get_table('counter')
>>> item = t.get_item('counter')
>>> item
{u'id': 'counter', u'n': 1}
>>> item.add_attribute('n', 20)
>>> item.save()
{u'ConsumedCapacityUnits': 1.0}
>>> item # Here's the bug, local Item is not updated
{u'id': 'counter', u'n': 1}
>>> item = t.get_item('counter') # Refetch item just to verify change occurred
>>> item
{u'id': 'counter', u'n': 21}
This results in the same over-the-wire request as you are performing in your Layer1 code, as shown by the following debug output.
2012-04-27 04:17:59,170 foo [DEBUG]:StringToSign:
POST
/
host:dynamodb.us-east-1.amazonaws.com
x-amz-date:Fri, 27 Apr 2012 11:17:59 GMT
x-amz-security- token:<removed> ==
x-amz-target:DynamoDB_20111205.UpdateItem
{"AttributeUpdates": {"n": {"Action": "ADD", "Value": {"N": "20"}}}, "TableName": "counter", "Key": {"HashKeyElement": {"S": "counter"}}}
If you want to avoid the initial GetItem call, you could do this instead:
>>> import boto
>>> c = boto.connect_dynamodb()
>>> t = c.get_table('counter')
>>> item = t.new_item('counter')
>>> item.add_attribute('n', 20)
>>> item.save()
{u'ConsumedCapacityUnits': 1.0}
Which will update the item if it already exists or create it if it doesn't yet exist.
For those looking for the answer I have found it.
First IMPORTANT NOTE, I am currently unaware of what is going on BUT for the moment, to get a layer1 instance I have had to do the following:
import boto
AWS_ACCESS_KEY=XXXXX
AWS_SECRET_KEY=YYYYY
dynoConn = boto.connect_dynamodb(AWS_ACCESS_KEY, AWS_SECRET_KEY)
dynoConnLayer1 = boto.dynamodb.layer1.Layer1(AWS_ACCESS_KEY, AWS_SECRET_KEY)
Essentially instantiating a layer2 FIRST and THEN a layer 1.
Maybe Im doing something stupid but at this point Im just happy to have it working....
I'll sort the details later. THEN...to actually do the atomic update call:
dynoConnLayer1.update_item("influencer_data",
{"HashKeyElement":{"S":"9f08b4f5-d25a-4950-a948-0381c34aed1c"}},
{"direct_influence":
{"Action":"ADD","Value":{"N":"20"}}
}
);
Note in the example above Dynamo will ADD 20 to what ever the current value is and this operation will be atomic meaning other operations happening at the "same time" will be correctly "scheduled" to happen after the new value has been established as +20 OR before this operation is executed. Either way the desired effect will be accomplished.
Be certain to do this on the instance of the layer1 connection as the layer2 will throw errors given it expects a different set of parameter types.
Thats all there is to it!!!! Just so folks know, I figured this out using the PHP SDK. Takes a very short time to install and set up AND THEN when you do a call, the debug data will actually show you the format of the HTTP request body so you will be able to copy/model your layer1 parameters after the example. Here is the code I used to do the atomic update in PHP:
<?php
// Instantiate the class
$dynamodb = new AmazonDynamoDB();
$update_response = $dynamodb->update_item(array(
'TableName' => 'influencer_data',
'Key' => array(
'HashKeyElement' => array(
AmazonDynamoDB::TYPE_STRING=> '9f08b4f5-d25a-4950-a948-0381c34aed1c'
)
),
'AttributeUpdates' => array(
'direct_influence' => array(
'Action' => AmazonDynamoDB::ACTION_ADD,
'Value' => array(
AmazonDynamoDB::TYPE_NUMBER => '20'
)
)
)
));
// status code 200 indicates success
print_r($update_response);
?>
Hopefully this will help other up until the Boto layer2 interface catches up...or someone simply figures out how to do it in level2 :-)
I'm not sure this is truly an atomic counter, since when you increment the value of 1, another call call could increment the number by 1, so that when you "get" the value, it is not the value that you would expect.
For instance, putting the code by garnaat, which is marked as the accepted answer, I see that when you put it in a thread, it does not work:
class ThreadClass(threading.Thread):
def run(self):
conn = boto.dynamodb.connect_to_region(aws_access_key_id=os.environ['AWS_ACCESS_KEY'], aws_secret_access_key=os.environ['AWS_SECRET_KEY'], region_name='us-east-1')
t = conn.get_table('zoo_keeper_ids')
item = t.new_item('counter')
item.add_attribute('n', 1)
r = item.save() #- Item has been atomically updated!
# Uh-Oh! The value may have changed by the time "get_item" is called!
item = t.get_item('counter')
self.counter = item['n']
logging.critical('Thread has counter: ' + str(self.counter))
tcount = 3
threads = []
for i in range(tcount):
threads.append(ThreadClass())
# Start running the threads:
for t in threads:
t.start()
# Wait for all threads to complete:
for t in threads:
t.join()
#- Now verify all threads have unique numbers:
results = set()
for t in threads:
results.add(t.counter)
print len(results)
print tcount
if len(results) != tcount:
print '***Error: All threads do not have unique values!'
else:
print 'Success! All threads have unique values!'
Note: If you want this to truly work, change the code to this:
def run(self):
conn = boto.dynamodb.connect_to_region(aws_access_key_id=os.environ['AWS_ACCESS_KEY'], aws_secret_access_key=os.environ['AWS_SECRET_KEY'], region_name='us-east-1')
t = conn.get_table('zoo_keeper_ids')
item = t.new_item('counter')
item.add_attribute('n', 1)
r = item.save(return_values='ALL_NEW') #- Item has been atomically updated, and you have the correct value without having to do a "get"!
self.counter = str(r['Attributes']['n'])
logging.critical('Thread has counter: ' + str(self.counter))
Hope this helps!
There is no high-level function in DynamoDB for atomic counters. However, you can implement an atomic counter using the conditional write feature. For example, let's say you a table with an string hash key called like this.
>>> import boto
>>> c = boto.connect_dynamodb()
>>> schema = s.create_schema('id', 's')
>>> counter_table = c.create_table('counter', schema, 5, 5)
You now write an item to that table that includes an attribute called 'n' whose value is zero.
>>> n = 0
>>> item = counter_table.new_item('counter', {'n': n})
>>> item.put()
Now, if I want to update the value of my counter, I would perform a conditional write operation that will bump the value of 'n' to 1 iff it's current value agrees with my idea of it's current value.
>>> n += 1
>>> item['n'] = n
>>> item.put(expected_value={'n': n-1})
This will set the value of 'n' in the item to 1 but only if the current value in the DynamoDB is zero. If the value was already incremented by someone else, the write would fail and I would then need to increment by local counter and try again.
This is kind of complicated but all of this could be wrapped up in some code to make it much simpler to use. I did a similar thing for SimpleDB that you can find here:
http://www.elastician.com/2010/02/stupid-boto-tricks-2-reliable-counters.html
I should probably try to update that example to use DynamoDB
You want to increment a value in dynamodb then you can achieve this by using:
import boto3
import json
import decimal
class DecimalEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, decimal.Decimal):
if o % 1 > 0:
return float(o)
else:
return int(o)
return super(DecimalEncoder, self).default(o)
ddb = boto3.resource('dynamodb')
def get_counter():
table = ddb.Table(TableName)
try:
response = table.update_item(
Key={
'haskey' : 'counterName'
},
UpdateExpression="set currentValue = currentValue + :val",
ExpressionAttributeValues={
':val': decimal.Decimal(1)
},
ReturnValues="UPDATED_NEW"
)
print("UpdateItem succeeded:")
except Exception as e:
raise e
print(response["Attributes"]["currentValue" ])
This implementetion needs a extra counter table that will just keep the last used value for you.