I have an SQS queue that I need to constantly monitor for incoming messages. Once a message arrives, I do some processing and continue to wait for the next message. I achieve this by setting up an infinite loop with a 2 second pause at the end of the loop. This works however I can't help but feel this isn't a very efficient way of solving the need to constantly pole the queue.
Code example:
while (1):
response = sqs.receive_message(
QueueUrl=queue_url,
AttributeNames=[
'SentTimestamp'
],
MaxNumberOfMessages=1,
MessageAttributeNames=[
'All'
],
VisibilityTimeout=1,
WaitTimeSeconds=1
)
try:
message = response['Messages'][0]
receipt_handle = message['ReceiptHandle']
# Delete received message from queue
sqs.delete_message(
QueueUrl=queue_url,
ReceiptHandle=receipt_handle
)
msg = message['Body']
msg_json = eval(msg)
value1 = msg_json['value1']
value2 = msg_json['value2']
process(value1, value2)
except:
pass
#print('Queue empty')
time.sleep(2)
In order to exit the script cleanly (which should run constantly), I catch the KeyboardInterrupt which gets triggered on Ctrl+C and do some clean-up routines to exit gracefully.
if __name__ == '__main__':
try:
main()
except KeyboardInterrupt:
logout()
Is there a better way to achieve the constant poling of the SQS queue and is the 2 second delay necessary? I'm trying not to hammer the SQS service, but perhaps it doesn't matter?
This is ultimately the way that SQS works - it requires something to poll it to get the messages. But some suggestions:
Don't get just a single message each time. Do something more like:
messages = sqs.receive_messages(
MessageAttributeNames=['All'],
MaxNumberOfMessages=10,
WaitTimeSeconds=10
)
for msg in messages:
logger.info("Received message: %s: %s", msg.message_id, msg.body)
This changes things a bit for you. The first thing is that you're willing to get up to 10 messages (this is the maximum number for SQS in one call). The second is that you will wait up to 10 seconds to get the messages. From the SQS docs:
The duration (in seconds) for which the call waits for a message to
arrive in the queue before returning. If a message is available, the
call returns sooner than WaitTimeSeconds. If no messages are available
and the wait time expires, the call returns successfully with an empty
list of messages.
So you don't need your own sleep call - if there are no messages the call will wait until it expires. Conversely, if you have a ton of messages then you'll get them all as fast as possible as you won't have your own sleep call in the code.
Adding on #stdunbar Answer:
You will find that MaxNumberOfMessages as stated by the Docs might return fewer messages than the provided integer number, which was the Case for me.
MaxNumberOfMessages (integer) -- The maximum number of messages to return. Amazon SQS never returns more messages than this value (however, fewer messages might be returned). Valid values: 1 to 10. Default: 1.
As a result, i made this solution to read from SQS Dead-Letter-Queue:
def read_dead_letter_queue():
""" This function is responsible for Reading Query Execution IDs related to the insertion that happens on Athena Query Engine
and we weren't able to deal with it in the Source Queue.
Args:
None
Returns:
Dictionary: That consists of execution_ids_list, mssg_receipt_handle_list and queue_url related to messages in a Dead-Letter-Queue that's related to the insertion operation into Athena Query Engine.
"""
try:
sqs_client = boto3.client('sqs')
queue_url = os.environ['DEAD_LETTER_QUEUE_URL']
execution_ids_list = list()
mssg_receipt_handle_list = list()
final_dict = {}
# You can change the range stop number to whatever number that suits your scenario, you just need to add a number that's more than the number of messages that maybe in the Queue as 1 thousand or 1 million, as the loop will break out when there aren't any messages left in the Queue before reaching the end of the range.
for mssg_counter in range(1, 20, 1):
sqs_response = sqs_client.receive_message(
QueueUrl = queue_url,
MaxNumberOfMessages = 10,
WaitTimeSeconds = 10
)
print(f"This is the dead-letter-queue response --> {sqs_response}")
try:
for mssg in sqs_response['Messages']:
print(f"This is the message body --> {mssg['Body']}")
print(f"This is the message ID --> {mssg['MessageId']}")
execution_ids_list.append(mssg['Body'])
mssg_receipt_handle_list.append(mssg['ReceiptHandle'])
except:
print(f"Breaking out of the loop, as there isn't any message left in the Queue.")
break
print(f"This is the execution_ids_list contents --> {execution_ids_list}")
print(f"This is the mssg_receipt_handle_list contents --> {mssg_receipt_handle_list}")
# We return the ReceiptHandle to be able to delete the message after we read it in another function that's responsible for deletion.
# We return a dictionary consists of --> {execution_ids_list: ['query_exec_id'], mssg_receipt_handle_list: ['ReceiptHandle']}
final_dict['execution_ids_list'] = execution_ids_list
final_dict['mssg_receipt_handle_list'] = mssg_receipt_handle_list
final_dict['queue_url'] = queue_url
return final_dict
#TODO: We need to delete the message after we finish reading in another function that will delete messages for both the DLQ and the Source Queue.
except Exception as ex:
print(f"read_dead_letter_queue Function Exception: {ex}")
Related
I have a computationally heavy process that takes several minutes to complete in the server. So I want to send the results of every iteration to the client via websockets.
The overall application works but my problem is that all the messages are arriving at the client in one big chunk after the entire simulation finishes. I must be missing something here as I expect the await websocket.send_json() to send the message during the process and not all of them at the end.
Server python (FastAPI)
# A very simplified abstraction of the actual app.
def simulate_intervals(data):
for t in range(data.n_intervals):
state = interval(data) # returns a JAX NumPy array
yield state
def simulate(data):
for key in range(data.n_trials):
trial = simulate_intervals(data)
yield trial
#app.websocket("/ws")
async def socket(websocket: WebSocket):
await websocket.accept()
while True:
# Get model inputs from client
data = await websocket.receive_text()
# Minimal computation
nodes = distributions(data)
nodosJson = json.dumps(nodes, cls=NumpyEncoder)
# I expect this message to be sent early on,
# but the client gets it at the end with all the other messages.
await websocket.send_json({"tipo": "nodos", "datos": json.loads(nodosJson)})
# Heavy computation
trials = simulate(data)
for trialI, trial in enumerate(trials):
for stateI, state in enumerate(trial):
stateString = json.dumps(state, cls=NumpyEncoder)
await websocket.send_json(
{
"tipo": "estado",
"datos": json.loads(stateString),
"trialI": trialI,
"stateI": stateI,
}
)
await websocket.send_json({"tipo": "estado", "msg": "fin"})
For completeness, here is the basic client code.
Client
const ws = new WebSocket('ws://localhost:8000/ws');
ws.onopen = () => {
console.log('Conexión exitosa');
};
ws.onmessage = (e) => {
const mensaje = JSON.parse(e.data);
console.log(mensaje);
};
botonEnviarDatos.onclick = () => {
ws.send(JSON.stringify({...}));
}
I was not able to make it work as posted in my question, still interested in hearing from anyone who understands why it is not possible to send multiple async messages without them getting blocked.
For anyone interested, here is my current solution:
Ping pong messages from client and server
I changed the logic so the server and client are constantly sending each other messages and not trying to stream the data in a single request from the client.
This actually works much better than my original attempt because I can detect when a sockets gets disconnected and stop processing in the server. Basically, if the client disconnects, no new requests for data are sent from that client and the server never continues the heavy computation.
Server
# A very simplified abstraction of the actual app.
def simulate_intervals(data):
for t in range(data.n_intervals):
state = interval(data) # returns a JAX NumPy array
yield state
def simulate(data):
for key in range(data.n_trials):
trial = simulate_intervals(data)
yield trial
#app.websocket("/ws")
async def socket(websocket: WebSocket):
await websocket.accept()
while True:
# Get messages from client
data = await websocket.receive_text()
# "tipo" is basically the type of data being sent from client or server to the other one.
# In this case, "tipo": "inicio" is the client sending inputs and requesting for a certain data in response.
if data["tipo"] == "inicio":
# Minimal computation
nodes = distributions(data)
nodosJson = json.dumps(nodes, cls=NumpyEncoder)
# In this first interaction, the client gets the first message without delay.
await websocket.send_json({"tipo": "nodos", "datos": json.loads(nodosJson)})
# Since this is a generator (def returns yield) it does not actually
# trigger that actual computationally heavy process.
trials = simulate(data)
# define some initial variables to count the iterations
trialI = 0
stateI = 0
trialsLen = args.number_trials
statesLen = 600
# load the first trial (also a generator)
# without the for loop used before, the counters and next()
# allow us to do the same as being done before in the for loop
trial = next(trials)
# With the use of generators and next() it is possible to keep
# this first message light on the server and send the first response
# as quickly as possible.
# This type of message asks for the next instance of the simluation
# without processing the entire model.
elif data["tipo"] == "sim":
# check if we are within the limits (before this was a nested for loop)
if trialI < trialsLen and stateI < statesLen:
# Trigger the next instance of the simulation
state = next(trial)
# update counter
stateI = stateI + 1
# Send the message with 1 instance of the simulation.
#
stateString = json.dumps(state, cls=NumpyEncoder)
await websocket.send_json(
{
"tipo": "estado",
"datos": json.loads(stateString),
"trialI": trialI,
"stateI": stateI,
}
)
# Check if the second loop is done
if stateI == statesLen:
# update counter of first loop
trialI = trialI + 1
# update counter of second loop
stateI = 0
# Check if there are more pending trials,
# otherwise stop and notify the client we are done.
try:
trial = next(trials)
except StopIteration:
await websocket.send_json({"tipo": "fin"})
Client
Just the part that actually changed:
ws.onmessage = (e) => {
const mensaje = JSON.parse(e.data);
// Simply check the type of incoming message so it can be processed
if (mensaje.tipo === 'fin') {
viz.calcularResultados();
} else if (mensaje.tipo === 'nodos') {
viz.pintarNodos(mensaje.datos);
} else if (mensaje.tipo === 'estado') {
viz.sumarEstado(mensaje.datos);
}
// After receiving a message, ping the server for the next one
ws.send(
JSON.stringify({
tipo: 'sim',
})
);
};
This seems like reasonable solution to keep the server and client working together. I am able to show in the client the progress of a long simulation and the user experience is much better than having to wait for a long time for the server to respond. Hope it helps other with a similar problem.
I got a similar issue, and was able to resolve it by adding a small await asyncio.sleep(0.1) after sending json messages. I have not dived into asyncios internals yet, but my guess is that websocker.send shedules a message to be sent, but since the async function continues to run it never has a chance to do it in the background. Sleeping the async function makes asyncio pick up other tasks while it is waiting.
Recently I have been working to integrate google directory, calendar and classroom to work seamlessly with the existing services that we have.
I need to loop through 1500 objects and make requests in google to check something. Responses from google does take awhile hence I want to wait on that request to complete but at the same time run other checks.
def __get_students_of_course(self, course_id, index_in_course_list, page=None):
print("getting students from gclass ", course_id, "page ", page)
# self.__check_request_count(10)
try:
response = self.class_service.courses().students().list(courseId=course_id,
pageToken=page).execute()
# the response must come back before proceeding to the next checks
course_to_add_to = self.course_list_gsuite[index_in_course_list]
current_students = course_to_add_to["students"]
for student in response["students"]:
current_students.append(student["profile"]["emailAddress"])
self.course_list_gsuite[index_in_course_list] = course_to_add_to
try:
if "nextPageToken" in response:
self.__get_students_of_course(
course_id, index_in_course_list, page=response["nextPageToken"])
else:
return
except Exception as e:
print(e)
return
except Exception as e:
print((e))
And I run that function from another function
def __check_course_state(self, course):
course_to_create = {...}
try:
g_course = next(
(g_course for g_course in self.course_list_gsuite if g_course["name"] == course_to_create["name"]), None)
if g_course != None:
index_2 = None
for index_1, class_name in enumerate(self.course_list_gsuite):
if class_name["name"] == course_to_create["name"]:
index_2 = index_1
self.__get_students_of_course(
g_course["id"], index_2) # need to wait here
students_enrolled_in_g_class = self.course_list_gsuite[index_2]["students"]
request = requests.post() # need to wait here
students_in_iras = request.json()
students_to_add_in_g_class = []
for student in students["data"]:
try:
pass
except Exception as e:
print(e)
students_to_add_in_g_class.append(
student["studentId"])
if len(students_to_add_in_g_class) != 0:
pass
else:
pass
else:
pass
except Exception as e:
print(e)
I need to these tasks for 1500 objects.
Although they are not related to each other. I want to move to the next object in the loop while it waits for the other results to come back and finish.
Here is how I tried this with threads:
def create_courses(self):
# pool = []
counter = 0
with concurrent.futures.ThreadPoolExecutor() as excecutor:
results = excecutor.map(
self.__check_course_state, self.courses[0:5])
The problem is when I run it like this I get multiple SSL errors and other errors and as far as I understand, as the threads themselves are running, the requests never wait to finish and move to the next line hence I have nothing in the request object so it throws me errors?
Any Ideas on how to approach this?
The ssl error occurs her because i was reusing the http instance from google api lib. self.class_service is being used to send a request while waiting on another request. The best way to handle this is to create instances of the service on every request.
Not able to write message into kafka topic (producer) when calling kakfa produce class with a loop.
I'm very new to Python and Kafka. I'm trying to write a python program to write messages into a Kafka topic and produce so Kafka consumer can subscribe to that topic to publish the message.
I'm not sure what is missing in my program which restricts from writing the message to the topic.
Point to Note: I'm reading a JSON file and using a for loop to ready the key value. Then assign it to a variable and refer that variable with Kafka produce with arg for msg.
Attached is the Kafka producer program.
Input: Json_smpl.json
File Content:
{
"transaction":{
"Accnttype":"Saving"
,"Branch":"West"
,"id":"WS"
}
}
Program:
from confluent_kafka import Producer
import json
def acked(err, msg):
if err is not None:
print("Failed to deliver message: {0}: {1}"
.format(msg.value(), err.str()))
else:
print("Message produced: {0}".format(msg.value()))
p = Producer({'bootstrap.servers': 'localhost:9092'})
try:
with open('json_smpl.json') as read_j:
data = json.load(read_j)
get_data = data.get("transactions")
print(get_data)
for i in get_data:
a = list(get_data.items()[0])
p.produce(topic='mytopic12', 'myvalue #{0}'.format(a), callback=acked)
except KeyboardInterrupt:
pass
p.flush(1)
Expected result: Message(JSON Key & Value) to be written to kafka topic for every iteration within the loop.
Actual Result: No messages in topic. so consumer is not receiving any messages.
Your file has no transactions key, and no loop to go over, so your JSON isn't being parsed, and you are not catching a KeyError or ValueError
Start with this
p = Producer({'bootstrap.servers': 'localhost:9092'})
try:
with open('json_smpl.json') as read_j:
data = json.load(read_j).get("transaction")
tosend = json.dumps(data)
print("Ready to send : {}".format(tosend))
p.produce(topic='mytopic12', tosend, callback=acked)
except:
print("There was some error")
I've a consumer for rabbitmq that reads from a particular queue and does some operation.
Now I want to perform that operation in a batch. I am unable to figure out a way where the consumer would just save the whole context in a list, i.e. (ch, method, props, body) and then when the size is greater than 'n', that is the required minimum batch size, I can perform the batch operation and do something like:
if len(no_of_messages) > 10:
responses = batch_operation(messages)
for response in responses:
ch, method, props, body = response
ch.basic_publish(exchange='',
routing_key=props.reply_to,
properties=pika.BasicProperties(
correlation_id=props.correlation_id),
body=str(body))
Any idea on how to go about it? I am starting the consumer like this:
channel.basic_qos(prefetch_count=1)
channel.basic_consume(some_function_that_does_above_stuffs, queue='rpc_queue')
print(" [x] Awaiting RPC requests")
channel.start_consuming()
I'm trying to take a list of items and check for their status change based on certain processing by the API. The list will be manually populated and can vary in number to several thousand.
I'm trying to write a script that makes multiple simultaneous connections to the API to keep checking for the status change. For each item, once the status changes, the attempts to check must stop. Based on reading other posts on Stackoverflow (Specifically, What is the fastest way to send 100,000 HTTP requests in Python? ), I've come up with the following code. But the script always stops after processing the list once. What am I doing wrong?
One additional issue that I'm facing is that the keyboard interrup method never fires (I'm trying with Ctrl+C but it does not kill the script.
from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue
requestURLBase = "https://example.com/api"
apiKey = "123456"
concurrent = 200
keepTrying = 1
def doWork():
while keepTrying == 1:
url = q.get()
status, body, url = checkStatus(url)
checkResult(status, body, url)
q.task_done()
def checkStatus(ourl):
try:
url = urlparse(ourl)
conn = httplib.HTTPConnection(requestURLBase)
conn.request("GET", url.path)
res = conn.getresponse()
respBody = res.read()
conn.close()
return res.status, respBody, ourl #Status can be 210 for error or 300 for successful API response
except:
print "ErrorBlock"
print res.read()
conn.close()
return "error", "error", ourl
def checkResult(status, body, url):
if "unavailable" not in body:
print status, body, url
keepTrying = 1
else:
keepTrying = 0
q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=doWork)
t.daemon = True
t.start()
try:
for value in open('valuelist.txt'):
fullUrl = requestURLBase + "?key=" + apiKey + "&value=" + value.strip() + "&years="
print fullUrl
q.put(fullUrl)
q.join()
except KeyboardInterrupt:
sys.exit(1)
I'm new to Python so there could be syntax errors as well... I'm definitely not familiar with multi-threading so perhaps I'm doing something else wrong as well.
In the code, the list is only read once. Should be something like
try:
while True:
for value in open('valuelist.txt'):
fullUrl = requestURLBase + "?key=" + apiKey + "&value=" + value.strip() + "&years="
print fullUrl
q.put(fullUrl)
q.join()
For the interrupt thing, remove the bare except line in checkStatus or make it except Exception. Bare excepts will catch all exceptions, including SystemExit which is what sys.exit raises and stop the python process from terminating.
If I may make a couple comments in general though.
Threading is not a good implementation for such large concurrencies
Creating a new connection every time is not efficient
What I would suggest is
Use gevent for asynchronous network I/O
Pre-allocate a queue of connections same size as concurrency number and have checkStatus grab a connection object when it needs to make a call. That way the connections stay alive, get reused and there is no overhead in creating and destroying them and the increased memory use that goes with it.