Successful use of pidbox.Mailbox? - python

Has anyone used pidbox.Mailbox ?
I am attempting to do something similar to that example, but that documentation is way out of date. I have managed to get something that pubishes messages to a django transport but from there they never are successfuly received.
I was hoping that someone knows how to use this, and could show me an example of how to successfuly call/cast.
Here is what I have(dummy node really does nothing just prints or lists):
#node/server
mailbox = pidbox.Mailbox("test", type="direct")
connection = BrokerConnection(transport="django")
bound = mailbox(connection)
state = {"node": DummyNode(),
"connection": connection
}
node = bound.Node(state = state)
#node.handler
def list( state, **kwargs):
print 'list called'
return state["node"].list()
#node.handler
def connection_info(state, **kwargs):
return {"connection": state["connection"].info()}
#node.handler
def print_msg(state, **kwargs):
print 'Node handler!'
state["node"].print_msg(kwargs)
consumer = node.listen(channel = connection.channel())
try:
while not self.killed:
print 'Consumer Waiting'
connection.drain_events()
finally:
consumer.cancel()
And a simple client.
#client:
mailbox = pidbox.Mailbox("test", type="direct")
connection = BrokerConnection(transport="django")
bound = mailbox(connection)
bound.cast(["localhost"], "print_msg", {'msg' : 'Message for you'})
info = bound.call(["test_application"],"list", callback=callback)

The answer to this is apparently no. If you come across this post I highly recommend writing your own. There is too little documentation for pidbox, and the documentation that is there is out of date.

Related

"TypeError: Cannot pickle 'SSL Object' " When using concurrent.futuresProcessPoolExecutor() with IMAP

I am using python 3.9 with IMAPlib in order to retrieve emails and scrape links from them. It works fine but can become quite slow for large amounts of emails (I'm doing ~40,000). In order to speed it up I'd like to use some concurrency I can get all the emails at once.
To do this I get the IDs of all the emails beforehand then assign each ID to a task in my pool. I close the previously used impalib connection before scrape_link_mp() is called. I have tried to use a lock and a manager lock but I still get the same error.
Am I missing something fundamental here? Let me know if anything else needs to be explained, thanks.
My code looks like this:
def scrape_link_mp(self):
self.file_counter = 0
self.login_session.logout()
self.Manager = multiprocessing.Manager()
self.lock = self.Manager.Lock()
futures = []
with concurrent.futures.ProcessPoolExecutor() as Executor:
for self.num_message in self.arr_of_emails[self.start_index:]:
task_params = self.current_user,self.current_password,self.counter,self.imap_url,self.num_message,self.substring_filter,self.link_regex,self.lock
futures.append(
Executor.submit(
self.scrape_link_from_email_single,
*task_params
)
)
for future in concurrent.futures.as_completed(futures):
self.counter+=1
self.timestamp = time.strftime('%H:%M:%S')
print(f'[{self.timestamp}] DONE: {self.counter}/{len(self.num_mails)}')
print(future.result())
def scrape_link_from_email_single(self,current_user,current_password,counter,imap_url,num_message,substring_filter,link_regex,lock):
login_session_mp.logout()
current_user_mp = self.current_user
current_password_mp = self.current_password
self.lock.acquire()
login_session_mp = imaplib.IMAP4_SSL(self.imap_url,993)
login_session_mp.login(current_user_mp,current_password_mp)
self.search_mail_status, self.amount_matching_criteria = login_session_mp.search(Mail.CHARSET,search_criteria)
_,individual_response_data = login_session_mp.fetch(self.num_message,'(RFC822)')
self.lock().release
raw = email.message_from_bytes(individual_response_data[0][1])
scraped_email_value = str(email.message_from_bytes(Mail.scrape_email(raw)))
print(scraped_email_value)
returned_links = str(link_regex.findall(scraped_email_value))
for i in returned_links:
if substring_filter:
self.lock.acquire()
with open('out.txt','a+') as link_file:
link_file.write(i +'\n')
link_file.close()
self.lock.release()

Python\Flask\SQLAlchemy\Marshmallow - How to process a request with duplicate values without failing the request?

This is only my second task (bug I need to fix) in a Python\Flask\SQLAlchemy\Marshmallow system I need to work on. So please try to be easy with me :)
In short: I'd like to approve an apparently invalid request.
In details:
I need to handle a case in which a user might send a request with some json in which he included by mistake a duplicate value in a list.
For example:
{
"ciphers": [
"TLS_AES_256_GCM_SHA384",
"AES256-SHA256"
],
"is_default": true,
"tls_versions": [
"tls10",
"tls10",
"tls11",
]
}
What I need to do is to eliminate one of the duplicated tls1.0 values, but consider the request as valid, update the db with the correct and distinct tls versions, and in the response return the non duplicated json in body.
Current code segments are as follows:
tls Controller:
...
#client_side_tls_bp.route('/<string:tls_profile_id>', methods=['PUT'])
def update_tls_profile_by_id(tls_profile_id):
return update_entity_by_id(TlsProfileOperator, entity_name, tls_profile_id)
...
general entity controller:
...
def update_entity_by_id(operator, entity_name, entity_id):
"""flask route for updating a resource"""
try:
entity_body = request.get_json()
except Exception:
return make_custom_response("Bad Request", HTTPStatus.BAD_REQUEST)
entity_obj = operator.get(g.tenant, entity_id, g.correlation)
if not entity_obj:
response = make_custom_response(http_not_found_message(entity_name, entity_id), HTTPStatus.NOT_FOUND)
else:
updated = operator.update(g.tenant, entity_id, entity_body, g.correlation)
if updated == "accepted":
response = make_custom_response("Accepted", HTTPStatus.ACCEPTED)
else:
response = make_custom_response(updated, HTTPStatus.OK)
return response
...
tls operator:
...
#staticmethod
def get(tenant, name, correlation_id=None):
try:
tls_profile = TlsProfile.get_by_name(tenant, name)
return schema.dump(tls_profile)
except NoResultFound:
return None
except Exception:
apm_logger.error(f"Failed to get {name} TLS profile", tenant=tenant,
consumer=LogConsumer.customer, correlation=correlation_id)
raise
#staticmethod
def update(tenant, name, json_data, correlation_id=None):
schema.load(json_data)
try:
dependant_vs_names = VirtualServiceOperator.get_dependant_vs_names_locked_by_client_side_tls(tenant, name)
# locks virtual services and tls profile table simultaneously
to_update = TlsProfile.get_by_name(tenant, name)
to_update.update(json_data, commit=False)
db.session.flush() # TODO - need to change when 2 phase commit will be implemented
snapshots = VirtualServiceOperator.get_snapshots_dict(tenant, dependant_vs_names)
# update QWE
# TODO handle QWE update atomically!
for snapshot in snapshots:
QWEController.update_abc_services(tenant, correlation_id, snapshot)
db.session.commit()
apm_logger.info(f"Update successfully {len(dependant_vs_names)} virtual services", tenant=tenant,
correlation=correlation_id)
return schema.dump(to_update)
except Exception:
db.session.rollback()
apm_logger.error(f"Failed to update {name} TLS profile", tenant=tenant,
consumer=LogConsumer.customer, correlation=correlation_id)
raise
...
and in the api schema class:
...
#validates('_tls_versions')
def validate_client_side_tls_versions(self, value):
if len(noDuplicatatesList) < 1:
raise ValidationError("At least a single TLS version must be provided")
for tls_version in noDuplicatatesList:
if tls_version not in TlsProfile.allowed_tls_version_values:
raise ValidationError("Not a valid TLS version")
...
I would have prefer to solve the problem in the schema level, so it won't accept the duplication.
So, as easy as it is to remove the duplication from the "value" parameter value, how can I propagate the non duplicates list back in order to use it to update the db and the response?
Thanks.
I didn't test but I think mutating value in the validation function would work.
However, this is not really guaranteed by marshmallow's API.
The proper way to do it would be to add a post_load method to de-duplicate.
#post_load
def deduplicate_tls(self, data, **kwargs):
if "tls_versions" in data:
data["tls_version"] = list(set(data["tls_version"]))
return data
This won't maintain the order, so if the order matters, or for issues related to deduplication itself, see https://stackoverflow.com/a/7961390/4653485.

Python Hanging Threads

I have the following code:
final = []
with futures.ThreadPoolExecutor(max_workers=self.number_threads) as executor:
_futures = [executor.submit(self.get_attribute, listing,
self.proxies[listings.index(listing) % len(self.proxies)]) for listing
in listings]
for result in futures.as_completed(_futures):
try:
listing = result.result()
final.append(listing)
except Exception as e:
print traceback.format_exc()
return final
The self.get_attribute function that's submitted to the executor takes a dictionary and proxy as input and makes either one or two http requests to get some data and return with an edited dictionary. The problem is that the workers/threads hang towards the end of completing all the submitted tasks. If I submit 400 dictionaries, it will complete ~380 tasks, and then hang. If I submit 600, it will complete ~570-580. However if I submit 25, it will complete all of them. I'm not sure what the threshold is at which it will go from finishing to not finishing.
I have also tried using a queue and threading system like this:
def _get_attribute_thread(self):
while self.q.not_empty:
job = self.q.get()
listing = job['listing']
proxy = job['proxy']
self.threaded_results.put(self.get_attribute(listing, proxy))
self.q.task_done()
def _get_attributes_threaded_with_proxies(self, listings):
for listing in listings:
self.q.put({'listing': listing, 'proxy': self.proxies[listings.index(listing) % len(self.proxies)]})
for _ in xrange(self.number_threads):
thread = threading.Thread(target=self._get_attribute_thread)
thread.daemon = True
thread.start()
self.q.join()
final = []
while self.threaded_results.not_empty:
final.append(self.threaded_results.get())
return final
However the result is the same. What can I do to fix/debug the problem? Thanks in advance.

Pickleing error: connot pickle Request object

I know That it is not possible to pickle a pyramid request object, but I cant seem to find where I am sending the Request object.
Consider the following:
#task
def do_consignment_task(store, agent):
print "GOTHERE IN TASK"
s = sqlahelper.get_session()
consign = store.gen_consignment()
ca = Agents.by_id(store.consignment_agents_id)
consign.consignment_agents_id = ca.id
consign.consignment_teamleader_id = ca.ou[0].lead_agents_id
consign.consignment_timestamp = func.now()
consign.created_by_agent_id = agent.id
consign.complete_stamp = func.now()
consign.sims = store.sims
consign.status = "SUCCESS"
print "GOT BEFORE LOOP "
for sim in store.sims:
if sim in consign.sims:
continue
else:
consign.sims.append(sim)
s.add(consign)
transaction.savepoint()
print "GOT AFTER SAVEPOINT"
for sim in consign.sims:
is_reconsign = sim.consignment_agent or sim.consignment_teamlead
if is_reconsign:
if not sim.consignment_history:
sim.consignment_history = []
sim.consignment_history.append(dict(
stamp=sim.consignment_timestamp,
consignment_agent_id=sim.consignment_agents_id,
consignment_teamleader_id=sim.consignment_teamleader_id,
by_agent_id=agent.id
))
s.query(
Sims
).filter(
Sims.iccid == sim.iccid
).update(
{
"consignment_agents_id": consign.consignment_agents_id,
"consignment_history": sim.consignment_history,
"consignment_teamleader_id": ca.ou[0].lead_agents_id,
"consignment_timestamp": func.now(),
"modify_stamp": func.now(),
"consignments_id": consign.id
},
synchronize_session=False
)
print "GOT BEFORE COMMIT"
transaction.savepoint()
print "THIS IS THE ID ID ID ID ID ID : ", consign.id
I call this function like:
if self.store.finalise:
try:
store = self.store
agent = self.agent
do_consignment_task.delay(store, agent)
transaction.commit()
self.check_and_purge()
return "Consignmnet is being processed"
except Exception, exc:
self.check_and_purge()
self.log.exception(exc)
exc_error = "CONSIGNERR:", exc.message
raise USSDFailure(exc_error)
else:
self.store.status = "CANCELLED"
if "fullconfirm" in self.session:
del self.session["fullconfirm"]
self.check_and_purge()
return "CONSIGNMENT Cancelled"
When I run this code I get the following error:
EncodeError: Can't pickle <class 'pyramid.util.Request'>: attribute lookup pyramid.util.Request failed
I am not sending self or request objects - at least not that I can see.
How can solve this problem? Am I sending a request object, because I can not see one?
The traceback can be seen here
EDIT:
okay So I have tried to change the data I send to the function - I am not passing a sqlalchemy object and I am making a copy of the store object, that changes my code to:
#task
def do_consignment_task(agent_id, **store):
print "GOTHERE IN TASK"
s = sqlahelper.get_session()
cObj = USSDConsignmentsObject()
consign = cObj.gen_consignment()
ca = Agents.by_id(store.consignment_agents_id)
consign.consignment_agents_id = ca.id
consign.consignment_teamleader_id = ca.ou[0].lead_agents_id
consign.consignment_timestamp = func.now()
consign.created_by_agent_id = agent_id
# etc
and:
if self.store.finalise:
try:
# del self.service
store = self.store.__dict__.copy()
agent_id = self.agent.id
print store
print agent_id
# print help(store)
do_consignment_task.delay(agent_id, **store)
transaction.commit()
#etc
This however still gives me the same error :|
Try not to serialise a Pyramid request object. When you interact with a celery task you should think of it as an independent process.
Provide it all the information it needs to do it's work. Be aware that you need to serialise that information.
So self.store possibly contains attribute references that may be unrealistic to serialise.
Perhaps create a method on the store object that returns a clean dictionary object.
def serialize(self):
data = {}
data["element1"] = self.element1
data["element2"] = self.element2
data["element3"] = self.element3
return data
Then when you want to call the delay method make sure to use store.serialize() instead of store or the dict.

Testing of async tornado RequestHandler method in a complex environment

I am trying to write unit testing code for a child of tornado.web.RequestHandler that runs an aggregate query to the database. I have already wasted several days trying to get the tests to work.
The tests are using pytest and factoryboy. A lot of the important tornado class have factories for the tests.
This is the class that is being tested:
class AggregateRequestHandler(StreamlyneRequestHandler):
'''
'''
SUPPORTED_METHODS = (
"GET", "POST", "OPTIONS")
def get(self):
self.aggregate()
#auth.hmac_auth
##tornado.web.asynchronous
#tornado.web.removeslash
#tornado.gen.coroutine
def aggregate(self):
'''
'''
self.logger.info('api aggregate')
data = self.data
print("Data: {0}".format(data))
pipeline = data['pipeline']
self.logger.debug('pipeline : {0}'.format(pipeline))
self.logger.debug('utc tz : {0}'.format(tz_util.utc))
# execute pipeline query
print(self.collection)
try:
cursor_future = self.collection.aggregate(pipeline, cursor={})
print(cursor_future)
cursor = yield cursor_future
print("Cursor: {0}".format(cursor))
except Exception as e:
print(e)
documents = yield cursor.to_list(length=None)
self.logger.debug('results : {0}'.format(documents))
# process MongoDB JSON extended
results = json.loads(json_util.dumps(documents))
pipeline = json.loads(json_util.dumps(pipeline))
response_data = {
'pipeline': pipeline,
'results': results
}
self.respond(response_data)
The method used to test it is here:
##tornado.testing.gen_test
def test_time_inside(self):
current_time = gen_time()
past_time = gen_time() - datetime.timedelta(minutes=20)
test_query = copy.deepcopy(QUERY)
oid = ObjectId("53a72de12fb05c0788545ed6")
test_query[0]['$match']['attribute'] = oid
test_query[0]['$match']['date_created']['$gte'] = past_time
test_query[0]['$match']['date_created']['$lte'] = current_time
request = produce.HTTPRequest(
method="GET",
headers=produce.HTTPHeaders(
kwargs = {
"Content-Type": "application/json",
"Accept": "application/json",
"X-Sl-Organization": "test",
"Hmac": "83275edec557e2a339e0ec624201db604645e1e1",
"X-Sl-Username": "test#test.co",
"X-Sl-Expires": 1602011725
}
),
uri="/api/v1/attribute-data/aggregate?{0}".format(json_util.dumps({
"pipeline": test_query
}))
)
self.ARH = produce.AggregateRequestHandler(request=request)
#io_loop = tornado.ioloop.IOLoop.instance()
self.io_loop.run_sync(self.ARH.get)
#def stop_test():
#self.stop()
#self.ARH.test_get(stop_test)
#self.wait()
output = self.ARH.get_written_output()
assert output == ""
This is the way I set up the factory for the Request Handler:
class OutputTestAggregateRequestHandler(slapi.rest.AggregateRequestHandler, tornado.testing.AsyncTestCase):
'''
'''
_written_output = []
def write(self, chunk):
print("Previously written: {0}".format(self._written_output))
print("Len: {0}".format(len(self._written_output)))
if self._finished:
raise RuntimeError("Cannot write() after finish(). May be caused "
"by using async operations without the "
"#asynchronous decorator.")
if isinstance(chunk, dict):
print("Going to encode a chunk")
chunk = escape.json_encode(chunk)
self.set_header("Content-Type", "application/json; charset=UTF-8")
chunk = escape.utf8(chunk)
print("Writing")
self._written_output = []
self._written_output.append(chunk)
print(chunk)
def flush(self, include_footers=False, callback=None):
pass
def get_written_output(self):
for_return = self._written_output
self._written_output = []
return for_return
class AggregateRequestHandler(StreamlyneRequestHandler):
'''
'''
class Meta:
model = OutputTestAggregateRequestHandler
model = slapi.model.AttributeDatum
When running the tests, the test simply stops in def aggregate(self): somewhere between print(cursor_future) and print("Cursor: {0}".format(cursor)).
The in the stdout you see
MotorCollection(Collection(Database(MongoClient([]), u'test'), u'attribute_datum'))
<tornado.concurrent.Future object at 0x7fbc737993d0>
and nothing else comes out of the test with it failing on
> assert output == ""
E AssertionError: assert [] == ''
After a lot of time looking at documentation and examples and stack overflow I managed to get a functioning test by adding the following code to OutputTestAggregateRequestHandler:
def set_io_loop(self):
self.io_loop = tornado.ioloop.IOLoop.instance()
def ioloop(f):
#functools.wraps(f)
def wrapper(self, *args, **kwargs):
print(args)
self.set_io_loop()
return f(self, *args, **kwargs)
return wrapper
def runTest(self):
pass
Then copying all of the code from AggregateRequestHandler.aggregate into OutputTestAggregateRequestHandler but with different decorators:
#ioloop
#tornado.testing.gen_test
def _aggregate(self):
......
I then received the output:
assert output == ""
E AssertionError: assert ['{\n "pipeline": [\n {\n "$match": {\n "attribute": {\n "$oid"... "$oid": "53cec0e72dc9832c4c4185f2"\n }, \n "quality": 9001\n }\n ]\n}'] == ''
which is actually a success, but I was just triggering an assertion error on purpose to see the output.
The big problem that I have, is how do I achieve the desired outcome, which is the output received by adding the extra code, and copying the aggregate method.
Obviously when copying the code out of the aggregate method the test is no longer useful after I make changes to the actual method. How can I get the actual aggregate method to function properly in the tests instead of stopping seemingly when it encounters asynchronous code?
Thanks for any help,
Cheers!
-Liam
In general, the intended way to test RequestHandlers is with AsyncHTTPTestCase, not AsyncTestCase. This will set up the HTTP client and server for you and everything will go through the HTTP plumbing. Using RequestHandlers outside of an Application and HTTP server is not fully supported, although in Tornado 4.0 it might be feasible to use a dummy HTTPConnection to avoid the full server stack. This might be faster, although it's kind of uncharted territory at this point.

Categories