Access global queue object in python - python

I am building a messaging app in python that interfaces with twitter I want to create a global fifo queue that the twitterConnection object can access to insert new messages.
this is the main section of the app:
#!/usr/bin/python
from twitterConnector import TwitterConnector
from Queue import deque
#instantiate the request queue
requestQueue = deque()
#instantiate the twitter connector
twitterInterface = TwitterConnector()
#Get all new mentions
twitterInterface.GetMentions()
Specifically I want to be able to call the .GetMentions() method of the TwitterConnector class and process any new mentions and put those messages into the queue for processing separately.
This is the twitterConnector class so far:
class TwitterConnector:
def __init__(self):
self.connection = twitter.Api(consumer_key=self.consumer_key,consumer_secret=self.consumer_secret,access_token_key=self.access_token_key,access_token_secret=self.access_token_secret)
self.mentions_since_id = None
def VerifyCredentials(self):
print self.connection.VerifyCredentials()
def GetMentions(self):
mentions = self.connection.GetMentions(since_id=self.mentions_since_id)
for mention in mentions:
print mention.text
global requestQueue.add(mention) # <- error
Any assistance would be greatly appreciated.
Update: let me see if I can clarify the use case a bit. My TwitterConnector is intended to retrieve messages and will eventually transform the twitter status object into a simplified object that contains the needed information for downstream processing. In the same method, it will perform some other calculations to try to determine other needed information. It is actually the transformed objects that I want to put into the queue. hopefully that makes sense. As a simplified example, this method would map the twitter status object to a new object that would contain properties such as: origin network (twitter), username, message, and location if available. This new object would then be put into the queue.
eventually there will be other connectors for other messaging systems that will perform similar actions. They will receive a message populate the new message object and place them into the same queue for further processing. Once the message has been completely processed a response would be formulated and then added to an appropriate queue for transmittal. Using the new object described above the once the actual processing of the tweet content took place then based upon the origin network and username a response might be returned to the originator via twitter.
I had thought of passing a reference to the queue in the contractor (or as an argument to the method) I had also thought of modifying the method to return a list of the new objects and iterate over them to put them into the queue, but I wanted to make sure there wasn't a better or more efficient way to handle this. I also wish to be able to do the same thing with a logging object and a database handler. Thoughts?

The problem is that "global" should appear on a line of its own
global requestQueue
requestQueue.add(mention)
Moreover, the requestQueue must appear in the module that is defining the class.
If you are importing the requestQueue symbol from another class, you don't need the global at all.
from some_module import requestQueue
class A(object):
def foo(o):
requestQueue.add(o) # Should work

It is generally good idea to avoid globals; a better design often exists. For details on the issue of globals see for instance ([1], [2], [3], [4]).
If by using globals you wish to share a single requestQueue between multiple TwitterConnector instances, you could also pass the queue to the connector in its constructor:
class TwitterConnector:
def __init__(self, requestQueue):
self.requestQueue = requestQueue
...
def GetMentions(self):
mentions = self.connection.GetMentions(since_id = self.mentions_since_id)
requestQueue = deque()
for mention in mentions:
print mention.text
self.requestQueue.add(mention)
Correspondingly, you need to provide your requestQueue to the constructor as:
twitterInterface = TwitterConnector(requestQueue)

Related

Drawbacks of executing code in an SQLAlchemy managed session and if so why?

I have seen different "patterns" in handling this case so I am wondering if one has any drawbacks comapred to the other.
So lets assume that we wish to create a new object of class MyClass and add it to the database. We can do the following:
class MyClass:
pass
def builder_method_for_myclass():
# A lot of code here..
return MyClass()
my_object=builder_method_for_myclass()
with db.managed_session() as s:
s.add(my_object)
which seems that only keeps the session open for adding the new object but I have also seen cases where the entire builder method is called and executed within the managed session like so:
class MyClass:
pass
def builder_method_for_myclass():
# A lot of code here..
return MyClass()
with db.managed_session() as s:
my_object=builder_method_for_myclass()
are there any downsides in either of these methods and if yes what are they? Cant find something specific about this in the documentation.
When you build objects depending on objects fetched from a session you have to be in a session. So a factory function can only execute outside a session for the simplest cases. Usually you have to pass the session around or make it available on a thread local.
For example in this case to build a product I need to fetch the product category from the database into the session. So my product factory function depends on the session instance. The new product is created and added to the same session that the category is also in. An implicit commit should also occur when the session ends, ie the context manager completes.
def build_product(session, category_name):
category = session.query(ProductCategory).where(
ProductCategory.name == category_name).first()
return Product(category=category)
with db.managed_session() as s:
my_product = build_product(s, "clothing")
s.add(my_product)

Celery pickling not playing nice with Cassandra driver, can't figure out the root cause

I'm experiencing some behavior that I can't quite figure out. I'm using Cassandra to store message objects, and I'm using Celery for async pulls and pushes to the database. Everything is working fine, except for a single Celery task; the other tasks that use the same code/classes work. Here's a rough breakdown of the code logic:
db_manager = DBManager()
class User(object):
def __init__(self, user_id):
... normal init stuff ...
self.loader()
#run_async
def loader(self):
... loads from database if found, otherwise pulls from API ...
# THIS WORKS
#celery.task(name='user-to-db', filter=task_method)
def to_db(self):
# db_manager is a custom backend that handles relevant db reads, writes, etc.
db_manager.add('users', self.user_payload)
# THIS WORKS
#celery.task(name='load-friends', filter=task_method)
def load_friends(self):
# Checks secondary redis index for friends of user
friends = redis.srandmember('users:the-users-id:friends', self.id, 20)
if not friends:
profiles = load_friends_from_api(user_id=self.id)
else:
query = "SELECT * FROM keyspace.users WHERE id IN ({friends})".format(friends=friends)
# Init a User object for every friend
loaded_friends = [User(friend) for friend in profiles]
# Returns a class container with all the instances of User(friend), accessible through a class property
return FriendContainer(self.id, loaded_friends)
# THIS DOES NOT WORK
#celery.task(name='get-user-messages', filter=task_method)
def get_user_messages(self):
# THIS IS WHERE IT FAILS #
messages = db_manager.get("SELECT message FROM keyspace.message_timelines WHERE user_id = {user_id}".format(user_id=self.id))
# THAT LINE ABOVE #
# Init a message class object for every message payload in database
msgs = [Message(m, user=self) for m in messages]
# Returns a message container class holding all the message objects, accessible through a class property
return MessageContainer(msgs)
This last class method throws an error:
File "/usr/local/lib/python2.7/dist-packages/kombu/serialization.py", line 356, in pickle_dumps
return dumper(obj, protocol=pickle_protocol)
EncodeError: Can't pickle <class 'cassandra.io.eventletreactor.message'>: attribute lookup cassandra.io.eventletreactor.message failed
cassandra.io.eventletreactor.message points to a user-defined type in Cassandra that I use as a container for message objects per user. The line that throws this error is:
messages = db_manager.get("SELECT message FROM keyspace.message_timelines WHERE user_id = {user_id}".format(user_id=self.id))
This is the method from DBManager():
class DBManager(object):
... stuff ...
def get(self, query):
# I do some stuff to prepare the query, namely substituting `WHERE this = that` for `WHERE this = ?` to create a Cassandra prepared statement.
statement = cassandra.prepare(query_prepared)
# I want these messages as a dict, not the default namedtuple
cassandra.row_factory = dict_factory
# User id is parsed out of query
results = cassandra.execute(statement, (user_id,))
rows = results.current_rows
# rows is a list of dicts, no weird class references or anything in there
return rows
I've read that Celery tasks out of class methods is/was kind of experimental, but I can't figure out why all the other methods qua tasks that use the same instance of DBManager are working.
The problem seems to be localized to some issue with the user-defined type message that's not playing nice within the Cassandra driver; however, if I run the get method from DBManager within the Celery task itself, it works. That is, if I copy/paste the code that is throwing the error from DBManager.get into User.get_user_messages, it works fine. If I try to call DBManager.get from within User.get_user_messages, it breaks.
I just can't figure out where the problem is. I can do all the following just fine:
Run the get_user_messages method without Celery, and it works.
Run the get_user_messages method WITH Celery if I run the get method code right in the Celery task method itself.
I can run other methods registered as Celery tasks that point to other methods in DBManager that use the Cassandra driver, even ones that insert the same message user-defined type into the database.
I've tried pickling ALL THE THINGS all the way down myself, and in various combinations, and can't reproduce the error.
What I have not tried:
Change serializer to json or yaml. There are a few convenience items in the db payload that won't serialize with either of those two.
Use dill instead of pickle. It seems like this should work without having to switch serializers given that I can get various parts working separately.
I could just say screw it and run the query directly through the Cassandra driver instead of my DBManager class, but I feel like this should be solvable and I'm just missing something really, really obvious, so obvious that I'm not seeing it. Any suggestions on where to look would be greatly appreciated.
In case of relevance: Cassandra 3.3, CQL 3.4, DataStax python driver 3.1
Meh, I found the problem, and it WAS really obvious. I guess I didn't actually try pickling all the things, just most of the things, and I didn't catch this in my 4am debugging stupor.
At any rate, cassandra.row_factory = dict_factory, when called on a user defined type, doesn't actually return everything as a dict. It gives a dict of {'label': message(x='this', y='that')}, where message is a namedtuple. The Cassandra driver dynamically creates the namedtuple inside of a class instance, and so pickle couldn't find it.

Python Flask error logging - possible to pickle objects which threw error?

I am learning about Flask's error logging and I am wondering if it is possible to pickle the object / objects that caused the error to make it easier to debug. The Flask documentation suggests that more info on errors is better, but why can't you just pickle the objects to make for easy debugging later?
I wouldn't go as far as recommending that you log all the objects because, depending on your scenario, you could end up logging sensitive data in a file or some other medium that could be the target of some data thief. If this is just for testing then my comment is irrelevant.
To do what you want, you could create a logging.Filter-derived class and add it to the logger:
python
app.logger.addFilter(MyCustomFilter())
Your filter, MyCustomFilter could look for objects in the thread local data and pickle each one into the record received in its filter method. These object would have been put there by you, in the thread local data.
# somewhere in your code, when you decide an object needs to be logged in case of error
local_data = threading.local()
if getattr(local_data, 'objects_to_log', None) is None:
local_data.objects_to_log = [ ]
local_data.objects_to_log.append(my_object_to_be_logged)
#in MyCustomFilter:
def filter(self, record):
local_data = threading.local()
if getattr(local_data, 'objects_to_log', None) is not None:
for obj in local_data.objects_to_log:
# todo: pickle each object and add to record
I realize this is not a complete and ready to use solution, but it could give you something to work with.

Initialize module python

I have a module which wrap an json api for querying song cover/remix data with limits for number of requests per hour/minute. I'd like to keep an optional cache of json responses without forcing users to adjust a cache/context parameter every time. What is a good way of initializing a library/module in python? Or would you recommend I just do the explicit thing and use a cache named parameter in every call that eventually request json data?
I was thinking of doing
_cache = None
class LFU(object):
...
NO_CACHE, LFU = "NO_CACHE", "LFU"
def set_cache_strategy(strategy):
if _cache == NO_CACHE:
_cache = None
else:
_cache = LFU()
import second_hand_songs_wrapper as s
s.set_cache_strategy(s.LFU)
l1 = s.ShsLabel.get_from_resource_id(123)
l2 = s.ShsLabel.get_from_resource_id(123,use_cache=Fale)
edit:
I'm probably only planning on having two strategies one with/ one without a cache.
Other possible alternative initialization schemes I can think of of the top of my head include using enviromental variables, initializing _cache by hand in the user code to None/LFU(), and using explicit cache everywhere(possibly defaulting to having a cache).
Note the reason I don't set cache on an instance of the class is that I currently use a never instantiated class(use class functions + class state as a singleton) to abstract downloading the json data along with some convenience/methods to download certain urls. I could instantiate the downloader class but then I'd have to pass the instance explicitly to each function, or use another global variable for a convenience version of the class. The downloader class also keeps track of # of requests(website has limit per minute/hour) so having multiple downloader objects would cause more trouble.
There's nothing wrong in setting a default, even if that default is None. I would note though that having the pseudo-constants as well as a conditional (provided that's all you use the values for) is redundant. Try:
caching_strategies = {'NO_CACHE' : lambda: None,
'LFU' : LFU}
_cache = caching_strategies['NO_CACHE']
def set_cache_strategy(strategy):
_cache = caching_methods[strategy]()
If you want to provide a convenience method for the available strategies, just wrap caching_strategies.keys(). Really though, as far as your strategies go, you should probably have all your strategies inherit from some base strategy, and just create a no_cache strategy class that inherits from that and stubs all the methods for your standardized caching inteface.

How to implement a FIFO queue that supports namespaces

I'm using the following approach to handle a FIFO queue based on Google App Engine db.Model (see this question).
from google.appengine.ext import db
from google.appengine.ext import webapp
from google.appengine.ext.webapp import run_wsgi_app
class QueueItem(db.Model):
created = db.DateTimeProperty(required=True, auto_now_add=True)
data = db.BlobProperty(required=True)
#staticmethod
def push(data):
"""Add a new queue item."""
return QueueItem(data=data).put()
#staticmethod
def pop():
"""Pop the oldest item off the queue."""
def _tx_pop(candidate_key):
# Try and grab the candidate key for ourselves. This will fail if
# another task beat us to it.
task = QueueItem.get(candidate_key)
if task:
task.delete()
return task
# Grab some tasks and try getting them until we find one that hasn't been
# taken by someone else ahead of us
while True:
candidate_keys = QueueItem.all(keys_only=True).order('created').fetch(10)
if not candidate_keys:
# No tasks in queue
return None
for candidate_key in candidate_keys:
task = db.run_in_transaction(_tx_pop, candidate_key)
if task:
return task
This queue works as expected (very good).
Right now my code has a method that access this FIFO queue invoked by a deferred queue:
def deferred_worker():
data= QueueItem.pop()
do_something_with(data)
I would like to enhance this method and the queue data structure adding a client_ID parameter representing a specific client that needs to access its own Queue.
Something like:
def deferred_worker(client_ID):
data= QueueItem_of_this_client_ID.pop() # I need to implement this
do_something_with(data)
How could I code the Queue to be client_ID aware?
Constraints:
- The number of clients is dynamic and not predefined
- Taskqueue is not an option (1. ten max queues 2. I would like to have full control on my queue)
Do you know how could I add this behaviour using the new Namespaces api (Remember that I'm not calling the db.Model from a webapp.RequestHandler)?
Another option: I could add a client_ID db.StringProperty to the QueueItem using it has a filter on pull method:
QueueItem.all(keys_only=True).filter(client_ID=an_ID).order('created').fetch(10)
Any better idea?
Assuming your "client class" is really a request handler the client calls, you could do something like this:
from google.appengine.api import users
from google.appengine.api.namespace_manager import set_namespace
class ClientClass(webapp.RequestHandler):
def get(self):
# For this example let's assume the user_id is your unique id.
# You could just as easily use a parameter you are passed.
user = users.get_current_user()
if user:
# If there is a user, use their queue. Otherwise the global queue.
set_namespace(user.user_id())
item = QueueItem.pop()
self.response.out.write(str(item))
QueueItem.push('The next task.')
Alternatively, you could also set the namespace app-wide.
By setting the default namespace all calls to the datastore will be "within" that namespace, unless you explicitly specify otherwise. Just note, to fetch and run tasks you'll have to know the namespace. So you probably want to maintain a list of namespaces in the default namespace for cleanup purposes.
As I said in response to your query on my original answer, you don't need to do anything to make this work with namespaces: the datastore, on which the queue is built, already supports namespaces. Just set the namespace as desired, as described in the docs.

Categories