I'm getting confused about class variable access through multi-thread. I'm in a django application and I initialize a class at server startups to inject some configuration. Then on user requests I call some method on that class but the config is None.
class SharedService:
_queue_name = None
#staticmethod
def config(queue_name=None):
if queue_name:
SharedService._queue_name = queue_name
print(SharedService._queue_name) # this print the "alert"
#staticmethod
def send_message(data):
print(SharedService._queue_name) # this print None should print "alert"
if usefull in the django settings class loaded at startup I do:
SharedService.config(queue_name="alert")
and in the user endpoint I just call:
SharedService.send_message("blabla")
I'm sure it did work previously, but we did update to python 3.10 and django 3.2 recently and might be related (or not!)
Ok that was completely from outer field!
The shared service is from an external library and can actually be imported from two paths because of errors in our python path.
In the settings file, I was doing
from lib.services import SharedService
And from the request handling code
from services import SharedService
Which obviously creates two different class definitions and thus, the second usage isn’t initialized.
Related
I have a django application that loads a huge array into memory after django is started for further processing.
What I do now is , after loading a specific view, I execute the code to load the array as follows:
try:
load_model = load_model_func()
except:
#Some exception
The code is now very bad. I want to create a class to load this model once after starting django and I want to be able to get this model in all other this model in all other methods of the application
So, is there a good practice for loading data into memory after Django starts?
#Ken4scholar 's answer is a good response in terms of "good practice for loading data AFTER Django starts". It doesn't really address storing/accessing it.
If your data doesn't expire/change unless you reload the application, then using cache as suggested in comments is redundant, and depending on the cache storage you use you might end up still pulling from some db, creating overhead.
You can simply store it in-memory, like you are already doing. Adding an extra class to the mix won't really add anything, a simple global variable could do.
Consider something like this:
# my_app/heavy_model.py
data = None
def load_model():
global data
data = load_your_expensive_model()
def get_data():
if data is None:
raise Exception("Expensive model not loaded")
return data
and in your config (you can lean more about applications and config here):
# my_app/app_config.py
from django.apps import AppConfig
from my_app.heavy_model import load_model
class MyAppConfig(AppConfig):
# ...
def ready(self):
load_model()
then you can simply call my_app.heavy_model.get_data directly in your views.
⚠️ You can also access the global data variable directly, but a wrapper around it may be nicer to have (also in case you want to create more abstractions around it later).
You can run a startup script when apps are initialized. To do this, call the script in the ready method of your Config class in apps.py like this:
class MyAppConfig(AppConfig):
name = 'myapp'
def ready(self):
<import and run script>
I'm writing a CLI to interact with elasticsearch using the elasticsearch-py library. I'm trying to mock elasticsearch-py functions in order to test my functions without calling my real cluster.
I read this question and this one but I still don't understand.
main.py
Escli inherits from cliff's App class
class Escli(App):
_es = elasticsearch5.Elasticsearch()
settings.py
from escli.main import Escli
class Settings:
def get(self, sections):
raise NotImplementedError()
class ClusterSettings(Settings):
def get(self, setting, persistency='transient'):
settings = Escli._es.cluster\
.get_settings(include_defaults=True, flat_settings=True)\
.get(persistency)\
.get(setting)
return settings
settings_test.py
import escli.settings
class TestClusterSettings(TestCase):
def setUp(self):
self.patcher = patch('elasticsearch5.Elasticsearch')
self.MockClass = self.patcher.start()
def test_get(self):
# Note this is an empty dict to show my point
# it will contain childs dict to allow my .get(persistency).get(setting)
self.MockClass.return_value.cluster.get_settings.return_value = {}
cluster_settings = escli.settings.ClusterSettings()
ret = cluster_settings.get('cluster.routing.allocation.node_concurrent_recoveries', persistency='transient')
# ret should contain a subset of my dict defined above
I want to have Escli._es.cluster.get_settings() to return what I want (a dict object) in order to not make the real HTTP call, but it keeps doing it.
What I know:
In order to mock an instance method I have to do something like
MagicMockObject.return_value.InstanceMethodName.return_value = ...
I cannot patch Escli._es.cluster.get_settings because Python tries to import Escli as module, which cannot work. So I'm patching the whole lib.
I desperately tried to put some return_value everywhere but I cannot understand why I can't mock that thing properly.
You should be mocking with respect to where you are testing. Based on the example provided, this means that the Escli class you are using in the settings.py module needs to be mocked with respect to settings.py. So, more practically, your patch call would look like this inside setUp instead:
self.patcher = patch('escli.settings.Escli')
With this, you are now mocking what you want in the right place based on how your tests are running.
Furthermore, to add more robustness to your testing, you might want to consider speccing for the Elasticsearch instance you are creating in order to validate that you are in fact calling valid methods that correlate to Elasticsearch. With that in mind, you can do something like this, instead:
self.patcher = patch('escli.settings.Escli', Mock(Elasticsearch))
To read a bit more about what exactly is meant by spec, check the patch section in the documentation.
As a final note, if you are interested in exploring the great world of pytest, there is a pytest-elasticsearch plugin created to assist with this.
I am using cherrypy as a web server, and I want to check a user's logged-in status before returning the page. This works on methods in the main Application class (in site.py) but gives an error when I call the same decorated function on method in a class that is one layer deeper in the webpage tree (in a separate file).
validate_user() is the function used as a decorator. It either passes a user to the page or sends them to a 401 restricted page, as a cherrypy.Tool, like this:
from user import validate_user
cherrypy.tools.validate_user = cherrypy.Tool('before_handler', validate_user)
I attach different sections of the site to the main site.py file's Application class by assigning instances of the sub-classes as variables accordingly:
from user import UserAuthentication
class Root:
user = UserAuthentication() # maps user/login, user/register, user/logout, etc
admin = Admin()
api = Api()
#cherrypy.expose
#cherrypy.tools.validate_user()
def how_to(self, **kw):
from other_stuff import how_to_page
return how_to_page(kw)
This, however, does not work when I try to use the validate_user() inside the Admin or Api or Analysis sections. These are in separate files.
import cherrypy
class Analyze:
#cherrypy.expose
#cherrypy.tools.validate_user() #### THIS LINE GIVES ERROR ####
def explore(self, *args, **kw): # #addkw(fetch=['uid'])
import explore
kw['uid'] = cherrypy.session.get('uid',-1)
return explore.explorer(args, kw)
The error is that cherrypy.tools doesn't have a validate_user function or method. But other things I assign in site.py do appear in cherrypy here. What's the reason why I can't use this tool in a separate file that is part of my overall site map?
If this is relevant, the validate_user() function simply looks at the cherrypy.request.cookie, finds the 'session_token' value, and compares it to our database and passes it along if the ID matches.
Sorry I don't know if the Analyze() and Api() and User() pages are subclasses, or nested classes, or extended methods, or what. So I can't give this a precise title. Do I need to pass in the parent class to them somehow?
The issue here is that Python processes everything except the function/method bodies during import. So in site.py, when you import user (or from user import <anything>), that causes all of the user module to be processed before the Python interpreter has gotten to the definition of the validate_user tool, including the decorator, which is attempting to access that tool by value (rather than by a reference).
CherryPy has another mechanism for decorating functions with config that will enable tools on those handlers. Instead of #cherrypy.tools.validate_user, use:
#cherrypy.config(**{"tools.validate_user.on": True})
This decorator works because instead of needing to access validate_user from cherrypy.tools to install itself on the handler, it instead configures CherryPy to install that tool on the handler later, when the handler is invoked.
If that tool is needed for all methods on that class, you can use that config decorator on the class itself.
You could alternatively, enable that tool for given endpoints in the server config, as mentioned in the other question.
I'm trying to use the Django LocMemCache for storing a few simple values but I'm not sure how to initialize the cache when Django starts. Django 1.7 come with Applications, allowing to run some code in AppConfig.ready(), that place would be perfect to initialize the cache, but according to Django 1.7 documentation:
" ... Although you can access model classes as described above, avoid
interacting with the database in your ready() implementation."
So, let say I want to store some DB queries from my model:
x = MyModel.objects.count()
y = MyModel.(a really expensive query)
How and when should I init the cache? Is there a recommended "best practice" for doing that?
Currently, I have just added the following cache.py to my application, but I'm not sure if
my code hits the database once (i.e. the first request) and then uses the cached value before (the following requests).
# cache.py
from django.core.cache import caches
from .models import MyModel
class Cache(object):
def __init__(self):
self.__count = MyModel.objects.count()
self.cache = caches['cache-storage']
#property
def total_count(self):
return self.cache.get('total_count', self.__count)
Then I use the cached values in this way:
# view.py
from .cache import Cache
cache = Cache()
...
(some view)
counter = cache.total_count
That tip about using databases in ready() is generally good advice but isn't always going to work out. Some applications need to access the database based on a schedule instead of a view. The only way I've seen to implement that is from within ready().
This does create an issue because all manage.py commands will run the code in ready(). This means that the application would be accessing the database with your runtime code while running migrations or creating super users which isn't ideal.
I avoid this issue by adding a sys.argv check in ready().
import sys
from django.apps import AppConfig
class MyApp(AppConfig):
name = 'MyApp'
def ready(self):
if 'runserver' in sys.argv:
# your code here
How do I access the scrapy settings in settings.py from the item pipeline. The documentation mentions it can be accessed through the crawler in extensions, but I don't see how to access the crawler in the pipelines.
UPDATE (2021-05-04)
Please note that this answer is now ~7 years old, so it's validity can no longer be ensured. In addition it is using Python2
The way to access your Scrapy settings (as defined in settings.py) from within your_spider.py is simple. All other answers are way too complicated. The reason for this is the very poor maintenance of the Scrapy documentation, combined with many recent updates & changes. Neither in the "Settings" documentation "How to access settings", nor in the "Settings API" have they bothered giving any workable example. Here's an example, how to get your current USER_AGENT string.
Just add the following lines to your_spider.py:
# To get your settings from (settings.py):
from scrapy.utils.project import get_project_settings
...
class YourSpider(BaseSpider):
...
def parse(self, response):
...
settings = get_project_settings()
print "Your USER_AGENT is:\n%s" % (settings.get('USER_AGENT'))
...
As you can see, there's no need to use #classmethod or re-define the from_crawler() or __init__() functions. Hope this helps.
PS. I'm still not sure why using from scrapy.settings import Settings doesn't work the same way, since it would be the more obvious choice of import?
Ok, so the documentation at http://doc.scrapy.org/en/latest/topics/extensions.html says that
The main entry point for a Scrapy extension (this also includes
middlewares and pipelines) is the from_crawler class method which
receives a Crawler instance which is the main object controlling the
Scrapy crawler. Through that object you can access settings, signals,
stats, and also control the crawler behaviour, if your extension needs
to such thing.
So then you can have a function to get the settings.
#classmethod
def from_crawler(cls, crawler):
settings = crawler.settings
my_setting = settings.get("MY_SETTING")
return cls(my_setting)
The crawler engine then calls the pipeline's init function with my_setting, like so:
def __init__(self, my_setting):
self.my_setting = my_setting
And other functions can access it with self.my_setting, as expected.
Alternatively, in the from_crawler() function you can pass the crawler.settings object to __init__(), and then access settings from the pipeline as needed instead of pulling them all out in the constructor.
The correct answer is: it depends where in the pipeline you wish to access the settings.
avaleske has answered as if you wanted access to the settings outside of your pipelines process_item method but it's very likely this is where you'll want the setting and therefore there is a much easier way as the Spider instance itself gets passed in as an argument.
class PipelineX(object):
def process_item(self, item, spider):
wanted_setting = spider.settings.get('WANTED_SETTING')
the project structure is quite flat, why not:
# pipeline.py
from myproject import settings