I'm using zappa to deploy a python/django wsgi app to AWS API Gateway and Lambda.
I have all of these in my environment:
NEW_RELIC_CONFIG_FILE: /var/task/newrelic.ini
NEW_RELIC_LICENSE_KEY: redacted
NEW_RELIC_ENVIRONMENT: dev-zappa
NEW_RELIC_STARTUP_DEBUG: "on"
NEW_RELIC_ENABLED: "on"
I'm doing "manual agent start" in my wsgi.py as documented:
import newrelic.agent
# Will collect NEW_RELIC_CONFIG_FILE and NEW_RELIC_ENVIRONMENT from the environment
# Dear god why??!?!
# NB: Looks like this IS what makes it go
newrelic.agent.global_settings().enabled = True
newrelic.agent.initialize('/var/task/newrelic.ini', 'dev-zappa', log_file='stderr', log_level=logging.DEBBUG)
I'm not using #newrelic.agent.wsgi_application since django should be auto-magically detected
I've added a middleware to shutdown the agent before the lambda gets frozen, but the logging suggests that only the first request is being sent to New Relic. Without the shutdown, I get no logging from the New Relic agent, and there are no events in APM.
class NewRelicShutdownMiddleware(MiddlewareMixin):
"""Simple middleware that shutsdown the NR agent at the end of a request"""
def process_request(self, request):
pass
# really wait for the agent to register with collector
# Enabling this causes more log messages about starting data samplers, but only on the first request
# newrelic.agent.register_application(timeout=10)
def process_response(self, request, response):
newrelic.agent.shutdown_agent(timeout=2.5)
return response
def process_exception(self, request, exception):
pass
newrelic.agent.shutdown_agent(timeout=2.5)
In my newrelic.ini I have the following, but when I log newrelic.agent.global_settings() it contains the default App name (which did get created in APM) and enabled = False, which led to some of the hacks above (environment var, and just editing newrelic.agent.global_settings() before initialize :
[newrelic:dev-zappa]
app_name = DEV APP zappa
monitor_mode = true
TL;DR - Two questions:
How to get New Relic to read it's ini file when it doesn't want to?
How to get New Relic to record data for all requests in AWS lambda?
Zappa does not use your wsgi.py file (currently), so the hooks there aren't happening. Take a look at this PR which allows for it: https://github.com/Miserlou/Zappa/pull/1251
Related
I am currently working on a Django app that uses a module I created called "stream". This module starts a thread to open a camera with OpenCV and yield frames.
Now that I am trying to run it with nginx and uwsgi I realized that the stream module gets initialized with every new worker. This causes a problem as each worker is attempting to start a new connection with the same camera.
Is there a way to make this stream module globally accessible between workers instead of being initialized by every worker?
(below is a snippet of my views.py)
from . import stream
#xframe_options_exempt
#login_required
def stream_log(request) -> StreamingHttpResponse:
"""
This endpoint requires a logged in user to
view the logged data for the current stream session. It is
fed through an iframe into the `view` page
:param request: http request
:return: StreamingHttpResponse
"""
try:
return StreamingHttpResponse(stream.log_feed())
except:
pass
#login_required
#gzip.gzip_page
def camera_stream(request) -> StreamingHttpResponse:
"""
This endpoint requires a logged in user to
view the stream feed from the camera. It is
fed through an iframe into the `view` page
:param request: http request
:return: StreamingHttpResponse
"""
try:
return StreamingHttpResponse(stream.video_camera.feed(),
content_type="multipart/x-mixed-replace;boundary=frame")
except:
pass
EDIT:
The bot said I wasn't clear enough so I'll add some more details. The streaming workings when running just one Django worker or just using the runserver command unless more than one user connects.
I added a print("ifname") to the module __init__.py to see if it was being called more than once. In the uwsgi log file we can see that it is
I have a flask server running within a gunicorn.
In my flask application I want to handle large upload files (>20GB), so I plan on letting a celery task do the handling of the large file.
The problem is that retrieving the file from request.files already takes quite long, in the meantime gunicorn terminates the worker handling that request. I could increase the timeout time, but the maximum file size is currently unknown, so I don't know how much time I would need.
My plan was to make the request context available to the celery task, as it was described here: http://xion.io/post/code/celery-include-flask-request-context.html, but I cannot make it work
Q1 Is the signature right?
I set the signature with
celery.signature(handle_large_file, args={}, kwargs={})
and nothing is complaining. I get the arguments I pass from the flask request handler to the celery task, but that's it. Should I somehow get a handle to the context here?
Q2 how to use the context?
I would have thought if the flask request context was available I could just use request.files in my code, but then I get the warning that I am out of context.
Using celery 4.4.0
Code:
# in celery.py:
from flask import request
from celery import Celery
celery = Celery('celery_worker',
backend=Config.CELERY_RESULT_BACKEND,
broker=Config.CELERY_BROKER_URL)
#celery.task(bind=True)
def handle_large_file(task_object, data):
# do something with the large file...
# what I'd like to do:
files = request.files['upfile']
...
celery.signature(handle_large_file, args={}, kwargs={})
# in main.py
def create_app():
app = Flask(__name__.split('.')[0])
...
celery_worker.conf.update(app.config)
# copy from the blog
class RequestContextTask(Task):...
celery_worker.Task = RequestContextTask
# in Controller.py
#FILE.route("", methods=['POST'])
def upload():
data = dict()
...
handle_large_file.delay(data)
What am I missing?
I use tornado 4.5.2 with routing implementation.
My server have two versions of API, let them call base and fancy. So a client is able to use both of them:
GET /base/foo
GET /base/baz
GET /fancy/foo
GET /fancy/baz
However, some fancy handlers may not be implemented; In this case a base one should be used.
In example:
application = web.Application([
(r"/base/foo", handlers.BaseFooHandler, {"some": "settings"}),
(r"/base/baz", handlers.BaseBazHandler, {"some": "settings"}),
(r"/fancy/foo", handlers.FancyFooHandler, {"some": "settings"}),
])
when cilent requests GET /fancy/baz the BaseBazHandler should do the job.
How can I achieve that with tornado routing?
Since you're registering your routes using a decorator, you can create a custom router that will respond to all the unmatched/unregistered /fancy/.* routes. For this to work correctly, you'll have to register your router at the end.
That way your custom router will be matched only if there isn't already a /fancy/... route registered. So, that means the custom router class will need to do these things:
Check if a fallback BaseBazHandler exists or not.
If exists, forward the request to it.
Else, return a 404 error.
Before proceeding any further, you'll have to create a custom class to handle 404 requests. This is necessary because if not handler is found, then this is the easiest way to return a 404 error.
class Handle404(RequestHandler):
def get(self):
self.set_status(404)
self.write('404 Not Found')
Okay, now let's write the custom router:
from tornado.routing import Router
class MyRouter(Router):
def __init__(self, app):
self.app = app
def find_handler(self, request, **kwargs):
endpoint = request.path.split('/')[2] # last part of the path
fallback_handler = 'Base%sHandler' % endpoint.title()
# fallback_handler will look like this - 'BaseBazHandler'
# now check if the handler exists in the current file
try:
handler = globals()[fallback_handler]
except KeyError:
handler = Handle404
return self.app.get_handler_delegate(request, handler)
Finally, after you've added all other routes, you can register your custom router:
from tornado.routing import PathMatches
application.add_handlers(r'.*', # listen for all hosts
[
(PathMatches(r"/fancy/.*"), MyRouter(application)),
]
)
I should point out that MyRouter.find_handler, only check checks for handlers in the current module (file). Modify the code to search for handlers in different modules, if you want.
Trying to get authentication working with Django channels with a very simple websockets app that echoes back whatever the user sends over with a prefix "You said: ".
My processes:
web: gunicorn myproject.wsgi --log-file=- --pythonpath ./myproject
realtime: daphne myproject.asgi:channel_layer --port 9090 --bind 0.0.0.0 -v 2
reatime_worker: python manage.py runworker -v 2
I run all processes when testing locally with heroku local -e .env -p 8080, but you could also run them all separately.
Note I have WSGI on localhost:8080 and ASGI on localhost:9090.
Routing and consumers:
### routing.py ###
from . import consumers
channel_routing = {
'websocket.connect': consumers.ws_connect,
'websocket.receive': consumers.ws_receive,
'websocket.disconnect': consumers.ws_disconnect,
}
and
### consumers.py ###
import traceback
from django.http import HttpResponse
from channels.handler import AsgiHandler
from channels import Group
from channels.sessions import channel_session
from channels.auth import channel_session_user, channel_session_user_from_http
from myproject import CustomLogger
logger = CustomLogger(__name__)
#channel_session_user_from_http
def ws_connect(message):
logger.info("ws_connect: %s" % message.user.email)
message.reply_channel.send({"accept": True})
message.channel_session['prefix'] = "You said"
# message.channel_session['django_user'] = message.user # tried doing this but it doesn't work...
#channel_session_user_from_http
def ws_receive(message, http_user=True):
try:
logger.info("1) User: %s" % message.user)
logger.info("2) Channel session fields: %s" % message.channel_session.__dict__)
logger.info("3) Anything at 'django_user' key? => %s" % (
'django_user' in message.channel_session,))
user = User.objects.get(pk=message.channel_session['_auth_user_id'])
logger.info(None, "4) ws_receive: %s" % user.email)
prefix = message.channel_session['prefix']
message.reply_channel.send({
'text' : "%s: %s" % (prefix, message['text']),
})
except Exception:
logger.info("ERROR: %s" % traceback.format_exc())
#channel_session_user_from_http
def ws_disconnect(message):
logger.info("ws_disconnect: %s" % message.__dict__)
message.reply_channel.send({
'text' : "%s" % "Sad to see you go :(",
})
And then to test, I go into Javascript console on the same domain as my HTTP site, and type in:
> var socket = new WebSocket('ws://localhost:9090/')
> socket.onmessage = function(e) {console.log(e.data);}
> socket.send("Testing testing 123")
VM481:2 You said: Testing testing 123
And my local server log shows:
ws_connect: test#test.com
1) User: AnonymousUser
2) Channel session fields: {'_SessionBase__session_key': 'chnb79d91b43c6c9e1ca9a29856e00ab', 'modified': False, '_session_cache': {u'prefix': u'You said', u'_auth_user_hash': u'ca4cf77d8158689b2b6febf569244198b70d5531', u'_auth_user_backend': u'django.contrib.auth.backends.ModelBackend', u'_auth_user_id': u'1'}, 'accessed': True, 'model': <class 'django.contrib.sessions.models.Session'>, 'serializer': <class 'django.core.signing.JSONSerializer'>}
3) Anything at 'django_user' key? => False
4) ws_receive: test#test.com
Which, of course, makes no sense. Few questions:
Why would Django see message.user as an AnonymousUser but have the actual user id _auth_user_id=1 (this is my correct user ID) in the session?
I am running my local server (WSGI) on 8080 and daphne (ASGI) on 9090 (different ports). And I didn't include session_key=xxxx in my WebSocket connection - yet Django was able to read my browser's cookie for the correct user, test#test.com? According to Channels docs, this shouldn't be possible.
Under my setup, what is the best / simplest way to carry out authentication with Django channels?
Note: This answer is explicit to channels 1.x, channels 2.x uses a different auth mechanism.
I had a hard time with django channels too, i had to dig into the source code to better understand the docs ...
Question 1:
The docs mention this kind of long trail of decorators relying on each other (http_session, http_session_user ...) that you can use to wrap your message consumers, in the middle of that trail it states this:
Now, one thing to note is that you only get the detailed HTTP information during the connect message of a WebSocket connection (you can read more about that in the ASGI spec) - this means we’re not wasting bandwidth sending the same information over the wire needlessly.
This also means we’ll have to grab the user in the connection handler and then store it in the session;....
Its easy to get lost in all that, at least we both did ...
You just have to remember that this happens when you use channel_session_user_from_http:
It calls http_session_user
a. calls http_session which will parse the message and give us a message.http_session attribute.
b. Upon returning from the call, it initiates a message.user based on the information it got in message.http_session ( this will bite you later)
It calls channel_session which will initiate a dummy session in message.channel_session and ties it to the message reply channel.
Now it calls transfer_user which will move the http_session into the channel_session
This happens during the connection handling of a websocket, so on subsequent messages you won't have acces to detailed HTTP information, so what's happening after the connect is that you're calling channel_session_user_from_http again, which in this situation (post-connect messages) calls http_session_user which will attempt reading the Http information but fails resulting in setting message.http_session to None and overriding message.user to AnonymousUser.
That's why you need to use channel_session_user in this case.
Question 2:
Channels can use Django sessions either from cookies (if you’re running your websocket server on the same port as your main site, using something like Daphne), or from a session_key GET parameter, which works if you want to keep running your HTTP requests through a WSGI server and offload WebSockets to a second server process on another port.
Remember http_session, that decorator that gets us the message.http_session data? it appears that if it doesn't find a session_key GET parameter it fails to settings.SESSION_COOKIE_NAME, which is the regular sessionid cookie, so whether you provide session_key or not, you'll still get connected if you're logged in, of course that happens only when your ASGI and WSGI servers are on the same domain (127.0.0.1 in this case), the port difference doesn't matter.
I think the difference that the docs are trying to communicate but didn't expand on is that you need to setup session_key GET parameter when having your ASGI and WSGI servers on different domains since cookies are restricted by domain not port.
Due to that lack of explanation i had to test running ASGI and WSGI on same port and different port and the result was the same, i was still getting authenticated, changed one server domain to 127.0.0.2 instead of 127.0.0.1 and the authentication was gone, set the session_key get parameter and the authentication was back again.
Update: a rectification of the docs paragraph was just pushed to the channels repo, it was meant to mention domain instead of port like i mentioned.
Question 3:
my answer is the same as turbotux's but longer, you should use #channel_session_user_from_http on ws_connect and #channel_session_user on ws_receive and ws_disconnect, nothing from what you showed tells that it won't work if you do that change, maybe try removing http_user=True from your receive consumer? even thou i suspect it has no effect since its undocumented and intended only to be used by Generic Consumers...
Hope this helps!
To answer your first question you need to use the:
channel_session_user
decorator in the receive and disconnect calls.
channel_session_user_from_http
calls the transfer_user session during the connect method to transfer the http session to the channel session. This way all future calls may access the channel session to retrieve user information.
To your second question I believe what you are seeing is that default web socket library passes the browser cookies over the connection.
Third, I think your setup will be working quite well once have changed the decorators.
I ran into this problem and I found that it was due to a couple of issues that might be the cause. I'm not suggesting this will solve your issue, but might give you some insight. Keep in mind I am using rest framework. First I was overriding the User model. Second when I defined the application variable in my root routing.py I didn't use my own AuthMiddleware. I was using the docs suggested AuthMiddlewareStack. So, per the Channels docs, I defined my own custom authentication middleware, which takes my JWT value from the cookies, authenticates it and assigns it to the scope["user"] like so:
routing.py
from channels.routing import ProtocolTypeRouter, URLRouter
import app.routing
from .middleware import JsonTokenAuthMiddleware
application = ProtocolTypeRouter(
{
"websocket": JsonTokenAuthMiddleware(
(URLRouter(app.routing.websocket_urlpatterns))
)
}
middleware.py
from http import cookies
from django.contrib.auth.models import AnonymousUser
from django.db import close_old_connections
from rest_framework.authtoken.models import Token
from rest_framework_jwt.authentication import BaseJSONWebTokenAuthentication
class JsonWebTokenAuthenticationFromScope(BaseJSONWebTokenAuthentication):
def get_jwt_value(self, scope):
try:
cookie = next(x for x in scope["headers"] if x[0].decode("utf-8")
== "cookie")[1].decode("utf-8")
return cookies.SimpleCookie(cookie)["JWT"].value
except:
return None
class JsonTokenAuthMiddleware(BaseJSONWebTokenAuthentication):
def __init__(self, inner):
self.inner = inner
def __call__(self, scope):
try:
close_old_connections()
user, jwt_value =
JsonWebTokenAuthenticationFromScope().authenticate(scope)
scope["user"] = user
except:
scope["user"] = AnonymousUser()
return self.inner(scope)
Hope this helps this helps!
When I run py.test --with-gae, I get the following error (I have pytest_gae plugin installed):
def get_current_session():
"""Returns the session associated with the current request."""
> return _tls.current_session
E AttributeError: 'thread._local' object has no attribute 'current_session'
gaesessions/__init__.py:50: AttributeError
I'm using pytest to test my google appengine application. The application runs fine when run in the localhost SDK or when deployed to GAE servers. I just can't figure out how to make pytest work with gaesessions.
My code is below:
test_handlers.py
from webtest import TestApp
import appengine_config
def pytest_funcarg__anon_user(request):
from main import app
app = appengine_config.webapp_add_wsgi_middleware(app)
return TestApp(app)
def test_session(anon_user):
from gaesessions import get_current_session
assert get_current_session()
appengine_config.py
from gaesessions import SessionMiddleware
def webapp_add_wsgi_middleware(app):
from google.appengine.ext.appstats import recording
app = recording.appstats_wsgi_middleware(app)
app = SessionMiddleware(app, cookie_key="replaced-with-this-boring-text")
return app
Relevant code from gaesessions:
# ... more code are not show here ...
_tls = threading.local()
def get_current_session():
"""Returns the session associated with the current request."""
return _tls.current_session
# ... more code are not show here ...
class SessionMiddleware(object):
"""WSGI middleware that adds session support.
``cookie_key`` - A key used to secure cookies so users cannot modify their
content. Keys should be at least 32 bytes (RFC2104). Tip: generate your
key using ``os.urandom(64)`` but do this OFFLINE and copy/paste the output
into a string which you pass in as ``cookie_key``. If you use ``os.urandom()``
to dynamically generate your key at runtime then any existing sessions will
become junk every time your app starts up!
``lifetime`` - ``datetime.timedelta`` that specifies how long a session may last. Defaults to 7 days.
``no_datastore`` - By default all writes also go to the datastore in case
memcache is lost. Set to True to never use the datastore. This improves
write performance but sessions may be occassionally lost.
``cookie_only_threshold`` - A size in bytes. If session data is less than this
threshold, then session data is kept only in a secure cookie. This avoids
memcache/datastore latency which is critical for small sessions. Larger
sessions are kept in memcache+datastore instead. Defaults to 10KB.
"""
def __init__(self, app, cookie_key, lifetime=DEFAULT_LIFETIME, no_datastore=False, cookie_only_threshold=DEFAULT_COOKIE_ONLY_THRESH):
self.app = app
self.lifetime = lifetime
self.no_datastore = no_datastore
self.cookie_only_thresh = cookie_only_threshold
self.cookie_key = cookie_key
if not self.cookie_key:
raise ValueError("cookie_key MUST be specified")
if len(self.cookie_key) < 32:
raise ValueError("RFC2104 recommends you use at least a 32 character key. Try os.urandom(64) to make a key.")
def __call__(self, environ, start_response):
# initialize a session for the current user
_tls.current_session = Session(lifetime=self.lifetime, no_datastore=self.no_datastore, cookie_only_threshold=self.cookie_only_thresh, cookie_key=self.cookie_key)
# create a hook for us to insert a cookie into the response headers
def my_start_response(status, headers, exc_info=None):
_tls.current_session.save() # store the session if it was changed
for ch in _tls.current_session.make_cookie_headers():
headers.append(('Set-Cookie', ch))
return start_response(status, headers, exc_info)
# let the app do its thing
return self.app(environ, my_start_response)
The problem is that your gae sessions is not yet called until the app is also called. The app is only called when you make a request to it. Try inserting a request call before you check for the session value. Check out the revised test_handlers.py code below.
def test_session(anon_user):
anon_user.get("/") # get any url to call the app to create a session.
from gaesessions import get_current_session
assert get_current_session()