I'm trying to convert videos uploaded by users in a django app. Problem is, it's eating up resources and the site becomes unavailable during this time and a new instance has to be sprung up to scale it(I'm using Elastic Beanstalk). I did some research and decided to use SQS with a worker enviroment
I set up celery and added necessary configs to the settings.py file
BROKER_TRANSPORT = 'sqs'
BROKER_TRANSPORT_OPTIONS = {
'region': 'us-east-1',
'polling_interval': 3,
'visibility_timeout': 3600,
}
BROKER_USER = AWS_ACCESS_KEY_ID
BROKER_PASSWORD = AWS_SECRET_ACCESS_KEY
CELERY_DEFAULT_QUEUE = 'celery-convert-video'
CELERY_QUEUES = {
CELERY_DEFAULT_QUEUE: {
'exchange': CELERY_DEFAULT_QUEUE,
'binding_key': CELERY_DEFAULT_QUEUE,
}
}
I set up the POST url to /celery-convert-video/
I also do:
video = TestVideo.objects.create(uploaded_video=videoFile)
then
ConvertVideo.delay(video_id=video.id)
to send the task to SQS. It grabs he uploaded file using a url and converts it. Locally it works but the problem comes in the cloud.
I seem to be having problems with setting up the Worker Environment because it's health always ends up becoming "Severe" and everytime I check the cause It says 100% of requests return error 4xx . Logs say it's a 403.
The tasks show up fine in SQS (but are indecipherable, just random letters, I'm guessing it's encoded?) and get "in flight" so the problem I'm assuming is the worker. I have no idea how to set it up properly.
So a few questions:
Do I have to use the same deployment files for the worker and the main enviroment?
Do I have to edit the deployment files and add a /celery-convert-video/ view in my worker environment and then send it to the equivalent ConvertVideo function in the worker?
How do I connect the worker to the RDS database used by the main enviroment?
How do I get rid of the 403 errors and get the worker health back to green?
If anyone has a step by step tutorial or something that they can point me to that would be a big help!
P.S. I'm not an expert in cloud computing, It's actually my first stab at it so forgive my ignorance..
Related
so as the post says, i'm making an app, for which i have a server/rest api written with the django framework. When i try to call the api using volley, i fail to get any response, and i don't see it show up on my server's dashboard
The server is running on my local machine, and the app is running on an emulator, since i'm using android studio
i'd like to send a request and display the response in a textview on the app, for now thats all i need to continue onto the next part
What ends up happening instead is that it seems to not even hit my server, the app displays the text i set for if the request fails, adn it doesn't show up on the dashboard of my server
here's basically all the code in my mobile app
val textView = findViewById<TextView>(R.id.text)
val queue = Volley.newRequestQueue(this)
val url = "http://127.0.0.1:8000//index"//i also tried to use 192.168.1.2 as the host ip. the port is always 8000
val stringRequest = StringRequest(Request.Method.GET, url,
Response.Listener<String> { response ->
// Display the first 500 characters of the response string.
textView.text = "Response is: ${response.substring(0, 500)}"
},
Response.ErrorListener { textView.text = "That didn't work!" })
queue.add(stringRequest)
Please check the logs. They should show you a security error because you use HTTP clear text traffic.
With the default settings, apps aren't allowed to transmit cleartest HTTP traffic.
You must either add an SSL certificate to your backend or allow HTTP traffic.
Check out this StackOverflow thread to see how to enable clear text web requests.
I'm trying to get Flask-Mail setup on in Flexible ENV on Google App Engine. Flask-Mail works on my localhost using the credentials for a domain I am trying to use to send the mail. However, when using it on GAE through my API it returns a 502 error, however it shows no error messages in the logs or console. Going through the documentation for GAE Flexible it doesn't mention anything about NOT being able to use it, however it doesn't show how one would setup Flask-Mail either.
I have this..
mail = Mail()
print('1') // We Get here
msg = Message("Hello",
sender="me#mydomain.com",
recipients=["me#mydomain.com"])
print('2') // We get here
msg.body = 'Testing'
print('3') // We get here
mail.send(msg)
print('4') // This never gets call because I timeout on a 502 before this
I can tell I am not getting any fatal errors because the app stays working. However this fails with the 502. I have tried adding my email to the list of authorized senders but it doesn't seem to have helped.
I would appreciate any feedback. If I forced to use a 3rd party service to send mail it may cause me to move the project off of GAE.
As Ivan posted on his comment, to send email from a GAE app you need to use a mail service. Right now there are 3 options for apps on a flexible environment: Mailgun, MailJet and SendGrid. Choose the one you see better for your app.
After setting up an account on the mail service you have chosen, you have to prepare your code by integrating the parts related to the mail service.
These tutorials should help you establish the mail service for your app:
Mailgun
MailJet
SendGrid
I've had the same error but on a virtual machine on the internet ( linode service ) and it turned out that it has some thing to do with rDNS and some domain name config that you have to set up for your Ip address to get things working correctly , check this
https://www.linode.com/community/questions/19082/i-just-created-my-first-linode-and-i-cant-send-emails-why
I'm working on a software function in which I have to delete files periodically using Django + cron + AWS. The problem is I can't make it work. What's the best way to make it work? Am I missing some AWS configuration? I've configured one web server and one worker environment, deployed the same application version on them. The task is a view mapped into a url (accessing the url the function is executed). There's a confirmation message on the worker environment:
Successfully loaded 1 scheduled tasks from cron.yaml.
But also an 403 error on the worker access_log:
"POST /networks_app/delete_expired_files HTTP/1.1" 403 2629 "-" "aws-sqsd/2.0"
cron.yaml:
version: 1
cron:
- name: "delete_expired_files"
url: "/networks_app/delete_expired_files"
schedule: "10 * * * *"
url mapping at urls.py:
urlpatterns = [
url(r'^delete_expired_files', views.delete_expired_files, name='delete_expired_files'),
]
function to delete files at views.py:
def delete_expired_files(request):
users = DemoUser.objects.all()
for user in users:
documents = Document.objects.filter(owner=user.id)
if documents:
for doc in documents:
now = timezone.now()
if now >= doc.date_published + timedelta(days=doc.owner.group.valid_time):
doc.delete()
My IAM roles are:
AmazonSQSFullAccess
AmazonS3FullAccess
AWSElasticBeanstalkFullAccess
AmazonDynamoDBFullAccess
If I access the url via browser, the task is executed (the expired files are deleted). However, the worker environment was supposed to access the url and execute the task automatically and not only when I access the url via browser. How can I make it work?
I had a similar issue. In my case, I needed to modify 2 things to get it to work:
Ensure the view is set up to accept a POST action from AWS. Previously I had mine set up as GET only, and it doesn't seem that AWS supports GET cron requests.
Once it supports POST, make it CSRF-exempt, so that Django isn't afraid that there's a CSRF threat taking place when AWS makes POST requests lacking a CSRF token. You can use the #csrf_exempt decorator described at this SO answer; in my case, it was slightly more complicated still by my using a class-based view, and I found this other SO answer on how to include the #csrf_exempt decorator on a class-based view.
when setting up Django to use Memcached for caching (in my case, I want to to use session caching), in settings.py we set
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'LOCATION': '127.0.0.1:11211',
}
}
I will be running the project in App Engine so my question is what do I do to for the LOCATION entry?
As it happens, I have been porting a Django (1.6.5) application to GAE over the last few days (GAE Development SDK 1.9.6). I don't have a big need for caching right now but it's good to know it's available if I need it.
So I just tried using django.core.cache.backends.memcached.MemcachedCache as my cache backend (set up as you describe in your question, and I put python-memcached in my libs folder) and
SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
to manage my sessions and GAE gave me the error:
RuntimeError: Unable to create a new session key. It is likely that the cache is unavailable.
Anyway...
...even if you could get this to work it's surely better to use Google's API lib and borrow from the Django Memcached implementation, especially as the Google lib has been designed to be compatible with python-memcached and otherwise your app could break at any time with an SDK update. Create a python module such as my_project/backends.py:
import pickle
from django.core.cache.backends.memcached import BaseMemcachedCache
class GaeMemcachedCache(BaseMemcachedCache):
"An implementation of a cache binding using google's app engine memcache lib (compatible with python-memcached)"
def __init__(self, server, params):
from google.appengine.api import memcache
super(GaeMemcachedCache, self).__init__(server, params,
library=memcache,
value_not_found_exception=ValueError)
#property
def _cache(self):
if getattr(self, '_client', None) is None:
self._client = self._lib.Client(self._servers, pickleProtocol=pickle.HIGHEST_PROTOCOL)
return self._client
Then your cache setting becomes:
CACHES = {
'default': {
'BACKEND': 'my_project.backends.GaeMemcachedCache',
}
}
That's it! This seems to work fine but I should be clear that it is not rigorously tested!
Aside
Have a poke around in google.appengine.api.memcache.__init__.py in your GAE SDK folder and you will find:
def __init__(self, servers=None, debug=0,
pickleProtocol=cPickle.HIGHEST_PROTOCOL,
pickler=cPickle.Pickler,
unpickler=cPickle.Unpickler,
pload=None,
pid=None,
make_sync_call=None,
_app_id=None):
"""Create a new Client object.
No parameters are required.
Arguments:
servers: Ignored; only for compatibility.
...
i.e. Even if you could find a LOCATION for your memcache instance in the cloud, Google's own library would ignore it.
The location should be set as your ip and port where your memcache daemon is running.
Check this in django official documentation.
Set LOCATION to ip:port values, where ip is the IP address of the
Memcached daemon and port is the port on which Memcached is running,
or to a unix:path value, where path is the path to a Memcached Unix
socket file.
https://docs.djangoproject.com/en/dev/topics/cache/
If you are following this documentation
http://www.allbuttonspressed.com/projects/djangoappengine
And cloning this (as asked in the above link)
https://github.com/django-nonrel/djangoappengine/blob/master/djangoappengine/settings_base.py
I don't think you need to define a location. Is it throwing an error when you don't define it?
My Setup:
I have an existing python script that is using Tweepy to access the Twitter Streaming API. I also have a website that shows aggregate real-time information from other sources from various back-ends.
My Ideal Scenario:
I want to publish real-time tweets as well as real-time updates of my other information to my connected users using Socket.IO.
It would be really nice if I could do something as simple as an HTTP POST (from any back-end) to broadcast information to all the connected clients.
My Problem:
The Socket.IO client implementation is super straight forward... i can handle that. But I can't figure out if the functionality I'm asking for already exists... and if not, what would be the best way to make it happen?
[UPDATE]
My Solution: I created a project called Pega.IO that does what I was looking for. Basically, it lets you use Socket.IO (0.8+) as usual, but you can use HTTP POST to send messages to connected users.
It uses the Express web server with a Redis back-end. Theoretically this should be pretty simple to scale -- I will continue contributing to this project going forward.
Pega.IO - github
To install on Ubuntu, you just run this command:
curl http://cloud.github.com/downloads/Gootch/pega.io/install.sh | sh
This will create a Pega.IO server that is listening on port 8888.
Once you are up and running, just:
HTTP POST http://your-server:8888/send
with data that looks like this:
channel=whatever&secretkey=mysecret&message=hello+everyone
That's all there is to it. HTTP POST from any back-end to your Pega.IO server.
The best way I've found for this sort of thing is using a message broker. Personally, I've used RabbitMQ for this, which seems to meet the requirements mentioned in your comment on the other answer (socket.io 0.7 and scalable). If you use RabbitMQ, I'd recommend the amqp module for node, available through npm, and the Pika module for Python.
An example connector for Python using pika. This example accepts a single json-serialized argument:
def amqp_transmit(message):
connection = pika.AsyncoreConnection(pika.ConnectionParameters(host=settings.AMQP_SETTINGS['host'],
port=settings.AMQP_SETTINGS['port'],
credentials=pika.PlainCredentials(settings.AMQP_SETTINGS['username'],
settings.AMQP_SETTINGS['pass'])))
channel = connection.channel()
channel.exchange_declare(exchange=exchange_name, type='fanout')
channel.queue_declare(queue=NODE_CHANNEL, auto_delete=True, durable=False, exclusive=False)
channel.basic_publish(exchange=exchange_name,
routing_key='',
body=message,
properties=pika.BasicProperties(
content_type='application/json'),
)
print ' [%s] Sent %r' %(exchange_name, message)
connection.close()
Very basic connection code on the node end might look like this:
var connection = amqp.createConnection(
{host: amqpHost,
port: amqpPort,
password: amqpPass});
function setupAmqpListeners() {
connection.addListener('ready', amqpReady)
connection.addListener('close', function() {
console.log('Uh oh! AMQP connection failed!');
});
connection.addListener('error', function(e) {throw e});
}
function amqpReady(){
console.log('Amqp Connection Ready');
var q, exc;
q = connection.queue(queueName,
{autoDelete: true, durable: false, exclusive: false},
function(){
console.log('Amqp Connection Established.');
console.log('Attempting to get an exchange named: '+exchangeName);
exc = connection.exchange(exchangeName,
{type: 'fanout', autoDelete: false},
function(exchange) {
console.log('Amqp Exchange Found. ['+exchange.name+']');
q.bind(exc, '#');
console.log('Amqp now totally ready.');
q.subscribe(routeAmqp);
}
);
}
);
}
routeAmqp = function(msg) {
console.log(msg);
doStuff(msg);
}
Edit: The example above uses a fan-out exchange that does not persist messages. Fan-out exchange is likely going to be your best option since scalability is a concern (ie: you are running more than one box running Node that clients can be connected to).
Why not write your Node app so that there are two parts:
The Socket.IO portion, which communicates directly with the clients, and
An HTTP API of some sort, which receives POST requests and then broadcasts appropriate messages with Socket.IO.
In this way, your application becomes a "bridge" between your non-Node apps and your users' browsers. The key here is to use Socket.IO for what it was made for--real time communication to browsers--and rely on other Node technologies for other parts of your application.
[Update]
I'm not on a development environment at the moment, so I can't get you a working example, but some pseudocode would look something like this:
http = require('http');
io = require('socket.io');
server = http.createServer(function(request, response) {
// Parse the HTTP request to get the data you want
io.sockets.emit("data", whatever); // broadcast the data to Socket.IO clients
});
server.listen(8080);
socket_server = io.listen(server);
With this, you'd have a web server on port 8080 that you can use to listen to web requests (you could use a framework such as Express or one of many others to parse the body of the POST request and extract the data you need).