Can Flask or Django handle concurrent tasks? - python

What I'm trying to accomplish:
I have a sensor that is constantly reading in data. I need to print this data to a UI whenever data appears. While the aforementioned task is taking place, the user should be able to write data to the sensor. Ideally, both these tasks would / could happen at the same time. Currently, I have the program written using flask; but if django would be better suited (or a third party) I would be willing to make the switch. Note: this website will never be deployed so no need to worry about that. Only user will be me, running program from my laptop.
I have spent a lot of time researching flask async functions and coroutines; however I have not seen any clear indications if something like this would be possible.
Not looking for a line by line solution. Rather, a way (async, threading etc) to set up the code such that the aforementioned tasks are possible. All help is appreciated, thanks.

I'm a Django guy, so I'll throw out what I think could be possible
Django has a decorator #start_new_thread which can be put on any function and it will run in a thread.
You could make a view, POST to it with Javascript/Ajax and start a thread for communication with the sensor using the data POSTed.
You could also make a threading function that will read from the sensor
Could be a management command or a 'start' btn that POSTs to a view that then starts the thread
Note: You need to do Locks or some other logic so the two threads don't conflict when reading/writing
Maybe it's a single thread that reads/writes to the sensor and each loop it checks if there's anything to write (existence + contents of a file? Maybe db entry?
Per the UI, lets say a webpage. You're best best would be Websockets, but because you're the only one that will ever use it you could just write up some Javascript/Ajax that would Ping a view every x seconds and display the new data on the webpage
Note: that's essentially what websockets do, ping every x seconds
Now the common thread is Javascript/Ajax, this is so the page doesn't need to refresh and you can constantly see the data coming in without the page being refreshed.
You can probably do all of this in Flask if you find a similar threading ability and just add some javascript to the frontend
Hopefully you find some of this useful, and idk why stackoverflow hates these types of questions... They're literally fine

Related

Is it a bad practice to use sleep() in a web server in production?

I'm working with Django1.8 and Python2.7.
In a certain part of the project, I open a socket and send some data through it. Due to the way the other end works, I need to leave some time (let's say 10 miliseconds) between each data that I send:
while True:
send(data)
sleep(0.01)
So my question is: is it considered a bad practive to simply use sleep() to create that pause? Is there maybe any other more efficient approach?
UPDATED:
The reason why I need to create that pause is because the other end of the socket is an external service that takes some time to process the chunks of data I send. I should also point out that it doesnt return anything after having received or let alone processed the data. Leaving that brief pause ensures that each chunk of data that I send gets properly processed by the receiver.
EDIT: changed the sleep to 0.01.
Yes, this is bad practice and an anti-pattern. You will tie up the "worker" which is processing this request for an unknown period of time, which will make it unavailable to serve other requests. The classic pattern for web applications is to service a request as-fast-as-possible, as there is generally a fixed or max number of concurrent workers. While this worker is continually sleeping, it's effectively out of the pool. If multiple requests hit this endpoint, multiple workers are tied up, so the rest of your application will experience a bottleneck. Beyond that, you also have potential issues with database locks or race conditions.
The standard approach to handling your situation is to use a task queue like Celery. Your web-application would tell Celery to initiate the task and then quickly finish with the request logic. Celery would then handle communicating with the 3rd party server. Django works with Celery exceptionally well, and there are many tutorials to help you with this.
If you need to provide information to the end-user, then you can generate a unique ID for the task and poll the result backend for an update by having the client refresh the URL every so often. (I think Celery will automatically generate a guid, but I usually specify one.)
Like most things, short answer: it depends.
Slightly longer answer:
If you're running it in an environment where you have many (50+ for example) connections to the webserver, all of which are triggering the sleep code, you're really not going to like the behavior. I would strongly recommend looking at using something like celery/rabbitmq so Django can dump the time delayed part onto something else and then quickly respond with a "task started" message.
If this is production, but you're the only person hitting the webserver, it still isn't great design, but if it works, it's going to be hard to justify the extra complexity of the task queue approach mentioned above.

Python flask how to avoid multi file access at the same time (mutual exclusion on files)?

I am currently developing a data processing web server(linux) using python flask.
The general work flow is:
Get an input file from the user (handled by python flask)
Flask passes this input file to a java program
Java program processes this input file, saves the outputs (multiple files) on the server.
Flask calls another python script which will process these outputs to get the final result and return the result back to the client.
The problem is: between step 3 and step 4, there exist some intermediate files, this would not have been a problem at all if this is a local program. but as a server program, When more than one clients access this program, they could get unexpected result generated by input that is provided by another user who is using the web program at the same time.
From the point I see it, this is kind of a mutual exclusion problem on file access. I have had problems with mutual exclusion problems on threads before, I solved some of them using thread locks such as like synchronization in java and lock in pythons, but I am not sure what to do when it comes to files instead of threads.
It occurred to me that maybe I canspawn different copies of files based on different clients. But as I understand, the HTTP is stateless so you can't really know who is accessing the server. I don't want to add a login system and a user database to achieve this purpose as I sense there is a much simpler and better way to resolve this problem.
I have been looking for a good solution these days but haven't found an ideal one so I am looking for some advice here. Any suggestions will be highly appreciated. If you can suggest a viable solution, please feel free to provide me with your name so I can add you to the thank list of digital and paper publications about this tool when it's published.
As a system kind of person I suggest you something like this
https://docs.python.org/3/library/fcntl.html#fcntl.lockf
This is how I would solve it there is so many way to solve this problem and it is up to debate of course it is come hard with the best solution
Assume the output file is where the conflict happen
so you lock the file and you keep polling until the resource is release (the user need to wait) so you force one user to access the file at a time (polling here time.sleep) for like 2-3 seconds (add a try except) here thread lock on the output file only when the resource is release the next user process will pass through normally.
Another easy way is to dump the data in a rds like mysql or postgres it will handle all the file access nightmare occurred from concurrent request (put the output file in a db).

Multithreading or how to avoid blocking in a Python-application

I'm developing a Python-application that "talks" to the user, and performs tasks based on what the user says(e.g. User:"Do I have any new facebook-messages?", answer:"Yes, you have 2 new messages. Would you like to see them?"). Functionality like integration with facebook or twitter is provided by plugins. Based on predefined parsing rules, my application calls the plugin with the parsed arguments, and uses it's response. The application needs to be able to answer multiple query's from different users at the same time(or practically the same time).
Currently, I need to call a function, "Respond", with the user input as argument. This has some disadvantages, however:
i)The application can only "speak when it is spoken to". It can't decide to query facebook for new messages, and tell the user whether it does, without being told to do that.
ii)Having a conversation with multiple users at a time is very hard, because the application can only do one thing at a time: if Alice asks the application to check her Facebook for new messages, Bob can't communicate with the application.
iii)I can't develop(and use) plugins that take a lot of time to complete, e.g. download a movie, because the application isn't able to do anything whilesame the previous task isn't completed.
Multithreading seems like the obvious way to go, here, but I'm worried that creating and using 500 threads at a time dramatically impacts performance, so using one thread per query(a query is a statement from the user) doesn' seem like the right option.
What would be the right way to do this? I've read a bit about Twisted, and the "reactor" approach seems quite elegant. However, I'm not sure how to implement something like that in my application.
i didn't really understand what sort of application its going to be, but i tried to anwser your questions
create a thread that query's, and then sleeps for a while
create a thread for each user, and close it when the user is gone
create a thread that download's and stops
after all, there ain't going to be 500 threads.

Run a repeating task for a web app

This seems like a simple question, but I am having trouble finding the answer.
I am making a web app which would require the constant running of a task.
I'll use sites like Pingdom or Twitterfeed as an analogy. As you may know, Pingdom checks uptime, so is constantly checking websites to see if they are up and Twitterfeed checks RSS feeds to see if they;ve changed and then tweet that. I too need to run a simple script to cycle through URLs in a database and perform an action on them.
My question is: how should I implement this? I am familiar with cron, currently using it to do my server backups. Would this be the way to go?
I know how to make a Python script which runs indefinitely, starting back at the beginning with the next URL in the database when I'm done. Should I just run that on the server? How will I know it is always running and doesn't crash or something?
I hope this question makes sense and I hope I am not repeating someone else or anything.
Thank you,
Sam
Edit: To be clear, I need the task to run constantly. As in, check URL 1 in the database, check URl 2 in the database, check URL 3 and, when it reaches the last one, go right back to the beginning. Thanks!
If you need a repeatable running of the task which can be run from command line - that's what the cron is ideal for.
I don't see any demerits of this approach.
Update:
Okay, I saw the issue somewhat different. Now I see several solutions:
run the cron task at set intervals, let it process the data once per run, next time it will process the data on another run; use PIDs/Database/semaphores to avoid parallel processes;
update the processes that insert/update data in the database; let the information be processed when it is inserted/updated; c)
write a demon process which will reside in memory and check the data in real time.
cron would definitely be a way to go with this, as well as any other task scheduler you may prefer.
The main point is found in the title to your question:
Run a repeating task for a web app
The background task and the web application should be kept separate. They can share code, they can share access to a database, but they should be separate and discrete application contexts. (Consider them as separate UIs accessing the same back-end logic.)
The main reason for this is because web applications and background processes are architecturally very different and aren't meant to be mixed. Consider the structure of a web application being held within a web server (Apache, IIS, etc.). When is the application "running"? When it is "on"? It's not really a running task. It's a service waiting for input (requests) to handle and generate output (responses) and then go back to waiting.
Web applications are for responding to requests. Scheduled tasks or daemon jobs are for running repeated processes in the background. Keeping the two separate will make your management of the two a lot easier.

suggestions for a daemon that accepts zip files for processing

im looking to write a daemon that:
reads a message from a queue (sqs, rabbit-mq, whatever ...) containing a path to a zip file
updates a record in the database saying something like "this job is processing"
reads the aforementioned archive's contents and inserts a row into a database w/ information culled from file meta data for each file found
duplicates each file to s3
deletes the zip file
marks the job as "complete"
read next message in queue, repeat
this should be running as a service, and initiated by a message queued when someone uploads a file via the web frontend. the uploader doesn't need to immediately see the results, but the upload be processed in the background fairly expediently.
im fluent with python, so the very first thing that comes to mind is writing a simple server with twisted to handle each request and carry out the process mentioned above. but, ive never written anything like this that would run in a multi-user context. its not going to service hundreds of uploads per minute or hour, but it'd be nice if it could handle several at a time, reasonable. i also am not terribly familiar with writing multi-threaded applications and dealing with issues like blocking.
how have people solved this in the past? what are some other approaches i could take?
thanks in advance for any help and discussion!
I've used Beanstalkd as a queueing daemon to very good effect (some near-time processing and image resizing - over 2 million so far in the last few weeks). Throw a message into the queue with the zip filename (maybe from a specific directory) [I serialise a command and parameters in JSON], and when you reserve the message in your worker-client, no one else can get it, unless you allow it to time out (when it goes back to the queue to be picked up).
The rest is the unzipping and uploading to S3, for which there are other libraries.
If you want to handle several zip files at once, run as many worker processes as you want.
I would avoid doing anything multi-threaded and instead use the queue and the database to synchronize as many worker processes as you care to start up.
For this application I think twisted or any framework for creating server applications is going to be overkill.
Keep it simple. Python script starts up, checks the queue, does some work, checks the queue again. If you want a proper background daemon you might want to just make sure you detach from the terminal as described here: How do you create a daemon in Python?
Add some logging, maybe a try/except block to email out failures to you.
i opted to use a combination of celery (http://ask.github.com/celery/introduction.html), rabbitmq, and a simple django view to handle uploads. the workflow looks like this:
django view accepts, stores upload
a celery Task is dispatched to process the upload. all work is done inside the Task.

Categories