i have some project working with Flask and Jinja2, and the web ui just table like this image
that all Routes is triggering program backend for scraping some website, this web is working fine. but i wonder how make it more flexible.
i just thought of making auto-run that all routes with one button, and have their own status.
that column will have Status and their status like (running, done, stop, not running etc.) but i cannot imagine that logic.
i already create for auto-run and work fine, and my question just how to know their status is running, done, stop or not running in the background.
any idea really appreciate. this is my own project so i'm so excite to make this work
The simplest way of doing this is by running a log of each stage of the scraping process. So for example:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get('https://www.google.com/')
print("Loaded Google.com")
some_task = driver.find_element(By.XPATH, '//button[text()="Some text"]')
print("Got some task")
(Locating elements as per: https://selenium-python.readthedocs.io/locating-elements.html)
However, for real-time processing of task status and for more efficiency, you can use Celery.
Celery works well for web scraping tasks as it actually allows you to asynchronously offload work from your Python app to workers and task queues.
You can then retrieve proper status reports from each worker. See: https://docs.celeryq.dev/en/stable/reference/celery.states.html
An easy and efficient approach will be to use AJAX to concurrently check a log file for the status of process and update the DOM element...
I would suggest you to have a seperate log file where the backend flask processes update the current status they are working on. something like initially everything is on status "SLEEP", and once the process is triggered it changes its corresponding log status to "RUNNING"... Once the process ends, it changes the log status to "DONE".
Use AJAX from the front end to parse the Log File every N seconds and update the DOM status element based on the status parsed from the log file by AJAX
PS. You can also add animation effects like a spinner on the DOM element on running process through AJAX
Related
I have a flask server and I'm trying to run a script in the background to enable a pairing process on a raspberry pi. I have a button to enable and disable this which works fine.
I use process = subprocess.Popen(["python3","bt.py"]) to run the process then process.kill() when I need to stop it.
But once the task stops I need to update the page with the new device information, but I'm having trouble detecting when the pairing script stops via flask. I know I can run process.poll() to check if the subprocess is still running but I can't think of any way to implement this into flask as it would need to run in a loop, which would stop the client from receiving the page.
The only thing I think could work would be to edit a file from the bt.py script and have the JS part of my flask app detect the change in the file and cause a redirect. However, this seems clunky and feels like bad practice. Any suggestions would be great
So I am currently writing a script that will allow me to wait on a website that has queue page before I can access contents
Essentially queue page is where they let people in randomly. In order to increase my chance of getting in faster , I am writing multi thread script and have each thread wait in line.
First thing that came to my mind is would session.get() works in this case?
If I send session get request every 10 seconds, would I stay hold my position in queue? Or would I end up at the end?
Some info about website, they randomly let people in. I am not sure if refreshing page reset your chance or not. But best thing would be to leave page open and let it do it things.
I could use phantomjs but I would rather not have over 100 headless browser open slowing down my program and computer
You don't need to keep sending the session, as long as you keep the Python application running you should be good.
I have an anchor tag that hits a route which generates a report in a new tab. I am lazyloading the report specs because I don't want to have copies of my data in the original place and on the report object. But collecting that data takes 10-20 seconds.
from flask import render_template
#app.route('/report/')
#app.route('/report/<id>')
def report(id=None):
report_specs = function_that_takes_20_seconds(id)
return render_template('report.html', report_specs=report_specs)
I'm wondering what I can do so that the server responds immediately with a spinner and then when function_that_takes_20_seconds is done, load the report.
You are right: a HTTP view is not a place for a long running tasks.
You need think your system architecture: what you can prepare outside the view and what computations must happen real time in a view.
The usual solutions include adding asynchronous properties and processing your data in a separate process. Often people use schedulers like Celery for this.
Prepare data in a scheduled process which runs for e.g. every 10 minutes
Cache results by storing them in a database
HTTP view always returns the last cached version
This, or then make a your view to do an AJAX call via JavaScript ("the spinner approach"). This requires obtaining some basic JavaScript skills. However this doesn't make the results appear to an end user any faster - it's just user experience smoke and mirrors.
I am working on building a gui/dashboard type of thing thru which I take input from user.. and when user press submits.. I gather response and fire a job in back end.
What I am hoping to achive is..
When the user press submits:
and that long job is being processed, I show something like: "Your job has been submitted succesfully) submitted
and when it is finished.. I refresh the page to take user to that page.
Here is how my route.py snippet looks like
#app.route('/',methods=['POST'])
def get_data():
data = request.form
for data_tuple in data:
requests_to_gb[data_tuple] = data[data_tuple]
flag = execute_request(requests_to_gb) <---Fires a job
if flag:
flash("Your request has been submitted")
else:
flash("request could not be completed!")
return render_template('request_submitted.html')
But the issue is.. that line where i execute_request() takes a long time to process.. and everything is halted until that is finished?
How do I handle this?
And how do i automatically refresh to new page as well?
Use celery which is a distributed task queue. The quickstart guide should get you going.
In a nutshell, it allows you to offload tasks to workers that run in the background so that you don't block your main user interface (which is what you are trying to prevent).
The good news is that it is very easy to integrate with flask (or django, or anything else really) since its written in Python.
I have application of following parts:
client->nginx->uwsgi(python)
and some python scripts can be running long time (2-6 minutes). After execution of script I should give to client content, but connection break with error "gateway timeout 504". What can I use for my case to avoid this error?
So is your goal to reduce the run time of the scripts, or to not have them time out? Browsers are going to give up on a 6 minute request no matter what you try.
Perhaps try doing the work on the server, and then polling for progress with AJAX requests?
Or, if possible, try optimizing the scripts. For example, if you have some horribly slow SQL stuff going on, try cleaning that up.
Otherwise, without more information, a more specific answer is hard to give.
I once set up a system where the "main page" contained an Iframe which showed the output of the long running program as text/plain. I think the the handler for the the Iframe content was a Python CGI script which emitted all headers and then the program output line by line under an Apache server.
I don't know whether this would work under your configuration.
This heavily depends on your server setup (i.e. how easy it is to push data back to the client), but is it possible while running your lengthy application to periodically send some “null” content (e.g plain newlines assuming your output is html) so that the browser thinks this is just a slow connection and not a stalled one?