Python/Django Prevent a script from being run twice - python

I got a big script that retrieves a lot of data via an API (Magento Invoices). This script is launched by a cron every 20 minutes. But sometimes, we need to refresh manually for getting the last invoiced orders. I have a dedicated page for this.
I would like to prevent from manual launching of the script by testing if it's already running because both API and script take a lot of ressources and time.
I tried to add a "process" model with is_active = True/False that would be tested and avoid re-launching if script is already active. At the start of the script, I switch the process status to TRUE and set it to FALSE when the script has finished.
But it seems that the 2nd instance of the script waits for the first to be finished before starting. At the end, both scripts are run because process.is_active always = False
I also tried with request.session variable, but same issue.
Spent lotsa time on this but didn't find a way to achieve my goal.
Has anyone already faced such a case ?

Related

TaskManager: Process is Terminated At Exactly 2 Hours

I'm running a python script remotely from a task machine and it creates a process that is supposed to be running for 3 hours. However, it seems to be terminating prematurely at exactly 2 hours. I don't believe it is a problem with the code because after the while loop ends, I am logging to a log file. The log file doesn't show that it exits out of that while loop successfully. Is there a specific setting on the machine that I need to look into that's interrupting my python process?
Is this perhaps a Scheduled Task? If so, have you checked the task's properties?
On my Windows 7 machine under the "Settings" tab is a checkbox for "Stop the task if it runs longer than:" with a box where you can specify the duration.
One of the suggested durations on my machine is "2 hours."

Python Script won't run as a scheduled task on Windows 2008 R2

I have multiple recurring tasks scheduled to run several times per day to keep some different data stores in sync with one another. The settings for the 'Actions' tab are as follows:
Action: Start a Program
Program/script: C:\<path to script>.py
Add arguments:
Start in: C:\<directory of script>
I can run the python files just fine if I use the command line and navigate to the file location and use python or even just using python without navigating.
For some reason, the scripts just won't run with a scheduled task. I've checked all over and tried various things like making sure the user profile is set correctly and has all of the necessary privileges, which holds true. These scripts have been working for several weeks now with no problems, so something has changed that we aren't able to identify at this time.
Any suggestions?
Have you tried using:
Action: Start a Program
Program/script: C:\<path to python.exe>\python.exe
Add arguments: C:\\<path to script>\\script.py

Persist Completed Pipeline in Luigi Visualiser

I'm starting to port a nightly data pipeline from a visual ETL tool to Luigi, and I really enjoy that there is a visualiser to see the status of jobs. However, I've noticed that a few minutes after the last job (named MasterEnd) completes, all of the nodes disappear from the graph except for MasterEnd. This is a little inconvenient, as I'd like to see that everything is complete for the day/past days.
Further, if in the visualiser I go directly to the last job's URL, it can't find any history that it ran: Couldn't find task MasterEnd(date=2015-09-17, base_url=http://aws.east.com/, log_dir=/home/ubuntu/logs/). I have verified that it ran successfully this morning.
One thing to note is that I have a cron that runs this pipeline every 15 minutes to check for a file on S3. If it exists, it runs, otherwise it stops. I'm not sure if that is causing the removal of tasks from the visualiser or not. I've noticed it generates a new PID every run, but I couldn't find a way to persist one PID/day in the docs.
So, my questions: Is it possible to persist the completed graph for the current day in the visualiser? And is there a way to see what has happened in the past?
Appreciate all the help
I'm not 100% positive if this is correct, but this is what I would try first. When you call luigi.run, pass it --scheduler-remove-delay. I'm guessing this is how long the scheduler waits before forgetting a task after all of its dependents have completed. If you look through luigi's source, the default is 600 seconds. For example:
luigi.run(["--workers", "8", "--scheduler-remove-delay","86400")], main_task_cls=task_name)
If you configure the remove_delay setting in your luigi.cfg then it will keep the tasks around for longer.
[scheduler]
record_task_history = True
state_path = /x/s/hadoop/luigi/var/luigi-state.pickle
remove_delay = 86400
Note, there is a typo in the documentation ("remove-delay" instead of remove_delay") which is being fixed under https://github.com/spotify/luigi/issues/2133

running 2 python scripts without them effecting each other

I have 2 python scripts I'm trying to run side by side. However, each of them have to open and close and reopen independently from each other. Also, one of the scripts is running inside a shell script.
Flaskserver.py & ./pyinit.sh
Flaskserver.py is just a flask server that needs to be restarted everynow and again to load a new page. (cant define all pages as the html is interchangeable). the pyinit is runs as xinit ./pyinit.sh (its selenium-webdriver pythoncode)
So when the Flaskserver changes and restarts the ./pyinit needs to wait about 20 seconds then restart as well.
Either one of these can create errors so I need to be able to check if Flaskserver has an error before restarting ./pyinit if ./pyinit errors i need to set the Flaskserver to a default value and then relaunch both of them.
I know a little about subprocess but I'm unsure on how it can deal with errors and stop-start code.
Rather than using sub-process I would recommend you to create a different thread for your processes using multithread.
Multithreading will not solve the problem if global variables are colliding, but by running them in different scripts, while you might solve this, you might collide in something else like a log file.
Now, if you keep both processes running from a single process that takes care of keeping them separated and assigning different global variables where necessary, you should be able to keep a better control. Using things like join and lock from the multithreading library, will also ensure that they don't collide and it should be easy to put a process to sleep while the other is running (as per waiting 20 secs).
You can keep a thread list as a global variable, as well as your lock. I have done this successfully with CherryPy's server for example. Any more details about multithreading look into the question I linked above, it's very well explained.

Python re establishment after Server shuts down

I have a python script running on my server which accessed a database, executes a fetch query and runs a learning algorithm to classify and updates certain values and means depending on the query.
I want to know if for some reason my server shuts down in between then my python script would shut down and my query lost.
How do i get to know where to continue from once I re-run the script and i want to carry on the updated means from the previous queries that have happened.
First of all: the question is not really related to Python at all. It's a general problem.
And the answer is simple: keep track of what your script does (in a file or directly in db). If it crashes continue from the last step.

Categories