I have a bunch of .py scripts as part of a project. Some of them i want to start and have running in the background whilst the others run through what they need to do.
For example, I have a script which takes a Screenshot every 10 seconds until the script is closed and i wish to have this running in the background whilst the other scripts get called and run through till finish.
Another example is a script which calculates the hash of every file in a designated folder. This has the potential to run for a fair amount of time so it would be good if the rest of the scripts could be kicked off at the same time so they do not have to wait for the Hash script to finish what it is doing before they are invoked.
Is Multiprocessor the right method for this kind of processing, or is there another way to achieve these results which would be better such as this answer: Run multiple python scripts concurrently
You could also use something like Celery to run the tasks async and you'll be able to call tasks from within your python code instead of through the shell.
It depends. With multiprocessing you can create a process manager, so it can spawn the processes the way you want, but there are more flexible ways to do it without coding. Multiprocessing is usually hard.
Check out circus, it's a process manager written in Python that you can use as a library, standalone or via remote API. You can define hooks to model dependencies between processes, see docs.
A simple configuration could be:
[watcher:one-shot-script]
cmd = python script.py
numprocesses = 1
warmup_delay = 30
[watcher:snapshots]
cmd = python snapshots.py
numprocesses = 1
warmup_delay = 30
[watcher:hash]
cmd = python hashing.py
numprocesses = 1
Related
I was writing the below code but it is running endless in airflow, but in my system it take 5 min to run
gc=pygsheets.authorize(service_account_file='file.json')
sh3 = gc.open("city")
wks3 = sh3.worksheet_by_title("test")
df = wks3.get_as_df()
df2 = demo_r
wks3.clear()
wks3.set_dataframe(df2,(1,1))
Answering just the question in the title because we can't do anything about your code without more details (stack trace/full code sample/infra setup/etc).
Airflow is a Python framework and will run any code you give it. So there is no difference between a Python script run via an Airflow task or just on your laptop -- the same lines of code will be executed. However, do note that Airflow runs Python code in a separate process, and possibly on different machines, depending on your chosen executor. Airflow registers metadata in a database and manages logfiles from your tasks, so there's more happening around your task when you execute it in Airflow.
I am using a commercial application called Abaqus/CAE1 with a built-in Python 2.6 interpreter and API. I've developed a long-running script that I'm attempting to split into simultaneous, independent tasks using Python's multiprocessing module. However, once spawned the processes just hang.
The script itself uses various objects/methods available only through Abaqus's proprietary cae module, which can only be loaded by starting up the Python bundled with Abaqus/CAE first, which then executes my script with Python's execfile.
To try to get multiprocessing working, I've attempted to run a script that avoids accessing any Abaqus objects, and instead just performs a calculation and prints the result to file2. This way, I can run the same script from the regular system Python installation as well as from the Python bundled with Abaqus.
The example code below works as expected when run from the command line using either of the following:
C:\some\path>python multi.py # <-- Using system Python
C:\some\path>abaqus python multi.py # <-- Using Python bundled with Abaqus
This spawns the new processes, and each runs the function and writes the result to file as expected. However, when called from the Abaqus/CAE Python environment using:
abaqus cae noGUI=multi.py
Abaqus will then start up, automatically import its own proprietary modules, and then executes my file using:
execfile("multi.py", __main__.__dict__)
where the global namespace arg __main__.__dict__ is setup by Abaqus. Abaqus then checks out licenses for each process successfully, spawns the new processes, and ... and that's it. The processes are created, but they all hang and do nothing. There are no error messages.
What might be causing the hang-up, and how can I fix it? Is there an environment variable that must be set? Are there other commercial systems that use a similar procedure that I can learn from/emulate?
Note that any solution must be available in the Python 2.6 standard library.
System details: Windows 10 64-bit, Python 2.6, Abaqus/CAE 6.12 or 6.14
Example Test Script:
# multi.py
import multiprocessing
import time
def fib(n):
a,b = 0,1
for i in range(n):
a, b = a+b, a
return a
def workerfunc(num):
fname = ''.join(('worker_', str(num), '.txt'))
with open(fname, 'w') as f:
f.write('Starting Worker {0}\n'.format(num))
count = 0
while count < 1000: # <-- Repeat a bunch of times.
count += 1
a=fib(20)
line = ''.join((str(a), '\n'))
f.write(line)
f.write('End Worker {0}\n'.format(num))
if __name__ == '__main__':
jobs = []
for i in range(2): # <-- Setting the number of processes manually
p = multiprocessing.Process(target=workerfunc, args=(i,))
jobs.append(p)
print 'starting', p
p.start()
print 'done starting', p
for j in jobs:
print 'joining', j
j.join()
print 'done joining', j
1A widely known finite element analysis package
2The script is a blend of a fairly standard Python function for fib(), and examples from PyMOTW
I have to write an answer as I cannot comment yet.
What I can imagine as a reason is that python multiprocessing spawns a whole new process with it's own non-shared memory. So if you create an object in your script, the start a new process, that new process contains a copy of the memory and you have two objects that can go into different directions. When something of abaqus is present in the original python process (which I suspect) that gets copied too and this copy could create such a behaviour.
As a solution I think you could extend python with C (which is capable to use multiple cores in a single process) and use threads there.
Just wanted to say that I have run into this exact issue. My solution at the current time is to compartmentalize my scripting. This may work for you if you're trying to run parameter sweeps over a given model, or run geometric variations on the same model, etc.
I first generate scripts to accomplish each portion of my modelling process:
Generate input file using CAE/Python.
Extract data that I want and put it in a text file.
With these created, I use text replacement to quickly generate N python scripts of each type, one for each discrete parameter set I'm interested in.
I then wrote a parallel processing tool in Python to call multiple Abaqus instances as subprocesses. This does the following:
Call CAE through subprocess.call for each model generation script. The script allows you to choose how many instances to run at once to keep you from taking every license on the server.
Execute the Abaqus solver for the generated models using the same, with parameters for cores per job and total number of cores used.
Extract data using the same process as 1.
There is some overhead in repeatedly checking out licenses for CAE when generating the models, but in my testing it is far outweighed by the benefit of being able to generate 10+ input files simultaneously.
I can put some of the scripts up on Github if you think the process outlined above would be helpful for your application.
Cheers,
Nathan
I want to execute a testrun via bash, if the test needs too much time. So far, I found some good solutions here. But since the command kill does not work properly (when I use it correctly it says it is not used correctly), I decided to solve this problem using python. This is the Execution call I want to monitor:
EXE="C:/program.exe"
FILE="file.tpt"
HOME_DIR="C:/Home"
"$EXE" -vm-Xmx4096M --run build "$HOME_DIR/test/$FILE" "Auslieferung (ML) Execute"
(The opened *.exe starts a testrun which includes some simulink simulation runs - sometimes there are simulink errors - in this case, the execution time of the tests need too long and I want to restart the entire process).
First, I came up with the idea, calling a shell script containing these lines within a subprocess from python:
import subprocess
import time
process = subprocess.Popen('subprocess.sh', shell = True)
time.sleep(10)
process.terminate()
But when I use this, *.terminate() or *.kill() does not close the program I started with the subprocess call.
That´s why I am now trying to implement the entire call in python language. I got the following so far:
import subprocess
file = "somePath/file.tpt"
p = subprocess.Popen(["C:/program.exe", file])
Now I need to know, how to implement the second call "Auslieferung (ML) Execute" of the bash function. This call starts an intern testrun named "Auslieferung (ML) Execute". Any ideas? Or is it better to choose one of the other ways? Or can I get the "kill" option for bash somewhere, somehow?
I'm running a python script manually that fetches data in JSON format.How do I automate this script to run automatically on an hourly basis?
I'm working on Windows7.Can I use tools like Task scheduler?If I can use it,what do I need to put in the batch file?
Can I use tools like Task scheduler?
Yes. Any tool that can run arbitrary programs can run your Python script. Pick the one you like best.
If I can use it,what do I need to put in the batch file?
What batch file? Task Scheduler takes anything that can be run, with arguments—a C program, a .NET program, even a document with a default app associated with it. So, there's no reason you need a batch file. Use C:\Python33\python.exe (or whatever the appropriate path is) as your executable, and your script's path (and its arguments, if any) as the arguments. Just as you do when running the script from the command line.
See Using the Task Scheduler in MSDN for some simple examples, and Task Scheduler Schema Elements or Task Scheduler Scripting Objects for reference (depending on whether you want to create the schedule in XML, or via the scripting interface).
You want to create an ExecAction with Path set to "C:\Python33\python.exe" and Arguments set to "C:\MyStuff\myscript.py", and a RepetitionPattern with Interval set to "PT1H". You should be able to figure out the rest from there.
As sr2222 points out in the comments, often you end up scheduling tasks frequently, and needing to programmatically control their scheduling. If you need this, you can control Task Scheduler's scripting interface from Python, or build something on top of Task Scheduler, or use a different tool that's a bit easier to get at from Python and has more helpful examples online, etc.—but when you get to that point, take a step back and look at whether you're over-using OS task scheduling. (If you start adding delays or tweaking times to make sure the daily foo1.py job never runs until 5 minutes after the most recent hourly foo0.py has finished its job, you're over-using OS task scheduling—but it's not always that obvious.)
May I suggest WinAutomation or AutoMate. These two do the exact same thing, except the UI is a little different. I prefer WinAutomation, because the scripts are a little easier to build.
Yes, you can use the Task Scheduler to run the script on an hourly bases.
To execute a python script via a Batch File, use the following code:
start path_to_python_exe path_to_python_file
Example:
start C:\Users\harshgoyal\AppData\Local\Continuum\Anaconda3\python.exe %UserProfile%\Documents\test_script.py
If python is set as Window’s Environment Window then you can reduce the syntax to:
start python %UserProfile%\Documents\test_script.py
What I generally do is run the batch file once via Task Scheduler and within the python script I call a thread/timer every hour.
class threading.Timer(interval, function, args=None, kwargs=None)
I am trying to constantly monitor a process which is basically a Python program. If the program stops, then I have to start the program again. I am using another Python program to do so.
For example, say I have to constantly run a process called run_constantly.py. I initially run this program manually, which writes its process ID to the file "PID" (in the location out/PROCESSID/PID).
Now I run another program which has the following code to monitor the program run_constantly.py from a Linux environment:
def Monitor_Periodic_Process():
TIMER_RUNIN = 1800
foo = imp.load_source("Run_Module","run_constantly.py")
PROGRAM_TO_MONITOR = ['run_constantly.py','out/PROCESSID/PID']
while(1):
# call the function checkPID to see if the program is running or not
res = checkPID(PROGRAM_TO_MONITOR)
# if res is 0 then program is not running so schedule it
if (res == 0):
date_time = datetime.now()
scheduler.add_cron_job(foo.Run_Module, year=date_time.year, day=date_time.day, month=date_time.month, hour=date_time.hour, minute=date_time.minute+2)
scheduler.start()
scheduler.get_jobs()
time.sleep(TIMER_NOT_RUNIN)
continue
else:
#the process is running sleep and then monitor again
time.sleep(TIMER_RUNIN)
continue
I have not included the checkPID() function here. checkPID() basically checks if the process ID still exists (i.e. if the program is still running) and if it does not exist, it returns 0. In the above program, I check if res == 0, and if so, then I use Python's scheduler to schedule the program. However, the major problem that I am currently facing is that the process ID of this program and the run_constantly.py program turns to be same once I schedule the run_constantly.py using the scheduler.add_cron_job() function. So if the program run_constantly.py crashes, the following program still thinks that the run_constantly.py is running (since both process IDs are same), and therefore continues to go into the else loop to sleep and monitor again.
Can someone tell me how to solve this issue? Is there a simple way to constantly monitor a program and reschedule it when it has crashed?
There are many programs that can do this.
On Ubuntu there is upstart (installed by default)
Lots of people like http://supervisord.org/
monit as mentioned by #nathan
If you are looking for a python alternative there is a library that has just been released called circus which looks interesting.
And pretty much every linux distro probably has one of these built in.
The choice is really just down to which one you like better, but you would be far better off using one of these than writing it yourself.
Hope that helps
If you are willing to control the monitored program directly from python instead of using cron, have a look at the subprocess module :
The subprocess module allows you to spawn new processes,
connect to their input/output/error pipes, and obtain their return codes.
Check examples like track process status with python on SO for examples and references.
You could just use monit
http://mmonit.com/monit/
It monitors processes and restarts them (and other things.)
I thought I'd add a more versatile solution, which is one that I personally use all the time as well.
It's name is Immortal (source is at https://github.com/immortal/immortal)
To have it monitor and instantly restart a program if it stops, simply run the following command:
immortal <command>
So in your case I would run run_constantly.py like so:
immortal python run_constantly.py
The command ps aux | grep run_constantly.py should return 2 process IDs, one for the Immortal command, and one for the separate command Immortal started (just the regular command. As long as the Immortal process is running, run_constantly.py will stay running.