I a linux script that I'm looking to automate through subprocess. Each iteration of subprocess should run the linux script in each subdirectory of a parent directory, and each of these subprocesses should run in a separate thread.
The way my directory is organized is as follows:
/parent/p1
/parent/p2....and so on till
/parent/p[n]
The first part of my code aims to run the process across all the subdirectories (p1, p2, p3...etc). It works fine for a fast process. However, many of my jobs need to run in the background, for which I usually use nohup and manually run them on a separate node. So every node in my terminal will run the same job on each directory (p1, p2, p3..etc). The latter part of my code (using threading) aims to achieve this, but what ends up happening is every node runs the same process (p1,p1,p1...etc) - basically by entire 'jobs' function is being passed through runSims when I want them separated out over the threads. Would someone know how I could further iterate the threading function to place different jobs on each node?
import os
import sys
import subprocess
import os.path
import threading
#takes the argument: python FOLDER_NAME #ofThreads
#Example: python /parent 8
directory = sys.argv[1] #in my case input is /parent
threads = int(sys.argv[2]) #input is 8
category_name = directory.split('/')[-1] #splits parent as a word
folder_list = next(os.walk(directory))[1] #makes a list of subdirectories [p1,p2,p3..]
def jobs(cmd):
for i in folder_list:
f = open("/vol01/bin/dir/nohup.out", "w")
cmd = subprocess.call(['nohup','python','np.py','{0}/{1}' .format(directory,i)],cwd = '/vol01/bin/dir', stdout=f)
return cmd
def runSimThreads(numThreads):
threads = []
for i in range(numThreads):
t = threading.Thread(target=jobs, args=(i,))
threads.append(t)
t.start()
#Wait for all threads to complete
main_thread = threading.currentThread()
for t in threads:
if t is main_thread:
continue
t.join()
runSimThreads(threads)
That can't be your code.
import os
import sys
import subprocess
import os.path
import threading
#takes the argument: python FOLDER_NAME #ofThreads
#Example: python /parent 8
threads = 8 #input is 8
...
...
for t in threads:
print("hello")
--output:--
TypeError: 'int' object is not iterable
You are using the same variable names everywhere, and that is confusing you (or me?).
You also do this:
def jobs(cmd):
for i in folder_list:
f = open("/vol01/bin/dir/nohup.out", "w")
cmd = "something"
You are overwriting your cmd parameter variable, which means that jobs() shouldn't have a parameter variable.
Response to comment1:
import threading as thr
import time
def greet():
print("hello world")
t = thr.Thread(target=greet)
t.start()
t.join()
--output:--
hello world
import threading as thr
import time
def greet(greeting):
print(greeting)
t = thr.Thread(target=greet, args=("Hello, Newman.",) )
t.start()
t.join()
--output:--
Hello, Newman.
Below is the equivalent of what you are doing:
import threading as thr
import time
def greet(greeting):
greeting = "Hello, Jerry."
print(greeting)
t = thr.Thread(target=greet, args=("Hello, Newman.",) )
t.start()
t.join()
--output:--
Hello, Jerry.
And anyone reading that code would ask, "Why are you passing an argument to the greet() function when you don't use it?"
I'm relatively new to python
Well, your code does this:
threads = 8
#Other irrelevant stuff here
for t in threads:
print("hello")
and that will produce the error:
TypeError: 'int' object is not iterable
Do you know why?
Related
I have a main python script which calls several subscripts --> main.py
My first sub script subscript1.py runs a few lines of code then at the end, opens an external program (putty) using subprocess. The program to be opened is a data monitor which I want to keep open the whole time.
I want to return to main.py so that subscript2.py can be run.
Problem: python code doesn't resume until external program from subprocess is closed. How can I keep subprocess open but carry on with my python code?
main.py:
import subprocess
subprocess.call(['python', 'subscript1.py'])
subprocess.call(['python', 'subscript2.py'])
subscript1.py:
import subprocess
prog_path = 'C:/Programs/PUTTY.exe'
load_config = 'config_to_load'
... lines of code to check for a condition
if outcome_value == 1:
subprocess.run(prog_path, 'load', load_config)
else:
print("error message")
If I were you I will use multiprocessing.pool
from multiprocessing import Pool
import time
import subprocess
def do_wait(_):
subprocess.call(['python','-c',"import time;time.sleep(1)"])
st = time.time()
with Pool(5) as p:
print(p.map(do_wait, [1, 2, 3]))
diff = time.time() - st
print(f"total : {diff} sec")
I have two python script, script1.py and script2.py. One is a counter which increment int x independently, and script2.py is to fetch the value of int x every 5 seconds, inputted into script2.py. I have tried doing this with multiprocessing verbatim from the following post,
Passing data between separately running Python scripts
and i applied While True function for script1. Here is my attempt, but i dont think i understand the general thought and i am getting various errors, since i am new to python and i miss some details.
script1.py:
from multiprocessing import Process, Pipe
x = 0
def function(child_conn):
global x
while True:
x += 1
print(x)
child_conn.send(x)
child_conn.close()
script2.py:
from multiprocessing import Proces,Queue,Pipe
from script1 import function
from time import sleep
if __name__=='__main__':
parent_conn,child_conn = Pipe()
p = Process(target=function, args=(child_conn,))
p.start()
print(parent_conn.recv())
time.sleep(5)
thanks in Advance!
You have a loop in the child process but no loop in the parent process. With no loop, the child can only send a single message then throws an error.
Try this code. Run script2.py to start the process.
script1.py
from multiprocessing import Process, Pipe
from time import sleep
x = 0
def function(child_conn):
global x
while True:
x += 1
print(x)
child_conn.send(x)
#child_conn.close()
sleep(1)
script2.py
from multiprocessing import Process,Queue,Pipe
from script1 import function
from time import sleep
if __name__=='__main__':
parent_conn,child_conn = Pipe()
p = Process(target=function, args=(child_conn,))
p.start()
while True:
print(parent_conn.recv())
sleep(1)
I have main_script.py which import scripts which get data from webpages. I want do this by use multithreading. I came up with this solution, but it does not work:
main_script:
import script1
temp_path = ''
thread1 = threading.Thread(target=script1.Main,
name='Script1',
args=(temp_path, ))
thread1.start()
thread1.join()
script1:
class Main:
def __init__()
def some_func()
def some_func2()
def __main__():
some_func()
some_func2()
return callback
Now only 1 way I know to get value of callback from script1 to main_script is:
main_script:
import script1
temp_path = ''
# make instance of class with temp_path
inst_script1 = script1.Main(temp_path)
print("instance1:")
print(inst_script1.callback)
It's works but then I run instances of scripts one-by-one, no concurrently.
Anybody has any idea how handle that? :)
First off if you are using threading in Python make sure you read: https://docs.python.org/2/glossary.html#term-global-interpreter-lock. Unless you are using C modules or a lot of I/O you won't see the scripts run concurrently. Generally speaking, multiprocessing.pool is a better approach.
If you are certain we want threads rather then processes you can use a mutable variable to store the result. For example, a dictionary which keeps track of the result of each thread.
result = {}
def test(val, name, target):
target[name] = val * 4
temp_path = 'ASD'
thread1 = threading.Thread(target=test,
name='Script1',
args=(temp_path, 'A', result))
thread1.start()
thread1.join()
print (result)
Thanks for response. Yes, I readed about GIL, but it's doesn't make me any problem yet. Generally I solve my problem, because I find solution on other website. Code like this now:
Main_script:
import queue
import script1
import script2
queue_callbacks = queue.Queue()
threads_list = list()
temp_path1 = ''
thread1 = threading.Thread(target= lambda q, arg1: q.put(Script1.Main(arg1)),
name='Script1',
args=(queue_callbacks, temp_path1, ))
thread1.start()
temp_path2 = ''
thread2 = threading.Thread(target= lambda q, arg1: q.put(Script2.Main(arg1)),
name='Script2',
args=(queue_callbacks, temp_path2, ))
thread2.start()
for t in threads_list:
t.join()
while not kolejka_callbacks.empty():
result = queue_callbacks.get()
callbacks.append({"service": result.service, "callback": result.callback, "error": result.error})
And this works fine. Now I have other problem, because I want this to work in big scale, where I have a hundreds of scripts and handle this by e.q. 5 threads.
In general, is there any limit to the number of threads running at any one time?
I have two scripts, new.py and test.py.
Test.py
import time
while True:
x = "hello"
time.sleep(1)
x = "world"
time.sleep(1)
new.py
import time
while True:
import test
x = test.x
print(x)
time.sleep(1)
Now from my understanding this should print "hello" and a second later "world" all the time when executing new.py.
It does not print anything, how can i fix that?
Thanks
I think the code below captures what you are asking. Here I simulate two scripts running independently (by using threads), then show how you can use shelve to communicate between them. Note, there are likely much better ways to get to what you are after -- but if you absolutely must run the scripts independently, this will work for you.
Incidentally, any persistent source would do (such as a database).
import shelve
import time
import threading
def script1():
while True:
with shelve.open('my_store') as holder3:
if holder3['flag'] is not None: break
print('waiting')
time.sleep(1)
print("Done")
def script2():
print("writing")
with shelve.open('my_store') as holder2:
holder2['flag'] = 1
if __name__ == "__main__":
with shelve.open('my_store') as holder1:
holder1['flag'] = None
t = threading.Thread(target=script1)
t.start()
time.sleep(5)
script2()
t.join()
Yields:
waiting
waiting
waiting
waiting
waiting
writing
Done
Test.py
import time
def hello():
callList = ['hello', 'world']
for item in callList:
print item
time.sleep(1)
hello()
new.py
from parent import hello
while True:
hello()
I would expect next code to be executed simultaneously and all filenames from os.walk iterations , that got 0 at random , will get in result dictionary. And all threads that have some timeout would get into deamon mode and will be killed as soon as script reaches end. However, script respects all timeouts for each thread.
Why is this happening? Should it put all threads in backgroung and kill them if they will not finish and return result before the end of script execution? thank you.
import threading
import os
import time
import random
def check_file(file_name,timeout):
time.sleep(timeout)
print file_name
result.append(file_name)
result = []
for home,dirs,files in os.walk("."):
for ifile in files :
filename = '/'.join([home,ifile])
t = threading.Thread(target=check_file(filename,random.randint(0,5)))
t.setDaemon(True)
t.start()
print result
Solution: I found my mistake:
t = threading.Thread(target=check_file(filename,random.randint(0,5)))
has to be
t = threading.Thread(target=check_file, args=(filename,random.randint(0,5)))
In this case, threading will spawn a thread with function as object ang give it arguments. In my initial example, function with args has to be resolved BEFORE thread spawns. And this is fair.
However, example above works for me at 2.7.3 , but at 2.7.2 i cannot make it working.
I `m getting got exception that
function check_file accepts exactly 1 argument (34 is given).
Soulution :
in 2.7.2 i had to put ending coma in args tuple , considering that i have 1 variable only . God knows why this not affects 2.7.3 version . It was
t = threading.Thread(target=check_file, args=(filename))
and started to work with
t = threading.Thread(target=check_file, args=(filename,))
I understand what you were trying to do, but you're not using the right format for threading. I fixed your example...look up the Queue class on how to do this properly.
Secondly, never ever do string manipulation on file paths. Use the os.path module; there's a lot more than adding separators between strings that you and I don't think about most of the time.
Good luck!
import threading
import os
import time
import random
import Queue
def check_file():
while True:
item = q.get()
time.sleep(item[1])
print item
q.task_done()
q = Queue.Queue()
result = []
for home,dirs,files in os.walk("."):
for ifile in files:
filename = os.path.join(home, ifile)
q.put((filename, random.randint(0,5)))
number_of_threads = 25
for i in range(number_of_threads):
t = threading.Thread(target=check_file)
t.daemon = True
t.start()
q.join()
print result