Automation of program execution on linux - python

I have to perform several experiments in order to analyse certain results for a class assignment. For each result, I have to run a line of code by varying the parameters from terminal. I was wondering if there is a way to automate this and thus save me the trouble of running the program and changing the values each time. Attached is the line of code that I have to execute each time:
teaa/examples/scripts/run-python.sh teaa/examples/python/rf_mnist.py \
--numTrees 10 --maxDepth 7 --pcaComponents 40
I would like these lines of code to be executed n times automatically and that in each execution, the parameters 'numTrees, maxDepth and pcaComponents' change in a set range of values.
I have not really tried any solutions yet. I have never programmed in terminal and I have no idea where to start.

For more involved process operations, one should use the subprocess module in stdlib Python... but if you just want a one-off run of this command line, you can just use plain old os.system function - The followin code:
import os
for nt in range(1,21):
for md in range(1,11):
for pc in range(20,100,10):
os.system(f'teaa/examples/scripts/run-python.sh teaa/examples/python/rf_mnist.py --numTrees {nt} --maxDepth {md} --pcaComponents {pc}')
will run your program for numTrees = 1,2...20, maxDepth = 1,..10 and pcaComponents = 20,...90

Related

Selenium multiprocess python3

I have list of 42000 elements that I need to pass in a function one by one that fetches data from the site one at a time which is taking too long just for even 100 elements. so is there any way I can make it fast. the main problem is for every element driver opens and returns the data so by my code it has to open and close for 42000 times to get the data.
here's the list :
zip_codes = ["00520", "43224", ..42000 zip codes]
here's my function which is getting the data :
article = []
def zipread(zipcode):
driver = webdriver.Firefox(executable_path=r'C:\Program Files (x86)\geckodriver.exe')
driver.get("https://apps.pnc.com/locator/search")
search = driver.find_element_by_id("extTxtSearchText")
search.send_keys(zipcode)
search.click()
search.send_keys(Keys.RETURN)
time.sleep(7)
a = driver.find_element_by_class_name("branch-address").text
articles.append(a)
for code in zip_code:
zipread(code)
this is the work happens for each zip. so what should I do to make it quick.
So, you enter in the world of multiprocess, and multithreading ! It's not that hard you just need to get used to it I see two solutions for your problem idk which one is the best but I have my idea, the easiest one:
Try with bash !
Your python programm will take 1 zip code in argument ! And you will call your programm the number of time you want , python3 your_programm.py [zip_code] (for example ), but you will call it from a bash script.
You can program in bash, (scripting language) but you can code so you can something like this
#!/bin/bash
zip_codes=("00520" "10000" ... "42000")
for i in ${!zip_codes[#]};
do
log=${logPaths[$i]}
python3 you_program.py log &
done
the "&" means forground process, so each time it your programm is calles he won't wait the end of the previous end, and so on!
I'm not 100 percent sure about the code but you know what I mean !
The other options will be to use multiprocess, in python look at this you can basically do the same logic as the bash thing, you call a process for each time you want to run your program!
HUGE TIPS: run selenium in headless mode it's really important in terms of performance
GL

Nuke Python run the same script / node multiple times without scriptOpen and scriptClose?

I've got a python script for (Foundry) Nuke that listens to commands and for every command received executes a Write node. I've noticed that if I don't do nuke.scriptOpen(my_renderScript) before nuke.execute(writeNode,1,1) and then after do nuke.scriptClose(my_renderScript), then the write command seems to execute but nothing is written to file, despite me changing knob values before I call execute again.
The reason I want to not use scriptOpen and scriptClose every time I execute -the same node- is for performance. I'm new to nuke, so correct me if I'm wrong, but it's inefficient to unload and reload a script every time you want to run a node inside it, right?
[EDIT] Here's a simple test script. Waits for command line input and runs the function, then repeats. If I move the script open and script close outside the looping / recursive function, then it will only write to file once, the first time. On subsequent commands it will "run", and nuke will output "Total render time: " in the console (render time will be 10x faster since it's not writing / doing anything) and pretend it succeeded.
# Nuke12.2.exe -nukex -i -t my_nukePython.py render.nk
# Then it asks for user input. The input should be:
# "0,1,0,1", "1024x1024", "C:/0000.exr", "C:/Output/", "myOutput####.png", 1, 1
# then just keep spamming it and see.
import nuke
import os
import sys
import colorsys
renderScript = sys.argv[1]
nuke.scriptOpen(renderScript)
readNode = nuke.toNode("Read1")
gradeNode = nuke.toNode("CustomGroup1")
writeNode = nuke.toNode("Write1")
def runRenderCommand():
cmdArgs = input("enter render command: ")
print cmdArgs
if len(cmdArgs) != 7:
print "Computer says no. Try again."
runRenderCommand()
nuke.scriptOpen(renderScript)
colorArr = cmdArgs[0].split(",")
imageProcessingRGB = [float(colorArr[0]), float(colorArr[1]), float(colorArr[2]), float(colorArr[3])]
previewImageSize = cmdArgs[1]
inputFileLocation = cmdArgs[2]
outputFileLocation = cmdArgs[3]
outputFileName = cmdArgs[4]
startFrameToExecute = cmdArgs[5]
endFrameToExecute = cmdArgs[6]
readNode.knob("file").setValue(inputFileLocation)
writeNode.knob("file").setValue(outputFileLocation+outputFileName)
gradeNode.knob("white").setValue(imageProcessingRGB)
print gradeNode.knob("white").getValue()
nuke.execute(writeNode.name(),20,20,1)
runRenderCommand()
nuke.scriptClose(renderScript)
runRenderCommand()
The problem was between the chair and the screen. Turns out my example works. My actual code that I didn't include for the example, was a bit more complex and involved websockets.
But anyway, it turns out I don't know how python scoping sintax works ^__^
I was making exactly this error in understanding how the global keyword should be used:
referenced before assignment error in python
So now it indeed works without opening and closing the nuke file every time. Funny how local scope declaration in python in this case in my code made it look like there's no errors at all... This is why nothing's sacred in scripting languages :)
Is there a way to delete this question on grounds that the problem turns out was completely unrelated to the question?
Well that took an unexpected turn. So yes I had the global problem. BUT ALSO I WAS RIGHT in my original question! Turns out depending on the nodes you're running, nuke can think that nothing has changed (probably the internal hash doesn't change) and therefore it doesn't need to execute the write command. In my case I was giving it new parameters, but the parameters were the same (telling it to render the same frame again)
If I add this global counter to the write node frame count (even though the source image only has 1 frame), then it works.
nuke.execute(m_writeNode.name(),startFrameToExecute+m_count,endFrameToExecute+m_count, continueOnError = False)
m_count+=1
So I gotta figure out how to make it render the write node without changing frames, as later on I might want to use actual frames not just bogus count increments.

How to transfer a value from a function in one script to another script without re-running the function (python)?

I'm really new to programming in general and very inexperienced, and I'm learning python as I think it's more simple than other languages. Anyway, I'm trying to use Flask-Ask with ngrok to program an Alexa skill to check data online (which changes a couple of times per hour). The script takes four different numbers (from a different URL) and organizes it into a dictionary, and uses Selenium and phantomjs to access the data.
Obviously, this exceeds the 8-10 second maximum runtime for an intent before Alexa decides that it's taken too long and returns an error message (I know its timing out as ngrok and the python log would show if an actual error occurred, and it invariably occurs after 8-10 seconds even though after 8-10 seconds it should be in the middle of the script). I've read that I could just reprompt it, but I don't know how and that would only give me 8-10 more seconds, and the script usually takes about 25 seconds just to get the data from the internet (and then maybe a second to turn it into a dictionary).
I tried putting the getData function right after the intent that runs when the Alexa skill is first invoked, but it only runs when I initialize my local server and just holds the data for every new Alexa session. Because the data changes frequently, I want it to perform the function every time I start a new session for the skill with Alexa.
So, I decided just to outsource the function that actually gets the data to another script, and make that other script run constantly in a loop. Here's the code I used.
import time
def getData():
username = '' #username hidden for anonymity
password = '' #password hidden for anonymity
browser = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs')
browser.get("https://gradebook.com") #actual website name changed
browser.find_element_by_name("username").clear()
browser.find_element_by_name("username").send_keys(username)
browser.find_element_by_name("password").clear()
browser.find_element_by_name("password").send_keys(password)
browser.find_element_by_name("password").send_keys(Keys.RETURN)
global currentgrades
currentgrades = []
gradeids = ['2018202', '2018185', '2018223', '2018626', '2018473', '2018871', '2018886']
for x in range(0, len(gradeids)):
try:
gradeurl = "https://www.gradebook.com/grades/"
browser.get(gradeurl)
grade = browser.find_element_by_id("currentStudentGrade[]").get_attribute('innerHTML').encode('utf8')[0:3]
if grade[2] != "%":
grade = browser.find_element_by_id("currentStudentGrade[]").get_attribute('innerHTML').encode('utf8')[0:4]
if grade[1] == "%":
grade = browser.find_element_by_id("currentStudentGrade[]").get_attribute('innerHTML').encode('utf8')[0:1]
currentgrades.append(grade)
except Exception:
currentgrades.append('No assignments found')
continue
dictionary = {"class1": currentgrades[0], "class2": currentgrades[1], "class3": currentgrades[2], "class4": currentgrades[3], "class5": currentgrades[4], "class6": currentgrades[5], "class7": currentgrades[6]}
return dictionary
def run():
dictionary = getData()
time.sleep(60)
That script runs constantly and does what I want, but then in my other script, I don't know how to just call the dictionary variable. When I use
from getdata.py import dictionary
in the Flask-ask script it just runs the loop and constantly gets the data. I just want the Flask-ask script to take the variable defined in the "run" function and then use it without running any of the actual scripts defined in the getdata script, which have already run and gotten the correct data. If it matters, both scripts are running in Terminal on a MacBook.
Is there any way to do what I'm asking about, or are there any easier workarounds? Any and all help is appreciated!
It sounds like you want to import the function, so you can run it; rather than importing the dictionary.
try deleting the run function and then in your other script
from getdata import getData
Then each time you write getData() it will run your code and get a new up-to-date dictionary.
Is this what you were asking about?
This issue has been resolved.
As for the original question, I didn't figure out how to make it just import the dictionary instead of first running the function to generate the dictionary. Furthermore, I realized there had to be a more practical solution than constantly running a script like that, and even then not getting brand new data.
My solution was to make the script that gets the data start running at the same time as the launch function. Here was the final script for the first intent (the rest of it remained the same):
#ask.intent("start_skill")
def start_skill():
welcome_message = 'What is the password?'
thread = threading.Thread(target=getData, args=())
thread.daemon = True
thread.start()
return question(welcome_message)
def getData():
#script to get data here
#other intents and rest of script here
By design, the skill requested a numeric passcode to make sure I was the one using it before it was willing to read the data (which was probably pointless, but this skill is at least as much for my own educational reasons as for practical reasons, so, for the extra practice, I wanted this to have as many features as I could possibly justify). So, by the time you would actually be able to ask for the data, the script to get the data will have finished running (I have tested this and it seems to work without fail).

linux python threading limit

I'm having a problem starting more than 8 threads using threading in Python 2.7 (running on Raspbian Jessie).
If I run
import threading
my_func_1()
my_func_2()
my_func_3()
my_func_4()
my_func_5()
threading.Thread(target=my_func_6).start()
my_func_7()
my_func_8()
my_func_9()#this is never reached
my_func_10()#nor this
where functions 1-5,7-10 have the form:
def my_func_1():
global some_global_vars
#do stuff here, involving global and local variables...
threading.Timer(0.02,my_func_1).start()
and function 6 has the form
def my_func_6():
while 1:
#do stuff
then only the first 8 functions get executed. I feel this might be related to thread-per-process limits in linux but I couldn't quite get my head around it. Individually the functions run fine, and it's always the first 8 that execute (regardless of the order they're launched in).
How would I check if it's a thread-per-process issue? Or is it something else altogether?

Plotting time against size of input for Longest Common Subsequence Problem

I wish to plot the time against size of input, for Longest common subsequence problem in recursive as well as dynamic programming approaches. Until now I've developed programs for evaluating lcs functions in both ways, a simple random string generator(with help from here) and a program to plot the graph. Now I need to connect all these in the following way.
Now I have to connect all these. That is, the two programs for calculating lcs should run about 10 times with output from simple random string generator given as command line arguments to these programs.
The time taken for execution of these programs are calculated and this along with the length of strings used is stored in a file like
l=15, r=0.003, c=0.001
This is parsed by the python program to populate the following lists
sequence_lengths = []
recursive_times = []
dynamic_times = []
and then the graph is plotted. I've the following questions regarding above.
1) How do I pass the output of one C program to another C program as command line arguments?
2) Is there any function to evaluate the time taken to execute the function in microseconds? Presently the only option I have is time function in unix. Being a command-line utility makes it tougher to handle.
Any help would be much appreciated.
If the data being passed from program to program is small and can be converted to character format, you can pass it as one or more command-line arguments. If not you can write it to a file and pass its name as a argument.
For Python programs many people use the timeit module's Timer class to measure code execution speed. You can also roll-you-own using the clock() or time() functions in time module. The resolution depends on what platform you're running on.
1) There are many ways, the simplest is to use system with a string constructed from the output (or popen to open it as a pipe if you need to read back its output), or if you wish to leave the current program then you can use the various exec (placing the output in the arguments).
In an sh shell you can also do this with command2 $(command1 args_to_command_1)
2) For timing in C, see clock and getrusage.

Categories