I am a fairly beginner programmer with python and in general with not that much experience, and currently I'm trying to parallelize a process that is heavily CPU bound in my code. I'm using anaconda to create environments and Visual Code to debug.
A summary of the code is as following :
from tkinter import filedialog
import myfuncs as mf, concurrent.futures
file_path = filedialog.askopenfilename('Ask for a file containing data')
# import data from file_path
a = input('Ask the user for input')
Next calculations are made from these and I reach a stage where I need to iterate of a list of lists. These lists may contain up to two values and calls are made to a separate file.
For example the inputs are :
sub_data1 = [test1]
sub_data2 = [test1, test2]
dataset = [sub_data1, sub_data2]
This is the stage I use concurrent.futures.ProcessPoolExecutor()-instance and its .map() method :
with concurrent.futures.ProcessPoolExecutor() as executor:
sm_res = executor.map(mf.process_distr, dataset)
While inside a myfuncs.py, the mf.process_distr() function works like this :
def process_distr(tests):
sm_reg = []
for i in range(len(tests)):
if i==0:
# do stuff
sm_reg.append(result1)
else:
# do stuff
sm_reg.append(result2)
return sm_reg
The problem is that when I try to execute this code on the main.py file, it seems that the main.py starts running multiple times, and asks for user inputs and file dialog pops up multiple times (same amount as cores count).
How can I resolve this matter?
Edit: After reading more into it, encapsulating the whole main.py code with:
if __name__ == '__main__':
did the trick. Thank you to anyone who gave time to help with my rookie problem.
Related
My python script passes changing inputs to a program called "Dymola", which in turn performs a simulation to generate outputs. Those outputs are stored as numpy arrays "out1.npy".
for i in range(0,100):
#code to initiate simulation
print(startValues, 'ParameterSet:', ParameterSet,'time:', stoptime)
np.save('out1.npy', output_data)
Unfortunately, Dymola crashes very often, which makes it necessary to rerun the loop from the time displayed in the console when it has crashed (e.g.: 50) and increase the number of the output file by 1. Otherwise the data from the first set would be overwritten.
for i in range(50,100):
#code to initiate simulation
print(startValues, 'ParameterSet:', ParameterSet,'time:', stoptime)
np.save('out2.npy', output_data)
Is there any way to read out the 'stoptime' value (e.g. 50) out of the console after Dymola has crashed?
I'm assuming dymola is a third-party entity that you cannot change.
One possibility is to use the subprocess module to start dymola and read its output from your program, either line-by-line as it runs, or all after the created process exits. You also have access to dymola's exit status.
If it's a Windows-y thing that doesn't do stream output but manipulates a window GUI-style, and if it doesn't generate a useful exit status code, your best bet might be to look at what files it has created while or after it has gone. sorted( glob.glob("somepath/*.out")) may be useful?
I assume you're using the dymola interface to simulate your model. If so, why don't you use the return value of the dymola.simulate() function and check for errors.
E.g.:
crash_counter = 1
from dymola.dymola_interface import DymolaInterface
dymola = DymolaInterface()
for i in range(0,100):
res = dymola.simulate("myModel")
if not res:
crash_counter += 1
print(startValues, 'ParameterSet:', ParameterSet,'time:', stoptime)
np.save('out%d.npy'%crash_counter, output_data)
As it is sometimes difficult to install the DymolaInterface on your machine, here is a useful link.
Taken from there:
The Dymola Python Interface comes in the form of a few modules at \Dymola 2018\Modelica\Library\python_interface. The modules are bundled within the dymola.egg file.
To install:
The recommended way to use the package is to append the \Dymola 2018\Modelica\Library\python_interface\dymola.egg file to your PYTHONPATH environment variable. You can do so from the Windows command line via set PYTHONPATH=%PYTHONPATH%;D:\Program Files (x86)\Dymola 2018\Modelica\Library\python_interface\dymola.egg.
If this does not work, append the following code before instantiating the interface:
import os
import sys
sys.path.insert(0, os.path.join('PATHTODYMOLA',
'Modelica',
'Library',
'python_interface',
'dymola.egg'))
I´m quite new in programming and even more when it comes to Object Oriented programming. I’m trying to connect through Python to an external software (SAP2000, an structural engineering software). This program comes with an API to connect and there is an example in the help (http://docs.csiamerica.com/help-files/common-api(from-sap-and-csibridge)/Example_Code/Example_7_(Python).htm).
This works pretty well but I would like to divide the code so that I can create one function for opening the program, several functions to work with it and another one to close. This would provide me flexibility to make different calculations as desired and close it afterwards.
Here is the code I have so far where enableloadcases() is a function that operates once the instance is created.
import os
import sys
import comtypes.client
import pandas as pd
def openSAP2000(path,filename):
ProgramPath = "C:\Program Files (x86)\Computers and Structures\SAP2000 20\SAP2000.exe"
APIPath = path
ModelPath = APIPath + os.sep + filename
mySapObject = comtypes.client.GetActiveObject("CSI.SAP2000.API.SapObject")
#start SAP2000 application
mySapObject.ApplicationStart()
#create SapModel object
SapModel = mySapObject.SapModel
#initialize model
SapModel.InitializeNewModel()
ret = SapModel.File.OpenFile(ModelPath)
#run model (this will create the analysis model)
ret = SapModel.Analyze.RunAnalysis()
def closeSAP2000():
#ret = mySapObject.ApplicationExit(False)
SapModel = None
mySapObject = None
def enableloadcases(case_id):
'''
The function activates LoadCases for output
'''
ret = SapModel.Results.Setup.SetCaseSelectedForOutput(case_id)
From another module, I call the function openSAP2000() and it works fine but when I call the function enableloadcases() an error says AttributeError: type object ‘SapModel’ has no attribute ‘Results’.
I believe this must be done by creating a class and after calling the functions inside but I honestly don´t know how to do it.
Could you please help me?
Thank you very much.
Thank you very much for the help. I managed to solve the problem. It was as simple and stupid as marking SapModel variable as global.
Now it works fine.
Thank you anyway.
Can I get some advice on writing a unit test for the following piece of code?
%python
import sys
import json
sys.argv = []
sys.argv.append('{"product1":{"brand":"x","type":"y"}}')
sys.argv.append('{"product1":{"brand":"z","type":"a"}}')
products = sys.argv
yy= {}
my_products = []
for n, i in enumerate(products[:]):
xx = json.loads(i)
for j in xx.keys():
yy["brand"] = xx[j]['brand']
yy["type"] = xx[j]["type"]
my_products.append(yy)
print my_products
As it stands there aren't any units to test!!!
A test might consist of:
packaging your program in a script
invoking your program from python unit test as a subprocess
piping the output of your command process to a buffer
asserting the buffer is what you except it to be
While the above would technically allow you to have an automated test on your code it comes with a lot of burden:
- multi processing
- weak assertions by not having types
- coarse interaction (have to invoke a script, can't just assert on the brand/type logic
One way to address those issues could be to package your code into smaller units, ie create a method to encapsulate:
for j in xx.keys():
yy["brand"] = xx[j]['brand']
yy["type"] = xx[j]["type"]
my_products.append(yy)
Import it, exercise it and assert on its output. Then there might be something to map the loading and application of xx.keys() loop to an array (which you could also encapsulate as a function).
And then there could be the highest level taking in args and composing the product mapper loader transformer. And since your code will be thoroughly unit tested at this point, you may get away with not having a test for your top level script?
I've looked around the forum and can't seem to find anything that helps. I'm trying to parallelize some processes but cannot seem to get it to work.
from multiprocessing import *
from numpy import *
x=array([1,2,3,4,5])
def SATSolver(args):
#many_calculations
return result
def main(arg):
new_args=append(x,arg)
return SATSolver(new_args)
y=array([8,9,10,11])
if __name__ == '__main__':
pool=Pool()
results=pool.map(main,y)
print(results)
The SATSolver function is where the bulk of the work happens. Basically, I have an array x and a second array y. I want to add append each value in y to x individually, and then run this new set through my SATSolver function. I would like to use the multiprocessing module so this can run in parallel.
Whenever I try and run this, I don't get an error, but a new interactive window pops up and says "Could not load the file from filepath\-c. Do you want to create a new file?"
Everything works perfectly when it's run without multiprocessing.
Any ideas on how to make this work?
Thanks!
sorry for this question because there are several examples in Stackoverflow. I am writing in order to clarify some of my doubts because I am quite new in Python language.
i wrote a function:
def clipmyfile(inFile,poly,outFile):
... # doing something with inFile and poly and return outFile
Normally I do this:
clipmyfile(inFile="File1.txt",poly="poly1.shp",outFile="res1.txt")
clipmyfile(inFile="File2.txt",poly="poly2.shp",outFile="res2.txt")
clipmyfile(inFile="File3.txt",poly="poly3.shp",outFile="res3.txt")
......
clipmyfile(inFile="File21.txt",poly="poly21.shp",outFile="res21.txt")
I had read in this example Run several python programs at the same time and i can use (but probably i wrong)
from multiprocessing import Pool
p = Pool(21) # like in your example, running 21 separate processes
to run the function in the same time and speed my analysis
I am really honest to say that I didn't understand the next step.
Thanks in advance for help and suggestion
Gianni
The map that is used in the example you provided only works for functions that recieve one argument. You can see a solution to this here: Python multiprocessing pool.map for multiple arguments
In your case what you would do is (assuming you have 3 arrays with files, polies, outs):
def expand_args(f_p_o):
clipmyfile(*f_p_o)
files = ["file1.txt", "file2.txt"]
polis = ["poli1.txt", "poly2.txt"]
outis = ["out1.txt", "out2.txt"]
len_f = len(files)
p = Pool()
p.map(expand_args, [(files[i], polis[i], outis[i]) for i in xrange(len_f)])