following (simplified) code applies an interpolation function on the multiprocessing module:
from multiprocessing import Pool
from scipy.interpolate import LinearNDInterpolator
...
if __name__=="__main__":
p=Pool(4)
lndi = LinearNDInterpolator(points, valuesA)
valuesB = list(np.split(valuesA, 4))
ret = p.map(lndi.__call__, valuesB)
when i run the .py, python freezes, if the last line is run separately everything works fine and i get the speed-up that i hoped for.
anyone knows how to fix the code to have it work automatically?
thanks in advance
edit: github issue was opened -> https://github.com/spyder-ide/spyder/issues/3367
Related
I'm trying to understand why the following simple example doesn't successfully complete execution and seems to get stuck on the first line of really_simple_func (on Ubuntu machines, but not Windows). The code is:
import torch as t
import numpy as np
import multiprocessing as mp # I've tried both multiprocessing
# import torch.multiprocessing as mp # and torch.multiprocessing
def really_simple_func():
temp_val_2 = t.tensor(np.zeros(425447)[0:400000]) # this is the line that blocks.
return 4.3
if __name__ == "__main__":
print("Run brief starting")
some_zeros = np.zeros(425447)
temp_val = t.tensor(some_zeros[0:400000]) # DELETE THIS LINE TO MAKE IT WORK
pool = mp.Pool(processes=1)
job = pool.apply_async(really_simple_func)
print("just before job.get()")
result = job.get()
print("Run brief completed. Reward = {}".format(result))
I have torch 1.11.0 installed, numpy 1.22.3 and have tried both CPU and GPU versions of Torch. When I run this code on two different Ubuntu machines, I get the following output:
Run brief starting
just before job.get()
However, the code never successfully completes (doesn't print the "Run brief completed" line). (It does complete on a third Windows box).
On the Ubuntu machines, if I delete the line with the comment "#DELETE THIS LINE TO MAKE IT WORK" the execution DOES complete, printing the final line as expected. Similarly, if I leave the line defining temp_val in but delete the line with the comment "This is the line that blocks" it will also complete. Moreover, if I reduce the size of the temp_val tensor (say from 400000 to 4000) it will also complete successfully. Finally, it is worth noting that while I can reproduce this behaviour on two different Ubuntu machines, this code does actually complete on my Windows machine - though, as far as I can tell, the versions of key packages, such as torch, are the same.
I don't understand this behaviour. I suspect it is something to do with the way torch allocates memory or stores information. I've tried calling del temp_val to free up memory, but that doesn't seem to fix things. It seems to me that the async call to t.tensor within really_simple_func is stopped from completing if there has already been a call to t.tensor in the main code block, creating a sufficiently large tensor.
I don't understand why this is happening, or even if that is the correct explanation. In any case, what would be best practice if I do need to do some tensor processing within apply_async as well as in the main thread? More generally, what is Torch waiting on when I make a call to t.tensor?
(Obviously, this is just the simplest version of the real code I'm trying to get to work that reproduced this issue. I realise that calling mp.Pool with only one process doesn't really make sense...nor, indeed, does using apply_async to call a function that returns a constant!)
Unfortunately, I cannot provide any answers to your questions.
I can, however, share experiences with seemingly the same issue. I use a Linux machine with torch 1.8.1 and numpy 1.19.2.
When I run the following code on my machine:
with Pool(max_pool) as p:
pool_outputs = list(
tqdm(
p.imap(lambda f: get_model_results_per_query_file(get_preds, tokenizer, f), query_files),
total=len(query_files)
)
)
For which the function get_model_results_per_query_file contains operations similar to the following:
feats = features.unsqueeze(0).repeat(batch_size, 1, 1).to(device)
(features is a torch tensor)
The first round of jobs automatically fail, and new ones are immediately started (that do not fail for some reason). The whole process never completes though, since the main process still seems to be waiting for the first failed jobs.
If I remove the lines in my code involving the repeat function, no jobs fail.
I managed to solve my issue and preserve the same results by adapting a similar solution to yours:
feats = torch.as_tensor(np.tile(features, (batch_size, 1, 1))).to(device)
I believe as_tensor works in a similar fashion to from_numpy in this case.
I only managed to find this solution thanks to your post and your proposed workaround, so thank you!
After some further exploration, here is a brief answer to my own question.
While I still don't fully understand the blocking behaviour (and would welcome any further explanation), I have just seen that the way I'm generating torch tensors from a numpy array is not correct.
In particular, instead of using torch.tensor(temp_val) where temp_val is a numpy array, I should be using torch.from_numpy(temp_val). Doing this fixes the problem.
Alternatively, I can convert temp_val into a list and then create the tensor via torch.tensor(temp_val_as_list) - which also avoids the issue.
I am using joblib to parallel a for loop for my own function.
from joblib import Parallel, delayed
from my_function import my_case_study
result = Parallel(n_jobs=4)(delayed(my_case_study)(i) for i in range(100))
So my_case_study is the only function in my_function.py file, and it takes i as the hyperparameter. my_case_study calls a bunch of different model fitting algorithm contains in the other python files, which are imported at the top of my_function. my_function.py basicly looks like
from anotherfile import fun1
from anotherfile import fun2
def my_case_study(i):
mse1 = fun1(i)
mse2 = fun2(i)
return (mse1,mse2)
But then I get the error message: A task gas failed to un-serialize. Please ensure that the arguments of the function are all picklable.
How to solve this? Thanks!!
I found the solution in the following link helpful in my case:
https://github.com/joblib/joblib/issues/810
Don't know if there is any other better solution. Since the comments at the end mentioned that there might be some issues with that (didn't fully understand).
I have a script that seemed to run slow and that i profiled using cProfile (and visualisation tool KCacheGrind)
It seems that what is taking almost 90% of the runtime is the import sequence, and especially the running of the _ _ init _ _.py files...
Here a screenshot of the KCacheGrind output (sorry for attaching an image...)
I am not very familiar with how the import sequence works in python ,so maybe i got something confused... I also placed _ _ init _ _.py files in everyone of my custom made packages, not sure if that was what i should have done.
Anyway, if anyone has any hint, greatly appreciated!
EDIT: additional picture when function are sorted by self:
EDIT2:
here the code attached, for more clarity for the answerers:
from strategy.strategies.gradient_stop_and_target import make_one_trade
from datetime import timedelta, datetime
import pandas as pd
from data.db import get_df, mongo_read_only, save_one, mongo_read_write, save_many
from data.get import get_symbols
from strategy.trades import make_trade, make_mae, get_prices, get_signals, \
get_prices_subset
#from profilehooks import profile
mongo = mongo_read_only()
dollar_stop = 200
dollar_target = 400
period_change = 3
signal = get_df(mongo.signals.signals, strategy = {'$regex' : '^indicators_group'}).iloc[0]
symbol = get_symbols(mongo, description = signal['symbol'])[0]
prices = get_prices(
signal['datetime'],
signal['datetime'].replace(hour = 23, minute = 59),
symbol,
mongo)
make_one_trade(
signal,
prices,
symbol,
dollar_stop,
dollar_target,
period_change)
The function get_prices simply get data from a mongo db database, and make_one_trade does simple calculation with pandas. This never poses problem anywhere else in my project.
EDIT3:
Here the Kcache grind screen when i select 'detect cycle' option in View tab:
Could that actually mean that there are indeed circular imports in my self created packages that takes all that time to resolve?
No. You are conflating cumulative time with time spent in the top-level code of the __init__.py file itself. The top-level code calls other methods, and those together take a lot of time.
Look at the self column instead to find where all that time is being spent. Also see What is the difference between tottime and cumtime in a python script profiled with cProfile?, the incl. column is the cumulative time, self is the total time.
I'd just filter out all the <frozen importlib.*> entries; the Python project has already made sure those paths are optimised.
However, your second screenshot does show that in your profiling run, all that your Python code busied itself with was loading bytecode for modules to import (the marshal module provides the Python bytecode serialisation implementation). Either the Python program did nothing but import modules and no other work was done, or it is using some form of dynamic import that is loading a large number of modules or is otherwise ignoring the normal module caches and reloading the same module(s) repeatedly.
You can profile import times using Python 3.7's new -X importtime command-line switch, or you could use a dedicated import-profiler to find out why imports take such a long time.
I've looked around the forum and can't seem to find anything that helps. I'm trying to parallelize some processes but cannot seem to get it to work.
from multiprocessing import *
from numpy import *
x=array([1,2,3,4,5])
def SATSolver(args):
#many_calculations
return result
def main(arg):
new_args=append(x,arg)
return SATSolver(new_args)
y=array([8,9,10,11])
if __name__ == '__main__':
pool=Pool()
results=pool.map(main,y)
print(results)
The SATSolver function is where the bulk of the work happens. Basically, I have an array x and a second array y. I want to add append each value in y to x individually, and then run this new set through my SATSolver function. I would like to use the multiprocessing module so this can run in parallel.
Whenever I try and run this, I don't get an error, but a new interactive window pops up and says "Could not load the file from filepath\-c. Do you want to create a new file?"
Everything works perfectly when it's run without multiprocessing.
Any ideas on how to make this work?
Thanks!
I'm currently writing a python script which plots a numpy matrix containing some data (which I'm not having any difficulty computing). For complicated reasons having to do with how I'm creating that data, I have to go through terminal. I've done problems like this a million times in Spyder using imshow(). So, I thought I'd try to do the same in terminal. Here's my code:
from numpy import *
from matplotlib import *
def make_picture():
f = open("DATA2.txt")
arr = zeros((200, 200))
l = f.readlines()
for i in l:
j = i[:-1]
k = j.split(" ")
arr[int(k[0])][int(k[1])] = float(k[2])
f.close()
imshow(arr)
make_picture()
Suffice it to say, the array stuff works just fine. I've tested it, and it extracts the data perfectly well. So, I've got this 200 by 200 array of numbers floating around my RAM and I'd like to display it. When I run this code in Spyder, I get exactly what I expected. However, when I run this code in Terminal, I get an error message:
Traceback (most recent call last):
File "DATAmine.py", line 15, in <module>
make_picture()
File "DATAmine.py", line 13, in make_picture
imshow(arr)
NameError: global name 'imshow' is not defined
(My program's called DATAmine.py) What's the deal here? Is there something else I should be importing? I know I had to configure my Spyder paths, so I wonder if I don't have access to those paths or something. Any suggestions would be greatly appreciated. Thanks!
P.S. Perhaps I should mention I'm using Ubuntu. Don't know if that's relevant.
To make your life easier you can use
from pylab import *
This will import the full pylab package, which includes matplotlib and numpy.
Cheers