I have a piece of code which computes the Helmholtz-Hodge Decomposition.
I've been running on my Mac OS Yosemite and it was working just fine. A month ago, however, my Mac got pretty slow (it was really old), and I opted to buy a new notebook (Windows 8.1, Dell).
After installing all Python libs and so on, I continued my work running this same code (versioned in Git). And then the result was pretty weird, completely different from the one obtained in the old notebook.
For instance, what I do is to construct to matrices a and b(really long calculus) and then I call the solver:
s = numpy.linalg.solve(a, b)
This was returning a (wrong, and different of the result obtained in my Mac, which was right).
Then, I tried to use:
s = scipy.linalg.solve(a, b)
And the program exits with code 0 but at the middle of it.
Then, I just made a simple test of:
print 'here1'
s = scipy.linalg.solve(a, b)
print 'here2'
And here2 is never printed.
I tried:
print 'here1'
x, info = numpy.linalg.cg(a, b)
print 'here2'
And the same happens.
I also tried to check the solution after using numpy.linalg.solve:
print numpy.allclose(numpy.dot(a, s), b)
And I got a False (?!).
I don't know what is happening, how to find a solution, I just know that the same code runs in my Mac, but it would be very good if I could run it in other platforms. Now I'm stucked in this problem (don't have a Mac anymore) and with no clue about the cause.
The weirdest thing is that I don't receive any error on runtime warning, no feedback at all.
Thank you for any help.
EDIT:
Numpy Suit Test Results:
Scipy Suit Test Results:
Download Anaconda package manager
http://continuum.io/downloads
When you download this it will already have all the dependencies for numpy worked out for you. It installs locally and will work on most platforms.
This is not really an answer, but this blog discusses in length the problems of having a numpy ecosystem that evolves fast, at the expense of reproducibility.
By the way, which version of numpy are you using? The documentation for the latest 1.9 does not report any method called cg as the one you use...
I suggest the use of this example so that you (and others) can check the results.
>>> import numpy as np
>>> import scipy.linalg
>>> np.random.seed(123)
>>> a = np.random.random(size=(10000, 10000))
>>> b = np.random.random(size=(10000,))
>>> s_np = np.linalg.solve(a, b)
>>> s_sc = scipy.linalg.solve(a, b)
>>> np.allclose(s_np,s_sc)
>>> s_np
array([-15.59186559, 7.08345804, 4.48174646, ..., -16.43310046,
-8.81301553, -10.77509242])
I hope you can find the answer - one option in the future is to create a virtual machine for each of your projects, using Docker. This allows easy portability.
See a great article here discussing Docker for research.
Related
I'm trying to understand why the following simple example doesn't successfully complete execution and seems to get stuck on the first line of really_simple_func (on Ubuntu machines, but not Windows). The code is:
import torch as t
import numpy as np
import multiprocessing as mp # I've tried both multiprocessing
# import torch.multiprocessing as mp # and torch.multiprocessing
def really_simple_func():
temp_val_2 = t.tensor(np.zeros(425447)[0:400000]) # this is the line that blocks.
return 4.3
if __name__ == "__main__":
print("Run brief starting")
some_zeros = np.zeros(425447)
temp_val = t.tensor(some_zeros[0:400000]) # DELETE THIS LINE TO MAKE IT WORK
pool = mp.Pool(processes=1)
job = pool.apply_async(really_simple_func)
print("just before job.get()")
result = job.get()
print("Run brief completed. Reward = {}".format(result))
I have torch 1.11.0 installed, numpy 1.22.3 and have tried both CPU and GPU versions of Torch. When I run this code on two different Ubuntu machines, I get the following output:
Run brief starting
just before job.get()
However, the code never successfully completes (doesn't print the "Run brief completed" line). (It does complete on a third Windows box).
On the Ubuntu machines, if I delete the line with the comment "#DELETE THIS LINE TO MAKE IT WORK" the execution DOES complete, printing the final line as expected. Similarly, if I leave the line defining temp_val in but delete the line with the comment "This is the line that blocks" it will also complete. Moreover, if I reduce the size of the temp_val tensor (say from 400000 to 4000) it will also complete successfully. Finally, it is worth noting that while I can reproduce this behaviour on two different Ubuntu machines, this code does actually complete on my Windows machine - though, as far as I can tell, the versions of key packages, such as torch, are the same.
I don't understand this behaviour. I suspect it is something to do with the way torch allocates memory or stores information. I've tried calling del temp_val to free up memory, but that doesn't seem to fix things. It seems to me that the async call to t.tensor within really_simple_func is stopped from completing if there has already been a call to t.tensor in the main code block, creating a sufficiently large tensor.
I don't understand why this is happening, or even if that is the correct explanation. In any case, what would be best practice if I do need to do some tensor processing within apply_async as well as in the main thread? More generally, what is Torch waiting on when I make a call to t.tensor?
(Obviously, this is just the simplest version of the real code I'm trying to get to work that reproduced this issue. I realise that calling mp.Pool with only one process doesn't really make sense...nor, indeed, does using apply_async to call a function that returns a constant!)
Unfortunately, I cannot provide any answers to your questions.
I can, however, share experiences with seemingly the same issue. I use a Linux machine with torch 1.8.1 and numpy 1.19.2.
When I run the following code on my machine:
with Pool(max_pool) as p:
pool_outputs = list(
tqdm(
p.imap(lambda f: get_model_results_per_query_file(get_preds, tokenizer, f), query_files),
total=len(query_files)
)
)
For which the function get_model_results_per_query_file contains operations similar to the following:
feats = features.unsqueeze(0).repeat(batch_size, 1, 1).to(device)
(features is a torch tensor)
The first round of jobs automatically fail, and new ones are immediately started (that do not fail for some reason). The whole process never completes though, since the main process still seems to be waiting for the first failed jobs.
If I remove the lines in my code involving the repeat function, no jobs fail.
I managed to solve my issue and preserve the same results by adapting a similar solution to yours:
feats = torch.as_tensor(np.tile(features, (batch_size, 1, 1))).to(device)
I believe as_tensor works in a similar fashion to from_numpy in this case.
I only managed to find this solution thanks to your post and your proposed workaround, so thank you!
After some further exploration, here is a brief answer to my own question.
While I still don't fully understand the blocking behaviour (and would welcome any further explanation), I have just seen that the way I'm generating torch tensors from a numpy array is not correct.
In particular, instead of using torch.tensor(temp_val) where temp_val is a numpy array, I should be using torch.from_numpy(temp_val). Doing this fixes the problem.
Alternatively, I can convert temp_val into a list and then create the tensor via torch.tensor(temp_val_as_list) - which also avoids the issue.
This is an old problem as is demonstrated as in https://community.intel.com/t5/Analyzers/Unable-to-view-source-code-when-analyzing-results/td-p/1153210. I have tried all the listed methods, none of them works, and I cannot find any more solutions on the internet. Basically vtune cannot find the custom python source file no matter what is tried. I am using the most recently version as of speaking. Please let me whether there is a solution.
For example, if you run the following program.
def myfunc(*args):
# Do a lot of things.
if __name__ = '__main__':
# Do something and call myfunc
Call this script main.py. Now use the newest vtune version (I have using Ubuntu 18.04), run the vtune-gui and basic hotspot analysis. You will not found any information on this file. However, a huge pile of information on Python and its other codes are found (related to your python environment). In theory, you should be able to find the source of main.py as well as cost on each line in that script. However, that is simply not happening.
Desired behavior: I would really like to find the source file and function in the top-down manual (or any really). Any advice is welcome.
VTune offer full support for profiling python code and the tool should be able to display the source code in your python file as you expected. Could you please check if the function you are expecting to see in the VTune results, ran long enough?
Just to confirm that everything is working fine, I wrote a matrix multiplication code as shown below (don't worry about the accuracy of the code itself):
def matrix_mul(X, Y):
result_matrix = [ [ 1 for i in range(len(X)) ] for j in range(len(Y[0])) ]
# iterate through rows of X
for i in range(len(X)):
# iterate through columns of Y
for j in range(len(Y[0])):
# iterate through rows of Y
for k in range(len(Y)):
result_matrix[i][j] += X[i][k] * Y[k][j]
return result_matrix
Then I called this function (matrix_mul) on my Ubuntu machine with large enough matrices so that the overall execution time was in the order of few seconds.
I used the below command to start profiling (you can also see the VTune version I used):
/opt/intel/oneapi/vtune/2021.1.1/bin64/vtune -collect hotspots -knob enable-stack-collection=true -data-limit=500 -ring-buffer=10 -app-working-dir /usr/bin -- python3 /home/johnypau/MyIntel/temp/Python_matrix_mul/mat_mul_method.py
Now open the VTune results in the GUI and under the bottom-up tab, order by "Module / Function / Call-stack" (or whatever preferred grouping is).
You should be able to see the the module (mat_mul_method.py in my case) and the function "matrix_mul". If you double click, VTune should be able to load the sources too.
While searching for some numpy stuff, I came across a question discussing the rounding accuracy of numpy.dot():
Numpy: Difference between dot(a,b) and (a*b).sum()
Since I happen to have two (different) Computers with Haswell-CPUs sitting on my desk, that should provide FMAand everything, I thought I'd test the example given by Ophion in the first answer, and I got a result that somewhat surprised me:
After updating/installing/fixing lapack/blas/atlas/numpy, I get the following on both machines:
>>> a = np.ones(1000, dtype=np.float128)+1e-14
>>> (a*a).sum()
1000.0000000000199999
>>> np.dot(a,a)
1000.0000000000199948
>>> a = np.ones(1000, dtype=np.float64)+1e-14
>>> (a*a).sum()
1000.0000000000198
>>> np.dot(a,a)
1000.0000000000176
So the standard multiplication + sum() is more precise than np.dot(). timeit however confirmed that the .dot() version is faster (but not much) for both float64 and float128.
Can anyone provide an explanation for this?
edit: I accidentally deleted the info on numpy versions: same results for 1.9.0 and 1.9.3 with python 3.4.0 and 3.4.1.
It looks like they recently added a special Pairwise Summation to ndarray.sum for improved numerical stability.
From PR 3685, this affects:
all add.reduce calls that go over float_add with IS_BINARY_REDUCE true
so this also improves mean/std/var and anything else that uses sum.
See here for code changes.
Edit: Really appreciate help in finding bug - but since it might prove hard to find/reproduce, any general debug help would be greatly appreciated too! Help me help myself! =)
Edit 2: Narrowing it down, commenting out code.
Edit 3: Seems lxml might not be the culprit, thanks! The full script is here. I need to go over it looking for references. What do they look like?
Edit 4: Actually, the scripts stops (goes 100%) in this, the parse_og part of it. So edit 3 is false - it must be lxml somehow.
Edit 5 MAJOR EDIT: As suggested by David Robinson and TankorSmash below, I've found a type of data content that will send lxml.etree.HTML( data ) in a wild loop. (I carelessly disregarded it, but find my sins redeemed as I've paid a price to the tune of an extra two days of debug! ;) A working crashing script is here. (Also opened a new question.)
Edit 6: Turns out this is a bug with lxml version 2.7.8 and below (at
least). Updated to lxml 2.9.0, and bug is gone. Thanks also to the fine folks over at this follow-up question.
I don't know how to debug this weird problem I'm having.
The below code runs fine for about five minutes, when the RAM is suddenly completely filled up (from 200MB to 1700MB during the 100% period - then when memory is full, it goes into blue wait state).
It's due to the code below, specifically the first two lines. That's for sure. But what is going on? What could possibly explain this behaviour?
def parse_og(self, data):
""" lxml parsing to the bone! """
try:
tree = etree.HTML( data ) # << break occurs on this line >>
m = tree.xpath("//meta[#property]")
#for i in m:
# y = i.attrib['property']
# x = i.attrib['content']
# # self.rj[y] = x # commented out in this example because code fails anyway
tree = ''
m = ''
x = ''
y = ''
i = ''
del tree
del m
del x
del y
del i
except Exception:
print 'lxml error: ', sys.exc_info()[1:3]
print len(data)
pass
You can try Low-level Python debugging with GDB. Probably there is a bug in Python interpreter or in lxml library and it is hard to find it without extra tools.
You can interrupt your script running under gdb when CPU usage goes to 100% and look at stack trace. It will probably help to understand what's going on inside script.
it must be due to some references which keep the documents alive. one must always be careful with string results from xpath evaluation. I see you have assigned None to tree and m but not to y,x and i .
Can you also assign None to y,x and i .
Tools are also helpful when trying to track down memory problems. I've found guppy to be a very useful Python memory profiling and exploration tool.
It is not the easiest to get started with due to a lack of good tutorials / documentation, but once you get to grips with it you will find it very useful. Features I make use of:
Remote memory profiling (via sockets)
Basic GUI for charting usage, optionally showing live data
Powerful, and consistent, interfaces for exploring data usage in a Python shell
Can't get code-completion to work for e.g. SciPy, Numpy or Matplotlib in Eclipse/PyDev under Ubuntu 12.4 or 11.4. Tried with Eclipse Helios and Juno, PyDev in latest version (2.6).
Code completion does work for e.g. internal project references or builtins.
Have added path to "Preferences->Pydev->Interpreter - Python->Libraries" and added scipy, numpy and matplotlib to the "Forced Builtins". Under "Preferences->PyDev->Editor->Code Completion" "Minimum Number of chars..." is set to 1, "Preferences->PyDev->Editor->Code Completion (ctx insensitive and tokens)" "Number of chars for..." are both set to 2.
Importing and code completion works within ipython shell, so I think it must be something in PyDev...
Example code:
import numpy as np
myArr = np.array([1,2,3])
myArr.set#<hit CTRL-SPACE for completion>
Code-completion does not suggest any of the array methods here (setasflat, setfield, setflags).
Thanks for any suggestions... :)
Regards,
Carsten
I think this happens because pydev couldn't figure out what type is returned by the np.array method. If your code is long and you want code completion many times, perhaps you could "tell" pydev what is myArr's type. Try using assert:
import numpy as np
myArr = np.array([1,2,3])
assert isinstance(myArr, np.ndarray)
myArr.set#<hit CTRL-SPACE for completion>
After that code completion will always work for the myArr variable. Later you can delete or comment the assert line or use the "-O" flag with the python interpreter. Look at this page.
Just to note, in the latest PyDev version you can now let PyDev know about the type through documentation (without needing the assert isinstance).
See: http://pydev.org/manual_adv_type_hints.html for details.