Memory error while slicing an array - python

I have object d connected to h5 dataset:
>>> data = d[:, :, 0].astype(np.float32)
>>> data.shape
(17201, 10801)
>>> data[data==-32768] = data[data>0].min()
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
MemoryError
Can I do some other slicing trick to avoid this error?

OK, I'm writing answer myself, as there is acceptable solution, gained after #mgilson questioned data type.
If data allows, memory error can be avoided by using simpler data type while operating on array. Considering initial question this worked for me:
>>> data = d[:, :, 0].astype(np.short)
>>> data[data==-32768] = data[data>0].min()
>>> data = data.astype(np.float32)

Related

TorchServe: How to convert bytes output to tensors

I have a model that is served using TorchServe. I'm communicating with the TorchServe server using gRPC. The final postprocess method of the custom handler defined returns a list which is converted into bytes for transfer over the network.
The post process method
def postprocess(self, data):
# data type - torch.Tensor
# data shape - [1, 17, 80, 64] and data dtype - torch.float32
return data.tolist()
The main issue is at the client where converting the received bytes from TorchServe to a torch Tensor is inefficiently done via ast.literal_eval
# This takes 0.3 seconds
response = self.inference_stub.Predictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
# This takes 0.84 seconds
predictions = torch.as_tensor(literal_eval(
response.prediction.decode('utf-8')))
Using numpy.frombuffer or torch.frombuffer return the following error.
import numpy as np
np.frombuffer(response.prediction)
Traceback (most recent call last):
File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size
np.frombuffer(response.prediction, dtype=np.float32)
Traceback (most recent call last):
File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size
Using torch
import torch
torch.frombuffer(response.prediction, dtype = torch.float32)
Traceback (most recent call last):
File "<string>", line 1, in <module>
ValueError: buffer length (2601542 bytes) after offset (0 bytes) must be a multiple of element size (4)
Is there an alternative, more efficient solution of converting the received bytes into torch.Tensor?
One hack I've found that has significantly increased the performance while sending large tensors is to return a list of json.
In your handler's postprocess function:
def postprocess(self, data):
output_data = {}
output_data['data'] = data.tolist()
return [output_data]
At the clients side when you receive the grpc response, decode it using json.loads
response = self.inference_stub.Predictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
decoded_output = response.prediction.decode('utf-8')
preds = torch.as_tensor(json.loads(decoded_output))
preds should have the output tensor
Update:
There's an even faster method and should completely solve the bottleneck. Use tf.io.serialize_tensor from tensorflow to serialize your tensor inside postprocess
def postprocess(self, data):
return [tf.io.serialize_tensor(data.cpu()).numpy()]
Decode it using tf.io.parse_tensor
response = self.inference_stub.Predictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
prediction = response.prediction
torch.as_tensor(tf.io.parse_tensor(prediction, out_type=tf.float32).numpy())

Convert a sympy poly with imaginary powers to an mpmath mpc

I have a sympy poly that looks like:
Poly(0.764635937801645*I**4 + 7.14650839258644*I**3 - 0.667712176660315*I**2 - 2.81663805543677*I - 0.623299856233272, I, domain='RR')
I'm converting to mpc using the following code:
a = val.subs('I',1.0j)
b = sy.re(a)
c = sy.im(a)
d = mpmath.mpc(b,c)
Two questions.
Assuming my mpc and sympy type have equal precision (of eg 100 dps) is there a precision loss using this conversion from a to d?
Is there a better way to convert?
Aside: sympy seems to treat I just like a symbol here. How do I get sympy to simplify this polynomial?
Edit: Ive also noticed that the following works in place of a above:
a = val.args[0]
Strings and expressions
Root cause of the issue is seen in val.subs('I', 1.0j) -- you appear to pass strings as arguments to SymPy functions. There are some valid uses for this (such as creation of high-precision floats), but when symbols are concerned, using strings is a recipe for confusion. The string 'I' gets implicitly converted to SymPy expression Symbol('I'), which is different from SymPy expression I. So the answer to
How do I get sympy to simplify this polynomial?
is to revisit the process of creation of that polynomial, and fix that. If you really need to create it from a string, then use locals parameter:
>>> S('3.3*I**2 + 2*I', locals={'I': I})
-3.3 + 2*I
Polynomials and expressions
If the Poly structure is not needed, use the method as_expr() of Poly to get an expression from it.
Conversion to mpmath and precision loss
is there a precision loss using this conversion from a to d?
Yes, splitting into real and imaginary and then recombining can lead to precision loss. Pass a SymPy object directly to mpc if you know it's a complex number. Or to mpmathify if you want mpmath to decide what type it should have. An example:
>>> val = S('1.111111111111111111111111111111111111111111111111')*I**3 - 2
>>> val
-2 - 1.111111111111111111111111111111111111111111111111*I
>>> import mpmath
>>> mpmath.mp.dps = 40
>>> mpmath.mpc(val)
mpc(real='-2.0', imag='-1.111111111111111111111111111111111111111111')
>>> mpmath.mpmathify(val)
mpc(real='-2.0', imag='-1.111111111111111111111111111111111111111111')
>>> mpmath.mpc(re(val), im(val))
mpc(real='-2.0', imag='-1.111111111111111111111111111111111111111114')
Observations:
When I is actual imaginary unit, I**3 evaluates fo -I, you don't have to do anything for it to happen.
A string representation of high-precision decimal is used to create such a float in SymPy. Here S stands for sympify. One can also be more direct and use Float('1.1111111111111111111111111')
Direct conversion of a SymPy complex number to an mpmath complex number is preferable to splitting in real/complex and recombining.
Conclusion
Most of the above is just talking around an XY problem. Your expression with I was not what you think it was, so you tried to do strange things that were not needed, and my answer is mostly a waste of time.
I'm adding my own answer here, as FTP's answer, although relevant and very helpful, did not (directly) resolve my issue (which wasn't that clear from the question tbh). When I ran the code in his example I got the following:
>>> from sympy import *
>>> import mpmath
>>> val = S('1.111111111111111111111111111111111111111111111111')*I**3 - 2
>>> val
-2 - 1.111111111111111111111111111111111111111111111111*I
>>> mpmath.mp.dps = 40
>>> mpmath.mpc(val)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\mpmath\ctx_mp_python.py", line 373, in __new__
real = cls.context.mpf(real)
File "C:\Python27\lib\site-packages\mpmath\ctx_mp_python.py", line 77, in __new__
v._mpf_ = mpf_pos(cls.mpf_convert_arg(val, prec, rounding), prec, rounding)
File "C:\Python27\lib\site-packages\mpmath\ctx_mp_python.py", line 96, in mpf_convert_arg
raise TypeError("cannot create mpf from " + repr(x))
TypeError: cannot create mpf from -2 - 1.111111111111111111111111111111111111111111111111*I
>>> mpmath.mpmathify(val)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\mpmath\ctx_mp_python.py", line 662, in convert
return ctx._convert_fallback(x, strings)
File "C:\Python27\lib\site-packages\mpmath\ctx_mp.py", line 614, in _convert_fallback
raise TypeError("cannot create mpf from " + repr(x))
TypeError: cannot create mpf from -2 - 1.111111111111111111111111111111111111111111111111*I
>>> mpmath.mpc(re(val), im(val))
mpc(real='-2.0', imag='-1.111111111111111111111111111111111111111114')
>>> mpmath.mpmathify(val)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\mpmath\ctx_mp_python.py", line 662, in convert
return ctx._convert_fallback(x, strings)
File "C:\Python27\lib\site-packages\mpmath\ctx_mp.py", line 614, in _convert_fallback
raise TypeError("cannot create mpf from " + repr(x))
TypeError: cannot create mpf from -2 - 1.111111111111111111111111111111111111111111111111*I
Updating my sympy (1.0->1.1.1) and mpmath (0.19->1.0.0) fixed the exceptions. I did not test which of these upgrades actually resolved the issue.

FFT Runtime Error in Running Galsim

I keep receiving the following error when running a script to save an animation:
RuntimeError: SB Error: fourierDraw() requires an FFT that is too large, 6144
If you can handle the large FFT, you may update gsparams.maximum_fft_size.
So I went into /Galsim/include/galsim/GSparams.h
and I changed the following
maximum_fft_size(16384) from maximum_fft_size(4096)
or 2^14 from 2^12.
I still get the same error as before. Should I restart my machine or something?
That is not where to change the maximum_fft_size parameter. See demo7 for an example of how to use the GSParams object and to update parameters. There is also an example in the doc string for GSObject:
>>> gal = galsim.Sersic(n=4, half_light_radius=4.3)
>>> psf = galsim.Moffat(beta=3, fwhm=2.85)
>>> conv = galsim.Convolve([gal,psf])
>>> im = galsim.Image(1000,1000, scale=0.05) # Note the very small pixel scale!
>>> im = conv.drawImage(image=im) # This uses the default GSParams.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "galsim/base.py", line 1236, in drawImage
image.added_flux = prof.SBProfile.draw(imview.image, gain, wmult)
RuntimeError: SB Error: fourierDraw() requires an FFT that is too large, 6144
If you can handle the large FFT, you may update gsparams.maximum_fft_size.
>>> big_fft_params = galsim.GSParams(maximum_fft_size=10240)
>>> conv = galsim.Convolve([gal,psf],gsparams=big_fft_params)
>>> im = conv.drawImage(image=im) # Now it works (but is slow!)
>>> im.write('high_res_sersic.fits')

How to work around the ValueError: array is too big error?

I've got a scipy sparse matrix (csr:Compressed Sparse Row matrix). I'd like to use Orange's feature selection methods (Orange.feature.scoring.score_all (InfoGain/MDL)). However, from my understanding I'll have to create a Table which only accepts a numpy array as an arguments. Therefore, whenever I tried to convert the csr matrix to an array, using (.toarray()), I get the following error (because the size of the matrix):
Traceback (most recent call last):
File "C:\Users\NMS\Desktop\PyExp\experiments_acl2013.py", line 249, in <module>
print(X_train.toarray())
File "C:\Python27\lib\site-packages\scipy\sparse\compressed.py", line 561, in toarray
return self.tocoo(copy=False).toarray(order=order, out=out)
File "C:\Python27\lib\site-packages\scipy\sparse\coo.py", line 238, in toarray
B = self._process_toarray_args(order, out)
File "C:\Python27\lib\site-packages\scipy\sparse\base.py", line 635, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
ValueError: array is too big.
Is there another approach that can allow me to pass a sparse matrix to create a table?
OR
Is there a way to apply InfoGain or MDL, in Orange, without creating a table using my sparse matrix directly?
when passing memmap to Table I get the following error:
>>> t2 = Table(d2, mm)
Traceback (most recent call last):
File "<pyshell#125>", line 1, in <module>
t2 = Table(d2, mm)
TypeError: invalid arguments
When passing the memmap with out the domain I get the following:
>>> mm
memmap([[0, 1, 2, 4],
[9, 8, 6, 3]])
>>> t2 = Table(mm)
Traceback (most recent call last):
File "<pyshell#128>", line 1, in <module>
t2 = Table(mm)
TypeError: invalid arguments for constructor (domain or examples or both expected)
Here it goes a workaround. For a given coo_matrix called m (obtained with m.tocoo()):
1) create a numpy.memmap array for writing:
mm = np.memmap('test.memmap', mode='w+', dtype=m.dtype, shape=m.shape)
2) copy the data to the memmap array, which should work:
for i,j,v in zip(m.row, m.col, m.data):
mm[i,j] = v
3) You can access the memmap as detailed in the documentation...

Infinity generated in python code

I'm looking over some complex Python 2.6 code which is occasionally resulting in an infinity being generated (at least an Infinity being serialized by the json library -- which checks w/ math.isinf).
What is especially baffling is that Python (as far as I can tell) shouldn't be able to ever produce computation results set to infinity. Am I wrong with this assumption? I was aware you can only get infinities from constants:
k = float('inf')
k = 1e900
Somewhere between 1e308 and 1e309 the floats run out of precision, so if you are computing results above that range you will see inf
>>> 1e308
1e+308
>>> 1e309
inf
>>> json.dumps(1e308,allow_nan=False)
'1e+308'
>>> json.dumps(1e309,allow_nan=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/json/__init__.py", line 237, in dumps
**kw).encode(obj)
File "/usr/lib/python2.6/json/encoder.py", line 367, in encode
chunks = list(self.iterencode(o))
File "/usr/lib/python2.6/json/encoder.py", line 304, in _iterencode
yield floatstr(o, self.allow_nan)
File "/usr/lib/python2.6/json/encoder.py", line 47, in floatstr
raise ValueError(msg)
ValueError: Out of range float values are not JSON compliant: inf
>>>
Decimal can handle larger numbers, but obviously there is a performance penalty (and it can't be serialised with json)
>>> from decimal import Decimal
>>> Decimal('1e900')/10
Decimal("1E+899")
Here is an example of an addition that doesn't raise overflow exception
>>> a=1e308
>>> a+a
inf

Categories