MemoryError when using np.median()

MemoryError when using np.median() - python

I have two files that I am reading data from, doing calculations, and plotting a graph with. One file is quite small ~50 KB and raises no problem with the script. The other file is ~ 702, 900 KB (this is the file that causes the problem). I am able to read in the data perfectly fine, though when I calculate the row-by-row medians for this particular file, the script fails and gives me a MemoryError. It looks like the following:
RMSDataS1 = [y01S1, y02S1, y03S1, y04S1, y05S1, y06S1, y07S1, y08S1, y09S1,
y010S1, y011S1, y012S1, y013S1, y014S1, y015S1, y016S1, y017S1,
y018S1, y019S1, y020S1, y021S1, y022S1, y023S1, y024S1, y025S1,
y026S1, y027S1, y028S1, y029S1, y030S1, y031S1, y032S1, y033S1,
y034S1, y035S1, y036S1, y037S1, y038S1, y039S1, y040S1, y041S1,
y042S1, y043S1, y044S1, y045S1, y046S1, y047S1, y048S1, y049S1,
y050S1, y051S1, y052S1, y053S1, y054S1, y055S1, y056S1, y057S1,
y058S1, y059S1, y060S1, y061S1, y062S1, y063S1, y064S1, y065S1,
y066S1, y067S1, y068S1, y069S1, y070S1, y071S1, y072S1, y073S1,
y074S1, y075S1, y076S1, y077S1, y078S1, y079S1, y080S1, y081S1,
y082S1, y083S1, y084S1, y085S1, y086S1, y087S1, y088S1, y089S1,
y090S1, y091S1, y092S1, y093S1, y094S1, y095S1, y096S1, y097S1,
y098S1, y099S1, y0100S1, y0101S1, y0102S1, y0103S1, y0104S1, y0105S1,
y0106S1, y0107S1, y0108S1]
MediansS1 = []
MediansS1 = np.median(RMSDataS1, axis = 0)
Is there any convenient way to get around this? I believe the script is failing when trying to sort the data when calculating the medians.
The error:
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\xy\RMSTrialOriginal-Aera.py", line 511, in <module>
MediansS1 = np.average(RMSDataS1, axis = 0)
File "C:\Python27\lib\site-packages\numpy\lib\function_base.py", line 486, in average
a = np.asarray(a)
File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 235, in asarray
return array(a, dtype, copy=False, order=order)
MemoryError
Any help would be greatly appreciated!

Related

MetPy suface_based_cape_cin returning error with units

At my site, we're having an issue with MetPy returning a units error when trying to call surface_based_cape_cin
I am seeing the following error:
Traceback (most recent call last):
File "Advanced_Sounding_3Dnetcdf2.py", line 202, in <module>
sbcape, sbcin = mpcalc.surface_based_cape_cin(p1, T1, Td1)
File "/gpfs/group/kal6112/default/sw/anaconda3/lib/python3.6/site-packages/metpy/xarray.py", line 677, in wrapper
return func(*args, **kwargs)
File "/gpfs/group/kal6112/default/sw/anaconda3/lib/python3.6/site-packages/metpy/units.py", line 320, in wrapper
return func(*args, **kwargs)
File "/gpfs/group/kal6112/default/sw/anaconda3/lib/python3.6/site-packages/metpy/calc/thermo.py", line 1851, in surface_based_cape_cin
return cape_cin(p, t, td, profile)
File "/gpfs/group/kal6112/default/sw/anaconda3/lib/python3.6/site-packages/metpy/xarray.py", line 677, in wrapper
return func(*args, **kwargs)
File "/gpfs/group/kal6112/default/sw/anaconda3/lib/python3.6/site-packages/metpy/units.py", line 319, in wrapper
raise ValueError(msg)
ValueError: `cape_cin` given arguments with incorrect units: `temperature` requires "[temperature]" but given "none", `dewpt` requires "[temperature]" but given "none".
When I check the incoming values p1, T1, and Td1 they all have the correct units (hectopascal, degree_Celcius).
Just to be sure I added the following and checked the results prior to the call to surface_based_cape_cin:
p1 = units.hPa * phPa
T1 = units.degC * TdegC
Td1 = units.degC * TddegC
I'm running the following version of MetPy
# Name Version Build Channel
metpy 0.12.2 py_0 conda-forge
I don't recall having this prior to updating to this version but I can't be certain the problem I'm seeing arose after the update or not.
Thanks for any help you can provide.

This is definitely a bug in MetPy, likely due to more challenges with masked arrays and preserving units. I've opened a new issue. In the meanwhile as a work-around, it's probably best to just eliminate masked arrays with something like:
p1 = p1.compressed() * p1.units
T1 = T1.compressed() * T1.units
Td1 = Td1.compressed() * Td1.units
This will work so long as the data have no actual masked values or if all 3 arrays are masked in the same spot. If not, you'll need to do some more work to remove any of the levels where one of the values is masked.

Keyword error when implementing Scatterv in mpi4py

I'm trying to use Scatterv to distribute parts of an array to each of my processors, but the line where I run the Scatterv call fails, and I get this error:
Traceback (most recent call last):
File "<ipython-input-16-e1f960b94347>", line 1, in <module>
comm.Scatterv([init_data, (sendcount,split)], init_data_local, root=0)
File "mpi4py/MPI/Comm.pyx", line 626, in mpi4py.MPI.Comm.Scatterv
File "mpi4py/MPI/msgbuffer.pxi", line 538, in mpi4py.MPI._p_msg_cco.for_scatter
File "mpi4py/MPI/msgbuffer.pxi", line 440, in mpi4py.MPI._p_msg_cco.for_cco_send
File "mpi4py/MPI/msgbuffer.pxi", line 266, in mpi4py.MPI.message_vector
File "mpi4py/MPI/msgbuffer.pxi", line 100, in mpi4py.MPI.message_basic
KeyError: '38w'
I have no idea what I'm doing wrong or how to fix this error. Any help would be appreciated!
EDIT: Here is a reproducible example of the code. Changing the data type of the init_data array changes the number following KeyError, but still gives the same error. My choice of '<U38' as the dtype is because that is what np.loadtxt uses when loading the array within my actual code.
import numpy as np
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
if rank==0:
init_data=np.ones((5187,3), dtype='<U38')
length=len(init_data[:,0])
else:
length=None
init_data=None
length=comm.bcast(length, root=0)
sendcount=[]
split=[]
for r in range(size):
split.append(r*length//size)
if r<size-1:
sendcount.append(length//size)
else:
sendcount.append(length-(r*length//size))
sendcount=tuple(sendcount)
split=tuple(split)
init_data_local=np.empty((sendcount[rank], 3),dtype=str)
comm.Scatterv([init_data, (sendcount,split)], init_data_local, root=0)

How to use numba.SmartArrays for vector addition?

I have written this code for vector addition using numba.SmartArrays. I am using this numba.SmartArrays for the first time. I am not sure how to use that.
This code is not working and it is throwing errors.
import numpy as np
from numba import SmartArray,cuda, jit, uint32
li1=np.uint32([1,2,3,4])
li=np.uint32([1,2,3,4])
b=SmartArray(li,where="host",copy=True)
a=SmartArray(li1,where="host",copy=True)
c=np.uint32([1,1,1,1])
print type(li)
print type(a)
#cuda.jit('void(uint32[:],uint32[:],uint32[:])',type="gpu")
def additionG(c,a,b):
idx=cuda.threadIdx.x+cuda.blockDim.x*cuda.blockIdx.x
if idx< len(a):
a[idx]=c[idx]+b[idx]
dA=cuda.to_device(a)
dB=cuda.to_device(b)
dC=cuda.to_device(c)
additionG[1, 128](c,a,b)
print a.__array__()
Errors:
<type 'numpy.ndarray'>
<class 'numba.smartarray.SmartArray'>
Traceback (most recent call last):
File "C:\Users\hp-pc\My Documents\LiClipse Workspace\cuda\blowfishgpu_smart_arrays.py", line 20, in <module>
dA=cuda.to_device(a)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\devices.py", line 257, in _require_cuda_context
return fn(*args, **kws)
File "C:\Anaconda\lib\site-packages\numba\cuda\api.py", line 55, in to_device
to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 403, in auto_device
devobj.copy_to_device(obj, stream=stream)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 148, in copy_to_device
sz = min(_driver.host_memory_size(ary), self.alloc_size)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1348, in host_memory_size
s, e = host_memory_extents(obj)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1333, in host_memory_extents
return mviewbuf.memoryview_get_extents(obj)
TypeError: expected a readable buffer object

Its been a while since I posted this question. Still posting the answer so that someone may find it helpful in future.
import numpy as np
from numba import SmartArray,cuda, jit, uint32,autojit
li1=np.uint32([6,7,8,9])
li=np.uint32([1,2,3,4])
a=SmartArray(li1,where='host',copy=True)
b=SmartArray(li,where="host",copy=True)
c=np.uint32([1,1,1,1])
def additionG(a,c):
idx=cuda.threadIdx.x+cuda.blockDim.x*cuda.blockIdx.x
if idx < len(c):
a[idx]=a[idx]+c[idx]
cuda.syncthreads()
bpg=1
tpb=128
dC=cuda.to_device(c)
cfunc = cuda.jit()(additionG)
cfunc[bpg, tpb](a,dC)
print a.__array__()

It looks to me like cuda.to_device doesn't handle smart arrays, which would sort of make sense, because smart arrays are supposed to do away with explicit copy management.
If my reading of the documentation is correct (I have never tried SmartArray before), you should just be able to change this
dA=cuda.to_device(a)
dB=cuda.to_device(b)
dC=cuda.to_device(c)
additionG[1, 128](c,a,b)
to just
dC=cuda.to_device(c)
additionG[1, 128](dC,a.gpu(),b.gpu())
The .gpu() method should return a GPU resident object that the kernel can understand and access.

How to fix "ValueError: need at least one array to concatenate" error

Following up from here: Calculating percentage of Bounding box overlap, for image detector evaluation, I'm getting an error at this line:
poly_clipped = poly.clip_to_bbox(clip_rect).to_polygons()[0]
This is the error:
File "C:\work_asaaki\code\detection.py", line 32, in clip_boxes
poly_clipped = poly.clip_to_bbox(clip_rect).to_polygons()[0]
File "C:\Anaconda\lib\site-packages\matplotlib\path.py", line 909, in clip_to_bbox
return self.make_compound_path(*paths)
File "C:\Anaconda\lib\site-packages\matplotlib\path.py", line 328, in make_compound_path
vertices = np.vstack([x.vertices for x in args])
File "C:\Anaconda\lib\site-packages\numpy\core\shape_base.py", line 228, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate
This doesn't always happen, it happens based on specific sets of polygons... What I'm trying to understand is when exactly does it not work? And how can I solve the issue?

Remove the numba.lowering.LoweringError: Internal error

I'm using numba to speed up my code which is working fine without numba. But after using #jit, it crashes with this error:
Traceback (most recent call last):
File "C:\work_asaaki\code\gbc_classifier_train_7.py", line 54, in <module>
gentlebooster.train(X_train, y_train, boosting_rounds)
File "C:\work_asaaki\code\gentleboost_c_class_jit_v7_nolimit.py", line 298, in train
self.g_per_round, self.g = train_function(X, y, H)
File "C:\Anaconda\lib\site-packages\numba\dispatcher.py", line 152, in _compile_for_args
return self.jit(sig)
File "C:\Anaconda\lib\site-packages\numba\dispatcher.py", line 143, in jit
return self.compile(sig, **kws)
File "C:\Anaconda\lib\site-packages\numba\dispatcher.py", line 250, in compile
locals=self.locals)
File "C:\Anaconda\lib\site-packages\numba\compiler.py", line 183, in compile_bytecode
flags.no_compile)
File "C:\Anaconda\lib\site-packages\numba\compiler.py", line 323, in native_lowering_stage
lower.lower()
File "C:\Anaconda\lib\site-packages\numba\lowering.py", line 219, in lower
self.lower_block(block)
File "C:\Anaconda\lib\site-packages\numba\lowering.py", line 254, in lower_block
raise LoweringError(msg, inst.loc)
numba.lowering.LoweringError: Internal error:
NotImplementedError: ('cast', <llvm.core.Instruction object at 0x000000001801D320>, slice3_type, int64)
File "gentleboost_c_class_jit_v7_nolimit.py", line 103
Line 103 is below, in a loop:
weights = np.empty([n,m])
for curr_n in range(n):
weights[curr_n,:] = 1.0/(n) # this is line 103
where n is a constant already defined somewhere above in my code.
How can I remove the error? What "lowering" is going on? I'm using Anaconda 2.0.1 with Numba 0.13.x and Numpy 1.8.x on a 64-bit machine.

Based on this: https://gist.github.com/cc7768/bc5b8b7b9052708f0c0a,
I figured out what to do to avoid the issue. Instead of using the colon : to refer to any row/column, I just opened up the loop into two loops to explicitly refer to the indices in each dimension of the array:
weights = np.empty([n,m])
for curr_n in range(n):
for curr_m in range (m):
weights[curr_n,curr_m] = 1.0/(n)
There were other instances in my code after this where I used the colon, but they didn't cause errors further down, not sure why.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

MemoryError when using np.median() - python

Related

MetPy suface_based_cape_cin returning error with units

Keyword error when implementing Scatterv in mpi4py

How to use numba.SmartArrays for vector addition?

How to fix "ValueError: need at least one array to concatenate" error

Remove the numba.lowering.LoweringError: Internal error

Categories

Resources