Numpy: apparent memory error

Numpy: apparent memory error - python

Using Python/Numpy, I'm trying to import a file; however, the script returns an error that I believe is a memory error:
In [1]: import numpy as np
In [2]: npzfile = np.load('cuda400x400x2000.npz')
In [3]: U = npzfile['U']
---------------------------------------------------------------------------
SystemError Traceback (most recent call last)
<ipython-input-3-0539104595dc> in <module>()
----> 1 U = npzfile['U']
/usr/lib/pymodules/python2.7/numpy/lib/npyio.pyc in __getitem__(self, key)
232 if bytes.startswith(format.MAGIC_PREFIX):
233 value = BytesIO(bytes)
--> 234 return format.read_array(value)
235 else:
236 return bytes
/usr/lib/pymodules/python2.7/numpy/lib/format.pyc in read_array(fp)
456 # way.
457 # XXX: we can probably chunk this to avoid the memory hit.
--> 458 data = fp.read(int(count * dtype.itemsize))
459 array = numpy.fromstring(data, dtype=dtype, count=count)
460
SystemError: error return without exception set
If properly loaded, U will contain 400*400*2000 doubles, so that's about 2.5 GB. It seems the system has enough memory available:
bogeholm#bananabot ~/Desktop $ free -m
total used free shared buffers cached
Mem: 7956 3375 4581 0 35 1511
-/+ buffers/cache: 1827 6128
Swap: 16383 0 16383
Is this a memory issue? Can it be fixed in any way other than buying more RAM? The box is Linux Mint DE with Python 2.7.3rc2 and Numpy 1.6.2.
Cheers,
\T

Related

I have an error using OSMNX: OS error when using ox.graph_from_place

I recently reinstalled OSMNX using pip and now I get a strange OS error I honestly have never seen before and google did not help me as well...
Here is a code excerpt with the error. I tried to restart the environment and also tried another VE to run it but the same error appears. It does actually not seem to be some native bug of OSMNX and seems to go to geopandas or shapely.
And no, I have never encountered anything similar and I am using anaconda, being the superuser (win10) of this machine with all rights.
OSError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_18096\802543517.py in <module>
6
7 # download the street network
----> 8 G = ox.graph_from_place('Berlin', network_type = 'drive', simplify=False)
~\anaconda3\lib\site-packages\osmnx\graph.py in graph_from_place(query, network_type, simplify, retain_all, truncate_by_edge, which_result, buffer_dist, clean_periphery, custom_filter)
345
346 # extract the geometry from the GeoDataFrame to use in API query
--> 347 polygon = gdf_place["geometry"].unary_union
348 utils.log("Constructed place geometry polygon(s) to query API")
349
~\anaconda3\lib\site-packages\geopandas\base.py in unary_union(self)
726 POLYGON ((0 1, 0 2, 2 2, 2 0, 1 0, 0 0, 0 1))
727
--> 728 return self.geometry.values.unary_union()
729
730 #
~\anaconda3\lib\site-packages\geopandas\array.py in unary_union(self)
636
637 def unary_union(self):
--> 638 return vectorized.unary_union(self.data)
639
640 #
~\anaconda3\lib\site-packages\geopandas\_vectorized.py in unary_union(data)
909 data = [g for g in data if g is not None]
910 if data:
--> 911 return shapely.ops.unary_union(data)
912 else:
913 return None
~\anaconda3\lib\site-packages\shapely\ops.py in unary_union(self, geoms)
159 subs[i] = g._geom
160 collection = lgeos.GEOSGeom_createCollection(6, subs, L)
--> 161 return geom_factory(lgeos.methods['unary_union'](collection))
162
163 operator = CollectionOperator()
OSError: exception: access violation writing 0x0000000000001101

I recently reinstalled OSMNX using pip
You did not follow the OSMnx installation instructions so this error is to be expected. Follow the documented instructions and it will resolve.

How to link netcdf as shared libraries during conda installation of esmpy?

I installed esmpy using conda install -c conda-forge esmpy but am unable to get it to create a mesh from an existing Netcdf file using something like this:
mesh = ESMF.Mesh(filename=myfile.nc),filetype=ESMF.FileFormat.ESMFMESH)
My input file is the output from the CAM-SE global atmospheric model at ne120 resolution. This model returns unstructured output. I get an error message saying that Netcdf should be built as a shared library. I know the Netcdf libraries exist because I use xarray all the time to read and process them. But how does one link those libraries during the installation step for esmpy using conda? Do I need to build esmpy from source to be able to do this?
Update: Have added the full traceback below. I installed netcdf using conda-forge within the conda environment I was using and it appears that was not the source of the error as the error message remains unchanged. I am now wondering if my command for generating mesh is just going to work after directly feeding in a CAM-SE output file. The file does not really have any information about the number of elements, etc. Also, should I rename the dimensions to some common form expected by ESMF? How will ESMF know which dimension represents the number of nodes, etc.? Here is the list of dimensions from one of the output files (followed by the traceback):
dimensions:
time = UNLIMITED ; // (292 currently)
lev = 30 ;
ilev = 31 ;
ncol = 777602 ;
nbnd = 2 ;
chars = 8 ;
string1 = 1 ;
----------Traceback------------------------
ArgumentError Traceback (most recent call last)
Input In [13], in <cell line: 5>()
----> 5 grid = ESMF.Mesh(filename='B1850.ne120_t12.cam.h2.0338-01-01-21600.nc',
filetype=ESMF.FileFormat.ESMFMESH)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/util/decorators.py:81, in initialize.<locals>.new_func(*args, **kwargs)
78 from ESMF.api import esmpymanager
80 esmp = esmpymanager.Manager(debug = False)
---> 81 return func(*args, **kwargs)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/api/mesh.py:198, in Mesh.__init__(self, parametric_dim, spatial_dim, coord_sys, filename, filetype, convert_to_dual, add_user_area, meshname, mask_flag, varname)
195 self._coord_sys = coord_sys
196 else:
197 # call into ctypes layer
--> 198 self._struct = ESMP_MeshCreateFromFile(filename, filetype,
199 convert_to_dual,
200 add_user_area, meshname,
201 mask_flag, varname)
202 # get the sizes
203 self._size[node] = ESMP_MeshGetLocalNodeCount(self)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/util/decorators.py:93, in netcdf.<locals>.new_func(*args, **kwargs)
90 from ESMF.api.constants import _ESMF_NETCDF
92 if _ESMF_NETCDF:
---> 93 return func(*args, **kwargs)
94 else:
95 raise NetCDFMissing("This function requires ESMF to have been built with NetCDF.")
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/interface/cbindings.py:1218, in ESMP_MeshCreateFromFile(filename, fileTypeFlag, convertToDual, addUserArea, meshname, maskFlag, varname)
1197 """
1198 Preconditions: ESMP has been initialized.\n
1199 Postconditions: An ESMP_Mesh has been created.\n
(...)
1215 string (optional) :: varname\n
1216 """
1217 lrc = ct.c_int(0)
-> 1218 mesh = _ESMF.ESMC_MeshCreateFromFile(filename, fileTypeFlag,
1219 convertToDual, addUserArea,
1220 meshname, maskFlag, varname,
1221 ct.byref(lrc))
1222 rc = lrc.value
1223 if rc != constants._ESMP_SUCCESS:
ArgumentError: argument 1: <class 'AttributeError'>: 'list' object has no attribute 'encode'

File load error: not enough storage available with 1.7TB storage free

I'm using the following code to load my files in NiFTI format in Python.
import nibabel as nib
img_arr = []
for i in range(len(datadir)):
img = nib.load(datadir[i])
img_data = img.get_fdata()
img_arr.append(img_data)
img.uncache()
A small amount of images works fine, but if I want to load more images, I get the following error:
OSError Traceback (most recent call last)
<ipython-input-55-f982811019c9> in <module>()
10 #img = nilearn.image.smooth_img(datadir[i],fwhm = 3) #Smoothing filter for preprocessing (necessary?)
11 img = nib.load(datadir[i])
---> 12 img_data = img.get_fdata()
13 img_arr.append(img_data)
14 img.uncache()
~\AppData\Roaming\Python\Python36\site-packages\nibabel\dataobj_images.py in get_fdata(self, caching, dtype)
346 if self._fdata_cache.dtype.type == dtype.type:
347 return self._fdata_cache
--> 348 data = np.asanyarray(self._dataobj).astype(dtype, copy=False)
349 if caching == 'fill':
350 self._fdata_cache = data
~\AppData\Roaming\Python\Python36\site-packages\numpy\core\_asarray.py in asanyarray(a, dtype, order)
136
137 """
--> 138 return array(a, dtype, copy=False, order=order, subok=True)
139
140
~\AppData\Roaming\Python\Python36\site-packages\nibabel\arrayproxy.py in __array__(self)
353 def __array__(self):
354 # Read array and scale
--> 355 raw_data = self.get_unscaled()
356 return apply_read_scaling(raw_data, self._slope, self._inter)
357
~\AppData\Roaming\Python\Python36\site-packages\nibabel\arrayproxy.py in get_unscaled(self)
348 offset=self._offset,
349 order=self.order,
--> 350 mmap=self._mmap)
351 return raw_data
352
~\AppData\Roaming\Python\Python36\site-packages\nibabel\volumeutils.py in array_from_file(shape, in_dtype, infile, offset, order, mmap)
507 shape=shape,
508 order=order,
--> 509 offset=offset)
510 # The error raised by memmap, for different file types, has
511 # changed in different incarnations of the numpy routine
~\AppData\Roaming\Python\Python36\site-packages\numpy\core\memmap.py in __new__(subtype, filename, dtype, mode, offset, shape, order)
262 bytes -= start
263 array_offset = offset - start
--> 264 mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
265
266 self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,
OSError: [WinError 8] Not enough storage is available to process this command
I thought that img.uncache() would delete the image from memory so it wouldn't take up too much storage but still being able to work with the image array. Adding this bit to the code didn't change anything though.
Does anyone know how I can help this? The computer I'm working on has 24 core 2,6 GHz CPU, more than 52 GB memory and the working directory has over 1.7 TB free storage. I'm trying to load around 1500 MRI images from the ADNI database.
Any suggestions are much appreciated.

This error is not being caused because the 1.7TB hard drive is filling up, it's because you're running out of memory, aka RAM. It's going to be important to understand how those two things differ.
uncache() does not remove an item from memory completely, as documented here, but that link also contains more memory saving tips.
If you want to remove an object from memory completely, you can use the Garbage Collector interface, like so:
import nibabel as nib
import gc
img_arr = []
for i in range(len(datadir)):
img = nib.load(datadir[i])
img_data = img.get_fdata()
img_arr.append(img_data)
img.uncache()
# Delete the img object and free the memory
del img
gc.collect()
That should help reduce the amount of memory you are using.

How to fix "not enough storage available.."?
Try to do these steps:
Press the Windows + R key at the same time on your keyboard, then type Regedit.exe in the Run window and click on OK.
Then Unfold HKEY_LOCAL_MACHINE, then SYSTEM, then CurrentControlSet, then services, then LanmanServer, then Parameters.
Locate IRPStackSize (If found skip to step 5), If it does not exist then right-click the right Window and choose New > Dword Value (32)
Now type IRPStackSize under the name, then hit enter.
Right-click IRPStackSize and click on Modify, then set any value higher then 15 but lower than 50 and click OK
Restart your system and try to repeat the same action as you did when the error occurred.
Or :
Set the following registry key HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargeSystemCache to value "1"
Set the following registry
HKLM\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters\Size to value "3"
Another way to saving memory in "nibabel" :
There are other ways to saving memory alongside to uncache() method, you can use :
The array proxy instead of get_fdata()
The caching keyword to get_fdata()

How to fix 'KeyError: dtype('float32')' in LDAviz

I use LDAvis library to visualize my LDA topics. It works fine before, but it gets me this error when I download the saved model files from Sagemaker to the local computer. I don't know why does this happen? Does that relate to Sagemaker?
If I run from the local, and saved the model from local, and then run LDAviz library, it works fine.
KeyError Traceback (most recent call last)
in ()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyLDAvis\gensim.py in prepare(topic_model, corpus, dictionary, doc_topic_dist, **kwargs)
116 See pyLDAvis.prepare for **kwargs.
117 """
--> 118 opts = fp.merge(_extract_data(topic_model, corpus, dictionary, doc_topic_dist), kwargs)
119 return vis_prepare(**opts)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyLDAvis\gensim.py in _extract_data(topic_model, corpus, dictionary, doc_topic_dists)
46 gamma = topic_model.inference(corpus)
47 else:
---> 48 gamma, _ = topic_model.inference(corpus)
49 doc_topic_dists = gamma / gamma.sum(axis=1)[:, None]
50 else:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\ldamodel.py in inference(self, chunk, collect_sstats)
665 # phinorm is the normalizer.
666 # TODO treat zeros explicitly, instead of adding epsilon?
--> 667 eps = DTYPE_TO_EPS[self.dtype]
668 phinorm = np.dot(expElogthetad, expElogbetad) + eps
669
KeyError: dtype('float32')

I know this is late but I just fixed a similar problem by updating my gensim library from 3.4 to the current version which for me is 3.8.

what reliable method to save huge numpy arrays

I saved some arrays using numpy.savez_compressed(). One of the arrays is gigantic, it has the shape (120000,7680), type float32.
Trying to load the array gave me the error below (message caught using Ipython).
Is seems like this is a Numpy limitation:
Numpy: apparent memory error
What are other ways to save such a huge array? (I had problems with cPickle as well)
In [5]: t=numpy.load('humongous.npz')
In [6]: humg = (t['arr_0.npy'])
/usr/lib/python2.7/dist-packages/numpy/lib/npyio.pyc in __getitem__(self, key)
229 if bytes.startswith(format.MAGIC_PREFIX):
230 value = BytesIO(bytes)
--> 231 return format.read_array(value)
232 else:
233 return bytes
/usr/lib/python2.7/dist-packages/numpy/lib/format.pyc in read_array(fp)
456 # way.
457 # XXX: we can probably chunk this to avoid the memory hit.
--> 458 data = fp.read(int(count * dtype.itemsize))
459 array = numpy.fromstring(data, dtype=dtype, count=count)
460
SystemError: error return without exception set
System: Ubuntu 12.04 64 bit, Python 2.7, numpy 1.6.1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: apparent memory error - python

Related

I have an error using OSMNX: OS error when using ox.graph_from_place

How to link netcdf as shared libraries during conda installation of esmpy?

File load error: not enough storage available with 1.7TB storage free

How to fix 'KeyError: dtype('float32')' in LDAviz

what reliable method to save huge numpy arrays

Categories

Resources