PyFITS: File already exists - python

I'm really close to completing a large code, but the final segment of it seems to be failing and I don't know why. What I'm trying to do here is take an image-array, compare it to a different image array, and wherever the initial image array equals 1, I want to mask that portion out in the second image array. However, I'm getting a strange error:
Code:
maskimg='omask'+str(inimgs)[5:16]+'.fits'
newmaskimg=pf.getdata(maskimg)
oimg=pf.getdata(inimgs)
for i in range (newmaskimg.shape[0]):
for j in range (newmaskimg.shape[1]):
if newmaskimg[i,j]==1:
oimg[i,j]=0
pf.writeto('newestmask'+str(inimgs)[5:16]+'.fits',newmaskimg)
Error:
/home/vidur/se_files/fetch_swarp10.py in objmask(inimgs, inwhts, thresh1, thresh2, tfdel, xceng, yceng, outdir, tmpdir)
122 if newmaskimg[i,j]==1:
123 oimg[i,j]=0
--> 124 pf.writeto('newestmask'+str(inimgs)[5:16]+'.fits',newmaskimg)
125
126
/usr/local/lib/python2.7/dist-packages/pyfits/convenience.pyc in writeto(filename, data, header, output_verify, clobber, checksum)
396 hdu = PrimaryHDU(data, header=header)
397 hdu.writeto(filename, clobber=clobber, output_verify=output_verify,
--> 398 checksum=checksum)
399
400
/usr/local/lib/python2.7/dist-packages/pyfits/hdu/base.pyc in writeto(self, name, output_verify, clobber, checksum)
348 hdulist = HDUList([self])
349 hdulist.writeto(name, output_verify, clobber=clobber,
--> 350 checksum=checksum)
351
352 def _get_raw_data(self, shape, code, offset):
/usr/local/lib/python2.7/dist-packages/pyfits/hdu/hdulist.pyc in writeto(self, fileobj, output_verify, clobber, checksum)
651 os.remove(filename)
652 else:
--> 653 raise IOError("File '%s' already exists." % filename)
654 elif (hasattr(fileobj, 'len') and fileobj.len > 0):
655 if clobber:
IOError: File 'newestmaskPHOTOf105w0.fits' already exists.

If you don't care about overwriting the existing file, pyfits.writeto accepts a clobber argument to automatically overwrite existing files (it will still output a warning):
pyfits.writeto(..., clobber=True)
As an aside, let me be very emphatic that the code you posted above is very much not the right way to use Numpy. The loop in your code can be written in one line and will be orders of magnitude faster. For example, one of many possibilities is to write it like this:
oimg[newmaskimg == 1] = 0

Yes, add clobber = True. I've used this in my codes before and it works just fine. Or, simply go and sudo rm path/to/file and get rid of them so you can run it again.

I had the same issue and as it turns out using the argument clobber still works but won't be supported in future versions of AstroPy.
The argument overwrite does the same thing and doesn't put out an error message.

Related

File load error: not enough storage available with 1.7TB storage free

I'm using the following code to load my files in NiFTI format in Python.
import nibabel as nib
img_arr = []
for i in range(len(datadir)):
img = nib.load(datadir[i])
img_data = img.get_fdata()
img_arr.append(img_data)
img.uncache()
A small amount of images works fine, but if I want to load more images, I get the following error:
OSError Traceback (most recent call last)
<ipython-input-55-f982811019c9> in <module>()
10 #img = nilearn.image.smooth_img(datadir[i],fwhm = 3) #Smoothing filter for preprocessing (necessary?)
11 img = nib.load(datadir[i])
---> 12 img_data = img.get_fdata()
13 img_arr.append(img_data)
14 img.uncache()
~\AppData\Roaming\Python\Python36\site-packages\nibabel\dataobj_images.py in get_fdata(self, caching, dtype)
346 if self._fdata_cache.dtype.type == dtype.type:
347 return self._fdata_cache
--> 348 data = np.asanyarray(self._dataobj).astype(dtype, copy=False)
349 if caching == 'fill':
350 self._fdata_cache = data
~\AppData\Roaming\Python\Python36\site-packages\numpy\core\_asarray.py in asanyarray(a, dtype, order)
136
137 """
--> 138 return array(a, dtype, copy=False, order=order, subok=True)
139
140
~\AppData\Roaming\Python\Python36\site-packages\nibabel\arrayproxy.py in __array__(self)
353 def __array__(self):
354 # Read array and scale
--> 355 raw_data = self.get_unscaled()
356 return apply_read_scaling(raw_data, self._slope, self._inter)
357
~\AppData\Roaming\Python\Python36\site-packages\nibabel\arrayproxy.py in get_unscaled(self)
348 offset=self._offset,
349 order=self.order,
--> 350 mmap=self._mmap)
351 return raw_data
352
~\AppData\Roaming\Python\Python36\site-packages\nibabel\volumeutils.py in array_from_file(shape, in_dtype, infile, offset, order, mmap)
507 shape=shape,
508 order=order,
--> 509 offset=offset)
510 # The error raised by memmap, for different file types, has
511 # changed in different incarnations of the numpy routine
~\AppData\Roaming\Python\Python36\site-packages\numpy\core\memmap.py in __new__(subtype, filename, dtype, mode, offset, shape, order)
262 bytes -= start
263 array_offset = offset - start
--> 264 mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
265
266 self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,
OSError: [WinError 8] Not enough storage is available to process this command
I thought that img.uncache() would delete the image from memory so it wouldn't take up too much storage but still being able to work with the image array. Adding this bit to the code didn't change anything though.
Does anyone know how I can help this? The computer I'm working on has 24 core 2,6 GHz CPU, more than 52 GB memory and the working directory has over 1.7 TB free storage. I'm trying to load around 1500 MRI images from the ADNI database.
Any suggestions are much appreciated.
This error is not being caused because the 1.7TB hard drive is filling up, it's because you're running out of memory, aka RAM. It's going to be important to understand how those two things differ.
uncache() does not remove an item from memory completely, as documented here, but that link also contains more memory saving tips.
If you want to remove an object from memory completely, you can use the Garbage Collector interface, like so:
import nibabel as nib
import gc
img_arr = []
for i in range(len(datadir)):
img = nib.load(datadir[i])
img_data = img.get_fdata()
img_arr.append(img_data)
img.uncache()
# Delete the img object and free the memory
del img
gc.collect()
That should help reduce the amount of memory you are using.
How to fix "not enough storage available.."?
Try to do these steps:
Press the Windows + R key at the same time on your keyboard, then type Regedit.exe in the Run window and click on OK.
Then Unfold HKEY_LOCAL_MACHINE, then SYSTEM, then CurrentControlSet, then services, then LanmanServer, then Parameters.
Locate IRPStackSize (If found skip to step 5), If it does not exist then right-click the right Window and choose New > Dword Value (32)
Now type IRPStackSize under the name, then hit enter.
Right-click IRPStackSize and click on Modify, then set any value higher then 15 but lower than 50 and click OK
Restart your system and try to repeat the same action as you did when the error occurred.
Or :
Set the following registry key HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargeSystemCache to value "1"
Set the following registry
HKLM\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters\Size to value "3"
Another way to saving memory in "nibabel" :
There are other ways to saving memory alongside to uncache() method, you can use :
The array proxy instead of get_fdata()
The caching keyword to get_fdata()

Why does numba not work with this nested function?

I reported the described bug here: https://github.com/numba/numba/issues/3095 , if anyone is interested in the solution of the problem.
I am trying to precompile a minimzation running on 3D time series data with numba. As a first step, I wanted to define a cost function, but this already fails. Here is my code:
from numba import jit
#jit
def tester(axis,data):
def lineCost(pars):
A=pars[0]
B=pars[1]
return np.sum((A*axis+B - data)**2)
return lineCost([axis,data])
tester(1,2)
This yields a "Not implemented" error:
~/.local/lib/python3.5/site-packages/numba/lowering.py in lower(self)
171 if self.generator_info is None:
172 self.genlower = None
--> 173 self.lower_normal_function(self.fndesc)
174 else:
175 self.genlower = self.GeneratorLower(self)
~/.local/lib/python3.5/site-packages/numba/lowering.py in lower_normal_function(self, fndesc)
212 # Init argument values
213 self.extract_function_arguments()
--> 214 entry_block_tail = self.lower_function_body()
215
216 # Close tail of entry block
~/.local/lib/python3.5/site-packages/numba/lowering.py in lower_function_body(self)
237 bb = self.blkmap[offset]
238 self.builder.position_at_end(bb)
--> 239 self.lower_block(block)
240
241 self.post_lower()
~/.local/lib/python3.5/site-packages/numba/lowering.py in lower_block(self, block)
252 with new_error_context('lowering "{inst}" at {loc}', inst=inst,
253 loc=self.loc, errcls_=defaulterrcls):
--> 254 self.lower_inst(inst)
255
256 def create_cpython_wrapper(self, release_gil=False):
/usr/lib/python3.5/contextlib.py in __exit__(self, type, value, traceback)
75 value = type()
76 try:
---> 77 self.gen.throw(type, value, traceback)
78 raise RuntimeError("generator didn't stop after throw()")
79 except StopIteration as exc:
~/.local/lib/python3.5/site-packages/numba/errors.py in new_error_context(fmt_, *args, **kwargs)
583 from numba import config
584 tb = sys.exc_info()[2] if config.FULL_TRACEBACKS else None
--> 585 six.reraise(type(newerr), newerr, tb)
586
587
~/.local/lib/python3.5/site-packages/numba/six.py in reraise(tp, value, tb)
657 if value.__traceback__ is not tb:
658 raise value.with_traceback(tb)
--> 659 raise value
660
661 else:
LoweringError: Failed at object (object mode backend)
make_function(closure=$0.3, defaults=None, name=$const0.5, code=<code object lineCost at 0x7fd7ada3b810, file "<ipython-input-59-ef6835d3b147>", line 3>)
File "<ipython-input-59-ef6835d3b147>", line 3:
def tester(axis,data):
def lineCost(pars):
^
[1] During: lowering "$0.6 = make_function(closure=$0.3, defaults=None, name=$const0.5, code=<code object lineCost at 0x7fd7ada3b810, file "<ipython-input-59-ef6835d3b147>", line 3>)" at <ipython-input-59-ef6835d3b147> (3)
-------------------------------------------------------------------------------
This should not have happened, a problem has occurred in Numba's internals.
Please report the error message and traceback, along with a minimal reproducer
at: https://github.com/numba/numba/issues/new
If more help is needed please feel free to speak to the Numba core developers
directly at: https://gitter.im/numba/numba
Thanks in advance for your help in improving Numba!
Could you help me in understanding which part of the code causes problems for numba? That would be of very big help. Thank you!
Best,
Malte
Avoid global variables (data and axis are global in lineCost), avoid functions which contains functions and avoid lists ([axis,data]).
Working example
from numba import jit
#jit
def lineCost(axis,data):
return np.sum((axis*axis+data - data)**2)
#jit
def tester(axis,data):
return lineCost(axis,data)
tester(1,2)
Most of this things should work in the latest release, but using the latest features which often contains some bugs or some details which aren't supported, isn't recommendable.
Even if this would work, it wouldn't suprise me much, if the performance is less than expected.
Actually, it seems this was a bug that was removed in the newest release :)
https://github.com/numba/numba/issues/3095

How to handle BigTable Scan InvalidChunk exceptions?

I am trying to scan BigTable data where some rows are 'dirty' - but this fails depending on the scan, causing (serialization?) InvalidChunk exceptions.
the code is as follows:
from google.cloud import bigtable
from google.cloud import happybase
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
connection = happybase.Connection(instance=instance)
table = connection.table(table_name)
for key, row in table.scan(limit=5000): #BOOM!
pass
leaving out some columns or limiting the rows to less or specifying the start and stop keys, allows the scan to succeed.
I cannot detect which values are problematic from the stacktrace - it varies across columns - the scan just fails. This makes it problematic to clean the data at source.
When I leverage the python debugger, I see that the chunk (which is of type google.bigtable.v2.bigtable_pb2.CellChunk) has no value (it is NULL / undefined):
ipdb> pp chunk.value
b''
ipdb> chunk.value_size
0
I can confirm this with the HBase shell from the rowkey ( i got from self._row.row_key)
So the question becomes: How can a BigTable scan filter-out columns which have undefined / empty / null value ?
I get the same problem from both google cloud APIs that return generators which internally stream data as chunks over gRPC:
google.cloud.happybase.table.Table# scan()
google.cloud.bigtable.table.Table# read_rows().consume_all()
the abbreviated stacktrace is as follows:
---------------------------------------------------------------------------
InvalidChunk Traceback (most recent call last)
<ipython-input-48-922c8127f43b> in <module>()
1 row_gen = table.scan(limit=n)
2 rows = []
----> 3 for kvp in row_gen:
4 pass
.../site-packages/google/cloud/happybase/table.py in scan(self, row_start, row_stop, row_prefix, columns, timestamp, include_timestamp, limit, **kwargs)
391 while True:
392 try:
--> 393 partial_rows_data.consume_next()
394 for row_key in sorted(rows_dict):
395 curr_row_data = rows_dict.pop(row_key)
.../site-packages/google/cloud/bigtable/row_data.py in consume_next(self)
273 for chunk in response.chunks:
274
--> 275 self._validate_chunk(chunk)
276
277 if chunk.reset_row:
.../site-packages/google/cloud/bigtable/row_data.py in _validate_chunk(self, chunk)
388 self._validate_chunk_new_row(chunk)
389 if self.state == self.ROW_IN_PROGRESS:
--> 390 self._validate_chunk_row_in_progress(chunk)
391 if self.state == self.CELL_IN_PROGRESS:
392 self._validate_chunk_cell_in_progress(chunk)
.../site-packages/google/cloud/bigtable/row_data.py in _validate_chunk_row_in_progress(self, chunk)
368 self._validate_chunk_status(chunk)
369 if not chunk.HasField('commit_row') and not chunk.reset_row:
--> 370 _raise_if(not chunk.timestamp_micros or not chunk.value)
371 _raise_if(chunk.row_key and
372 chunk.row_key != self._row.row_key)
.../site-packages/google/cloud/bigtable/row_data.py in _raise_if(predicate, *args)
439 """Helper for validation methods."""
440 if predicate:
--> 441 raise InvalidChunk(*args)
InvalidChunk:
Can you show me how to scan BigTable from Python, ignoring / logging dirty rows that raise InvalidChunk?
(try ... except wont work around the generator,which is in the google cloud API row_data PartialRowsData class)
Also, can you show me code to chunk stream a table scan in BigTable?
HappyBase batch_size & scan_batching don't seem to be supported.
This was likely due to this bug: https://github.com/googleapis/google-cloud-python/issues/2980
The bug has been fixed, so this should no longer be an issue.

what reliable method to save huge numpy arrays

I saved some arrays using numpy.savez_compressed(). One of the arrays is gigantic, it has the shape (120000,7680), type float32.
Trying to load the array gave me the error below (message caught using Ipython).
Is seems like this is a Numpy limitation:
Numpy: apparent memory error
What are other ways to save such a huge array? (I had problems with cPickle as well)
In [5]: t=numpy.load('humongous.npz')
In [6]: humg = (t['arr_0.npy'])
/usr/lib/python2.7/dist-packages/numpy/lib/npyio.pyc in __getitem__(self, key)
229 if bytes.startswith(format.MAGIC_PREFIX):
230 value = BytesIO(bytes)
--> 231 return format.read_array(value)
232 else:
233 return bytes
/usr/lib/python2.7/dist-packages/numpy/lib/format.pyc in read_array(fp)
456 # way.
457 # XXX: we can probably chunk this to avoid the memory hit.
--> 458 data = fp.read(int(count * dtype.itemsize))
459 array = numpy.fromstring(data, dtype=dtype, count=count)
460
SystemError: error return without exception set
System: Ubuntu 12.04 64 bit, Python 2.7, numpy 1.6.1

Why the source code of conjugate in Numpy cannot be found by using the inspect module?

I want to see the implementation of the conjugate function used in Numpy. Then I tried the following:
import numpy as np
import inspect
inspect.getsource(np.conjugate)
However, I received the following error message stating that <ufunc 'conjugate'> is not a module, class, method, function, traceback, frame, or code object. May someone answer why?
In [8]: inspect.getsource(np.conjugate)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-821ecfb71e08> in <module>()
----> 1 inspect.getsource(np.conj)
/Users/duanlx/anaconda/python.app/Contents/lib/python2.7/inspect.pyc in getsource(object)
699 or code object. The source code is returned as a single string. An
700 IOError is raised if the source code cannot be retrieved."""
--> 701 lines, lnum = getsourcelines(object)
702 return string.join(lines, '')
703
/Users/duanlx/anaconda/python.app/Contents/lib/python2.7/inspect.pyc in getsourcelines(object)
688 original source file the first line of code was found. An IOError is
689 raised if the source code cannot be retrieved."""
--> 690 lines, lnum = findsource(object)
691
692 if ismodule(object): return lines, 0
/Users/duanlx/anaconda/lib/python2.7/site-packages/IPython/core/ultratb.pyc in findsource(object)
149 FIXED version with which we monkeypatch the stdlib to work around a bug."""
150
--> 151 file = getsourcefile(object) or getfile(object)
152 # If the object is a frame, then trying to get the globals dict from its
153 # module won't work. Instead, the frame object itself has the globals
/Users/duanlx/anaconda/python.app/Contents/lib/python2.7/inspect.pyc in getsourcefile(object)
442 Return None if no way can be identified to get the source.
443 """
--> 444 filename = getfile(object)
445 if string.lower(filename[-4:]) in ('.pyc', '.pyo'):
446 filename = filename[:-4] + '.py'
/Users/duanlx/anaconda/python.app/Contents/lib/python2.7/inspect.pyc in getfile(object)
418 return object.co_filename
419 raise TypeError('{!r} is not a module, class, method, '
--> 420 'function, traceback, frame, or code object'.format(object))
421
422 ModuleInfo = namedtuple('ModuleInfo', 'name suffix mode module_type')
TypeError: <ufunc 'conjugate'> is not a module, class, method, function, traceback, frame, or code object
Thanks!
Numpy is written in C, for speed. You can only see the source of Python functions.

Categories