File is not showing when applying rasterio.open() - python

Here is my code
refPath = '/Users/admin/Downloads/Landsat8/'
ext = '_NDWI.tif'
for file in sorted(os.listdir(refPath)):
if file.endswith(ext):
print(file)
ndwiopen = rs.open(file)
ndwiread = ndwiopen.read(1)
Here is the error
2014_NDWI.tif
---------------------------------------------------------------------------
CPLE_OpenFailedError Traceback (most recent call last)
File rasterio/_base.pyx:302, in rasterio._base.DatasetBase.__init__()
File rasterio/_base.pyx:213, in rasterio._base.open_dataset()
File rasterio/_err.pyx:217, in rasterio._err.exc_wrap_pointer()
CPLE_OpenFailedError: 2014_NDWI.tif: No such file or directory
During handling of the above exception, another exception occurred:
RasterioIOError Traceback (most recent call last)
Input In [104], in <cell line: 33>()
34 if file.endswith(ext):
35 print(file)
---> 36 ndwiopen = rs.open(file)
38 ndwiread = ndwiopen.read(1)
39 plt.figure(figsize = (20, 15))
File /Applications/anaconda3/lib/python3.9/site-packages/rasterio/env.py:442, in ensure_env_with_credentials.<locals>.wrapper(*args, **kwds)
439 session = DummySession()
441 with env_ctor(session=session):
--> 442 return f(*args, **kwds)
File /Applications/anaconda3/lib/python3.9/site-packages/rasterio/__init__.py:277, in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
274 path = _parse_path(raw_dataset_path)
276 if mode == "r":
--> 277 dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
278 elif mode == "r+":
279 dataset = get_writer_for_path(path, driver=driver)(
280 path, mode, driver=driver, sharing=sharing, **kwargs
281 )
File rasterio/_base.pyx:304, in rasterio._base.DatasetBase.__init__()
RasterioIOError: 2014_NDWI.tif: No such file or directory
As it is shown that the file is getting printed as an output but that can not be opened by the RasterIO (as rs).
Can't understand what is missing in the script.

Unsure if this is your exact problem, but I rammed my head against this same exact error for 5-10 hours before I realized that the '.tif' file I was trying to read had an extension in all caps, as in '.TIF'. This is apparently the default for the Landsat 8 image bands that I was working with.
I was doing similar concatenation but my string would result in 'filename.tif' instead of the correct 'filename.TIF', so rasterio would be unable to read it. It was really frustrating, so I figured I would share how I was able to solve it since you have not yet received any replies, even though I cannot know if this was your issue. When I searched this error, this post is one of the first and most similar that would pop up but was unanswered, so I thought I would post in case any one with my issue might stumble across it as well (or, for myself when I inevitably forget in a few months how I had solved this).

Related

How to link netcdf as shared libraries during conda installation of esmpy?

I installed esmpy using conda install -c conda-forge esmpy but am unable to get it to create a mesh from an existing Netcdf file using something like this:
mesh = ESMF.Mesh(filename=myfile.nc),filetype=ESMF.FileFormat.ESMFMESH)
My input file is the output from the CAM-SE global atmospheric model at ne120 resolution. This model returns unstructured output. I get an error message saying that Netcdf should be built as a shared library. I know the Netcdf libraries exist because I use xarray all the time to read and process them. But how does one link those libraries during the installation step for esmpy using conda? Do I need to build esmpy from source to be able to do this?
Update: Have added the full traceback below. I installed netcdf using conda-forge within the conda environment I was using and it appears that was not the source of the error as the error message remains unchanged. I am now wondering if my command for generating mesh is just going to work after directly feeding in a CAM-SE output file. The file does not really have any information about the number of elements, etc. Also, should I rename the dimensions to some common form expected by ESMF? How will ESMF know which dimension represents the number of nodes, etc.? Here is the list of dimensions from one of the output files (followed by the traceback):
dimensions:
time = UNLIMITED ; // (292 currently)
lev = 30 ;
ilev = 31 ;
ncol = 777602 ;
nbnd = 2 ;
chars = 8 ;
string1 = 1 ;
----------Traceback------------------------
ArgumentError Traceback (most recent call last)
Input In [13], in <cell line: 5>()
----> 5 grid = ESMF.Mesh(filename='B1850.ne120_t12.cam.h2.0338-01-01-21600.nc',
filetype=ESMF.FileFormat.ESMFMESH)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/util/decorators.py:81, in initialize.<locals>.new_func(*args, **kwargs)
78 from ESMF.api import esmpymanager
80 esmp = esmpymanager.Manager(debug = False)
---> 81 return func(*args, **kwargs)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/api/mesh.py:198, in Mesh.__init__(self, parametric_dim, spatial_dim, coord_sys, filename, filetype, convert_to_dual, add_user_area, meshname, mask_flag, varname)
195 self._coord_sys = coord_sys
196 else:
197 # call into ctypes layer
--> 198 self._struct = ESMP_MeshCreateFromFile(filename, filetype,
199 convert_to_dual,
200 add_user_area, meshname,
201 mask_flag, varname)
202 # get the sizes
203 self._size[node] = ESMP_MeshGetLocalNodeCount(self)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/util/decorators.py:93, in netcdf.<locals>.new_func(*args, **kwargs)
90 from ESMF.api.constants import _ESMF_NETCDF
92 if _ESMF_NETCDF:
---> 93 return func(*args, **kwargs)
94 else:
95 raise NetCDFMissing("This function requires ESMF to have been built with NetCDF.")
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/interface/cbindings.py:1218, in ESMP_MeshCreateFromFile(filename, fileTypeFlag, convertToDual, addUserArea, meshname, maskFlag, varname)
1197 """
1198 Preconditions: ESMP has been initialized.\n
1199 Postconditions: An ESMP_Mesh has been created.\n
(...)
1215 string (optional) :: varname\n
1216 """
1217 lrc = ct.c_int(0)
-> 1218 mesh = _ESMF.ESMC_MeshCreateFromFile(filename, fileTypeFlag,
1219 convertToDual, addUserArea,
1220 meshname, maskFlag, varname,
1221 ct.byref(lrc))
1222 rc = lrc.value
1223 if rc != constants._ESMP_SUCCESS:
ArgumentError: argument 1: <class 'AttributeError'>: 'list' object has no attribute 'encode'

How to write a proper dataset_fn in tff.simulation.FilePerUserClientData?

I'm currently implementing federated learning using tff.
Because the dataset is very large, we split it into many npy files, and I'm currently putting the dataset together using tff.simulation.FilePerUserClientData.
This is what I'm trying to do
client_ids_to_files = dict()
for i in range(len(train_filepaths)):
client_ids_to_files[str(i)] = train_filepaths[i]
def dataset_fn(filepath):
print(filepath)
dataSample = np.load(filepath)
label = filepath[:-4].strip().split('_')[-1]
return tf.data.Dataset.from_tensor_slices((dataSample, label))
train_filePerClient = tff.simulation.FilePerUserClientData(client_ids_to_files,dataset_fn)
However, it doesn't seem to work well, the filepath in the callback function has is a tensor with dtype of string. The value of filepath is: Tensor("hash_table_Lookup/LookupTableFindV2:0", shape=(), dtype=string)
Instead of containing a path in client_ids_to_files, the tensor seems to contains error messages? Am I doing something wrong? How can I write a proper dataset_fn for tff.simulation.FilePerUserClientData using npy files?
EDIT:
Here is the error log. The error itself is not really related to the question I'm asking, but you can find the called functions:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-46-e61ddbe06cdb> in <module>
22 return tf.data.Dataset.from_tensor_slices(filepath)
23
---> 24 train_filePerClient = tff.simulation.FilePerUserClientData(client_ids_to_files,dataset_fn)
25
~/fasttext-venv/lib/python3.6/site-packages/tensorflow_federated/python/simulation/file_per_user_client_data.py in __init__(self, client_ids_to_files, dataset_fn)
52 return dataset_fn(client_ids_to_files[client_id])
53
---> 54 #computations.tf_computation(tf.string)
55 def dataset_computation(client_id):
56 client_ids_to_path = tf.lookup.StaticHashTable(
~/fasttext-venv/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/wrappers/computation_wrapper.py in __call__(self, tff_internal_types, *args)
405 parameter_type)
406 args, kwargs = unpack_arguments_fn(next(wrapped_fn_generator))
--> 407 result = fn_to_wrap(*args, **kwargs)
408 if result is None:
409 raise ComputationReturnedNoneError(fn_to_wrap)
~/fasttext-venv/lib/python3.6/site-packages/tensorflow_federated/python/simulation/file_per_user_client_data.py in dataset_computation(client_id)
59 list(client_ids_to_files.values())), '')
60 client_path = client_ids_to_path.lookup(client_id)
---> 61 return dataset_fn(client_path)
62
63 self._create_tf_dataset_fn = create_dataset_for_filename_fn
<ipython-input-46-e61ddbe06cdb> in dataset_fn(filepath)
17 filepath = tf.print(filepath)
18 print(filepath)
---> 19 dataSample = np.load(filepath)
20 print(dataSample)
21 label = filepath[:-4].strip().split('_')[-1]
~/fasttext-venv/lib/python3.6/site-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
426 own_fid = False
427 else:
--> 428 fid = open(os_fspath(file), "rb")
429 own_fid = True
430
TypeError: expected str, bytes or os.PathLike object, not Operation
The problem is the dataset_fn must be serializable as a tf.Graph. This is required because TFF uses TensorFlow graphs to execute logic on remote machines.
In this case, np.load is not serializable to a graph operation. It looks like numpy is used to load from disk in to memory, and then tf.data.Dataset.from_tensor_slices is used to create a dataset from an in-memory object? I may be possible to save the file in a different format and use a native tf.data.Dataset operation to load from disk, rather than using Python. Some options could be tf.data.TFRecordDataset, tf.data.TextLineDataset, or tf.data.experimental.SqlDataset.

How to fix 'KeyError: dtype('float32')' in LDAviz

I use LDAvis library to visualize my LDA topics. It works fine before, but it gets me this error when I download the saved model files from Sagemaker to the local computer. I don't know why does this happen? Does that relate to Sagemaker?
If I run from the local, and saved the model from local, and then run LDAviz library, it works fine.
KeyError Traceback (most recent call last)
in ()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyLDAvis\gensim.py in prepare(topic_model, corpus, dictionary, doc_topic_dist, **kwargs)
116 See pyLDAvis.prepare for **kwargs.
117 """
--> 118 opts = fp.merge(_extract_data(topic_model, corpus, dictionary, doc_topic_dist), kwargs)
119 return vis_prepare(**opts)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyLDAvis\gensim.py in _extract_data(topic_model, corpus, dictionary, doc_topic_dists)
46 gamma = topic_model.inference(corpus)
47 else:
---> 48 gamma, _ = topic_model.inference(corpus)
49 doc_topic_dists = gamma / gamma.sum(axis=1)[:, None]
50 else:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\ldamodel.py in inference(self, chunk, collect_sstats)
665 # phinorm is the normalizer.
666 # TODO treat zeros explicitly, instead of adding epsilon?
--> 667 eps = DTYPE_TO_EPS[self.dtype]
668 phinorm = np.dot(expElogthetad, expElogbetad) + eps
669
KeyError: dtype('float32')
I know this is late but I just fixed a similar problem by updating my gensim library from 3.4 to the current version which for me is 3.8.

Error in reading and writing files in Python

I am trying to convert files from one format to other in Python. The current format is DAQ (data acquisition format), which is read in first. Then I use undaq Tools module to write the files to hdf5 format.
import glob
ctnames = glob.glob('*.daq')
Following are the few filenames (there are 100 in total):
ctnames
['Cars_20160601_01.daq',
'Cars_20160601_02.daq',
'Cars_20160601_03.daq',
'Cars_20160601_04.daq',
'Cars_20160601_05.daq',
'Cars_20160601_06.daq',
'Cars_20160601_07.daq',
.
.
.
## Importing undaq tools:
from undaqTools import Daq
Reading the DAQ files and writing to hdf5:
for n in ctnames:
x = daq.read(n)
daq.write_hd5(x)
Following is the error I got:
C:\Anaconda3\envs\py27\lib\site-packages\undaqtools-0.2.3-py2.7.egg\undaqTools\daq.py:405: RuntimeWarning: Failed loading file on frame 46970. (stopped reading file)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-17-6fe7a8c9496d> in <module>()
1 for n in ctnames:
----> 2 x = daq.read(n)
3 daq.write_hd5(x)
C:\Anaconda3\envs\py27\lib\site-packages\undaqtools-0.2.3-py2.7.egg\undaqTools\daq.pyc in read_daq(self, filename, elemlist, loaddata, process_dynobjs, interpolate_missing_frames)
272
273 if loaddata:
--> 274 self._loaddata()
275 self._unwrap_lane_deviation()
276
C:\Anaconda3\envs\py27\lib\site-packages\undaqtools-0.2.3-py2.7.egg\undaqTools\daq.pyc in _loaddata(self)
449 assert tmpdata[name].shape[0] == frame.frame.shape[0]
450 else:
--> 451 assert tmpdata[name].shape[1] == frame.frame.shape[0]
452
453 # cast as Element objects
AssertionError:
Questions
I have 2 questions:
1. How do I know which of the 100 files is throwing the error?
2. How do I skip the files if they throw the error?
Wrap the read() call in a try/except block. If you get an exception, print the current filename and skip to the next one.
for n in ctnames:
try:
x = daq.read(n)
except AssertionError:
print 'Could not process file %s. Skipping.' % n
continue
daq.write_hd5(x)

MemoryError when using np.median()

I have two files that I am reading data from, doing calculations, and plotting a graph with. One file is quite small ~50 KB and raises no problem with the script. The other file is ~ 702, 900 KB (this is the file that causes the problem). I am able to read in the data perfectly fine, though when I calculate the row-by-row medians for this particular file, the script fails and gives me a MemoryError. It looks like the following:
RMSDataS1 = [y01S1, y02S1, y03S1, y04S1, y05S1, y06S1, y07S1, y08S1, y09S1,
y010S1, y011S1, y012S1, y013S1, y014S1, y015S1, y016S1, y017S1,
y018S1, y019S1, y020S1, y021S1, y022S1, y023S1, y024S1, y025S1,
y026S1, y027S1, y028S1, y029S1, y030S1, y031S1, y032S1, y033S1,
y034S1, y035S1, y036S1, y037S1, y038S1, y039S1, y040S1, y041S1,
y042S1, y043S1, y044S1, y045S1, y046S1, y047S1, y048S1, y049S1,
y050S1, y051S1, y052S1, y053S1, y054S1, y055S1, y056S1, y057S1,
y058S1, y059S1, y060S1, y061S1, y062S1, y063S1, y064S1, y065S1,
y066S1, y067S1, y068S1, y069S1, y070S1, y071S1, y072S1, y073S1,
y074S1, y075S1, y076S1, y077S1, y078S1, y079S1, y080S1, y081S1,
y082S1, y083S1, y084S1, y085S1, y086S1, y087S1, y088S1, y089S1,
y090S1, y091S1, y092S1, y093S1, y094S1, y095S1, y096S1, y097S1,
y098S1, y099S1, y0100S1, y0101S1, y0102S1, y0103S1, y0104S1, y0105S1,
y0106S1, y0107S1, y0108S1]
MediansS1 = []
MediansS1 = np.median(RMSDataS1, axis = 0)
Is there any convenient way to get around this? I believe the script is failing when trying to sort the data when calculating the medians.
The error:
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\xy\RMSTrialOriginal-Aera.py", line 511, in <module>
MediansS1 = np.average(RMSDataS1, axis = 0)
File "C:\Python27\lib\site-packages\numpy\lib\function_base.py", line 486, in average
a = np.asarray(a)
File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 235, in asarray
return array(a, dtype, copy=False, order=order)
MemoryError
Any help would be greatly appreciated!

Categories