I'm currently implementing federated learning using tff.
Because the dataset is very large, we split it into many npy files, and I'm currently putting the dataset together using tff.simulation.FilePerUserClientData.
This is what I'm trying to do
client_ids_to_files = dict()
for i in range(len(train_filepaths)):
client_ids_to_files[str(i)] = train_filepaths[i]
def dataset_fn(filepath):
print(filepath)
dataSample = np.load(filepath)
label = filepath[:-4].strip().split('_')[-1]
return tf.data.Dataset.from_tensor_slices((dataSample, label))
train_filePerClient = tff.simulation.FilePerUserClientData(client_ids_to_files,dataset_fn)
However, it doesn't seem to work well, the filepath in the callback function has is a tensor with dtype of string. The value of filepath is: Tensor("hash_table_Lookup/LookupTableFindV2:0", shape=(), dtype=string)
Instead of containing a path in client_ids_to_files, the tensor seems to contains error messages? Am I doing something wrong? How can I write a proper dataset_fn for tff.simulation.FilePerUserClientData using npy files?
EDIT:
Here is the error log. The error itself is not really related to the question I'm asking, but you can find the called functions:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-46-e61ddbe06cdb> in <module>
22 return tf.data.Dataset.from_tensor_slices(filepath)
23
---> 24 train_filePerClient = tff.simulation.FilePerUserClientData(client_ids_to_files,dataset_fn)
25
~/fasttext-venv/lib/python3.6/site-packages/tensorflow_federated/python/simulation/file_per_user_client_data.py in __init__(self, client_ids_to_files, dataset_fn)
52 return dataset_fn(client_ids_to_files[client_id])
53
---> 54 #computations.tf_computation(tf.string)
55 def dataset_computation(client_id):
56 client_ids_to_path = tf.lookup.StaticHashTable(
~/fasttext-venv/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/wrappers/computation_wrapper.py in __call__(self, tff_internal_types, *args)
405 parameter_type)
406 args, kwargs = unpack_arguments_fn(next(wrapped_fn_generator))
--> 407 result = fn_to_wrap(*args, **kwargs)
408 if result is None:
409 raise ComputationReturnedNoneError(fn_to_wrap)
~/fasttext-venv/lib/python3.6/site-packages/tensorflow_federated/python/simulation/file_per_user_client_data.py in dataset_computation(client_id)
59 list(client_ids_to_files.values())), '')
60 client_path = client_ids_to_path.lookup(client_id)
---> 61 return dataset_fn(client_path)
62
63 self._create_tf_dataset_fn = create_dataset_for_filename_fn
<ipython-input-46-e61ddbe06cdb> in dataset_fn(filepath)
17 filepath = tf.print(filepath)
18 print(filepath)
---> 19 dataSample = np.load(filepath)
20 print(dataSample)
21 label = filepath[:-4].strip().split('_')[-1]
~/fasttext-venv/lib/python3.6/site-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
426 own_fid = False
427 else:
--> 428 fid = open(os_fspath(file), "rb")
429 own_fid = True
430
TypeError: expected str, bytes or os.PathLike object, not Operation
The problem is the dataset_fn must be serializable as a tf.Graph. This is required because TFF uses TensorFlow graphs to execute logic on remote machines.
In this case, np.load is not serializable to a graph operation. It looks like numpy is used to load from disk in to memory, and then tf.data.Dataset.from_tensor_slices is used to create a dataset from an in-memory object? I may be possible to save the file in a different format and use a native tf.data.Dataset operation to load from disk, rather than using Python. Some options could be tf.data.TFRecordDataset, tf.data.TextLineDataset, or tf.data.experimental.SqlDataset.
Related
I installed esmpy using conda install -c conda-forge esmpy but am unable to get it to create a mesh from an existing Netcdf file using something like this:
mesh = ESMF.Mesh(filename=myfile.nc),filetype=ESMF.FileFormat.ESMFMESH)
My input file is the output from the CAM-SE global atmospheric model at ne120 resolution. This model returns unstructured output. I get an error message saying that Netcdf should be built as a shared library. I know the Netcdf libraries exist because I use xarray all the time to read and process them. But how does one link those libraries during the installation step for esmpy using conda? Do I need to build esmpy from source to be able to do this?
Update: Have added the full traceback below. I installed netcdf using conda-forge within the conda environment I was using and it appears that was not the source of the error as the error message remains unchanged. I am now wondering if my command for generating mesh is just going to work after directly feeding in a CAM-SE output file. The file does not really have any information about the number of elements, etc. Also, should I rename the dimensions to some common form expected by ESMF? How will ESMF know which dimension represents the number of nodes, etc.? Here is the list of dimensions from one of the output files (followed by the traceback):
dimensions:
time = UNLIMITED ; // (292 currently)
lev = 30 ;
ilev = 31 ;
ncol = 777602 ;
nbnd = 2 ;
chars = 8 ;
string1 = 1 ;
----------Traceback------------------------
ArgumentError Traceback (most recent call last)
Input In [13], in <cell line: 5>()
----> 5 grid = ESMF.Mesh(filename='B1850.ne120_t12.cam.h2.0338-01-01-21600.nc',
filetype=ESMF.FileFormat.ESMFMESH)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/util/decorators.py:81, in initialize.<locals>.new_func(*args, **kwargs)
78 from ESMF.api import esmpymanager
80 esmp = esmpymanager.Manager(debug = False)
---> 81 return func(*args, **kwargs)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/api/mesh.py:198, in Mesh.__init__(self, parametric_dim, spatial_dim, coord_sys, filename, filetype, convert_to_dual, add_user_area, meshname, mask_flag, varname)
195 self._coord_sys = coord_sys
196 else:
197 # call into ctypes layer
--> 198 self._struct = ESMP_MeshCreateFromFile(filename, filetype,
199 convert_to_dual,
200 add_user_area, meshname,
201 mask_flag, varname)
202 # get the sizes
203 self._size[node] = ESMP_MeshGetLocalNodeCount(self)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/util/decorators.py:93, in netcdf.<locals>.new_func(*args, **kwargs)
90 from ESMF.api.constants import _ESMF_NETCDF
92 if _ESMF_NETCDF:
---> 93 return func(*args, **kwargs)
94 else:
95 raise NetCDFMissing("This function requires ESMF to have been built with NetCDF.")
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/interface/cbindings.py:1218, in ESMP_MeshCreateFromFile(filename, fileTypeFlag, convertToDual, addUserArea, meshname, maskFlag, varname)
1197 """
1198 Preconditions: ESMP has been initialized.\n
1199 Postconditions: An ESMP_Mesh has been created.\n
(...)
1215 string (optional) :: varname\n
1216 """
1217 lrc = ct.c_int(0)
-> 1218 mesh = _ESMF.ESMC_MeshCreateFromFile(filename, fileTypeFlag,
1219 convertToDual, addUserArea,
1220 meshname, maskFlag, varname,
1221 ct.byref(lrc))
1222 rc = lrc.value
1223 if rc != constants._ESMP_SUCCESS:
ArgumentError: argument 1: <class 'AttributeError'>: 'list' object has no attribute 'encode'
I am trying to create an embedding to use for a matching technique of words but I get the following error:
Traceback (most recent call last)
/var/folders/k1/jt1nfyks4cx689d50f5mtg0w0000gp/T/ipykernel_1349/3490519318.py in <module>
53 #Compute embedding for both lists
54
---> 55 embeddings1 = model.encode(fifteen_percent_list, convert_to_tensor=True)
56
57
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py in encode(self, sentences, batch_size, show_progress_bar, output_value, convert_to_numpy, convert_to_tensor, device, normalize_embeddings)
185
186 if convert_to_tensor:
--> 187 all_embeddings = torch.stack(all_embeddings)
188 elif convert_to_numpy:
189 all_embeddings = np.asarray([emb.numpy() for emb in all_embeddings])
RuntimeError: stack expects a non-empty TensorList
I do not seem to understand why it happens since my second embedding(2) goes through just fine without any errors?
Here is some of the code if that helps:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')
fifteen_percent_list = list(fiften_percent)
#Compute embedding for both lists
embeddings1 = model.encode(fifteen_percent_list, convert_to_tensor=True)
# try on a smaller set of 10k, as it takes too long to run on full set of queries
rest_of_queries_list = list(set(rest_of_queries))[:10000]
embeddings2 = model.encode(rest_of_queries_list, convert_to_tensor=True)
I'm using the following code to load my files in NiFTI format in Python.
import nibabel as nib
img_arr = []
for i in range(len(datadir)):
img = nib.load(datadir[i])
img_data = img.get_fdata()
img_arr.append(img_data)
img.uncache()
A small amount of images works fine, but if I want to load more images, I get the following error:
OSError Traceback (most recent call last)
<ipython-input-55-f982811019c9> in <module>()
10 #img = nilearn.image.smooth_img(datadir[i],fwhm = 3) #Smoothing filter for preprocessing (necessary?)
11 img = nib.load(datadir[i])
---> 12 img_data = img.get_fdata()
13 img_arr.append(img_data)
14 img.uncache()
~\AppData\Roaming\Python\Python36\site-packages\nibabel\dataobj_images.py in get_fdata(self, caching, dtype)
346 if self._fdata_cache.dtype.type == dtype.type:
347 return self._fdata_cache
--> 348 data = np.asanyarray(self._dataobj).astype(dtype, copy=False)
349 if caching == 'fill':
350 self._fdata_cache = data
~\AppData\Roaming\Python\Python36\site-packages\numpy\core\_asarray.py in asanyarray(a, dtype, order)
136
137 """
--> 138 return array(a, dtype, copy=False, order=order, subok=True)
139
140
~\AppData\Roaming\Python\Python36\site-packages\nibabel\arrayproxy.py in __array__(self)
353 def __array__(self):
354 # Read array and scale
--> 355 raw_data = self.get_unscaled()
356 return apply_read_scaling(raw_data, self._slope, self._inter)
357
~\AppData\Roaming\Python\Python36\site-packages\nibabel\arrayproxy.py in get_unscaled(self)
348 offset=self._offset,
349 order=self.order,
--> 350 mmap=self._mmap)
351 return raw_data
352
~\AppData\Roaming\Python\Python36\site-packages\nibabel\volumeutils.py in array_from_file(shape, in_dtype, infile, offset, order, mmap)
507 shape=shape,
508 order=order,
--> 509 offset=offset)
510 # The error raised by memmap, for different file types, has
511 # changed in different incarnations of the numpy routine
~\AppData\Roaming\Python\Python36\site-packages\numpy\core\memmap.py in __new__(subtype, filename, dtype, mode, offset, shape, order)
262 bytes -= start
263 array_offset = offset - start
--> 264 mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
265
266 self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,
OSError: [WinError 8] Not enough storage is available to process this command
I thought that img.uncache() would delete the image from memory so it wouldn't take up too much storage but still being able to work with the image array. Adding this bit to the code didn't change anything though.
Does anyone know how I can help this? The computer I'm working on has 24 core 2,6 GHz CPU, more than 52 GB memory and the working directory has over 1.7 TB free storage. I'm trying to load around 1500 MRI images from the ADNI database.
Any suggestions are much appreciated.
This error is not being caused because the 1.7TB hard drive is filling up, it's because you're running out of memory, aka RAM. It's going to be important to understand how those two things differ.
uncache() does not remove an item from memory completely, as documented here, but that link also contains more memory saving tips.
If you want to remove an object from memory completely, you can use the Garbage Collector interface, like so:
import nibabel as nib
import gc
img_arr = []
for i in range(len(datadir)):
img = nib.load(datadir[i])
img_data = img.get_fdata()
img_arr.append(img_data)
img.uncache()
# Delete the img object and free the memory
del img
gc.collect()
That should help reduce the amount of memory you are using.
How to fix "not enough storage available.."?
Try to do these steps:
Press the Windows + R key at the same time on your keyboard, then type Regedit.exe in the Run window and click on OK.
Then Unfold HKEY_LOCAL_MACHINE, then SYSTEM, then CurrentControlSet, then services, then LanmanServer, then Parameters.
Locate IRPStackSize (If found skip to step 5), If it does not exist then right-click the right Window and choose New > Dword Value (32)
Now type IRPStackSize under the name, then hit enter.
Right-click IRPStackSize and click on Modify, then set any value higher then 15 but lower than 50 and click OK
Restart your system and try to repeat the same action as you did when the error occurred.
Or :
Set the following registry key HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargeSystemCache to value "1"
Set the following registry
HKLM\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters\Size to value "3"
Another way to saving memory in "nibabel" :
There are other ways to saving memory alongside to uncache() method, you can use :
The array proxy instead of get_fdata()
The caching keyword to get_fdata()
I use LDAvis library to visualize my LDA topics. It works fine before, but it gets me this error when I download the saved model files from Sagemaker to the local computer. I don't know why does this happen? Does that relate to Sagemaker?
If I run from the local, and saved the model from local, and then run LDAviz library, it works fine.
KeyError Traceback (most recent call last)
in ()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyLDAvis\gensim.py in prepare(topic_model, corpus, dictionary, doc_topic_dist, **kwargs)
116 See pyLDAvis.prepare for **kwargs.
117 """
--> 118 opts = fp.merge(_extract_data(topic_model, corpus, dictionary, doc_topic_dist), kwargs)
119 return vis_prepare(**opts)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyLDAvis\gensim.py in _extract_data(topic_model, corpus, dictionary, doc_topic_dists)
46 gamma = topic_model.inference(corpus)
47 else:
---> 48 gamma, _ = topic_model.inference(corpus)
49 doc_topic_dists = gamma / gamma.sum(axis=1)[:, None]
50 else:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\ldamodel.py in inference(self, chunk, collect_sstats)
665 # phinorm is the normalizer.
666 # TODO treat zeros explicitly, instead of adding epsilon?
--> 667 eps = DTYPE_TO_EPS[self.dtype]
668 phinorm = np.dot(expElogthetad, expElogbetad) + eps
669
KeyError: dtype('float32')
I know this is late but I just fixed a similar problem by updating my gensim library from 3.4 to the current version which for me is 3.8.
I am studying Google Crash ML cause.
I have trouble in chapter “Feature Cross”.
https://developers.google.com/machine-learning/crash-course/feature-crosses/programming-exercise
I tried to get the weight of cross feature from linear_regressor.
# here I change _ to linear_model
linear_model = train_model(
learning_rate=1.0,
steps=500,
batch_size=100,
feature_columns=construct_feature_columns(),
training_examples=training_examples,
training_targets=training_targets,
validation_examples=validation_examples,
validation_targets=validation_targets)
Weight_bucketized_longitude= linear_model.get_variable_value('linear/linear_model/bucketized_longitude/weights')
print(Weight_bucketized_longitude)
However, I got error message as below:
Error Message:
NotFoundError: Key linear/linear_model/bucketized_longitude/weights
not found in checkpoint
It looks like the path is wrong.
The path works for numeric_column, but it doesn’t for bucketized_column.
Could you help to indicate the correct path?
Thanks.
#
I tried Geeocode's method.
However, I still got error message.
Weight_bucketized_longitude= linear_model.get_variable_value('linear/linear_model/bucketized_longitude/weights')
AttributeErrorTraceback (most recent call last)
in ()
----> 1 Weight_bucketized_longitude= >linear_model.get_variable_value(["linear", "linear_model", >"bucketized_longitude", "weights"])
/usr/local/lib/python2.7/dist->packages/tensorflow/python/estimator/estimator.pyc in >get_variable_value(self, name)
252 _check_checkpoint_available(self.model_dir)
253 with context.graph_mode():
--> 254 return training.load_variable(self.model_dir, name)
255
256 def get_variable_names(self):
/usr/local/lib/python2.7/dist->packages/tensorflow/python/training/checkpoint_utils.pyc in >load_variable(ckpt_dir_or_file, name)
77 """
78 # TODO(b/29227106): Fix this in the right place and remove >this.
---> 79 if name.endswith(":0"):
80 name = name[:-2]
81 reader = load_checkpoint(ckpt_dir_or_file)
AttributeError: 'list' object has no attribute 'endswith'
The problem is that linear_model.get_variable_value() have to pass a list of string with variables' name. From the documentation:
get_variable_value
get_variable_value(name)
Returns value of the variable given by name.
Args: name: string or a list of string, name of the tensor. Returns:
Numpy array - value of the tensor.
Raises: ValueError: If the Estimator has not produced a checkpoint
yet.
Thus your code should changes as follow:
Weight_bucketized_longitude= linear_model.get_variable_value(["linear", "linear_model", "bucketized_longitude", "weights"])