Problem with Dataloader object not subscriptable

Problem with Dataloader object not subscriptable - python

I am now running a Python program using Pytorch. I use my own dataset, not torch.data.dataset. I download data from a pickle file extracted from feature extraction. But the following errors appear:
Traceback (most recent call last):
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 326, in <module>
fire.Fire(demo)
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\fire\core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\fire\core.py", line 468, in _Fire
target=component.__name__)
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\fire\core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 304, in demo
train(model,train_set1, valid_set=valid_set, test_set=test1, save=save, n_epochs=n_epochs,batch_size=batch_size,seed=seed)
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 172, in train
n_epochs=n_epochs,
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 37, in train_epoch
loader=np.asarray(list(loader))
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
data = self._next_data()
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\dataset.py", line 257, in __getitem__
return self.dataset[self.indices[idx]]
TypeError: 'DataLoader' object is not subscriptable
The code is:
train_set1 = Owndata()
train1, test1 = train_set1 .get_splits()
# prepare data loaders
train_dl = torch.utils.data.DataLoader(train1, batch_size=32, shuffle=True)
test_dl =torch.utils.data.DataLoader(test1, batch_size=1024, shuffle=False)
test_set1 = Owndata()
'''print('test_set# ',test_set)'''
if valid_size:
valid_set = Owndata()
indices = torch.randperm(len(train_set1))
train_indices = indices[:len(indices) - valid_size]
valid_indices = indices[len(indices) - valid_size:]
train_set1 = torch.utils.data.Subset(train_dl, train_indices)
valid_set = torch.utils.data.Subset(valid_set, valid_indices)
else:
valid_set = None
model = DenseNet(
growth_rate=growth_rate,
block_config=block_config,
num_classes=10,
small_inputs=True,
efficient=efficient,
)
train(model,train_set1, valid_set=valid_set, test_set=test1, save=save, n_epochs=n_epochs, batch_size=batch_size, seed=seed)
Any help is appreciated! Thanks a lot in advance!!

It is not the line giving you an error as it's the very last train function you are not showing.
You are confusing two things:
torch.utils.data.Dataset object is indexable (dataset[5] works fine for example). It is a simple object which defines how to get a single (usually single) sample of data.
torch.utils.data.DataLoader - non-indexable, only iterable, usually returns batches of data from above Dataset. Can work in parallel using num_workers. It's what you are trying to index while you should use dataset for that.
Please see PyTorch documentation about data to get a better grasp on how those work.

Related

Fit generator with yield generator. Cannot Pickle 'generator' object

I have the following code:
def generator_train(x_train_df, y_train_df, batch_size):
for i in range(int(len(x_train_df) / batch_size)):
x_train = x_train_df[i * batch_size:(i + 1) * batch_size]
y_train = y_train_df[i * batch_size:(i + 1) * batch_size]
yield np.array(x_train), np.array(y_train)
train_generator = generator_train(x_train_df, y_train_df, batch_size)
history = model.fit(train_generator,
epochs=epochs_no,
steps_per_epoch=number_of_rows_input/batch_size,
verbose=1,
max_queue_size=100,
validation_data=None,
workers=8,
use_multiprocessing=True
)
The x_train_df, y_train_df are pandas.DataFrame both.
I'm still getting the following error referring to pickle. However, the fit_generator should have noting to do with dumping/loading pickled data.
Exception in thread Traceback (most recent call last):
Thread-2 File "<string>", line 1, in <module>
Traceback (most recent call last):
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\threading.py", line 954, in _bootstrap_inner
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 116, in spawn_main
self.run()
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\threading.py", line 892, in run
exitcode = _main(fd, parent_sentinel)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
self._target(*self._args, **self._kwargs)
File "E:\Tut\pythonProject5_MachineLearning\venv\lib\site-packages\keras\utils\data_utils.py", line 868, in _run
with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor:
File "E:\Tut\pythonProject5_MachineLearning\venv\lib\site-packages\keras\utils\data_utils.py", line 858, in pool_fn
pool = get_pool_class(True)(
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 212, in __init__
self._repopulate_pool()
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static
w.start()
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'generator' object
What am I missing?

One solution would be by using MirroredStrategy() for the neural network and the date should be preprocessed using the functions from tensorflow.data.Dataset
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = Sequential()
model.add(Dense.....
.....
model.compile(loss='mae', optimizer='sgd')
def dataset_fn(dummy_argument):
x = np.array(x_train_df).astype(np.float32)
y = np.array(y_train_df).astype(np.float32)
dataset = tf.data.Dataset.from_tensor_slices((x, y))
return dataset.repeat().batch(batch_size=batch_size, drop_remainder=True)
dist_dataset = strategy.experimental_distribute_datasets_from_function(dataset_fn)
history = model.fit(
dist_dataset,
epochs=epochs,
steps_per_epoch=number_of_batches_in_the_x_set,
verbose=1,
max_queue_size=max_queue_size,
validation_data=None,
workers=number_of_workers,
use_multiprocessing=True
)

You are pickling: because you're using multiprocessing, and multiprocessing needs to pickle anything it runs to send it to the new python processes. Since your train_generator is needed in each process, it will be sent, i.e. pickled.
As the linked question notes, avoid this by not using a generator: trivially, cast to list and evaluate before sending; but more sensibly, rewrite your generator to return the list for you.

pytorch cifar10 dataset - cannot get first item

I have selected the CIFAR 10 dataset using the torchvision library:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transforms.ToTensor())
Then I try to select the first item in the dataset, which as I understand implements the get_item method of the dataset class:
trainset[0]
and I get
File "env\lib\site-packages\torchvision\transforms\functional.py", line 129, in to_tensor
np.array(pic, mode_to_nptype.get(pic.mode, np.uint8), copy=True)
TypeError: __array__() takes 1 positional argument but 2 were given
Any ideas why I would get this error?
Python 3.7.9, torch==1.9.0, torchvision==0.10.0

I was hitting this error too:
def get_transformations():
return transforms.Compose([transforms.ToTensor()])
...
self.transforms = get_transformations()
...
# Load the image + augment
img = Image.open(img_path).convert("RGB")
img = self.transforms(img)
...
Original Traceback (most recent call last):
File "c:\2021-mcm-master\src\PyTorch-RCNN\ui-prediction\env\lib\site-packages\torch\utils\data\_utils\worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "c:\2021-mcm-master\src\PyTorch-RCNN\ui-prediction\env\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "c:\2021-mcm-master\src\PyTorch-RCNN\ui-prediction\env\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "c:\2021-mcm-master\src\PyTorch-RCNN\ui-prediction\src\screenshot_dataset.py", line 112, in __getitem__
img = self.transforms(img)
File "c:\2021-mcm-master\src\PyTorch-RCNN\ui-prediction\env\lib\site-packages\torchvision\transforms\transforms.py", line 60, in __call__
img = t(img)
File "c:\2021-mcm-master\src\PyTorch-RCNN\ui-prediction\env\lib\site-packages\torchvision\transforms\transforms.py", line 97, in __call__
return F.to_tensor(pic)
File "c:\2021-mcm-master\src\PyTorch-RCNN\ui-prediction\env\lib\site-packages\torchvision\transforms\functional.py", line 129, in to_tensor
np.array(pic, mode_to_nptype.get(pic.mode, np.uint8), copy=True)
TypeError: __array__() takes 1 positional argument but 2 were given
As #Phil suggested, downgrading Pillow from 8.3.0 to 8.2.0 solved the issue:
pip install pillow==8.2.0

How to solve AttributeError: 'str' object has no attribute 'ndim'?

I want to test my keras model. But I've faced that problem. I have an image for checking in the "path".
path = 'C:\\Users\\Администратор\\AppData\\Local\\Programs\\Python\\Python36-32\\577793008_ef4345205b.jpg'
model = keras.models.load_model('C:\\Users\\Администратор\\AppData\\Local\\Programs\\Python\\Python36-32\\model1.h5')
predictions = model.predict(path)
print (predictions[0])
Error.
Traceback (most recent call last):
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\load1.p
y", line 11, in <module>
predictions = model.predict(path)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\sit
e-packages\keras\engine\training.py", line 1441, in predict
x, _, _ = self._standardize_user_data(x)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\sit
e-packages\keras\engine\training.py", line 579, in _standardize_user_data
exception_prefix='input')
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\sit
e-packages\keras\engine\training_utils.py", line 99, in standardize_input_data
data = [standardize_single_array(x) for x in data]
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\sit
e-packages\keras\engine\training_utils.py", line 99, in <listcomp>
data = [standardize_single_array(x) for x in data]
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python36-32\lib\sit
e-packages\keras\engine\training_utils.py", line 34, in standardize_single_array
elif x.ndim == 1:
AttributeError: 'str' object has no attribute 'ndim'

The predict method can take several types of inputs, but not a string. It cannot directly read a file based on the path.
You need to transform this file into whatever can be read by the Model class. Read the file and make the content an array for instance.

How to use the HuggingFace transformers pipelines?

I'm trying to do a simple text classification project with Transformers, I want to use the pipeline feature added in the V2.3, but there is little to no documentation.
data = pd.read_csv("data.csv")
FLAUBERT_NAME = "flaubert-base-cased"
encoder = LabelEncoder()
target = encoder.fit_transform(data["category"])
y = target
X = data["text"]
model = FlaubertForSequenceClassification.from_pretrained(FLAUBERT_NAME)
tokenizer = FlaubertTokenizer.from_pretrained(FLAUBERT_NAME)
pipe = TextClassificationPipeline(model, tokenizer, device=-1) # device=-1 -> Use only CPU
print("Test #1: pipe('Bonjour le monde')=", pipe(['Bonjour le monde']))
Traceback (most recent call last):
File "C:/Users/PLHT09191/Documents/work/dev/Classif_Annonces/src/classif_annonce.py", line 33, in <module>
model = FlaubertForSequenceClassification.from_pretrained(FLAUBERT_NAME)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_utils.py", line 463, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_flaubert.py", line 343, in __init__
super(FlaubertForSequenceClassification, self).__init__(config)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 733, in __init__
self.transformer = XLMModel(config)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 382, in __init__
self.ffns.append(TransformerFFN(self.dim, self.hidden_dim, self.dim, config=config))
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 203, in __init__
self.lin2 = nn.Linear(dim_hidden, out_dim)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\torch\nn\modules\linear.py", line 72, in __init__
self.weight = Parameter(torch.Tensor(out_features, in_features))
RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 9437184 bytes. Buy new RAM!
Process finished with exit code 1
How can I use my pipeline with my X and y data?

How to fix type error using tensorflow data API for rejection sampling

I am using the tensorflow data API to try and do some rejection sampling for my unbalanced data set.
I have run the code on my personal computer and it seems to work as I expect it to, however, when I run the code on my University's cluster I get a type error that I can't seem to understand. I have tried recasting and I get the same error.
I am still learning how to use this API and I'm still not 100% clear on if this is the best way to achieve what I want, so I also welcome any advice on how I implemented the rejection sampling (this could very well be the reason why I get error since I don't fully understand yet).
This is how I am loading in the data to the dataset:
data = np.loadtxt("my_data.dat")
features = data[:, 1:10]
labels = data[:, 0]
labels[labels == -1] = 0
assert features.shape[0] == labels.shape[0]
dataset_size = len(features)
dataset = tf.data.Dataset.from_tensor_slices((features.astype('float32'),
labels.astype('int32')))
dataset = dataset.shuffle(buffer_size=dataset_size)
the error occurs when I read here:
train_size = int((2/3.0)*dataset_size)
tr_dataset = dataset.take(train_size)
tr_dataset = (tr_dataset.apply(
tf.contrib.data.rejection_resample(
class_func=lambda _, c: c, target_dist=[0.5, 0.5],
seed=42)).map(lambda a, b: b)).batch(100)
This is the error:
Traceback (most recent call last):
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1094, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 931, in _TensorTensorConversionFunction
(dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype int32 for Tensor with dtype int64: 'Tensor("Sum:0", shape=(2,), dtype=int64)'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 185, in <module>
seed=42))).batch(100)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1109, in apply
dataset = transformation_func(self)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/contrib/data/python/ops/resampling.py", line 74, in _apply_fn
target_dist_t, class_values_ds)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/contrib/data/python/ops/resampling.py", line 183, in _estimate_initial_dist_ds
update_estimate_and_tile))
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1109, in apply
dataset = transformation_func(self)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/contrib/data/python/ops/scan_ops.py", line 172, in _apply_fn
return _ScanDataset(dataset, initial_state, scan_func)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/contrib/data/python/ops/scan_ops.py", line 74, in __init__
add_to_graph=False)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1459, in __init__
self._function._create_definition_if_needed() # pylint: disable=protected-access
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/framework/function.py", line 337, in _create_definition_if_needed
self._create_definition_if_needed_impl()
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/framework/function.py", line 346, in _create_definition_if_needed_impl
self._capture_by_value, self._caller_device)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/framework/function.py", line 863, in func_graph_from_py_func
outputs = func(*func_graph.inputs)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1392, in tf_data_structured_function_wrapper
ret = func(*nested_args)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/contrib/data/python/ops/resampling.py", line 176, in update_estimate_and_tile
c, num_examples_per_class_seen)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/contrib/data/python/ops/resampling.py", line 212, in _estimate_data_distribution
array_ops.one_hot(c, num_classes, dtype=dtypes.int64), 0))
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 297, in add
"Add", x=x, y=y, name=name)
File "/home/user/.conda/envs/tensorflowcpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 546, in _apply_op_helper
inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Add' Op has type int64 that does not match type int32 of argument 'x'.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problem with Dataloader object not subscriptable - python

Related

Fit generator with yield generator. Cannot Pickle 'generator' object

pytorch cifar10 dataset - cannot get first item

How to solve AttributeError: 'str' object has no attribute 'ndim'?

How to use the HuggingFace transformers pipelines?

How to fix type error using tensorflow data API for rejection sampling

Categories

Resources