AttributeError: 'numpy.float64' object has no attribute 'split' - python

I'm following a YT tutorial, and I feel I've copied the code exactly, but keep getting this error:
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Users\Connor\Anaconda3\lib\site-packages\torch\utils\data\_utils\worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\Connor\Anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\Connor\Anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\Connor\OneDrive\Python\Kaggle\Facial Keypoint Detecetion\dataset.py", line 21, in __getitem__
image = np.array(self.data.iloc[index, 30].split()).astype(np.float32)
AttributeError: 'numpy.float64' object has no attribute 'split'
Below is the full code:
class FacialKeypointDataset(Dataset):
def __init__(self, csv_file, train=True, transform=None):
super().__init__()
self.data = pd.read_csv(csv_file)
self.category_names = ['left_eye_center_x', 'left_eye_center_y', 'right_eye_center_x', 'right_eye_center_y', 'left_eye_inner_corner_x', 'left_eye_inner_corner_y', 'left_eye_outer_corner_x', 'left_eye_outer_corner_y', 'right_eye_inner_corner_x', 'right_eye_inner_corner_y', 'right_eye_outer_corner_x', 'right_eye_outer_corner_y', 'left_eyebrow_inner_end_x', 'left_eyebrow_inner_end_y', 'left_eyebrow_outer_end_x', 'left_eyebrow_outer_end_y', 'right_eyebrow_inner_end_x', 'right_eyebrow_inner_end_y', 'right_eyebrow_outer_end_x', 'right_eyebrow_outer_end_y', 'nose_tip_x', 'nose_tip_y', 'mouth_left_corner_x', 'mouth_left_corner_y', 'mouth_right_corner_x', 'mouth_right_corner_y', 'mouth_center_top_lip_x', 'mouth_center_top_lip_y', 'mouth_center_bottom_lip_x', 'mouth_center_bottom_lip_y']
self.transform = transform
self.train = train
def __len__(self):
return self.data.shape[0]
def __getitem__(self, index):
if self.train:
image = np.array(self.data.iloc[index, 30].split()).astype(np.float32)
labels = np.array(self.data.iloc[index, :30].tolist())
labels[np.isnan(labels)] = -1
else:
image = np.array(self.data.iloc[index, 1].split()).astype(np.float32)
labels = np.zeros(30)
ignore_indices = labels == -1
labels = labels.reshape(15, 2)
if self.transform:
image = np.repeat(image.reshape(96, 96, 1), 3, 2).astype(np.uint8)
augmentations = self.transform(image=image, keypoints=labels)
image = augmentations["image"]
labels = augmentations["keypoints"]
labels = np.array(labels).reshape(-1)
labels[ignore_indices] = -1
return image, labels.astype(np.float32)
if __name__ == "__main__":
ds = FacialKeypointDataset(csv_file="data/train_4.csv", train=True, transform=config.train_transforms)
loader = DataLoader(ds, batch_size=1, shuffle=True, num_workers=0)
for idx, (x, y) in enumerate(loader):
plt.imshow(x[0][0].detach().cpu().numpy(), cmap='gray')
plt.plot(y[0][0::2].detach().cpu().numpy(), y[0][1::2].detach().cpu().numpy(), "go")
plt.show()
In the tutorial it has the same lines of code, but no error. Here is the link to the Gituhub:
https://github.com/aladdinpersson/Machine-Learning-Collection/tree/master/ML/Kaggles/Facial%20Keypoint%20Detection%20Competition
Any ideas what might be causing this?

The code seems to expect that the value returned by self.data.iloc[index, 30] will always be a string.
That might be ok for the project you are basing your code on, but if you pass a csv file that has floats instead of strings it will result on the error that you got.

Convert the data to float with .astype(np.float32) first. For example,
self.data = pd.read_csv(csv_file)
self.data = self.data.astype(np.float32)
Or
self.data = pd.read_csv(csv_file, dtype=np.float64)
If you got error, it means your csv_file has data type like strings, and this program cannot be used on your input data.

Related

TypeError: Caught TypeError in DataLoader worker process 0 ,TypeError: __call__() takes 2 positional arguments but 3 were given

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT")
dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
data_loader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True, num_workers=2, collate_fn=collate_fn)
# For Training
images,targets = next(iter(data_loader))
images = list(image for image in images)
targets = [{k: v for k, v in t.items()} for t in targets]
output = model(images,targets) # Returns losses and detections
# For inference
model.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model(x) # Returns predictions
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "<ipython-input-14-8e3b7d164f81>", line 70, in __getitem__
img, target = self.transforms(img, target)
TypeError: __call__() takes 2 positional arguments but 3 were given
I did code for instance segementation. and get it as the type error can you help me regarding this error.
Thanks!

'tuple' object is not callable

I am trying to convert any .png images with a transparent background to a white background.
however I am getting an error that says tuple object is not callable.
I have tried this:
def transparent_to_white(img):
color = (255, 255, 255)
for x in range(img.size()):
for y in range(img.size()):
r, g, b, a = img.getpixel((x, y))
if a == 0:
img.putpixel((x, y), color)
return img
but I get this error:
Original Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/gdrive/My Drive/All_Deep_Learning/PythonCustomLibraries/pix2pixdatasetlib.py", line 49, in __getitem__
y_label = self.resize(transparent_to_white(y_label))
File "/content/gdrive/My Drive/All_Deep_Learning/PythonCustomLibraries/pix2pixdatasetlib.py", line 33, in transparent_to_white
for x in range(img.size()):
TypeError: 'tuple' object is not callable
I am called it in my dataset class :
class Pix2PixDataset(Dataset):
def __init__(self, data_points, transforms = None):
self.data_points = data_points
self.transforms = transforms
self.resize = T.Resize((512,512))
def __getitem__(self, index) :
image, y_label = process_images(self.data_points[index].reference_image, self.data_points[index].drawing )
image = self.resize(image)
y_label = self.resize(transparent_to_white(y_label))
if self.transforms:
image = self.transforms(image)
y_label = self.transforms(y_label)
return(image, y_label)
def __len__(self):
return len(self.data_points)
I tried removing the open and close parenthesis but that did not help, I still get the same error
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/gdrive/My Drive/All_Deep_Learning/PythonCustomLibraries/pix2pixdatasetlib.py", line 49, in __getitem__
y_label = self.resize(transparent_to_white(y_label))
File "/content/gdrive/My Drive/All_Deep_Learning/PythonCustomLibraries/pix2pixdatasetlib.py", line 33, in transparent_to_white
for x in range(img.size()):
TypeError: 'tuple' object is not callable
Disclaimer: I'm assuming img is an instance of Image class, from module PIL or it's fork Pillow
img.size is a tuple. For example, if you do:
print(img.size)
It prints a tuple with (width, height).
So, your code could be
def transparent_to_white(img):
color = (255, 255, 255)
width, height = img.size # unpacking width/height beforehand
for x in range(width): # using unpacked values in range
for y in range(height)): # same as above
r, g, b, a = img.getpixel((x, y))
if a == 0:
img.putpixel((x, y), color)
return img
Or, alternatively, you could store x and y into a tuple of coordinates, to simplify passing it around:
def transparent_to_white(img):
color = (255, 255, 255)
width, height = img.size # unpacking width/height beforehand
for x in range(width): # using unpacked values in range
for y in range(height)): # same as above
coords = (x, y) # tuple of coordinates
r, g, b, a = img.getpixel(coords) # used here
if a == 0:
img.putpixel(coords, color) # and here
return img

Why iter(dataloader) would stuck and dont stop? (AttributeError for __main__ related)

I am trying to do some revision about machine learning. This is some tutorial exercise I did before. It dont have problem at that time. However, now it takes forever to run dataiter = iter(train_loader). It just stuck and would not stop. It doesn't have problem at google colab which run Python 3.7.15 but I am in Python 3.8.11.
Here is the size of my dataset
Length of Dataset is 1470
Full: 1470
Train: 940
Valid: 236
Test: 294
import multiprocessing as mp
bs = 32
# num_cpu = 2
num_cpu = mp.cpu_count()
train_loader = DataLoader(train, batch_size=bs, shuffle=True, num_workers=num_cpu, pin_memory=True)
valid_loader = DataLoader(valid, batch_size=bs, shuffle=False, num_workers=num_cpu, pin_memory=True)
test_loader = DataLoader(test, batch_size=bs, shuffle=False, num_workers=num_cpu, pin_memory=True)
After that I run dataiter = iter(train_loader)
These is some error msg
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/kelvin/opt/anaconda3/envs/torch-gpu/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Users/kelvin/opt/anaconda3/envs/torch-gpu/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'HrDataset' on <module '__main__' (built-in)>
AttributeError
class HrDataset(Dataset):
def __init__(self, file_path):
print('HrDataset is loading {}'.format(file_path))
df = pd.read_csv(file_path)
self.df = df
self.df = self.preprocessing(df)
print("Preprocessing is completed")
print('Length of HrDataset is {}'.format(len(self.df)))
def __getitem__(self, idx):
X = np.array(self.df.iloc[idx, 1:]).astype(np.float32)
y = self.df.iloc[idx, 0]
return X, y
def __len__(self):
return len(df)
def preprocessing(self, df):
for col in df.columns:
if df.dtypes[col] == 'object':
df[col] = df[col].fillna('NA')
df[col] = df[col].astype('category')
if len(df[col].cat.categories) > 2:
df = pd.get_dummies(df, columns=[col])
else:
df[col] = LabelEncoder().fit_transform(df[col])
else:
df[col] = df[col].fillna(0)
return df
I finally found out what is the problem. I defined "HrDataset" in an IPYNB file. It seems that it can't find the object I define for the handling of python multiprocessing. I define the object in a file HrDataset.py and then call it by from HrDataset import HrDataset to fix this problem.
Ref: Multiprocessing example giving AttributeError

How to use a pre-trained language model correctly?

I'm trying to use a Huggingface pretrained model "GPT2dialog" as a encoder for sentences,But the textindexer confused me.
In detail ,I can run a unittest for dataset_reader with a pretrained indexer normally,when use the train command to train the model caused a Bug:
File "/home/lee/anaconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/common/lazy.py", line 54, in constructor_to_use
return constructor.from_params(Params({}), **kwargs) # type: ignore[union-attr]
File "/home/lee/anaconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/common/from_params.py", line 604, in from_params
**extras,
File "/home/lee/anaconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/common/from_params.py", line 634, in from_params
return constructor_to_call(**kwargs) # type: ignore
File "/home/lee/anaconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/data/vocabulary.py", line 310, in from_instances
instance.count_vocab_items(namespace_token_counts)
File "/home/lee/anaconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/data/instance.py", line 60, in count_vocab_items
field.count_vocab_items(counter)
File "/home/lee/anaconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/data/fields/text_field.py", line 78, in count_vocab_items
for indexer in self.token_indexers.values():
AttributeError: 'PretrainedTransformerIndexer' object has no attribute 'values'
Here is my dataset_reader code.
class MultiWozDatasetReader(DatasetReader):
def __init__(self,
lazy:bool = False,
tokenizer: Tokenizer = None,
tokenindexer:Dict[str, TokenIndexer] = None
) -> None:
super().__init__(lazy)
self._tokenizer = tokenizer or WhitespaceTokenizer()
self._tokenindexer = PretrainedTransformerIndexer("microsoft/DialoGPT-small")
#overrides
def read(self, file_path: str):
logger.warn("call read")
with open(file_path, 'r') as data_file:
dialogs = json.load(data_file)
for dialog in dialogs:
dialogue = dialog["dialogue"]
for turn_num in range(len(dialogue)):
dia_single_turn = dialogue[turn_num]
sys_utt = dia_single_turn["system_transcript"]
user_utt = dia_single_turn["transcript"]
state_category = dia_single_turn["state_category"]
span_info = dia_single_turn["span"]
yield self.text_to_instance(sys_utt, user_utt, state_category, span_info)
#overrides
def text_to_instance(self, sys_utt, user_utt, state_catgory, span_info):
tokenized_sys_utt = self._tokenizer.tokenize(sys_utt)
tokenized_user_utt = self._tokenizer.tokenize(user_utt)
tokenized_span_info = self._tokenizer.tokenize(span_info)
tokenized_classifier_input = self._tokenizer.tokenize("[CLS] "+ sys_utt + " [SEP] "+ user_utt)
sys_utt_field = TextField(tokenized_sys_utt, self._tokenindexer)
user_utt_field = TextField(tokenized_user_utt, self._tokenindexer)
classifier_filed = TextField(tokenized_classifier_input, self._tokenindexer)
span_field = TextField(tokenized_span_info, self._tokenindexer)
fields = {"sys_utt": sys_utt_field,"user_utt":user_utt_field,"classifier_input":classifier_filed,"span":span_field}
fields['label']=LabelField(state_catgory)
return Instance(fields)
I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this.
The token_indexer needs to be a dictionary. It can be set as follows:
self._token_indexers = {"tokens": PretrainedTransformerIndexer("microsoft/DialoGPT-small")}

Python TypeError: 'float' object cannot be interpreted as an index

The following code uses audio files to create a matrix of features in tensorflow:
import tensorflow as tf
directory = "audio_dataset/*.wav"
filenames = tf.train.match_filenames_once(directory)
init = (tf.global_variables_initializer(), tf.local_variables_initializer())
count_num_files = tf.size(filenames)
filename_queue = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
filename, file_contents = reader.read(filename_queue)
with tf.Session() as sess:
sess.run(init)
num_files = sess.run(count_num_files)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(num_files):
audio_file = sess.run(filename)
print(audio_file)
this is a toolkit that converts audio from time to frequency domain:
from bregman.suite import *
chromo = tf.placeholder(tf.float32)
max_freqs = tf.argmax(chromo, 0)
def get_next_chromogram(sess):
audio_file = sess.run(filename)
F = Chromagram(audio_file, nfft=16384, wfft=8192, nhop=2205)
return F.X
def extract_feature_vector(sess, chromo_data):
num_features, num_samples = np.shape(chromo_data)
freq_vals = sess.run(max_freqs, feed_dict={chromo: chromo_data})
hist, bins = np.histogram(freq_vals, bins=range(num_features + 1))
return hist.astype(float) / num_samples
def get_dataset(sess):
num_files = sess.run(count_num_files)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
xs = []
for _ in range(num_files):
chromo_data = get_next_chromogram(sess)
x = [extract_feature_vector(sess, chromo_data)]
x = np.matrix(x)
if len(xs) == 0:
xs = x
else:
xs = np.vstack((xs, x))
return xs
this clusters the data around two centroids:
k = 2
max_iterations = 100
def initial_cluster_centroids(X, k):
return X[0:k, :]
def assign_cluster(X, centroids):
expanded_vectors = tf.expand_dims(X, 0)
expanded_centroids = tf.expand_dims(centroids, 1)
distances = tf.reduce_sum(tf.square(tf.subtract(expanded_vectors, expanded_centroids)), 2)
mins = tf.argmin(distances, 0)
return mins
def recompute_centroids(X, Y):
sums = tf.unsorted_segment_sum(X, Y, k)
counts = tf.unsorted_segment_sum(tf.ones_like(X), Y, k)
return sums / counts
with tf.Session() as sess:
sess.run(init)
X = get_dataset(sess)
centroids = initial_cluster_centroids(X, k)
i, converged = 0, False
while not converged and i < max_iterations:
i += 1
Y = assign_cluster(X, centroids)
centroids = sess.run(recompute_centroids(X, Y))
print(centroids)
but Im getting the following traceback:
Traceback (most recent call last):
File "components.py", line 776, in <module>
X = get_dataset(sess)
File "ccomponents.py", line 745, in get_dataset
chromo_data = get_next_chromogram(sess)
File "coffee_components.py", line 728, in get_next_chromogram
F = Chromagram(audio_file, nfft=16384, wfft=8192, nhop=2205)
File "/Volumes/Dados/Documents/Education/Programming/Machine Learning/Manning/book/BregmanToolkit-master/bregman/features.py", line 143, in __init__
Features.__init__(self, arg, feature_params)
File "/Volumes/Dados/Documents/Education/Programming/Machine Learning/Manning/book/BregmanToolkit-master/bregman/features_base.py", line 70, in __init__
self.extract()
File "/Volumes/Dados/Documents/Education/Programming/Machine Learning/Manning/book/BregmanToolkit-master/bregman/features_base.py", line 213, in extract
self.extract_funs.get(f, self._extract_error)()
File "/Volumes/Dados/Documents/Education/Programming/Machine Learning/Manning/book/BregmanToolkit-master/bregman/features_base.py", line 711, in _chroma
if not self._cqft():
File "/Volumes/Dados/Documents/Education/Programming/Machine Learning/Manning/book/BregmanToolkit-master/bregman/features_base.py", line 588, in _cqft
self._make_log_freq_map()
File "/Volumes/Dados/Documents/Education/Programming/Machine Learning/Manning/book/BregmanToolkit-master/bregman/features_base.py", line 353, in _make_log_freq_map
mxnorm = P.empty(self._cqtN) # Normalization coefficients
TypeError: 'float' object cannot be interpreted as an index
as far as I'm concerned, rangeis an intand not a float.
could someone please point me the error here?
The problem is that you're using Python 3, but the Bregman Toolkit was written in Python 2. The error comes from this line:
mxnorm = P.empty(self._cqtN)
self._cqtN is a float. In Python 2, the pylab library accepts floats as input:
pylab.empty(5.0)
__main__:1: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
array([ 0., 0., 0., 0., 0.])
However, in Python 3 you get the same error as you do:
pylab.empty(5.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'float' object cannot be interpreted as an integer
You should be able to fix this error by just editing the line in the file I linked above and cast it to an int:
mxnorm = P.empty(int(self._cqtN))
However, I'd be surprised if there weren't any other errors due to the incompatible versions. You might want to try using Python 2 or look for an alternative to the Bregman Toolkit.
You need to change castself._cqtN to int in line 353 and 357 in feature_base.py
There are
mxnorm = P.empty(int(self._cqtN))
and
for i in P.arange(int(self._cqtN))])

Categories