Creating custom dataset in PyTorch

Creating custom dataset in PyTorch - python

Problem
In PyTorch, I am trying to write a class that could return the entire data and label separately using syntax like dataset.data and dataset.label. The code skeleton looks like:
class MyDataset(object):
data = _get_data()
label = _get_label()
def __init__(self, dir, transforms):
self.img_list = ... # all image paths loaded from dir
# do something
def __getitem__(self):
# do something
return data, label
def __len__(self):
return len(self.img_list)
def _get_data():
# do something
def _get_label():
# do something
However, when I use dataset.data and dataset.label to access the corresponding variables, nothing is returned.
I am wondering why this is the case and how I can fix this.
Edit
Thank you for all of your attention.
I have solved this problem by myself. The solution is pretty straightforward, which just utilizes the property of class variables.
class FaceDataset(object):
# class variable
data = None
label = None
def __init__(self, root, transforms=None):
# read img_list from root
img_list = ...
self.transforms = ...
FaceDataset.data = FaceDataset._get_data(self.img_list, self.transforms)
FaceDataset.label = FaceDataset._get_label(self.img_list)
#classmethod
def _get_data(cls, img_list, transforms):
data_list = []
for img_path in img_list:
data_list.append(transforms(Image.open(img_path)).unsqueeze(0))
return torch.stack(data_list, dim=0)
#classmethod
def _get_label(cls, img_list):
label = torch.zeros(len(img_list))
for i, img_path in enumerate(img_list):
label[i] = ...
return label
def __getitem__(self, index):
img_path = self.img_list[index]
label = ...
# read image from file
data = Image.open(img_path)
# apply transform defined in __init__
data = self.transforms(data)
return data, label
def __len__(self):
return len(self.img_list)

The "normal" way to create custom datasets in Python has already been answered here on SO. There happens to be an official PyTorch tutorial for this.
For a simple example, you can read the PyTorch MNIST dataset code here (this dataset is used in this PyTorch example code for further illustration). Finally, you can find other dataset implementations in this torchvision datasets list (click on the dataset name, then on the "source" button in the dataset documentation, to access the dataset's PyTorch implementation).

Related

Declaring class methods in a function and accessing its values from another function

I've never really used classes before, I just simply went the easy way (global variables), and now I would like to make my code right to avoid future complications.
This is my code:
from dearpygui.core import *
class Engine:
def __init__(self,serial,type,profile):
self.serial = serial
self.type = type
self.profile = profile
def apply_selected_file():
res = []
html_name= "example.html"
path= "C:/"
#Function that reads data from a file and saves selected data in a list
res = html_imp(path + '/' + html_name)
#I would like to remove the code below and use a class for each file instead
setvalue(sn1,es[0]) #shows a label with this value
setvalue(type1,res[1]) #shows a label with this value
setvalue(profile1,res[2]) #shows a label with this value
return res
def button():
#This was my initial idea but it doesn't seem to work.
# res = apply_selected_file()
# E = Engine(res[0],res[1],res[2])
I have in mind reading multiple HTML files so using a class would be much easier than declaring variables for each file:
1- Use apply_selected_file to read a file and assign values (s/n,type,profile) to a new class (E1,E2,E3,...,E20,...)
2- Use another function button() to access those stored class values.

how to add methods to class attributes in python?

I'm creating a custom class to store information about a CFD simulation results.
Right now the way it is set up is that it instantiates an empty class object, then used a method called load_mesh which calls an external function to read all the information about the mesh, and return a dictionary of all the information. The load_mesh method then assigns a bunch of class attributes from the values in the dictionary.
The problem is that I am planning to store alot more information than just the mesh, and I dont want to have like 1000 attributes to my class object. I want to store then in appropriate containers(?) that each have their own methods.
For example, my code looks like this currently (some stuff omitted that's unnecessary):
class CFD():
def __init__(self, infile=None):
self.file = infile
def load_mesh(self):
mesh = load_cfd_mesh(self) #calls outside function to load mesh info, uses self.file, returns dict
self.proj = mesh['proj']
self.static_items = mesh['static_items']
self.nnodes = mesh['nnodes']
self.node_coords = mesh['node_coords']
self.node_codes = mesh['node_codes']
self.nelements = mesh['nelements']
self.element_types = mesh['element_types_str']
self.node_connectivity = mesh['node_connectivity']
self.element_node_ids = mesh['element_node_ids']
self.element_coords = mesh['element_coords']
self.element_elevs = mesh['element_elevs']
self.horizontal_units = mesh['horizontal_units']
self.vertical_units = mesh['vertical_units']
test = CFD('testfile.txt') #instantiate
test.load_mesh() #load mesh information to attributes
Now, I can access any of the mesh information by doing:
test.proj
self.nnodes
self.coords
etc...
But want I want to do is store all of this information in test.mesh, where test.mesh has all of these attributes but also has the method test.mesh.load().
I THINK I can do something like this:
class CFD():
def __init__(self, infile=None):
self.file = infile
self.mesh = None
def load_mesh(self):
mesh = load_cfd_mesh(self) #calls outside function to load mesh info, uses self.file, returns dict
setattr(self.mesh, 'proj', mesh['proj'])
#etc....
then I'd be able to do:
test = CFD('testfile.txt') #instantiate
test.load_mesh() #load mesh information to attributes
test.mesh.proj
But I can't figure out how to add the load_mesh method to self.mesh?
How is it possible to achieve the following way of doing this:
test = CFD('testfile.txt') #instantiate
test.mesh.load() #load mesh information to attributes
test.mesh.proj
Do I have to define another class within the main class? Like class mesh(self):
Also, if my proposed way of adding attributes to self.mesh doesn't make sense..please help!

I think you might be looking for something like a property to lazily load the mesh when needed – I don't really see why there'd be an "empty" mesh object you explicitly have to .load():
class Mesh:
def __init__(self, filename):
mesh = load_cfd_mesh(filename)
self.proj = mesh["proj"]
self.static_items = mesh["static_items"]
# ...
class CFD:
def __init__(self, filename):
self.filename = filename
self._mesh = None
#property
def mesh(self):
if not self._mesh:
self._mesh = Mesh(self.filename)
return self._mesh
test = CFD("testfile.txt")
print(test.mesh.proj)

You can do that with an inner class (below is a simplified code for demonstrating):
class CFD:
class Mesh:
def __init__(self, file):
self._file = file
def load_mesh(self):
# implement here your own code...
print("loading from file", self._file)
self.proj = "PROJ"
def __init__(self, file):
self.mesh = self.__class__.Mesh(file)

Creating a new object every time you call a function in python

I have a recommender system that I need to train, and I included the entire training procedure inside a function:
def train_model(data):
model = Recommender()
Recommender.train(data)
pred = Recommender.predict(data)
return pred
something like this. Now if I want to train this inside a loop, for different datasets, like:
preds_list = []
data_list = [dataset1, dataset2, dataset3...]
for data_subset in data_list:
preds = train_model(data_subset)
preds_list += [preds]
How can I make sure that every time I call the train_model function, a brand new instance of a recommender is created, not an old one, trained on the previous dataset?

You are already creating a new instance everytime you execute train_model. The thing you are not using the new instance.
You probably meant:
def train_model(data):
model = Recommender()
model.train(data)
pred = model.predict(data)
return pred

Use the instance you've instantiated, not the class
class Recommender:
def __init__(self):
self.id = self
def train(self, data):
return data
def predict(self, data):
return data + str(self.id)
def train_model(data):
model = Recommender()
model.train(data)
return model.predict(data)
data = 'a data '
x = {}
for i in range(3):
x[i] = train_model(data)
print(x[i])
# a data <__main__.Recommender object at 0x11cefcd10>
# a data <__main__.Recommender object at 0x11e0471d0>
# a data <__main__.Recommender object at 0x11a064d50>

pyspark.ml pipelines: are custom transformers necessary for basic preprocessing tasks?

Getting started with pyspark.ml and the pipelines API, I find myself writing custom transformers for typical preprocessing tasks in order to use them in a pipeline. Examples:
from pyspark.ml import Pipeline, Transformer
class CustomTransformer(Transformer):
# lazy workaround - a transformer needs to have these attributes
_defaultParamMap = dict()
_paramMap = dict()
_params = dict()
class ColumnSelector(CustomTransformer):
"""Transformer that selects a subset of columns
- to be used as pipeline stage"""
def __init__(self, columns):
self.columns = columns
def _transform(self, data):
return data.select(self.columns)
class ColumnRenamer(CustomTransformer):
"""Transformer renames one column"""
def __init__(self, rename):
self.rename = rename
def _transform(self, data):
(colNameBefore, colNameAfter) = self.rename
return data.withColumnRenamed(colNameBefore, colNameAfter)
class NaDropper(CustomTransformer):
"""
Drops rows with at least one not-a-number element
"""
def __init__(self, cols=None):
self.cols = cols
def _transform(self, data):
dataAfterDrop = data.dropna(subset=self.cols)
return dataAfterDrop
class ColumnCaster(CustomTransformer):
def __init__(self, col, toType):
self.col = col
self.toType = toType
def _transform(self, data):
return data.withColumn(self.col, data[self.col].cast(self.toType))
They work, but I was wondering if this is a pattern or antipattern - are such transformers a good way to work with the pipeline API? Was it necessary to implement them, or is equivalent functionality provided somewhere else?

I'd say it is primarily opinion based, although it looks unnecessarily verbose and Python Transformers don't integrate well with the rest of the Pipeline API.
It is also worth pointing out that everything you have here can be easily achieved with SQLTransformer. For example:
from pyspark.ml.feature import SQLTransformer
def column_selector(columns):
return SQLTransformer(
statement="SELECT {} FROM __THIS__".format(", ".join(columns))
)
or
def na_dropper(columns):
return SQLTransformer(
statement="SELECT * FROM __THIS__ WHERE {}".format(
" AND ".join(["{} IS NOT NULL".format(x) for x in columns])
)
)
With a little bit of effort you can use SQLAlchemy with Hive dialect to avoid handwritten SQL.

Python pass multiple classes to function/method/

I am trying to write the derivative function given in pseudo code in my mwe below. It is supposed to calculate the numerical derivative of the cost of the prediction of my neural network with respect to a parameter of one of its layers.
My problem is I don't know how to pass and access an instance of NeuralNetwork and an instance of Layer from within the function (or method?) at the same time.
Looking into e.g. Passing a class to another class (Python) did not provide an answer to me.
import copy
class NeuralNetwork:
def __init__(self):
self.first_layer = Layer()
self.second_layer = Layer()
def cost(self):
# not the actual cost but not of interest
return self.first_layer.a + self.second_layer.a
class Layer:
def __init__(self):
self.a = 1
''' pseudocode
def derivative(NeuralNetwork, Layer):
stepsize = 0.01
cost_unchanged = NeuralNetwork.cost()
NN_deviated = copy.deepcopy(NeuralNetwork)
NN_deviated.Layer.a += stepsize
cost_deviated = NN_deviated.cost()
return (cost_deviated - cost_unchanged)/stepsize
'''
NN = NeuralNetwork()
''' pseudocode
derivative_first_layer = derivative(NN, first_layer)
derivative_second_layer = derivative(NN, second_layer)
'''

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating custom dataset in PyTorch - python

Related

Declaring class methods in a function and accessing its values from another function

how to add methods to class attributes in python?

Creating a new object every time you call a function in python

pyspark.ml pipelines: are custom transformers necessary for basic preprocessing tasks?

Python pass multiple classes to function/method/

Categories

Resources