change() missing 1 required positional argument: 'X' while predicting future value - python

I am trying to predict the future value with three inputs. Here I want to forecast the future value according to the three inputs in every one hour. Here g= temperature, p=humidity, c=wind and I want to predict temperature in next hour according to these inputs. That's why here I put n_out is 1, I wrote the code in def class. After that I tried to add that def class value as x,y value. Because I am going to write it as train and test value. But the error came as this. I am going to predict future value using LSTM . After this I don't know how to add this code as train and test into LSTM model. Can anyone help me to solve this problem?
Here I paste my code and csv file.
def change(train,X, n_out=1):
data = train.reshape((train.shape[0]))
x, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + X
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end < len(data):
x_input = data[in_start:in_end, 0]
x_input = x_input.reshape((len(x_input), 3))
x.append(x_input)
y.append(data[in_end:out_end, 0])
# move along one time step
in_start += 1
return array(x), array(y)
data= pd.DataFrame(data,columns=['g','p','c'])
data.columns = ['g', 'p', 'c',]
pd.options.display.float_format = '{:,.0f}'.format
data = data.dropna ()
cols=['g', 'p', 'c']
X=data[cols]
x,y = change(data)
The error came as
my csv file:
Aftre edditing the code it gave me this error:

In your definition of the function you have 3 parameters:
train, X and n_out=1
def change(train, X, n_out=1)
when you are calling your function you are providing just 1 argument(data)
x,y = change(data)
how n_out you define as 1, you need to provide x also, or define your function as:
def change(train, n_out=1)
NOTE:
you need to provide X when you are calling your function for example :
x,y = change(data, 1)
or define functrion like :
def change(train, X=1, n_out=1)

Related

tf.data.datasets set each batch (prefetch)

I am looking for help thinking through this.
I have a function (that is not a generator) that will give me any number of samples.
Let's say that getting all the data I want to train (1000 samples) can't fit into memory.
So I want to call this function 10 times to get smaller number of samples that fit into memory.
This is a dummy example for simplicity.
def get_samples(num_samples: int, random_seed=0):
np.random.seed(random_seed)
x = np.random.randint(0,100, num_samples)
y = np.random.randint(0,2, num_samples)
return np.array(list(zip(x,y))
Again lets say get_samples(1000,0) won't fit into memory.
So in theory I am looking for something like this:
batch_size = 100
total_num_samples = 1000
batches = []
for i in range(total_num_samples//batch_size):
batches.append(get_samples(batch_size, i))
But this still loads everything into memory.
Again this function is a dummy representation and the real one is already defined and not a generator.
In tf land. I was hoping that:
tf.data.Dataset.batch[0] would equal to the output of get_data(100,0)
tf.data.Dataset.batch[1] would equal to the output of get_data(100,1)
tf.data.Dataset.batch[2] would equal to the output of get_data(100,2)
...
tf.data.Dataset.batch[9] would equal to the output of get_data(100,9)
I understand that I can use tf.data.Datasets with a generator (and I think you can set a generator per batch). But the function I have gives more than a single sample. The set up is too expensive to set it up for a every single sample.
I was wanting to use tf.data.Dataset.prefetch() to run the get_batch function on every batch. And of course, it would call the get_batch with the same parameters on every epoch.
Sorry if the explaination is convoluted. Trying my best to describe the problem.
Anyone have any ideas?
This what I came up with:
def simple_static_synthesizer(batch_size, seed=1, verbose=True):
if verbose:
print(f"Creating Synthetic Data with seed {seed}")
rng = np.random.default_rng(seed)
all_x = []
all_y = []
for i in range(batch_size):
x = np.array(np.concatenate((rng.integers(0,100, 1, dtype=int), rng.integers(0,100, 1, dtype=int), rng.integers(0,100, 1, dtype=int))))
y = np.array(rng.integers(0,2,1, dtype=int))
all_x.append(x)
all_y.append(y)
return all_x, all_y
def my_generator(total_size, batch_size, seed=0, verbose=True):
counter = 0
for i in range(total_size):
# Regenerate for each batch
if counter%batch_size == 0: # Regen data for every batch
x,y = simple_static_synthesizer(batch_size,seed,verbose)
seed += 1
yield x[i%batch_size],y[i%batch_size]
counter += 1
my_gen = my_generator(10,2,seed=1)
# See values
for x,y in my_gen:
print(x,y)
# Call again, this give same answer as above
my_gen = my_generator(10,2,seed=1)
for x,y in my_gen:
print(x,y)
# Dataset with small batches to see if it is doing it correctly
total_samples = 10
batch_size = 2
seed = 5
dataset = tf.data.Dataset.from_generator(
my_generator,
args=[total_samples,batch_size,seed],
output_signature=(
tf.TensorSpec(shape=(3,), dtype=tf.uint8),
tf.TensorSpec(shape=(1,), dtype=tf.uint8),
)
)
for i,(x,y) in enumerate(dataset):
print(x.numpy(),y.numpy())
if i == 4:
break # shows first 3 syn calls
Wish we could have notebook answers!

Creating masks for duplicate elements in Tensorflow/Keras

I am trying to write a custom loss function for a person-reidentification task which is trained in a multi-task learning setting along with object detection. The filtered label values are of the shape (batch_size, num_boxes). I would like to create a mask such that only the values which repeat in dim 1 are considered for further calculations. How do I do this in TF/Keras-backend?
Short Example:
Input labels = [[0,0,0,0,12,12,3,3,4], [0,0,10,10,10,12,3,3,4]]
Required output: [[0,0,0,0,1,1,1,1,0],[0,0,1,1,1,0,1,1,0]]
(Basically I want to filter out only duplicates and discard unique identities for the loss function).
I guess a combination of tf.unique and tf.scatter could be used but I do not know how.
This code works:
x = tf.constant([[0,0,0,0,12,12,3,3,4], [0,0,10,10,10,12,3,3,4]])
def mark_duplicates_1D(x):
y, idx, count = tf.unique_with_counts(x)
comp = tf.math.greater(count, 1)
comp = tf.cast(comp, tf.int32)
res = tf.gather(comp, idx)
mult = tf.math.not_equal(x, 0)
mult = tf.cast(mult, tf.int32)
res *= mult
return res
res = tf.map_fn(fn=mark_duplicates_1D, elems=x)

Python: 'numpy.float64' object is not callable

The code below throws the error 'numpy.float64' object is not callable at last_mae = mae(val_scaled_price_client, cv_model). Nevertheless the mae function with the same parameters works just fine when used outside the loop.
loss_history = [1000.00]
for i in range(10000):
# train
iterations = 10
train_auto_encoder(train_latent_customers=train_latentvars,
train_product_customers=train_scaled_price_client,
auto_encoder=auto_model,
iters=iterations,
batch_size=128,
display_step=20)
cv_model = auto_model.predict([val_latentvars, val_scaled_price_client_corrupted])
last_mae = mae(val_scaled_price_client, cv_model)
loss_history.append(last_mae)
if loss_history[-1] < loss_history[-2]:
iterations += 10
else:
break
I declared a function mae in previous cells as follow
# define function to calculate MAE between true and reconstructed values
def mae(y_true, y_pred):
# get non-zero positions
cond = np.not_equal(y_true, 0)
# get number of non-zero elements
num_non_zero = np.sum(cond)
# initialize zer matrix
zero_matrix = np.zeros(shape=y_true.shape)
# replace
predictions_corrected = np.where(cond, y_pred, zero_matrix)
# get rmse
mae = np.sum(np.abs(y_true - predictions_corrected)) / num_non_zero
# return
return(mae)
The problem is in your mae function. In the last but one row you overwrite your function definition of mae with a number. If you call this function once everything is OK. As soon as you call it again (as in the loop), you try to call a number instead of a function, which is impossible.
Just change
mae = np.sum(np.abs(y_true - predictions_corrected)) / num_non_zero
# return
return(mae)
to
return np.sum(np.abs(y_true - predictions_corrected)) / num_non_zero
Python is not Basic or Fortran where you assign the result to the function name to return it :).

Odd Results on Entropy Calculation

I am trying to write a function that properly calculates the entropy of a given dataset. However, I am getting very weird entropy values.
I am following the understanding that all entropy calculations must fall between 0 and 1, yet I am consistently getting values above 2.
Note: I must use log base 2 for this
Can someone explain why am I yielding incorrect entropy results?
The dataset I am testing is the ecoli dataset from the UCI Machine Learning Repository
import numpy
import math
#################### DATA HANDLING LIBRARY ####################
def csv_to_array(file):
# Open the file, and load it in delimiting on the ',' for a comma separated value file
data = open(file, 'r')
data = numpy.loadtxt(data, delimiter=',')
# Loop through the data in the array
for index in range(len(data)):
# Utilize a try catch to try and convert to float, if it can't convert to float, converts to 0
try:
data[index] = [float(x) for x in data[index]]
except Exception:
data[index] = 0
except ValueError:
data[index] = 0
# Return the now type-formatted data
return data
# Function that utilizes the numpy library to randomize the dataset.
def randomize_data(csv):
csv = numpy.random.shuffle(csv)
return csv
# Function to split the data into test, training set, and validation sets
def split_data(csv):
# Call the randomize data function
randomize_data(csv)
# Grab the number of rows and calculate where to split
num_rows = csv.shape[0]
validation_split = int(num_rows * 0.10)
training_split = int(num_rows * 0.72)
testing_split = int(num_rows * 0.18)
# Validation set as the first 10% of the data
validation_set = csv[:validation_split]
# Training set as the next 72
training_set = csv[validation_split:training_split + validation_split]
# Testing set as the last 18
testing_set = csv[training_split + validation_split:]
# Split the data into classes vs actual data
training_cols = training_set.shape[1]
testing_cols = testing_set.shape[1]
validation_cols = validation_set.shape[1]
training_classes = training_set[:, training_cols - 1]
testing_classes = testing_set[:, testing_cols - 1]
validation_classes = validation_set[:, validation_cols - 1]
# Take the sets and remove the last (classification) column
training_set = training_set[:-1]
testing_set = testing_set[:-1]
validation_set = validation_set[:-1]
# Return the datasets
return testing_set, testing_classes, training_set, training_classes, validation_set, validation_classes
#################### DATA HANDLING LIBRARY ####################
# This function returns the list of classes, and their associated weights (i.e. distributions)
# for a given dataset
def class_distribution(dataset):
# Ensure the dataset is a numpy array
dataset = numpy.asarray(dataset)
# Collect # of total rows and columns, using numpy
num_total_rows = dataset.shape[0]
num_columns = dataset.shape[1]
# Create a numpy array of just the classes
classes = dataset[:, num_columns - 1]
# Use numpy.unique to remove duplicates
classes = numpy.unique(classes)
# Create an empty array for the class weights
class_weights = []
# Loop through the classes one by one
for aclass in classes:
# Create storage variables
total = 0
weight = 0
# Now loop through the dataset
for row in dataset:
# If the class of the dataset is equal to the current class you are evaluating, increase the total
if numpy.array_equal(aclass, row[-1]):
total = total + 1
# If not, continue
else:
continue
# Divide the # of occurences by total rows
weight = float((total / num_total_rows))
# Add that weight to the list of class weights
class_weights.append(weight)
# Turn the weights into a numpy array
class_weights = numpy.asarray(class_weights)
# Return the array
return classes, class_weights
# This function returns the entropy for a given dataset
# Can be used across an entire csv, or just for a column of data (feature)
def get_entropy(dataset):
# Set initial entropy
entropy = 0.0
# Determine the classes and their frequencies (weights) of the dataset
classes, class_freq = class_distribution(dataset)
# Utilize numpy's quicksort to test the most occurring class first
numpy.sort(class_freq)
# Determine the max entropy for the dataset
max_entropy = math.log(len(classes), 2)
print("MAX ENTROPY FOR THIS DATASET: ", max_entropy)
# Loop through the frequencies and use given formula to calculate entropy
# For...Each simulates the sequence operator
for freq in class_freq:
entropy += float(-freq * math.log(freq, 2))
# Return the entropy value
return entropy
def main():
ecol = csv_to_array('ecoli.csv')
testing_set, testing_classes, training_set, training_classes, validation_set, validation_classes = split_data(ecol)
entropy = get_entropy(ecol)
print(entropy)
main()
The following function was used to calculate Entropy:
# Function to return Shannon's Entropy
def entropy(attributes, dataset, targetAttr):
freq = {}
entropy = 0.0
index = 0
for item in attributes:
if (targetAttr == item):
break
else:
index = index + 1
index = index - 1
for item in dataset:
if ((item[index]) in freq):
# Increase the index
freq[item[index]] += 1.0
else:
# Initialize it by setting it to 0
freq[item[index]] = 1.0
for freq in freq.values():
entropy = entropy + (-freq / len(dataset)) * math.log(freq / len(dataset), 2)
return entropy
As #MattTimmermans had indicated, entropy's value is actually contingent on the number of classes. For strictly 2 classes, it is contained in the 0 to 1 (inclusive) range. However, for more than 2 classes (which is what was being tested), entropy is calculated with a different formula (converted to Pythonic code above). This post here explains those mathematics and calculations a bit more in detail.

Initialize a batch-dependent variable in Tensorflow

I have a tensorflow code that runs well and accurately, but occupies a lot of memory. Specifically, in my code, I have a for-loop that looks something like this:
K = 10
myarray1 = tf.placeholder(tf.float32, shape=[None,5,5]) # shape = [None, 5, 5]
myarray2 = tf.Variable( np.zeros([K,5,5]), dtype=tf.float32 )
vals = []
for k in range(0,K):
tmp = tf.reduce_sum(myarray1*myarray2[k],axis=(1,2))
vals.append(tmp)
result = tf.min( tf.stack(vals,axis=-1), axis=-1 )
Unfortunately, that takes a lot of memory as K gets to be big in my application. So, I want to have a better way of doing it. For example, in numpy/python, you would just keep track of the minimum value as you iterate through the loops, and update it on each iteration. It seems like I could use tf.assign, as:
K = 10
myarray1 = tf.placeholder(tf.float32, shape=[None,5,5]) # shape = [None, 5, 5]
myarray2 = tf.Variable( np.zeros([K,5,5]), dtype=tf.float32 )
min_value = tf.Variable(myarray1, validate_shape=False, trainable=False)
for k in range(0,K):
tmp = myarray1*myarray2[k]
idx = tf.where(tmp<min_value)
tf.scatter_nd_assign(min_value, idx, tmp[idx], use_locking=True)
result = min_value
While this code builds the graph (when validate_shape=False), it fails to run because it complains that min_value has not been initialized. The issue is, when I run the initializer as:
sess.run(tf.global_variables_initializer())
or
sess.run(tf.variables_initializer(tf.trainable_variables()))
it complains that I am not feeding in a placeholder. This actually makes sense because the definition of min_value depends on myarray1 in the graph.
What I would actually want to do is define a dummy variable that doesn't depend on myarray1's values, but does match its shape. I would like these values to be initialized as some number (in this case something large is fine), as I will manually ensure these are overwritten in the network.
Note: as far as I know, currently you cannot define a variable with an unknown shape unless you feed in another variable of the desired shape and set validate_shape=False). Maybe there is another way?
Any help / suggestions appreciated.
Try this, if don't know how to feed placeholder, read the tutorial.
K = 10
myarray1 = tf.placeholder(tf.float32, shape=[None,5,5]) # shape = [None, 5, 5]
###################ADD THIS ####################
sess=tf.Session()
FOO = tf.run(myarray1,feed_dict={myarray1: YOURDATA}) #get myarray1 value
#replace all myarray1 below with FOO
################################################
myarray2 = tf.Variable( np.zeros([K,5,5]), dtype=tf.float32 )
min_value = tf.Variable(FOO, validate_shape=False, trainable=False)
for k in range(0,K):
tmp = FOO*myarray2[k]
idx = tf.where(tmp<min_value)
tf.scatter_nd_assign(min_value, idx, tmp[idx], use_locking=True)
result = min_value
-------above new 15.April.2018------
Since I don't know your input data, I would like to try to make some steps.
Step_1: make a place for input data
x = tf.placeholder(tf.float32, shape=[None,2])
Step_2: Get batches of data
batch_x=[[1,2],[3,4]] #example
#since x=[None,2], the batch size would be batch_x_size/x_size=2
Step_3: make a session
sess=tf.Session()
if you have variables add the following code to initialize before calculation
init=tf.gobal_variables_initializer()
sess.run(init)
Step_4:
yourplaceholderdictiornay={x: batch_x}
sess.run(x, feed_dict=yourplaceholderdictiornay)
always feed your placeholder so it gets the value to calculate.
There is a Tensorflow and Deep Learning without a PHD, very helpful PDF file, you can also find it on youtube with this title.

Categories