I'm trying to create a 2-layer neural network, for that I first initialize weights and biases to random floats between 0 an 1 using numpy.random.rand. However, for some reason this process produces floats bigger than 1 for W1 (weight 1) whereas it works correctly for all other weights an biases. I can't understand why this happens, I thought maybe something affects the function from outside the function where I initialized the parameters, but I couldn't detect any part in the function that could be affected from outside the function.
import numpy as np
### CONSTANTS DEFINING THE MODEL ####
n_x = 12288 # num_px * num_px * 3
n_h = 7
n_y = 1
layers_dims = (n_x, n_h, n_y)
def initialize_parameters_deep(layer_dims):
"""
Arguments:
layer_dims -- python array (list) containing the dimensions of each layer in our network
Returns:
parameters -- python dictionary containing your parameters "W1", "b1","W2", "b2":
"""
np.random.seed(1)
parameters = {}
parameters["W1"] = np.random.rand(n_h, n_x) #(7, 12288)
parameters["b1"] = np.random.rand(n_h, 1) #(7)
parameters["W2"] = np.random.rand(n_y, n_h) #(7, 1)
parameters["b2"] = np.random.rand(n_y, 1) #(1)
return parameters
parameters = initialize_parameters_deep(layers_dims)
print(parameters)
Output:
{'W1': array([[4.17022005e-01, 7.20324493e-01, 1.14374817e-04, ...,
3.37562919e-01, 1.12292153e-01, 5.37047221e-01],
[7.07934286e-01, 3.37726007e-01, 7.07954162e-01, ...,
4.22040811e-01, 7.78593215e-01, 3.49866021e-01],
[9.01338451e-01, 7.95132845e-03, 1.03777034e-01, ...,
2.78602449e-01, 5.05813021e-02, 8.26828833e-01],
...,
[5.62717083e-03, 6.58208224e-01, 3.88407263e-01, ...,
5.56312618e-01, 8.69650932e-01, 1.00112287e-01],
[4.16278934e-01, 4.56060621e-01, 9.33378848e-01, ...,
9.52798385e-01, 9.41894584e-01, 4.44342962e-01],
[8.89254832e-01, 6.42558949e-01, 2.29427262e-01, ...,
8.05884494e-01, 1.80676088e-01, 6.12694420e-01]]), 'b1': array([[0.11933315],
[0.50073416],
[0.21336813],
[0.14223935],
[0.60809243],
[0.41994954],
[0.43137737]]), 'W2': array([[0.81360697, 0.44638382, 0.41794085, 0.08649817, 0.29957473,
0.33706742, 0.24721952]]), 'b2': array([[0.92363097]])}
It's not generating floats bigger than 1, it's just representing them differently.
4.17022005e-01 is the same as 0.417022005, and 1.14374817e-04 is the same as 0.000114374817.
See here or here.
The e-01, e-02, e-03, etc at the end of the W1 numbers just mean that the numbers are written in exponential format. So if you have for example 2.786e-01 that is the same as if it was written like (2.786/10) and that is the same as 0.2786. Same thing goes for: 2.786e-03 == (2.786/1000) == 0.002786. e+2 is 10^2 and e-2 is 1/(10^2).
Pay attention to the final few characters printed when you print your weights parameter tensor, which gives e.g. e-01. This represents base-10 exponentiation, i.e. meaning that the value of a given weight is the number printed times 10 to the given power.
All of the powers are negative, meaning the weights have small but positive values in the range [0, 1].
For example, 4.17022005e-01 equals 0.417022005.
Related
I start with the optimization function from scipy.
I tried to create my code by copying the Find optimal vector that minimizes function solution
I have an array that contains series in columns. I need to multiply each of them by a weight so that the sum of last row of these columns multiplied by the weights gives a given number (constraint).
The sum of the series multiplied by the weights gives a new series where I extract the max-draw-down and I want to minimize this mdd.
I wrote my code as best as I can (2 months of Python and 3 hours of scipy) and can't solve the error message on the function used to solve the problem.
Here is my code and any help would be much appreciated:
import numpy as np
from scipy.optimize import fmin_slsqp
# based on: https://stackoverflow.com/questions/41145643/find-optimal-vector-that-minimizes-function
# the number of columns (and so of weights) can vary; it should be generic, regardless the number of columns
def mdd(serie): # finding the max-draw-down of a series (put aside not to create add'l problems)
min = np.nanargmax(np.fmax.accumulate(serie) - serie)
max = np.nanargmax((serie)[:min])
return serie[np.nanargmax((serie)[:min])] - serie[min] # max-draw-down
# defining the input data
# mat is an array of 5 columns containing series of independent data
mat = np.array([[1, 0, 0, 1, 1],[2, 0, 5, 3, 4],[3, 2, 4, 3, 7],[4, 1, 3, 3.1, -6],[5, 0, 2, 5, -7],[6, -1, 4, 1, -8]]).astype('float32')
w = np.ndarray(shape=(5)).astype('float32') # 1D vector for the weights to be used for the columns multiplication
w0 = np.array([1/5, 1/5, 1/5, 1/5, 1/5]).astype('float32') # initial weights (all similar as a starting point)
fixed_value = 4.32 # as a result of constraint nb 1
# testing the operations that are going to be used in the minimization
series = np.sum(mat * w0, axis=1)
# objective:
# minimize the mdd of the series by modifying the weights (w)
def test(w, mat):
series = np.sum(mat * w, axis=1)
return mdd(series)
# constraints:
def cons1(last, w, fixed_value): # fixed_value = 4.32
# the sum of the weigths multiplied by the last value of each column must be equal to this fixed_value
return np.sum(mat[-1, :] * w) - fixed_value
def cons2(w): # the sum of the weights must be equal to 1
return np.sum(w) - 1
# solution:
# looking for the optimal set of weights (w) values that minimize the mdd with the two contraints and bounds being respected
# all w values must be between 0 and 1
result = fmin_slsqp(test, w0, f_eqcons=[cons1, cons2], bounds=[(0.0, 1.0)]*len(w), args=(mat, fixed_value, w0), full_output=True)
weights, fW, its, imode, smode = result
print(weights)
You weren't that far off the mark. The biggest problem lies in the mdd function: In case there is no draw-down, your function spits out an empty list as an intermediate result, which then can no longer cope with the argmax function.
def mdd(serie): # finding the max-draw-down of a series (put aside not to create add'l problems)
i = np.argmax(np.maximum.accumulate(serie) - serie) # end of the period
start = serie[:i]
# check if there is dd at all
if not start.any():
return 0
j = np.argmax(start) # start of period
return serie[j] - serie[i] # max-draw-down
In addition, you must make sure that the parameter list is the same for all functions involved (cost function and constraints).
# objective:
# minimize the mdd of the series by modifying the weights (w)
def test(w, mat,fixed_value):
series = mat # w
return mdd(series)
# constraints:
def cons1(w, mat, fixed_value): # fixed_value = 4.32
# the sum of the weigths multiplied by the last value of each column must be equal to this fixed_value
return mat[-1, :] # w - fixed_value
def cons2(w, mat, fixed_value): # the sum of the weights must be equal to 1
return np.sum(w) - 1
# solution:
# looking for the optimal set of weights (w) values that minimize the mdd with the two contraints and bounds being respected
# all w values must be between 0 and 1
result = fmin_slsqp(test, w0, eqcons=[cons1, cons2], bounds=[(0.0, 1.0)]*len(w), args=(mat,fixed_value), full_output=True)
One more remark: You can make the matrix-vector multiplications much leaner with the #-operator.
I want to convert the R package Hmisc::wtd.quantile() into python.
Here is the example in R:
I took this as reference and it seems that the logics are different than R:
# First function
def weighted_quantile(values, quantiles, sample_weight = None,
values_sorted = False, old_style = False):
""" Very close to numpy.percentile, but supports weights.
NOTE: quantiles should be in [0, 1]!
:param values: numpy.array with data
:param quantiles: array-like with many quantiles needed
:param sample_weight: array-like of the same length as `array`
:return: numpy.array with computed quantiles.
"""
values = np.array(values)
quantiles = np.array(quantiles)
if sample_weight is None:
sample_weight = np.ones(len(values))
sample_weight = np.array(sample_weight)
assert np.all(quantiles >= 0) and np.all(quantiles <= 1), 'quantiles should be in [0, 1]'
if not values_sorted:
sorter = np.argsort(values)
values = values[sorter]
sample_weight = sample_weight[sorter]
# weighted_quantiles = np.cumsum(sample_weight)
# weighted_quantiles /= np.sum(sample_weight)
weighted_quantiles = np.cumsum(sample_weight)/np.sum(sample_weight)
return np.interp(quantiles, weighted_quantiles, values)
weighted_quantile(values = [0.4890342, 0.4079128, 0.5083345, 0.2136325, 0.6197319],
quantiles = np.arange(0, 1 + 1 / 5, 1 / 5),
sample_weight = [1,1,1,1,1])
>> array([0.2136325, 0.2136325, 0.4079128, 0.4890342, 0.5083345, 0.6197319])
# Second function
def weighted_percentile(data, weights, perc):
"""
perc : percentile in [0-1]!
"""
data = np.array(data)
weights = np.array(weights)
ix = np.argsort(data)
data = data[ix] # sort data
weights = weights[ix] # sort weights
cdf = (np.cumsum(weights) - 0.5 * weights) / np.sum(weights) # 'like' a CDF function
return np.interp(perc, cdf, data)
weighted_percentile([0.4890342, 0.4079128, 0.5083345, 0.2136325, 0.6197319], [1,1,1,1,1], np.arange(0, 1 + 1 / 5, 1 / 5))
>> array([0.2136325 , 0.31077265, 0.4484735 , 0.49868435, 0.5640332 ,
0.6197319 ])
Both are different with R. Any idea?
I am Python-illiterate, but from what I see and after some quick checks I can tell you the following.
Here you use uniform (sampling) weights, so you could also directly use the quantile() function. Not surprisingly, it gives the same results as wtd.quantile() with uniform weights:
x <- c(0.4890342, 0.4079128, 0.5083345, 0.2136325, 0.6197319)
n <- length(x)
x <- sort(x)
quantile(x, probs = seq(0,1,0.2))
# 0% 20% 40% 60% 80% 100%
# 0.2136325 0.3690567 0.4565856 0.4967543 0.5306140 0.6197319
The R quantile() function get the quantiles in a 'textbook' way, i.e. by determining the index i of the obs to use with i = q(n+1).
In your case:
seq(0,1,0.2)*(n+1)
# 0.0 1.2 2.4 3.6 4.8 6.0
Of course since you have 5 values/obs and you want quintiles, the indices are not integers. But you know for example that the first quintile (i = 1.2) lies between obs 1 and obs 2. More precisely, it is a linear combination of the two observations (the 'weights' are derived from the value of the index):
0.2*x[1] + 0.8*x[2]
# 0.3690567
You can do the same for the all the quintiles, on the basis of the indices:
q <-
c(min(x), ## 0: actually, the first obs
0.2*x[1] + 0.8*x[2], ## 1.2: quintile lies between obs 1 and 2
0.4*x[2] + 0.6*x[3], ## 2.4: quintile lies between obs 2 and 3
0.6*x[3] + 0.4*x[4], ## 3.6: quintile lies between obs 3 and 4
0.8*x[4] + 0.2*x[5], ## 4.8: quintile lies between obs 4 and 5
max(x) ## 6: actually, the last obs
)
q
# 0.2136325 0.3690567 0.4565856 0.4967543 0.5306140 0.6197319
You can see that you get exactly the output of quantile() and wtd.quantile().
If instead of 0.2*x[1] + 0.8*x[2] we consider the following:
0.5*x[1] + 0.5*x[2]
# 0.3107726
We get the output of your second Python function. It appears that your second function considers uniform 'weights' (obviously I am not talking about the sampling weights here) when combining the two observations. The issue (at least for the second Python function) seems to come from this. I know these are just insights, but I hope they will help.
EDIT: note that the difference between the two is not necessary an 'issue' with the python code. There are different quantile estimators (and their weighted versions) and the python functions could simply rely on a different estimator than Hmisc::wtd.quantile(). I think that the latter uses the weighted version of the Harrell-Davis quantile estimator. If you really want to implement this one, you should check the source code of Hmisc::wtd.quantile() and try to 'directly' translate this into Python.
I am following a book which has the following code:
import numpy as np
np.random.seed(1)
streetlights = np.array([[1, 0, 1], [0, 1, 1], [0, 0, 1], [1, 1, 1]])
walk_vs_stop = np.array([[1, 1, 0, 0]]).T
def relu(x):
return (x > 0) * x
def relu2deriv(output):
return output > 0
alpha = 0.2
hidden_layer_size = 4
# random weights from the first layer to the second
weights_0_1 = 2*np.random.random((3, hidden_layer_size)) -1
# random weights from the second layer to the output
weights_1_2 = 2*np.random.random((hidden_layer_size, 1)) -1
for iteration in range(60):
layer_2_error = 0
for i in range(len(streetlights)):
layer_0 = streetlights[i : i + 1]
layer_1 = relu(np.dot(layer_0, weights_0_1))
layer_2 = relu(np.dot(layer_1, weights_1_2))
layer_2_error += np.sum((layer_2 - walk_vs_stop[i : i + 1])) ** 2
layer_2_delta = layer_2 - walk_vs_stop[i : i + 1]
layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)
weights_1_2 -= alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 -= alpha * layer_0.T.dot(layer_1_delta)
if iteration % 10 == 9:
print(f"Error: {layer_2_error}")
Which outputs:
# Error: 0.6342311598444467
# Error: 0.35838407676317513
# Error: 0.0830183113303298
# Error: 0.006467054957103705
# Error: 0.0003292669000750734
# Error: 1.5055622665134859e-05
I understand everything but this part is not explained and I am not sure why it is the way it is:
weights_0_1 = 2*np.random.random((3, hidden_layer_size)) -1
weights_1_2 = 2*np.random.random((hidden_layer_size, 1)) -1
I don't understand:
Why there is 2* the whole matrix and why is there a -1
If I change 2 to 3 my error becomes greatly lower # Error: 5.616513576418916e-13
I tried changing the 2 to many other numbers along with the change of -1 to many other numbers I get # Error: 2.0 most of the time or the Error is much worst than combination of 3 and -1.
I can't seem to grasp the relationship and the purpose of multiplying the random weights by a number and subracting a number afterwards.
P.S. The idea of the network is to understand a streetlight pattern when people should go and when they should stop depending what combination of the lights in streetlight is on / off.
There is a lot of ways to initialize neural network, and it's a current research subject as it can have a great impact on performance and training time. Some rules of thumb :
avoid having only one value for all weights, as they would all update the same
avoid having too large weights that could make your gradient too high
avoid having too small weights that could make your gradient vanish
In your case, the goal is just to have something between [-1;1] :
np.random.random gives you a float in [0;1]
multiply by 2 gives you something in [0;2]
substract 1 gives you a number in [-1;1]
2*np.random.random((3, 4)) -1 is a way to generated 3*4=12 random number from uniform distribution of half-open interval [-1, +1) i.e including -1 but excluding +1.
This is equivalent to more readable code
np.random.uniform(-1, 1, (3, 4))
Does Tensorflow support runtime determine the shape of Tensor?
The problem is to build a Constant tensor in runtime based on the input vector length_q. The number of columns of the target tensor is the sum of length_q. The code snippet is shown as follows, the length of length_q is fixed to 64.
T = tf.reduce_sum(length_q, 0)[0]
N = np.shape(length_q)[0]
wm = np.zeros((N, T), dtype=np.float32)
# Something inreletive.
count = 0
for i in xrange(N):
ones = np.ones(length_q[i])
wm[i][count:count+length_q[i]] = ones
count += length_q[i]
return tf.Constant(wm)
Update
I want to create a dynamic Tensor according to the input length_q. length_q is some input vector (64*1). The new tensor's shape I want to create depends on the sum of length_q because in each batch the data in length_q changes. The current code snippet is as follows:
def some_matrix(length_q):
T = tf.reduce_sum(length_q, 0)[0]
N = np.shape(length_q)[0]
wm = np.zeros((N, T), dtype=np.float32)
count = 0
return wm
def network_inference(length_q):
wm = tf.constant(some_matrix(length_q));
...
And the problem occurs probably because length_q is the placeholder and doesn't have summation operation. Are there some ways to solve this problem?
It sounds like the tf.fill() op is what you need. This op allows you to specify a shape as a tf.Tensor (i.e. a runtime value) along with a value:
def some_matrix(length_q):
T = tf.reduce_sum(length_q, 0)[0]
N = tf.shape(length_q)[0]
wm = tf.fill([T, N], 0.0)
return wm
Not clear about what you are calculating. If you need to calculate N shape, you can generate ones like this
T = tf.constant(20.0,tf.float32) # tf variable which is reduced sum , 20.0 is example float value
T = tf.cast(T,tf.int32) # columns will be integer only
N = 10 # if numpy integer- assuming np.shape giving 10
# N = length_q.getshape()[0] # if its a tensor, 'lenght_q' replace by your tensor name
wm = tf.ones([N,T],dtype=tf.float32) # N rows and T columns
sess = tf.Session()
sess.run(tf.initialize_all_variables())
sess.run(wm)
On the numpy page they give the example of
s = np.random.dirichlet((10, 5, 3), 20)
which is all fine and great; but what if you want to generate random samples from a 2D array of alphas?
alphas = np.random.randint(10, size=(20, 3))
If you try np.random.dirichlet(alphas), np.random.dirichlet([x for x in alphas]), or np.random.dirichlet((x for x in alphas)), it results in a
ValueError: object too deep for desired array. The only thing that seems to work is:
y = np.empty(alphas.shape)
for i in xrange(np.alen(alphas)):
y[i] = np.random.dirichlet(alphas[i])
print y
...which is far from ideal for my code structure. Why is this the case, and can anyone think of a more "numpy-like" way of doing this?
Thanks in advance.
np.random.dirichlet is written to generate samples for a single Dirichlet distribution. That code is implemented in terms of the Gamma distribution, and that implementation can be used as the basis for a vectorized code to generate samples from different distributions. In the following, dirichlet_sample takes an array alphas with shape (n, k), where each row is an alpha vector for a Dirichlet distribution. It returns an array also with shape (n, k), each row being a sample of the corresponding distribution from alphas. When run as a script, it generates samples using dirichlet_sample and np.random.dirichlet to verify that they are generating the same samples (up to normal floating point differences).
import numpy as np
def dirichlet_sample(alphas):
"""
Generate samples from an array of alpha distributions.
"""
r = np.random.standard_gamma(alphas)
return r / r.sum(-1, keepdims=True)
if __name__ == "__main__":
alphas = 2 ** np.random.randint(0, 4, size=(6, 3))
np.random.seed(1234)
d1 = dirichlet_sample(alphas)
print "dirichlet_sample:"
print d1
np.random.seed(1234)
d2 = np.empty(alphas.shape)
for k in range(len(alphas)):
d2[k] = np.random.dirichlet(alphas[k])
print "np.random.dirichlet:"
print d2
# Compare d1 and d2:
err = np.abs(d1 - d2).max()
print "max difference:", err
Sample run:
dirichlet_sample:
[[ 0.38980834 0.4043844 0.20580726]
[ 0.14076375 0.26906604 0.59017021]
[ 0.64223074 0.26099934 0.09676991]
[ 0.21880145 0.33775249 0.44344606]
[ 0.39879859 0.40984454 0.19135688]
[ 0.73976425 0.21467288 0.04556287]]
np.random.dirichlet:
[[ 0.38980834 0.4043844 0.20580726]
[ 0.14076375 0.26906604 0.59017021]
[ 0.64223074 0.26099934 0.09676991]
[ 0.21880145 0.33775249 0.44344606]
[ 0.39879859 0.40984454 0.19135688]
[ 0.73976425 0.21467288 0.04556287]]
max difference: 5.55111512313e-17
I think you're looking for
y = np.array([np.random.dirichlet(x) for x in alphas])
for your list comprehension. Otherwise you're simply passing a python list or tuple. I imagine the reason numpy.random.dirichlet does not accept your list of alpha values is because it's not set up to - it already accepts an array, which it expects to have a dimension of k, as per the documentation.