Spliting an array into train and test sets with python

Spliting an array into train and test sets with python - python

I tried a method to split data between train and test sets, but it seems that it fill the train with zeros and leave the data in test...
In theory, it works :
When I apply the following function which randomly selects some columns of the given array, it worked with the DataLens with numpy matrix but not with others.
def train_test_split(array):
test = np.zeros(array.shape)
train = array.copy()
for user in xrange(array.shape[0]):
test_ratings = np.random.choice(array[user, :].nonzero()[0],
size=10,
replace=False)
train[user, test_ratings] = 0.
test[user, test_ratings] = ratings[user, test_ratings]
# Test and training are truly disjoint
assert(np.all((train * test) == 0))
return train, test
train, test = train_test_split(ratings)
With simple data it doesn't work :
When using simple data :
ratings :
[[ 1. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]
[ 1. 0. 0. 0. 0.]
[ 0. 0. 0. 1. 1.]]
It fill the array with 0 one by one even if train was a copy of ratings at the very beginning :
train :
[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]

Related

Interpolation issues at the edges with inpaint from openCV

I tried interpolating not-a-number (nan) pixels in a scan with inpaint from opencv. This worked fine in the bulk of the image, but nan pixels at the edges of the image remained nan pixels.
Here is a minimal python example to reproduce the problem:
import numpy as np
import cv2 as cv
if __name__ == '__main__':
input = np.zeros((6,6))
input[1, 3] = np.nan
input = np.float32(input)
mask = np.uint8(input != 0)
inpaintRadius = 2
inpaintAlgorithm = cv.INPAINT_NS
output = cv.inpaint(input, mask, inpaintRadius, inpaintAlgorithm)
print(output)
This gives the output:
[[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. nan 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]]
As the nan pixel is interpolated with the Navier-Stokes equation, the correct solution is the equilibrium state. Therefore, I would expect and want for the output.
expected output:
[[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]]
If I relocate the nan from [1, 3] to [2, 3], then I obtain the expected output.
Does someone know, how inpaint from openCV handels the edges and what is the appropriate way to interpolate the edges?

How can i find gradient input?

How can i find log likilihood layer if i have:
logP = [[-5.8971105e+00 -1.3536860e-01 -2.3225722e+00 -3.6559267e+00]
[-7.1035299e+00 -7.1037712e+00 -8.0828800e+00 -1.9549085e-03]]
oneHotTruth = [[0. 0. 0. 1.]
[0. 0. 0. 1.]]
gradInput should be equal:
[[ 0. 0. 0. -0.5]
[ 0. 0. 0. -0.5]]
Need to implement without using the library pytorch / tf

Randomizing Array Values

I have a 4x4 array.
Initially the values are all set to 0, however I want to change innermost 2x2 values to a random float.
a = np.zeros((4,4))
print(a)
a[1:3,1:3] = random.uniform(-1,1)
print(a)
Creates an output:
[[0. 0. 0. 0. ]
[0. 0.66529858 0.66529858 0. ]
[0. 0.66529858 0.66529858 0. ]
[0. 0. 0. 0. ]]
When the desired outcome would be:
[[0. 0. 0. 0. ]
[0. 0.random0 0.random1 0. ]
[0. 0.random2 0.random3 0. ]
[0. 0. 0. 0. ]]

You need to use the size argument to generate a 2 by 2 random matrix:
a[1:3,1:3] = random.uniform(-1,1,size=(2,2))

Number of features of the model must match the input?

I'm trying to use a RandomForestClassifier on some data I have. The code is below:
print train_data[0,0:20]
print train_data[0,21::]
print test_data[0]
print 'Training...'
forest = RandomForestClassifier(n_estimators=100)
forest = forest.fit( train_data[0::,0::20], train_data[0::,21::] )
print 'Predicting...'
output = forest.predict(test_data)
but this generates the following error:
ValueError: Number of features of the model must match the input.
Model n_features is 3 and input n_features is 21
The output from the first three print statements is:
[ 0. 0. 0. 0. 1. 0.
0. 0. 0. 0. 1. 0.
0. 0. 0. 37.7745986 -122.42589168
0. 0. 0. ]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
1. 0.]
[ 0. 0. 0. 0. 0. 0.
0. 1. 0. 0. 1. 0.
0. 0. 0. 0. 37.73505101
-122.3995877 0. 0. 0. ]
I had assumed that the data was in the correct format for my fit/predict calls, but it is erroring out on the predict. Can anyone see what I am doing wrong here?

The input data used to train the model is train_data[0::,0::20], which I think is a mistake (why skip features in between?) -- it should be train_data[0::,0:20] instead based on the debug prints you did in the beginning.
Also, it seems that the last column represents the labels in both train_data and test_data. When predicting, you might want to pass test_data[:, :20] instead of test_data when calling thepredict function.

Python: Appending dimensions to a bidimensional array

Suppose you have an array (m, m) and want to make it (n, n). For example, transforming a 2x2 matrix to a 6x6. So:
[[ 1. 2.]
[ 3. 4.]]
To:
[[ 1. 2. 0. 0. 0. 0.]
[ 3. 4. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]]
This is what I'm doing:
def array_append(old_array, new_shape):
old_shape = old_array.shape
dif = np.array(new_shape) - np.array(old_array.shape)
rows = []
for i in xrange(dif[0]):
rows.append(np.zeros((old_array.shape[0])).tolist())
new_array = np.append(old_array, rows, axis=0)
columns = []
for i in xrange(len(new_array)):
columns.append(np.zeros(dif[1]).tolist())
return np.append(new_array, columns, axis=1)
Example use:
test1 = np.ones((2,2))
test2 = np.zeros((6,6))
print array_append(test1, test2.shape)
Output:
[[ 1. 1. 0. 0. 0. 0.]
[ 1. 1. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]]
Based on this answer. But that's a lot of code for an (imho) simple operation. Is there a more concise/pythonic way to do it?

Why not use array = numpy.zeros((6,6)), see the numpy docs...
EDIT, woops, question has been edited... I guess you are trying to put ones in a section of an array filled with zeros? Then:
array = numpy.zeros((6,6))
array[0:2,0:2] = 1
If the small matrix does not all have the value of 1:
array[ystart:yend,xstart:xend] = smallermatrix

That would be then:
# test1= np.ones((2, 2))
test1= np.random.randn((2, 2))
test2= np.zeros((6, 6))
test2[0: 2, 0: 2]= test1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Spliting an array into train and test sets with python - python

Related

Interpolation issues at the edges with inpaint from openCV

How can i find gradient input?

Randomizing Array Values

Number of features of the model must match the input?

Python: Appending dimensions to a bidimensional array

Categories

Resources