Spliting a numpy ndarray using slices - python

Python 2.7.3
numpy 1.8.0
Hi all,
I am using numpy for a few months and I need help with some basic stuff. The code below should work and the bit I need help with is highlighted (# <<<<<<<):
import numpy as np
rng = np.random.RandomState(12345)
samples = np.array(np.arange(400).reshape(50, 8))
nSamples = samples.shape[0]
FOLDS = 15
foldSize = nSamples / FOLDS
indices = np.arange(nSamples)
rng.shuffle(indices)
slices = [slice(i * foldSize ,
(i + 1) * foldSize, 1) for i in xrange(FOLDS + 1)]
for i in xrange(len(slices)):
y = samples[indices[slices[i]]]
x = np.array([x for x in samples if x not in samples[slices[i]]]) # <<<<<<<
#do some processing with x and y
Basically random slices a 2D array row-wisely, use the full array to process and test in the sliced bit, then repeat for the for another slice util everything is done (It called an cross-validation experiment).
My question is: Is there a better way to select all rows in a ndarray but a slice? Am I missing something? What is the advised way to [x for x in samples if x not in samples[indices][0:3]] ?
Thanks in advance.
ps: masked arrays does not solve my problem.
ps1: I know it's already implemented elsewhere, I just need to learn.

You can create a boolean array for the rows to select as follows:
indices_to_ignore = [1, 2, 3]
mask = np.ones(samples.shape[:1], dtype=np.bool)
mask[indices_to_ignore] = 0
samples[mask].shape

Related

Numpy: create a "superset" array view

I am trying to create a view x_ of a vector x that is augmented, but still references the same memory location as x. That way, I don't need to run an augment function every time I want the augmented vector, but can simply refer to x_.
Is there a way to re-write this so that the assertion below is true? I am looking to maximize efficiency.
import numpy as np
x = np.arange(10)
ones = np.ones(len(x), dtype=x.dtype)
x_ = np.stack([x, ones], axis=0)
x[0] = 11
assert x_[0, 0] == 11
Note: I have a feeling that this could be impossible or inefficient because it would break contiguous storage. I would appreciate an explanation if this is the case.

Numpy: Is this the most efficient way to add a column of ones to a 1D vector?

In my Python script, I load a 1D vector from a .dat file. I want to use that vector as the first column in a matrix, where the second column is filled with 1s. This is how I did it:
x = np.loadtxt( 'x.dat' )
m = x.shape[0]
X = np.concatenate((x.reshape((m,1)), np.ones((m,1))), axis=1)
Is there a more efficient way to achieve this?
It looks perfectly reasonable to me. If you are really squeezing for performance you can try the following:
X = np.ones((m, 2))
X[:,0] = x
or:
X = np.empty((m, 2))
X[:,0] = x
X[:,1] = 1
I wouldn't be surprised if they're slower though.
However, I expect that most of your performance is not lost on adding an extra column, but instead on np.loadtxt. Consider changing your data format to np.save/np.load, which is much faster.

How to append numpy array to numpy array of different size?

I have 2 arrays to concatenate:
X_train's shape is (3072, 50000)
y_train's shape is (50000,)
I'd like to concatenate them so I can shuffle the indices all in one go. I have tried the following, but neither works:
np.concatenate([X_train, np.transpose(y_train)])
np.column_stack([X_train, np.transpose(y_train)])
How can I concatenate them?
To give you some recommendation targeting the task, not your problem: don't do this!
Assuming X are your samples / observations, y are your targets:
Just generate a random-permutation and create views (nothing copied or modified) into those, e.g. (untested):
import numpy as np
X = np.random.random(size=(50000, 3072))
y = np.random.random(size=50000)
perm = np.random.permutation(X.shape[0]) # assuming X.shape[0] == y.shape[0]
X_perm = X[perm] # views!!!
y_perm = y[perm]
Reminder: your start-shapes are not compatible to most python-based ml-tools as the usual interpretation is:
first-dim / rows: samples
second-dim / cols: features
As #samples need to be the same as #target-values y, you will see that my example is correct in regards to this, while yours need a transpose on X
As DavidG said, I realized the answer is that y_train has shape (50000,) so I needed to reshape it before concat-ing
np.concatenate([X_train,
np.reshape(y_train, (1, 50000))])
Still, this evaluated very slowly in Jupyter. If there's a faster answer, I'd be grateful to have it

Newton method in python for multivariables (system of equations)

My code is running fine for first iteration but after that it outputs the following error:
ValueError: matrix must be 2-dimensional
To the best of my knowledge (which is not much in python), my code is correct. but I don't know, why it is not running correctly for all given iterations. Could anyone help me in this problem.
from __future__ import division
import numpy as np
import math
import matplotlib.pylab as plt
import sympy as sp
from numpy.linalg import inv
#initial guesses
x = -2
y = -2.5
i1 = 0
while i1<5:
F= np.matrix([[(x**2)+(x*y**3)-9],[(3*y*x**2)-(y**3)-4]])
theta = np.sum(F)
J = np.matrix([[(2*x)+y**3, 3*x*y**2],[6*x*y, (3*x**2)-(3*y**2)]])
Jinv = inv(J)
xn = np.array([[x],[y]])
xn_1 = xn - (Jinv*F)
x = xn_1[0]
y = xn_1[1]
#~ print theta
print xn
i1 = i1+1
I believe xn_1 is a 2D matrix. Try printing it you and you will see [[something], [something]]
Therefore to get the x and y, you need to use multidimensional indexing. Here is what I did
x = xn_1[0,0]
y = xn_1[1,0]
This works because within the 2D matrix xn_1 are two single element arrays. Therefore we need to further index 0 to get that single element.
Edit: To clarify, xn_1[1,0] means to index 1 and then take that subarray and index 0 on that. And although according to Scipy it may seem that it should be functionally equivalent to xn_1[1][0], that only applies to the general np.array type and not the np.matrix type. Here is an excellent thread on SO that explains this.
So you should use the xn_1[1,0] way to get the element you want.
xn_1 is a numpy matrix, so it's elements are accessed with the item() method, not like an array. (with []s)
So just change
x = xn_1[0]
y = xn_1[1]
to
x = xn_1.item(0)
y = xn_1.item(1)

Create random submatrix of large matrix in python

I have the following code to create a random subset (of size examples) of a large set:
def sampling(input_set):
tmp = random.sample(input_set, examples)
return tmp
The problem is that my input is a large matrix, so input_set.shape = (n,m). However, sampling(input_set) is a list, while I want it to be a submatrix of size = (examples, m), not a list of length examples of vectors of size m.
I modified my code to do this:
def sampling(input_set):
tmp = random.sample(input_set, examples)
sample = input_set[0:examples]
for i in range(examples):
sample[i] = tmp[i]
return sample
This works, but is there a more elegant/better way to accomplish what I am trying to do?
Use numpy as follow to create a n x m matrix (assuming input_set is a list)
import numpy as np
input_matrix = np.array(input_set).reshape(n,m)
Ok, if i understand correctly the question you just want to drop the last couple of rolls (n - k) so:
sample = input_matrix[:k - n]
must do the job for you.
Don't know if still interested in, but maybe you do something like this:
#select a random 6x6 matrix with items -10 / 10
import numpy as np
mat = np.random.randint(-10,10,(6,6))
print (mat)
#select a random int between 0 and 5
startIdx = np.random.randint(0,5)
print(startIdx)
#extracy submatrix (will be less than 3x3 id the index is out of bounds)
print(mat[startIdx:startIdx+3,startIdx:startIdx+3])

Categories