Python 2.7.3
numpy 1.8.0
Hi all,
I am using numpy for a few months and I need help with some basic stuff. The code below should work and the bit I need help with is highlighted (# <<<<<<<):
import numpy as np
rng = np.random.RandomState(12345)
samples = np.array(np.arange(400).reshape(50, 8))
nSamples = samples.shape[0]
FOLDS = 15
foldSize = nSamples / FOLDS
indices = np.arange(nSamples)
rng.shuffle(indices)
slices = [slice(i * foldSize ,
(i + 1) * foldSize, 1) for i in xrange(FOLDS + 1)]
for i in xrange(len(slices)):
y = samples[indices[slices[i]]]
x = np.array([x for x in samples if x not in samples[slices[i]]]) # <<<<<<<
#do some processing with x and y
Basically random slices a 2D array row-wisely, use the full array to process and test in the sliced bit, then repeat for the for another slice util everything is done (It called an cross-validation experiment).
My question is: Is there a better way to select all rows in a ndarray but a slice? Am I missing something? What is the advised way to [x for x in samples if x not in samples[indices][0:3]] ?
Thanks in advance.
ps: masked arrays does not solve my problem.
ps1: I know it's already implemented elsewhere, I just need to learn.
You can create a boolean array for the rows to select as follows:
indices_to_ignore = [1, 2, 3]
mask = np.ones(samples.shape[:1], dtype=np.bool)
mask[indices_to_ignore] = 0
samples[mask].shape
Related
I am trying to create a view x_ of a vector x that is augmented, but still references the same memory location as x. That way, I don't need to run an augment function every time I want the augmented vector, but can simply refer to x_.
Is there a way to re-write this so that the assertion below is true? I am looking to maximize efficiency.
import numpy as np
x = np.arange(10)
ones = np.ones(len(x), dtype=x.dtype)
x_ = np.stack([x, ones], axis=0)
x[0] = 11
assert x_[0, 0] == 11
Note: I have a feeling that this could be impossible or inefficient because it would break contiguous storage. I would appreciate an explanation if this is the case.
In my Python script, I load a 1D vector from a .dat file. I want to use that vector as the first column in a matrix, where the second column is filled with 1s. This is how I did it:
x = np.loadtxt( 'x.dat' )
m = x.shape[0]
X = np.concatenate((x.reshape((m,1)), np.ones((m,1))), axis=1)
Is there a more efficient way to achieve this?
It looks perfectly reasonable to me. If you are really squeezing for performance you can try the following:
X = np.ones((m, 2))
X[:,0] = x
or:
X = np.empty((m, 2))
X[:,0] = x
X[:,1] = 1
I wouldn't be surprised if they're slower though.
However, I expect that most of your performance is not lost on adding an extra column, but instead on np.loadtxt. Consider changing your data format to np.save/np.load, which is much faster.
I have 2 arrays to concatenate:
X_train's shape is (3072, 50000)
y_train's shape is (50000,)
I'd like to concatenate them so I can shuffle the indices all in one go. I have tried the following, but neither works:
np.concatenate([X_train, np.transpose(y_train)])
np.column_stack([X_train, np.transpose(y_train)])
How can I concatenate them?
To give you some recommendation targeting the task, not your problem: don't do this!
Assuming X are your samples / observations, y are your targets:
Just generate a random-permutation and create views (nothing copied or modified) into those, e.g. (untested):
import numpy as np
X = np.random.random(size=(50000, 3072))
y = np.random.random(size=50000)
perm = np.random.permutation(X.shape[0]) # assuming X.shape[0] == y.shape[0]
X_perm = X[perm] # views!!!
y_perm = y[perm]
Reminder: your start-shapes are not compatible to most python-based ml-tools as the usual interpretation is:
first-dim / rows: samples
second-dim / cols: features
As #samples need to be the same as #target-values y, you will see that my example is correct in regards to this, while yours need a transpose on X
As DavidG said, I realized the answer is that y_train has shape (50000,) so I needed to reshape it before concat-ing
np.concatenate([X_train,
np.reshape(y_train, (1, 50000))])
Still, this evaluated very slowly in Jupyter. If there's a faster answer, I'd be grateful to have it
My code is running fine for first iteration but after that it outputs the following error:
ValueError: matrix must be 2-dimensional
To the best of my knowledge (which is not much in python), my code is correct. but I don't know, why it is not running correctly for all given iterations. Could anyone help me in this problem.
from __future__ import division
import numpy as np
import math
import matplotlib.pylab as plt
import sympy as sp
from numpy.linalg import inv
#initial guesses
x = -2
y = -2.5
i1 = 0
while i1<5:
F= np.matrix([[(x**2)+(x*y**3)-9],[(3*y*x**2)-(y**3)-4]])
theta = np.sum(F)
J = np.matrix([[(2*x)+y**3, 3*x*y**2],[6*x*y, (3*x**2)-(3*y**2)]])
Jinv = inv(J)
xn = np.array([[x],[y]])
xn_1 = xn - (Jinv*F)
x = xn_1[0]
y = xn_1[1]
#~ print theta
print xn
i1 = i1+1
I believe xn_1 is a 2D matrix. Try printing it you and you will see [[something], [something]]
Therefore to get the x and y, you need to use multidimensional indexing. Here is what I did
x = xn_1[0,0]
y = xn_1[1,0]
This works because within the 2D matrix xn_1 are two single element arrays. Therefore we need to further index 0 to get that single element.
Edit: To clarify, xn_1[1,0] means to index 1 and then take that subarray and index 0 on that. And although according to Scipy it may seem that it should be functionally equivalent to xn_1[1][0], that only applies to the general np.array type and not the np.matrix type. Here is an excellent thread on SO that explains this.
So you should use the xn_1[1,0] way to get the element you want.
xn_1 is a numpy matrix, so it's elements are accessed with the item() method, not like an array. (with []s)
So just change
x = xn_1[0]
y = xn_1[1]
to
x = xn_1.item(0)
y = xn_1.item(1)
I have the following code to create a random subset (of size examples) of a large set:
def sampling(input_set):
tmp = random.sample(input_set, examples)
return tmp
The problem is that my input is a large matrix, so input_set.shape = (n,m). However, sampling(input_set) is a list, while I want it to be a submatrix of size = (examples, m), not a list of length examples of vectors of size m.
I modified my code to do this:
def sampling(input_set):
tmp = random.sample(input_set, examples)
sample = input_set[0:examples]
for i in range(examples):
sample[i] = tmp[i]
return sample
This works, but is there a more elegant/better way to accomplish what I am trying to do?
Use numpy as follow to create a n x m matrix (assuming input_set is a list)
import numpy as np
input_matrix = np.array(input_set).reshape(n,m)
Ok, if i understand correctly the question you just want to drop the last couple of rolls (n - k) so:
sample = input_matrix[:k - n]
must do the job for you.
Don't know if still interested in, but maybe you do something like this:
#select a random 6x6 matrix with items -10 / 10
import numpy as np
mat = np.random.randint(-10,10,(6,6))
print (mat)
#select a random int between 0 and 5
startIdx = np.random.randint(0,5)
print(startIdx)
#extracy submatrix (will be less than 3x3 id the index is out of bounds)
print(mat[startIdx:startIdx+3,startIdx:startIdx+3])