I have the following code to create a random subset (of size examples) of a large set:
def sampling(input_set):
tmp = random.sample(input_set, examples)
return tmp
The problem is that my input is a large matrix, so input_set.shape = (n,m). However, sampling(input_set) is a list, while I want it to be a submatrix of size = (examples, m), not a list of length examples of vectors of size m.
I modified my code to do this:
def sampling(input_set):
tmp = random.sample(input_set, examples)
sample = input_set[0:examples]
for i in range(examples):
sample[i] = tmp[i]
return sample
This works, but is there a more elegant/better way to accomplish what I am trying to do?
Use numpy as follow to create a n x m matrix (assuming input_set is a list)
import numpy as np
input_matrix = np.array(input_set).reshape(n,m)
Ok, if i understand correctly the question you just want to drop the last couple of rolls (n - k) so:
sample = input_matrix[:k - n]
must do the job for you.
Don't know if still interested in, but maybe you do something like this:
#select a random 6x6 matrix with items -10 / 10
import numpy as np
mat = np.random.randint(-10,10,(6,6))
print (mat)
#select a random int between 0 and 5
startIdx = np.random.randint(0,5)
print(startIdx)
#extracy submatrix (will be less than 3x3 id the index is out of bounds)
print(mat[startIdx:startIdx+3,startIdx:startIdx+3])
Related
A is a k dimensional numpy array of floats (k could be pretty big, e.g. up to 10)
I need to implement an update to A by incrementing each of the values (as described below). I'm wondering if there is a numpy-style way that would be fast.
Let L_i be the length of axis i
An update to this array is generated in two steps follows:
For each axis of A a corresponding vector G is generated.
For example, corresponding to axis i a vector G_i of length L_i is generated (from data).
Update A at all positions by calculating an increment from the G vectors for each position in A
To do this at any particular position, let p be an array of k indices, corresponding to a position in A. Then A at p is incremented by a value calculated as the product:
Product(G_i[p[i]], for i from 0 to k-1)
A full update to A involves doing this operation for all locations in A (i.e. all possible values of p)
This operation would be very slow doing positions one by one via loops.
Is there a numpy style way to do this that would be fast?
edit
## this for three dimensions, final matrix at pos i,j,k has the
## product of c[i]*b[j]*a[k]
## but for arbitrary # of dimensions it will have a loop in a loop
## and will be slow
import numpy as np
a = np.array([1,2])
b = np.array([3,4,5])
c = np.array([6,7,8,9])
ab = []
for bval in b:
ab.append(bval*a)
ab = np.stack(ab)
abc = []
for cval in c:
abc.append(cval*ab)
abc = np.stack(abc)
as a function
def loopfunc(arraylist):
ndim = len(arraylist)
m = arraylist[0]
for i in range(1,ndim):
ml = []
for val in arraylist[i]:
ml.append(val*m)
m = np.stack(ml)
return m
This is a wacky problem, but I like it.
If I understand what you need from your example, you can accomplish this with some reshaping trickery and NumPy's usual broadcasting rules. The idea is to reshape each array so it has the right number of dimensions, then just directly multiply.
Here's a function that implements this.
from functools import reduce
import operator
import numpy as np
import scipy.linalg
def wacky_outer_product(*arrays):
assert len(arrays) >= 2
assert all(arr.ndim == 1 for arr in arrays)
ndim = len(arrays)
shapes = scipy.linalg.toeplitz((-1,) + (1,) * (ndim - 1))
reshaped = (arr.reshape(new_shape) for arr, new_shape in zip(arrays, shapes))
return reduce(operator.mul, reshaped).T
Testing this on your example arrays, we have:
>>> foo = wacky_outer_product(a, b, c)
>>> np.all(foo, abc)
True
Edit
Ok, the above function is fun, but the below is probably much better. No transposing, clearer, and much smaller:
from functools import reduce
import operator
import numpy as np
def wacky_outer_product(*arrays):
return reduce(operator.mul, np.ix_(*reversed(arrays)))
How can I transpose a 3D array in a similar fashion to a 2D array, except that the entries at the lowest level are arrays of three instead of scalar values?
This is what I mean:
M = [[[0,0,0][1,1,1][2,2,2]]
[[0,0,0][0,0,0][3,3,3]]
[[0,0,0][0,0,0][0,0,0]]]
N = some_operation(M)
N = [[[0,0,0][0,0,0][0,0,0]]
[[1,1,1][0,0,0][0,0,0]]
[[2,2,2][3,3,3][0,0,0]]]
I have an example in python code that shows what I mean as well:
import numpy as np
M = np.array([[[0,0,0],[1,1,1],[2,2,2]],[[0,0,0],[0,0,0],[3,3,3]],[[0,0,0],[0,0,0],[0,0,0]]])
N = np.array([[[0,0,0],[0,0,0],[0,0,0]],[[1,1,1],[0,0,0],[0,0,0]],[[2,2,2],[3,3,3],[0,0,0]]])
print(M)
print('\n\n')
print(M_flipped)
The np.transpose() function doesn't seem to be adaptable for my case.
Thanks in advance.
Simply permute axes with np.transpose -
N = M.transpose(1,0,2)
Or with np.moveaxis -
N = np.moveaxis(M,0,1)
With np.rollaxis -
N = np.rollaxis(M,1,0)
I must calculate the mean in this specific part of the matrix, that was generated with random numbers, my work so far:
import random as rd
import numpy as np
matriz= np.zeros([12, 12])
for i in range(0,12):
for j in range(0,12):
matriz[i,j]=rd.randint(0,10)
Your problem is trying to fit an algorithm. There's clearly a structure to the "marked" section of your matrix, so your problem is in trying to see/identify this structure, in order to fit it.
What I see is a pattern: starting with row 0, you're taking columns from 1 to n-1, then in row 1 you're taking columns 2 to n-2, etc. So basically, you're summing the coordinates for each row in range(rowIndex+1, len(columns)-(rowIndex+1))
There may be some more elegant ways to achieve this, but I think this will work:
import random as rd
import numpy as np
l, w = 12, 12 # matrix has dimensions l=12, w=12
matriz= np.zeros([l, w])
for i in range(l):
for j in range(w):
matriz[i,j]=rd.randint(0,10)
vals = []
for i in range(int(l/2)): # iterate through rows 0 to 4
for j in range(i+1, w-(i+1)):
vals.append(matrix[i,j])
# get the mean:
print 'mean is {}'.format(sum(vals)/len(vals))
Note this probably doesn't work for a non-square matrix.
I have many arrays of different length and what I want to do is to have for those arrays a fixed length, let's say 100 samples. These arrays contain time series and I do not want to lose the shape of those series while reducing the size of the array. What I think I need here is an undersampling algorithm. Is there an easy way to reduce the number of samples in an array doing like an average on some of those values?
Thanks
Heres a little script to do it without numpy. Maintains shape even if length required is larger than the length of the array.
from math import floor
def sample(input, count):
output = []
sample_size = float(len(input)) / count
for i in range(count):
output.append(input[int(floor(i * sample_size))])
return output
if you use a slice with generated random indices, and you keep your original array (or only the shape of it to reduce memory usage):
import numpy as np
input_data = somearray
shape = input_data.shape
n_samples= 100
inds = np.random.randint(0,shape[0], size=n_samples)
sub_samples = input_data[inds]
Here's a shorter version of Nick Fellingham's answer.
from math import floor
def sample(input,count):
ss=float(len(input))/count
return [ input[int(floor(i*ss))] for i in range(count) ]
Python 2.7.3
numpy 1.8.0
Hi all,
I am using numpy for a few months and I need help with some basic stuff. The code below should work and the bit I need help with is highlighted (# <<<<<<<):
import numpy as np
rng = np.random.RandomState(12345)
samples = np.array(np.arange(400).reshape(50, 8))
nSamples = samples.shape[0]
FOLDS = 15
foldSize = nSamples / FOLDS
indices = np.arange(nSamples)
rng.shuffle(indices)
slices = [slice(i * foldSize ,
(i + 1) * foldSize, 1) for i in xrange(FOLDS + 1)]
for i in xrange(len(slices)):
y = samples[indices[slices[i]]]
x = np.array([x for x in samples if x not in samples[slices[i]]]) # <<<<<<<
#do some processing with x and y
Basically random slices a 2D array row-wisely, use the full array to process and test in the sliced bit, then repeat for the for another slice util everything is done (It called an cross-validation experiment).
My question is: Is there a better way to select all rows in a ndarray but a slice? Am I missing something? What is the advised way to [x for x in samples if x not in samples[indices][0:3]] ?
Thanks in advance.
ps: masked arrays does not solve my problem.
ps1: I know it's already implemented elsewhere, I just need to learn.
You can create a boolean array for the rows to select as follows:
indices_to_ignore = [1, 2, 3]
mask = np.ones(samples.shape[:1], dtype=np.bool)
mask[indices_to_ignore] = 0
samples[mask].shape