Null numpy array to be appended to - python

I'm writing a feature selection code. Basically get the output from featureselection function and concatenate it to the numpy array data
data=np.zeros([1,4114]) # put feature length here
for i in range(1,N):
filename=splitpath+str(i)+'.tiff'
feature=featureselection(filename)
data=np.vstack((data, feature))
data=data[1:,:] # remove the first zeros row
However, this is not a robust implementation as I need to know feature length (4114) beforehand.
Is there any null numpy array matrix, like in Python list we have []?

Appending to a numpy array in a loop is inefficient, there might be some situations when it cannot be avoided but this doesn't seem to be one of them. If you know the size of the array that you'll end up with, it's best to just per-allocate the array, something like this:
data = np.zeros([N, 4114])
for i in range(1, N):
filename = splitpath+str(i)+'.tiff'
feature = featureselection(filename)
data[i] = feature
Sometimes you don't know the size of the final array. There are several ways to deal with this case, but the simplest is probably to use a temporary list, something like:
data = []
for i in range(1,N):
filename = splitpath+str(i)+'.tiff'
feature = featureselection(filename)
data.append(feature)
data = np.array(data)
Just for completeness, you can also do data = np.zeros([0, 4114]), but I would recommend against that and suggest one of the methods above.

If you don't want to assume the size before creating the first array, you can use lazy initialization.
data = None
for i in range(1,N):
filename=splitpath+str(i)+'.tiff'
feature=featureselection(filename)
if data is None:
data = np.zeros(( 0, feature.size ))
data = np.vstack((data, feature))
if data is None:
print 'no features'
else:
print data.shape

Related

How to concatenate 2D array with a chunk size

I have the following 2D array and I would like to find a way to generate another 2D array but with data concatened with a chunk size.
array_2d = [
[0,0,0,0,0,0,0,1],
[0,0,0,0,0,0,0,1],
[0,0,0,0,0,1,1,1],
[0,0,0,0,0,0,1,1],
[0,0,0,0,0,1,1,1]
]
For example, with a chunk size of 2 the above 2D array will be changed to:
array_2d = [
[0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1],
[0,0,0,0,0,0,1,1,0,0,0,0,0,1,1,1],
[0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1]
]
Note that the last element has been zero padded on the left.
Thanks for help.
array_2d = [
[0,0,0,0,0,0,0,1],
[0,0,0,0,0,0,0,1],
[0,0,0,0,0,1,1,1],
[0,0,0,0,0,0,1,1],
[0,0,0,0,0,1,1,1]
]
def chunkconcat(chunk, init_data):
data_arr = init_data
# padding
while (len(data_arr) % chunk) != 0:
data_arr.append([0 for _ in range(len(data_arr[0]))])
# division into smaller chunks
divide_data = [data_arr[i*chunk:(i+1)*chunk] for i in range(int(len(data_arr)/chunk))]
print(divide_data)
new_arr = []
# use the slices by reversing order and concatenating them
for slice in divide_data:
tmp = []
for ind in range(chunk):
tmp.append(slice[-1])
slice.pop()
new_arr.append(np.concatenate(tmp))
final_data = np.array(new_arr)
print(final_data)
chunkconcat(2, array_2d)
So you can also follow the solution and build your own solutions based on this, I did it explicitly.
Basically you start by padding vectors of zeros to be able to do the wanted conversion/concatenation or however you want to call it.
Afterwards you divide the data set into smaller parts of your wanted chunksize. Reversing order and passing it to temporary list enables us to get the order right and concatenating them, afterwards we add the result to our final list ('matrix').
Converting it into a numpy array in a final step to have a real matrix.
Using predefined functions and list comprehensions you could solve this in few lines, but this would not be very helpful for you I guess.
Take care.

How to preserve the data type witha mixed data type array?

I have an array that its initialized as an empty array. Inside the array, there will be some values that are integer and other that are floats. After a for cycle, all the output are floats (scientific notation) regardless if the previous data was an integer. I need the integers to stay as integers.
Here is how the for cycle looks like:
arr = np.empty(arr_size)
co_point = np.uint8(arr_size[1]/2)
for k in range(arr_size[0]):
p1_idx = int(k%np.array(first_list).shape[0])
p2_idx = int((k+1)%np.array(first_list).shape[0])
arr[k, 0:co_point] = first_list[p1_idx][0:co_point]
arr[k, co_point:] = first_list[p2_idx][co_point:]
first_list is a list that cointains other lists with numbers.
For example, the output of this would be:
>>arr
array([[1.0e+02, 1.0e-05, 1.0e-02, 3.0e-01, 3.2e+01]])
And the desired output is something like this:
>>arr
array([[100, 1.0e-05, 0.001, 0.3, 32]])
How can I achieve this?
I did a workaround that does the trick, but I'm sure that there are better or more optimal answers. I'll share it here for anyone that also has some similar issue.
The original code is this:
arr = np.empty(arr_size)
co_point = np.uint8(arr_size[1]/2)
for k in range(arr_size[0]):
p1_idx = int(k%np.array(first_list).shape[0])
p2_idx = int((k+1)%np.array(first_list).shape[0])
arr[k, 0:co_point] = first_list[p1_idx][0:co_point]
arr[k, co_point:] = first_list[p2_idx][co_point:]
I added an aditional array of same dimentions of the arr array that holds only the data type of the first_list list.
dtypes = np.empty(arr_size,dtype=type)
for x in range(arr_size[0]):
for y in range(arr_size[1]):
dtypes[x,y]=type(first_list[x][y])
Then, when the loop is finished I need to recast the values of arr to the original types of first_list, but because arr forces the data type, I created a new list with an auxiliary list to append all the values.
r_arr = []
for x in range(arr_size[0]):
aux = []
for y in range(arr_size[1]):
aux.append(dtypes[x,y](arr[x,y]))
r_arr.append(aux)
The trick is that having an array that stores dtype=type data type, enables you to actually do dtypes[x,y](arr[x,y]) in the same way that you would do type(1.13).
Hope its clear and you find it useful!

Write to multidimensional arrays by index

I need to write the processed data to a multidimensional array cache, but I don't know how to do this easily.
A simple example:
x = np.random.rand(5,2,5,3)
ind = np.array([True,True,False,True,False])
dat = np.random.rand(3,3,3)
The way I want it to be:
x[ind,-1][:,ind] = dat
But the indexing method produces a copy, and the data is not actually written in.
I'm looking for whether there's a simple and straightforward way to do that, thank you.
Use np.ix_ -
x[np.ix_(ind,[-1],ind)] = dat[:,None]
Another one in two lines again with np.ix_ -
r,c = np.ix_(ind,ind)
x[r,-1,c] = dat
Another one using the integer indices off the mask -
indx = np.flatnonzero(ind)
x[indx[:,None],-1,indx] = dat

Defining a matrix with unknown size in python

I want to use a matrix in my Python code but I don't know the exact size of my matrix to define it.
For other matrices, I have used np.zeros(a), where a is known.
What should I do to define a matrix with unknown size?
In this case, maybe an approach is to use a python list and append to it, up until it has the desired size, then cast it to a np array
pseudocode:
matrix = []
while matrix not full:
matrix.append(elt)
matrix = np.array(matrix)
You could write a function that tries to modify the np.array, and expand if it encounters an IndexError:
x = np.random.normal(size=(2,2))
r,c = (5,10)
try:
x[r,c] = val
except IndexError:
r0,c0 = x.shape
r_ = r+1-r0
c_ = c+1-c0
if r > 0:
x = np.concatenate([x,np.zeros((r_,x.shape[1]))], axis = 0)
if c > 0:
x = np.concatenate([x,np.zeros((x.shape[0],c_))], axis = 1)
There are problems with this implementation though: First, it makes a copy of the array and returns a concatenation of it, which translates to a possible bottleneck if you use it many times. Second, the code I provided only works if you're modifying a single element. You could do it for slices, and it would take more effort to modify the code; or you can go the whole nine yards and create a new object inheriting np.array and override the .__getitem__ and .__setitem__ methods.
Or you could just use a huge matrix, or better yet, see if you can avoid having to work with matrices of unknown size.
If you have a python generator you can use np.fromiter:
def gen():
yield 1
yield 2
yield 3
In [11]: np.fromiter(gen(), dtype='int64')
Out[11]: array([1, 2, 3])
Beware if you pass an infinite iterator you will most likely crash python, so it's often a good idea to cap the length (with the count argument):
In [21]: from itertools import count # an infinite iterator
In [22]: np.fromiter(count(), dtype='int64', count=3)
Out[22]: array([0, 1, 2])
Best practice is usually to either pre-allocate (if you know the size) or build the array as a list first (using list.append). But lists don't build in 2d very well, which I assume you want since you specified a "matrix."
In that case, I'd suggest pre-allocating an oversize scipy.sparse matrix. These can be defined to have a size much larger than your memory, and lil_matrix or dok_matrix can be built sequentially. Then you can pare it down once you enter all of your data.
from scipy.sparse import dok_matrix
dummy = dok_matrix((1000000, 1000000)) # as big as you think you might need
for i, j, data in generator():
dummy[i,j] = data
s = np.array(dummy.keys).max() + 1
M = dummy.tocoo[:s,:s] #or tocsr, tobsr, toarray . . .
This way you build your array as a Dictionary of Keys (dictionaries supporting dynamic assignment much better than ndarray does) , but still have a matrix-like output that can be (somewhat) efficiently used for math, even in a partially built state.

Select cells randomly from NumPy array - without replacement

I'm writing some modelling routines in NumPy that need to select cells randomly from a NumPy array and do some processing on them. All cells must be selected without replacement (as in, once a cell has been selected it can't be selected again, but all cells must be selected by the end).
I'm transitioning from IDL where I can find a nice way to do this, but I assume that NumPy has a nice way to do this too. What would you suggest?
Update: I should have stated that I'm trying to do this on 2D arrays, and therefore get a set of 2D indices back.
How about using numpy.random.shuffle or numpy.random.permutation if you still need the original array?
If you need to change the array in-place than you can create an index array like this:
your_array = <some numpy array>
index_array = numpy.arange(your_array.size)
numpy.random.shuffle(index_array)
print your_array[index_array[:10]]
All of these answers seemed a little convoluted to me.
I'm assuming that you have a multi-dimensional array from which you want to generate an exhaustive list of indices. You'd like these indices shuffled so you can then access each of the array elements in a randomly order.
The following code will do this in a simple and straight-forward manner:
#!/usr/bin/python
import numpy as np
#Define a two-dimensional array
#Use any number of dimensions, and dimensions of any size
d=numpy.zeros(30).reshape((5,6))
#Get a list of indices for an array of this shape
indices=list(np.ndindex(d.shape))
#Shuffle the indices in-place
np.random.shuffle(indices)
#Access array elements using the indices to do cool stuff
for i in indices:
d[i]=5
print d
Printing d verified that all elements have been accessed.
Note that the array can have any number of dimensions and that the dimensions can be of any size.
The only downside to this approach is that if d is large, then indices may become pretty sizable. Therefore, it would be nice to have a generator. Sadly, I can't think of how to build a shuffled iterator off-handedly.
Extending the nice answer from #WoLpH
For a 2D array I think it will depend on what you want or need to know about the indices.
You could do something like this:
data = np.arange(25).reshape((5,5))
x, y = np.where( a = a)
idx = zip(x,y)
np.random.shuffle(idx)
OR
data = np.arange(25).reshape((5,5))
grid = np.indices(data.shape)
idx = zip( grid[0].ravel(), grid[1].ravel() )
np.random.shuffle(idx)
You can then use the list idx to iterate over randomly ordered 2D array indices as you wish, and to get the values at that index out of the data which remains unchanged.
Note: You could also generate the randomly ordered indices via itertools.product too, in case you are more comfortable with this set of tools.
Use random.sample to generates ints in 0 .. A.size with no duplicates,
then split them to index pairs:
import random
import numpy as np
def randint2_nodup( nsample, A ):
""" uniform int pairs, no dups:
r = randint2_nodup( nsample, A )
A[r]
for jk in zip(*r):
... A[jk]
"""
assert A.ndim == 2
sample = np.array( random.sample( xrange( A.size ), nsample )) # nodup ints
return sample // A.shape[1], sample % A.shape[1] # pairs
if __name__ == "__main__":
import sys
nsample = 8
ncol = 5
exec "\n".join( sys.argv[1:] ) # run this.py N= ...
A = np.arange( 0, 2*ncol ).reshape((2,ncol))
r = randint2_nodup( nsample, A )
print "r:", r
print "A[r]:", A[r]
for jk in zip(*r):
print jk, A[jk]
Let's say you have an array of data points of size 8x3
data = np.arange(50,74).reshape(8,-1)
If you truly want to sample, as you say, all the indices as 2d pairs, the most compact way to do this that i can think of, is:
#generate a permutation of data's size, coerced to data's shape
idxs = divmod(np.random.permutation(data.size),data.shape[1])
#iterate over it
for x,y in zip(*idxs):
#do something to data[x,y] here
pass
Moe generally, though, one often does not need to access 2d arrays as 2d array simply to shuffle 'em, in which case one can be yet more compact. just make a 1d view onto the array and save yourself some index-wrangling.
flat_data = data.ravel()
flat_idxs = np.random.permutation(flat_data.size)
for i in flat_idxs:
#do something to flat_data[i] here
pass
This will still permute the 2d "original" array as you'd like. To see this, try:
flat_data[12] = 1000000
print data[4,0]
#returns 1000000
people using numpy version 1.7 or later there can also use the builtin function numpy.random.choice

Categories