Write to multidimensional arrays by index - python

I need to write the processed data to a multidimensional array cache, but I don't know how to do this easily.
A simple example:
x = np.random.rand(5,2,5,3)
ind = np.array([True,True,False,True,False])
dat = np.random.rand(3,3,3)
The way I want it to be:
x[ind,-1][:,ind] = dat
But the indexing method produces a copy, and the data is not actually written in.
I'm looking for whether there's a simple and straightforward way to do that, thank you.

Use np.ix_ -
x[np.ix_(ind,[-1],ind)] = dat[:,None]
Another one in two lines again with np.ix_ -
r,c = np.ix_(ind,ind)
x[r,-1,c] = dat
Another one using the integer indices off the mask -
indx = np.flatnonzero(ind)
x[indx[:,None],-1,indx] = dat

Related

Replace elements in Numpy array by value and location

I am working on a program which will create contour data out of numpy arrays, and trying to avoid calls to matplotlib.
I have an array of length L which contains NxN arrays of booleans. I want to convert this into an LxNxN array where, for example, the "True"s in the first inner array get replaced by "red", in the second, by "blue" and so forth.
The following code works as expected:
import numpy as np
import pdb
def new_layer(N,p):
return np.random.choice(a=[False,True],size=(N,N),p=[p,1-p])
a = np.array([new_layer(3,0.5),new_layer(3,0.5),new_layer(3,0.5)]).astype('object')
colors = np.array(["red","green","blue"])
for i in range(np.shape(a)[0]):
b = a[i]
b[np.where(b==True)] = colors[i]
a[i] = b
print(a)
But I am wondering if there is a way to accomplish the same using Numpy's built-in tools, e.g., indexing. I am a newcomer to Numpy and I suspect there is a better way to do this but I can't think what it would be. Thank you.
You could use np.copyto:
np.copyto(a, colors[:, None, None], where=a.astype(bool))
Here's one way -
a_bool = a.astype(bool)
a[a_bool] = np.repeat(colors,a_bool.sum((1,2)))
Another with extending colors to 3D -
a_bool = a.astype(bool)
colors3D = np.broadcast_to(colors[:,None,None],a.shape)
a[a_bool] = colors3D[a_bool]
You can use a combination of boolean indexes and np.indices. Also you can use a as index to itself. Then you could do what you did in the for loop with this line (although I don't think it necessarily is a good idea):
a[a.astype(bool)] = colors[np.indices(a.shape)[0][a.astype(bool)]]
Also, for the new_layer function you could just use np.random.rand(N,N) > p (not sure if the actual distribution will be exactly the same as what you had).

Defining a matrix with unknown size in python

I want to use a matrix in my Python code but I don't know the exact size of my matrix to define it.
For other matrices, I have used np.zeros(a), where a is known.
What should I do to define a matrix with unknown size?
In this case, maybe an approach is to use a python list and append to it, up until it has the desired size, then cast it to a np array
pseudocode:
matrix = []
while matrix not full:
matrix.append(elt)
matrix = np.array(matrix)
You could write a function that tries to modify the np.array, and expand if it encounters an IndexError:
x = np.random.normal(size=(2,2))
r,c = (5,10)
try:
x[r,c] = val
except IndexError:
r0,c0 = x.shape
r_ = r+1-r0
c_ = c+1-c0
if r > 0:
x = np.concatenate([x,np.zeros((r_,x.shape[1]))], axis = 0)
if c > 0:
x = np.concatenate([x,np.zeros((x.shape[0],c_))], axis = 1)
There are problems with this implementation though: First, it makes a copy of the array and returns a concatenation of it, which translates to a possible bottleneck if you use it many times. Second, the code I provided only works if you're modifying a single element. You could do it for slices, and it would take more effort to modify the code; or you can go the whole nine yards and create a new object inheriting np.array and override the .__getitem__ and .__setitem__ methods.
Or you could just use a huge matrix, or better yet, see if you can avoid having to work with matrices of unknown size.
If you have a python generator you can use np.fromiter:
def gen():
yield 1
yield 2
yield 3
In [11]: np.fromiter(gen(), dtype='int64')
Out[11]: array([1, 2, 3])
Beware if you pass an infinite iterator you will most likely crash python, so it's often a good idea to cap the length (with the count argument):
In [21]: from itertools import count # an infinite iterator
In [22]: np.fromiter(count(), dtype='int64', count=3)
Out[22]: array([0, 1, 2])
Best practice is usually to either pre-allocate (if you know the size) or build the array as a list first (using list.append). But lists don't build in 2d very well, which I assume you want since you specified a "matrix."
In that case, I'd suggest pre-allocating an oversize scipy.sparse matrix. These can be defined to have a size much larger than your memory, and lil_matrix or dok_matrix can be built sequentially. Then you can pare it down once you enter all of your data.
from scipy.sparse import dok_matrix
dummy = dok_matrix((1000000, 1000000)) # as big as you think you might need
for i, j, data in generator():
dummy[i,j] = data
s = np.array(dummy.keys).max() + 1
M = dummy.tocoo[:s,:s] #or tocsr, tobsr, toarray . . .
This way you build your array as a Dictionary of Keys (dictionaries supporting dynamic assignment much better than ndarray does) , but still have a matrix-like output that can be (somewhat) efficiently used for math, even in a partially built state.

List of Lists to 2D Array in Python

I have a list of lists in Python that holds a mix of values, some are strings and some are tuples.
data = [[0,1,2],["a", "b", "c"]]
I am wondering if there is a way to easily convert any length list like that to a 2D Array without using Numpy. I am working with System.Array because that's the format required.
I understand that I can create a new instance of an Array and then use for loops to write all data from list to it. I was just curious if there is a nice Pythonic way of doing that.
x = len(data)
y = len(data[0])
arr = Array.CreateInstance(object, x, y)
Then I can loop through my data and set the arr values right?
arr = Array.CreateInstance(object, x, y)
for i in range(0, len(data),1):
for j in range(0,len(data[0]), 1):
arr.SetValue(data[i][j], i,j)
I want to avoid looping like that if possible. Thank you,
Ps. This is for Excel Interop where I can set a whole Range in Excel by setting it to be equal to an Array. That's why I want to convert a list to an Array. Thank you,
Thing that I am wondering about is that Array is a typed object, is it possible to set its constituents to either string or integer? I think i might be constrained to only one. Right? If so, is there any other type of data that I can use?
Is setting it to Arrayobject ensures that I can combine str/int inside of it?
Also I though I could use this:
arr= Array[Array[object]](map(object, data))
but it throws an error. Any ideas?
You can use Array.CreateInstance to create a single, or multidimensional, array. Since the Array.CreateInstance method takes in a "Type" you specify any type you want. For example:
// gives you an array of string
myArrayOfString = Array.CreateInstance(String, 3)
// gives you an array of integer
myArrayOfInteger = Array.CreateInstance(Int32, 3)
// gives you a multidimensional array of strings and integer
myArrayOfStringAndInteger = [myArrayOfString, myArrayOfInteger]
Hope this helps. Also see the msdn website for examples of how to use Array.CreateInstance.

Appending arrays in numpy

I have a loop that reads through a file until the end is reached. On each pass through the loop, I extract a 1D numpy array. I want to append this array to another numpy array in the 2D direction. That is, I might read in something of the form
x = [1,2,3]
and I want to append it to something of the form
z = [[0,0,0],
[1,1,1]]
I know I can simply do z = numpy.append([z],[x],axis = 0) and achieve my desired result of
z = [[0,0,0],
[1,1,1],
[1,2,3]]
My issue comes from the fact that in the first run through the loop, I don't have anything to append to yet because first array read in is the first row of the 2D array. I dont want to have to write an if statement to handle the first case because that is ugly. If I were working with lists I could simply do z = [] before the loop and every time I read in an array, simply do z.append(x) to achieve my desired result. However I can find no way doing a similar procedure in numpy. I can create an empty numpy array, but then I can't append to it in the way I want. Can anyone help? Am I making any sense?
EDIT:
After some more research, I found another workaround that does technically do what I want although I think I will go with the solution given by #Roger Fan given that numpy appending is very slow. I'm posting it here just so its out there.
I can still define z = [] at the beginning of the loop. Then append my arrays with `np.append(z, x). This will ultimately give me something like
z = [0,0,0,1,1,1,1,2,3]
Then, because all the arrays I read in are of the same size, after the loop I can simply resize with `np.resize(n, m)' and get what I'm after.
Don't do it. Read the whole file into one array, using for example numpy.genfromtext().
With this one array, you can then loop over the rows, loop over the columns, and perform other operations using slices.
Alternatively, you can create a regular list, append a lot of arrays to that list, and in the end generate your desired array from the list using either numpy.array(list_of_arrays) or, for more control, numpy.vstack(list_of_arrays).
The idea in this second approach is "delayed array creation": find and organize your data first, and then create the desired array once, already in its final form.
As #heltonbiker mentioned in his answer, something like np.genfromtext is going to be the best way to do this if it fits your needs. Otherwise, I suggest reading the answers to this question about appending to numpy arrays. Basically, numpy array appending is extremely slow and should be avoided whenever possible. There are two much better (and faster by about 20x) solutions:
If you know the length in advance, you can preallocate your array and assign to it.
length_of_file = 5000
results = np.empty(length_of_file)
with open('myfile.txt', 'r') as f:
for i, line in enumerate(f):
results[i] = processing_func(line)
Otherwise, just keep a list of lists or list of arrays and convert it to a numpy array all at once.
results = []
with open('myfile.txt', 'r') as f:
for line in f:
results.append(processing_func(line))
results = np.array(results)

Null numpy array to be appended to

I'm writing a feature selection code. Basically get the output from featureselection function and concatenate it to the numpy array data
data=np.zeros([1,4114]) # put feature length here
for i in range(1,N):
filename=splitpath+str(i)+'.tiff'
feature=featureselection(filename)
data=np.vstack((data, feature))
data=data[1:,:] # remove the first zeros row
However, this is not a robust implementation as I need to know feature length (4114) beforehand.
Is there any null numpy array matrix, like in Python list we have []?
Appending to a numpy array in a loop is inefficient, there might be some situations when it cannot be avoided but this doesn't seem to be one of them. If you know the size of the array that you'll end up with, it's best to just per-allocate the array, something like this:
data = np.zeros([N, 4114])
for i in range(1, N):
filename = splitpath+str(i)+'.tiff'
feature = featureselection(filename)
data[i] = feature
Sometimes you don't know the size of the final array. There are several ways to deal with this case, but the simplest is probably to use a temporary list, something like:
data = []
for i in range(1,N):
filename = splitpath+str(i)+'.tiff'
feature = featureselection(filename)
data.append(feature)
data = np.array(data)
Just for completeness, you can also do data = np.zeros([0, 4114]), but I would recommend against that and suggest one of the methods above.
If you don't want to assume the size before creating the first array, you can use lazy initialization.
data = None
for i in range(1,N):
filename=splitpath+str(i)+'.tiff'
feature=featureselection(filename)
if data is None:
data = np.zeros(( 0, feature.size ))
data = np.vstack((data, feature))
if data is None:
print 'no features'
else:
print data.shape

Categories