Trying to convert a MATLAB array to a Python array - python

I have this MATLAB code that I need to translate to python, however there is an issue in creating a new column in the firings array. In MATLAB, the code creates an n*2 matrix that is initially empty and I want to be able to do the same in python. Using NumPy, I created fired = np.where(v >= 30). However python creates a tuple rather than an array so it throws an error:
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'
This is the code I have in MATLAB that I would like converted into Python
firings=[];
firings=[firings; t+0*fired, fired];
Help is appreciated! Thanks!

np.where generates a two-element tuple if the array is 1D in nature. For the 1D case, you would need to access the first element of the result of np.where only:
fired = np.where(v >= 30)[0]
You can then go ahead and concatenate the matrices. Also a suggestion provided by user #Divakar would be to use np.flatnonzero which would equivalently find the non-zero values in a NumPy array and flattened into a 1D array for less headaches:
fired = np.flatnonzero(v >= 30)
Take note that the logic to concatenate would not work if there were no matches found in fired. You will need to take this into account when you look at your concatenating logic. The convenient thing with MATLAB is that you're able to concatenate empty matrices and the result is no effect (obviously).
Also note that there is no conception of a row vector or column vector in NumPy. It is simply a 1D array. If you want to specifically force the array to be a column vector as you have it, you need to introduce a singleton axis in the second dimension for you to do this. Note that this only works provided that np.where gave you matched results. After, you can use np.vstack and np.hstack to vertically and horizontally concatenate arrays to help you do what you ask. What you have to do first is create a blank 2D array, then do what we just covered:
firings = np.array([[]]) # Create blank 2D array
# Some code here...
# ...
# ...
# fired = find(v >= 30); % From MATLAB
fired = np.where(v >= 30)[0]
# or you can use...
# fired = np.flatnonzero(v >= 30)
if np.size(fired) != 0:
fired = fired[:, None] # Introduce singleton axis
# Update firings with two column vectors
# firings = [firings; t + 0 * fired, fired]; % From MATLAB
firings = np.vstack([firings, np.hstack([t + 0*fired, fired])])
Here np.size finds the total number of elements in the NumPy array. If the result of np.where generated no results, the number of elements in fired should be 0. Therefore the if statement only executes if we have found at least one element in v subject to v >= 30.

If you use numpy, you can define an ndarray:
import numpy as np
firings=np.ndarray(shape=(1,2)
firings[0][0:]=(1.,2.)
firings=np.append(firings,[[3.,4.]],axis=0)

Related

How can I scale (x-axes) and shift data within array in Python?

I have an array of data that represents some signal f(x). If there is a way to perform operations which gives me in result an array of f(ax + b) by using only first array?
For "+ b" shifting part I use numpy.insert to insert array of zeros to shift signal left or right, but can't figure how to do f(ax). Please keep in mind that I do not want to a*f(x) and simple multiplication of array by constant is not an option.
Edit: Unfortunately I have no access to function that generated first array, I think that resampling functions are the ones that will solve rescalling issue.
Depending on the size of the array there are several solutions, the simplest is to access the array f as f[a*x+b] and checking if that is a valid index. Here is a code that creates the shifted array:
import numpy as np
def scale_shift(f, a , b):
i = np.arange(len(f))*a+b
y = f[i[(0<=i) & (i<len(f))]]
return y
n = 10
f = np.random.rand(n)
print(scale_shift(f,2,1))
Note that the length of the new array will depend on the shift. You can use % if you want to wrap around the boundaries

Filtered Numpy Array Changes Number of Dimensions

I'm having trouble getting used to Numpy arrays (I'm a Matlab user). When I try to select just a range of values from an array, I see the resulting array has an extra dimension:
ioi = np.nonzero((self.data_array[0,:] >= range_start) & (self.data_array[0,:] <= range_end))
print("self.data_array.shape = {0}".format(self.data_array.shape))
print("self.data_array.shape[:,ioi] = {0}".format(self.data_array[:,ioi].shape))
The result is:
self.data_array.shape = (5, 50000)
self.data_array.shape[:,ioi] = (5, 1, 408)
I also see that ioi is a tuple. I don't know if that has anything to do with it.
What is happening here to create that extra dimension and what should I do, in the most direct way, to get an array shape of (5,408) in this case?
The simplest and most efficient thing would be to get rid of the np.nonzero call, and use logical indexing just as one would in Matlab. Here's an example. (I'm using random data of the same shape, FYI.)
>>> data = np.random.randn(5, 5000)
>>> start, end = -0.5, 0.5
>>> ioi = (data[0] > start) & (data[0] < end)
>>> print(ioi.shape)
(5000,)
>>> print(ioi.sum())
1900
>>> print(data[:, ioi].shape)
(5, 1900)
The np.nonzero call is not usually needed. Just like Matlab's find function, it's slow compared with logical indexing, and usually one's goal can be more efficiently accomplished with logical indexing. np.nonzero, just like find, should mostly be used only when you need the actual index values themselves.
As you suspected, the reason for the extra dimensions is that tuples are handled differently from other types of indexing arrays in NumPy. This is to allow more flexible indexing, such as with slices, ellipses, etc. See this useful page for in-depth explanation, especially the last section.
There are at least two other options to solve the problem. One is to use the ioi array, as returned from np.nonzero, directly as your only index to the data array. As in: self.data_array[ioi]. Part of why you have an extra dimension is that you actually have two set of indices in your call: the slice (:) and the tuple ioi. np.nonzero is guaranteed to return a tuple exactly for this reason, so that its output can always be used to directly index the source array.
The last option is to call np.squeeze on the returned array, but I'd opt for one of the above first.

Speeding up fancy indexing with numpy

I have two numpy arrays and each has shape of (10000,10000).
One is value array and the other one is index array.
Value=np.random.rand(10000,10000)
Index=np.random.randint(0,1000,(10000,10000))
I want to make a list (or 1D numpy array) by summing all the "Value array" referring the "Index array". For example, for each index i, finding matching array index and giving it to value array as argument
for i in range(1000):
NewArray[i] = np.sum(Value[np.where(Index==i)])
However, This is too slow since I have to do this loop through 300,000 arrays.
I tried to come up with some logical indexing method like
NewArray[Index] += Value[Index]
But it didn't work.
The next thing I tried is using dictionary
for k, v in list(zip(Index.flatten(),Value.flatten())):
NewDict[k].append(v)
and
for i in NewDict:
NewDict[i] = np.sum(NewDict[i])
But it was slow too
Is there any smart way to speed up?
I had two thoughts. First, try masking, it speeds this up by about 4x:
for i in range(1000):
NewArray[i] = np.sum(Value[Index==i])
Alternately, you can sort your arrays to put the values you're adding together in contiguous memory space. Masking or using where() has to gather all your values together each time you call sum on the slice. By front-loading this gathering, you might be able to speed things up considerably:
# flatten your arrays
vals = Value.ravel()
inds = Index.ravel()
s = np.argsort(inds) # these are the indices that will sort your Index array
v_sorted = vals[s].copy() # the copy here orders the values in memory instead of just providing a view
i_sorted = inds[s].copy()
searches = np.searchsorted(i_sorted, np.arange(0, i_sorted[-1] + 2)) # 1 greater than your max, this gives you your array end...
for i in range(len(searches) -1):
st = searches[i]
nd = searches[i+1]
NewArray[i] = v_sorted[st:nd].sum()
This method takes 26 sec on my computer vs 400 using the old way. Good luck. If you want to read more about contiguous memory and performance check this discussion out.

Random array from list of arrays by numpy.random.choice()

I have list of arrays similar to lstB and want to pick random collection of 2D arrays. The problem is that numpy somehow does not treat objects in lists equally:
lstA = [numpy.array(0), numpy.array(1)]
lstB = [numpy.array([0,1]), numpy.array([1,0])]
print(numpy.random.choice(lstA)) # returns 0 or 1
print(numpy.random.choice(lstB)) # returns ValueError: must be 1-dimensional
Is there an ellegant fix to this?
Let's call it semi-elegant...
# force 1d object array
swap = lstB[0]
lstB[0] = None
arrB = np.array(lstB)
# reinsert value
arrB[0] = swap
# and clean up
lstB[0] = swap
# draw
numpy.random.choice(arrB)
# array([1, 0])
Explanation: The problem you encountered appears to be that numpy when converting the input list to an array will make as deep an array as it can. Since all your list elements are sequences of the same length this will be 2d. The hack shown here forces it to make a 1d array of object dtype instead by temporarily inserting an incompatible element.
However, I personally would not use this. Because if you draw multiple subarrays with this method you'll get a 1d array of arrays which is probably not what you want and tedious to convert.
So I'd actually second what one of the comments recommends, i.e. draw ints and then use advanced indexing into np.array(lstB).

How to build a numpy array row by row in a for loop?

This is basically what I am trying to do:
array = np.array() #initialize the array. This is where the error code described below is thrown
for i in xrange(?): #in the full version of this code, this loop goes through the length of a file. I won't know the length until I go through it. The point of the question is to see if you can build the array without knowing its exact size beforehand
A = random.randint(0,10)
B = random.randint(0,10)
C = random.randint(0,10)
D = random.randint(0,10)
row = [A,B,C,D]
array[i:]= row # this is supposed to add a row to the array with A,C,B,D as column values
This code doesn't work. First of all it complains: TypeError: Required argument 'object' (pos 1) not found. But I don't know the final size of the array.
Second, I know that last line is incorrect but I am not sure how to call this in python/numpy. So how can I do this?
A numpy array must be created with a fixed size. You can create a small one (e.g., one row) and then append rows one at a time, but that will be inefficient. There is no way to efficiently grow a numpy array gradually to an undetermined size. You need to decide ahead of time what size you want it to be, or accept that your code will be inefficient. Depending on the format of your data, you can possibly use something like numpy.loadtxt or various functions in pandas to read it in.
Use a list of 1D numpy arrays, or a list of lists, and then convert it to a numpy 2D array (or use more nesting and get more dimensions if you need to).
import numpy as np
a = []
for i in range(5):
a.append(np.array([1,2,3])) # or a.append([1,2,3])
a = np.asarray(a) # a list of 1D arrays (or lists) becomes a 2D array
print(a.shape)
print(a)

Categories