Many functions like in1d and setdiff1d are designed for 1-d array. One workaround to apply these methods on N-dimensional arrays is to make numpy to treat each row (something more high dimensional) as a value.
One approach I found to do so is in this answer Get intersecting rows across two 2D numpy arrays by Joe Kington.
The following code is taken from this answer. The task Joe Kington faced was to detect common rows in two arrays A and B while trying to use in1d.
import numpy as np
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.intersect1d(A.view(dtype), B.view(dtype))
# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ncols)
I am hoping you to help me with any of the following three questions. First, I do not understand the mechanisms behind this method. Can you try to explain it to me?
Second, is there other ways to let numpy treat an subarray as one object?
One more open question: dose Joe's approach have any drawbacks? I mean whether treating rows as a value might cause some problems? Sorry this question is pretty broad.
Try to post what I have learned. The method Joe used is called structured arrays. It will allow users to define what is contained in a single cell/element.
We take a look at the description of the first example the documentation provided.
x = np.array([(1,2.,'Hello'), (2,3.,"World")], ...
dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])
Here we have created a one-dimensional array of length 2. Each element
of this array is a structure that contains three items, a 32-bit
integer, a 32-bit float, and a string of length 10 or less.
Without passing in dtype, however, we will get a 2 by 3 matrix.
With this method, we would be able to let numpy treat a higher dimensional array as an single element with properly set dtype.
Another trick Joe showed is that we don't need to really form a new numpy array to achieve the purpose. We can use the view function (See ndarray.view) to change the way numpy view data. There is a section of Note section in ndarray.view that I think you should take a look before utilizing the method. I have no guarantee that there would not be side effects. The paragraph below is from the note section and seems to call for caution.
For a.view(some_dtype), if some_dtype has a different number of bytes per entry than the previous dtype (for example, converting a regular array to a structured array), then the behavior of the view cannot be predicted just from the superficial appearance of a (shown by print(a)). It also depends on exactly how a is stored in memory. Therefore if a is C-ordered versus fortran-ordered, versus defined as a slice or transpose, etc., the view may give different results.
Other reference
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html
Related
I want to initialise an array that will hold some data. I have created a random matrix (using np.empty) and then multiplied it by np.nan. Is there anything wrong with that? Or is there a better practice that I should stick to?
To further explain my situation: I have data I need to store in an array. Say I have 8 rows of data. The number of elements in each row is not equal, so my matrix row length needs to be as long as the longest row. In other rows, some elements will not be filled. I don't want to use zeros since some of my data might actually be zeros.
I realise I can use some value I know my data will never, but nans is definitely clearer. Just wondering if that can cause any issues later with processing. I realise I need to use nanmax instead of max and so on.
I have created a random matrix (using np.empty) and then multiplied it by np.nan. Is there anything wrong with that? Or is there a better practice that I should stick to?
You can use np.full, for example:
np.full((100, 100), np.nan)
However depending on your needs you could have a look at numpy.ma for masked arrays or scipy.sparse for sparse matrices. It may or may not be suitable, though. Either way you may need to use different functions from the corresponding module instead of the normal numpy ufuncs.
A way I like to do it which probably isn't the best but it's easy to remember is adding a 'nans' method to the numpy object this way:
import numpy as np
def nans(n):
return np.array([np.nan for i in range(n)])
setattr(np,'nans',nans)
and now you can simply use np.nans as if it was the np.zeros:
np.nans(10)
How would I translate the following into Python from Matlab? I'm still trying to wrap my head around lists/matrices and arrays in numpy, etc.
outframe(:,[4:4:nout-1]) = 0.25*inframe(:,[1:n-1]) + 0.75*inframe(:,[2:n])
pos=(beamnum>0)*(beamnum<=nbeams)*(binnum>0)*(binnum<=nbins)*((beamnum-1)*nbins+binnum)
for index =1:512:
outarray(index,:) =uint8(interp1([1:n],inarray64(index,:),[1:.25:n],method))
(There's other stuff, these are just the particular statements I'm not sure how to make sense of. I have numpy imported,
The main workhorse in numpy is the ndarray (or array). It will for the most part replace matlab matrices when you translate code. Like a matlab matrix, the ndarray stores homogeneous data (ie float64) and is optimized for numerical operations.
The numpy matrix is a subclass of the ndarray which can be convenient for some linear algebra intensive applications. Here is more info about the differences between the two.
The python list is more like a matlab cell array (though not exactly the same). It's one of the basic python data structures, but in scientific applications I find that it comes up most often when you need to hold heterogeneous data. (Or when you're doing something very simple and don't want to go to the trouble of creating a numpy array).
Your code above can be converted almost verbatim to python using the ndarray and replacing () with [] for indexing and taking into account that indexing starts at 1 in MATLAB and 0 in python
i.e. : the first element in MATLAB is element 1, and in python it is element 0.
Let's try this line by line:
outframe(:,[4:4:nout-1]) = 0.25*inframe(:,[1:n-1]) + 0.75*inframe(:,[2:n])
would translate in "English" to: all rows of outframe, but only every 4th column starting from 4 to nout-1 (i.e.4,8..). I assume you understand what inframe references mean.
pos=(beamnum>0)*(beamnum<=nbeams)*(binnum>0)*(binnum<=nbins)*((beamnum-1)*nbins+binnum)
Possibly beamnum is a vector and (beamnum >0) returns a vector of {0,1} such that the elements are '1' where the respective beamnum element is >0, else 0. The rest of it is clear, i hope.
The second last line is a for-loop and the last line should hopefully be clear.
Maybe this is a simple issue, but I could not find any information about it so far.
For an optimization in numpy I need an array of functions. The number of functions I need depends on the current object which shall be optimized.
I have already figured out how to create these functions dynamically, but now I would like to store them in an array like this:
myArray = zeros(x)
for i in range(x):
myArray[i] = createFunction(i)
If I run this I get a type mismatch:
float() argument must be a string or a number, not 'function'
Creating the array directly works well:
myArray = array([createFunction(0)...])
But because I don't know the number of functions I need, this is exactly what I want to prevent.
Ah, I get it. You really do mean an array of functions.
The type mismatch error arises because the call to zeros creates an array of floats by default. So your original would work if instead you did myArray = numpy.empty(x, dtype=numpy.object) (note that empty makes more sense than zeros here). The slightly more pythonic version is to use a list comprehension
myArray = numpy.array([createFunction(i) for i in range(x)]).
But you might not need to create a numpy array at all, depending on what you want to do with it:
myArray = [createFunction(i) for i in range(x)]
If you want to avoid the list, it might be better to use numpy.fromfunction along with numpy.vectorize:
myArray = numpy.fromfunction(numpy.vectorize(createFunction),
shape=(x,), dtype=numpy.object)
where (x,) is a tuple giving the shape of the array. The call to vectorize is needed because fromfunction assumes that the function can work on an array of inputs and return an array of scalars, and vectorize converts a function to do exactly that. The dtype=object is needed since otherwise numpy tries to create an array of floats.
Maybe you can use
myArray = array([createFunction(i) for i in range(x)])
If you need an array of functions, is it possible to not use NumPy? NumPy arrays have C-style types and it defaults to float. If you can, just use a standard Python list. But if you absolutely must use NumPy, try defining the array like so:
import numpy as np
a = np.empty([x], dtype=np.dtype(np.object_))
Or however you need it to be with that dtype.
Numpy arrays are homogeneous. That is all elements of a numpy array are of the same type -- python is duck-typed, numpy isn't. This is part of what makes matrix operations on numpy arrays and matrices so fast. However, because of this a data type must be known when the array is first created. Numpy is generally very good at inferring the data type. The problem comes when creating an empty or zeroed array. Since there are no elements to examine numpy must guess the data type. Numpy defaults to numpy.float64 if it isn't given a data type at array creation time. This is a decent choice as numpy is typically used in scientific or engineering areas where floating point numbers are required. This is also why numpy is complaining -- because it can't store your functions as 64-bit floating point numbers.
The quick solution is to let numpy know the data type you want. eg.
myArray = numpy.zeros(x, dtype=numpy.object)
Note that the data type cannot be any class, but must be an instance of numpy.dtype (for advanced use you can create additional dtypes a runtime that numpy can then manipulate). For functions, numpy will store them as numpy.object (which means any generic python object). I do not think you will get any performance benefit from using numpy to store arrays of functions. Perhaps you would be better off creating generator functions and chaining them, converting to a numpy array once you know the result will be a number.
funcs = [createFunction(i) for i in xrange(x)]
def getItemFromEachFunction(i):
return funcs[i]()
arr = numpy.fromfunction(getItemFromEachFunction, (x,))
I have some data represented in a 1300x1341 matrix. I would like to split this matrix in several pieces (e.g. 9) so that I can loop over and process them. The data needs to stay ordered in the sense that x[0,1] stays below (or above if you like) x[0,0] and besides x[1,1].
Just like if you had imaged the data, you could draw 2 vertical and 2 horizontal lines over the image to illustrate the 9 parts.
If I use numpys reshape (eg. matrix.reshape(9,260,745) or any other combination of 9,260,745) it doesn't yield the required structure since the above mentioned ordering is lost...
Did I misunderstand the reshape method or can it be done this way?
What other pythonic/numpy way is there to do this?
Sounds like you need to use numpy.split() which has its documentation here ... or perhaps its sibling numpy.array_split() here. They are for splitting an array into equal subsections without re-arranging the numbers like reshape does,
I haven't tested this but something like:
numpy.array_split(numpy.zeros((1300,1341)), 9)
should do the trick.
reshape, to quote its docs,
Gives a new shape to an array without
changing its data.
In other words, it does not move the array's data around at all -- it just affects the array's dimension. You, on the other hand, seem to require slicing; again quoting:
It is possible to slice and stride
arrays to extract arrays of the same
number of dimensions, but of different
sizes than the original. The slicing
and striding works exactly the same
way it does for lists and tuples
except that they can be applied to
multiple dimensions as well.
So for example thearray[0:260, 0:745] is the "upper leftmost part, thearray[260:520, 0:745] the upper left-of-center part, and so forth. You could have references to the various parts in a list (or dict with appropriate keys) to process them separately.
I have a few functions that return an array of data corresponding to parameters ranges.
Example: for a 2d array a, the a_{ij} value corresponds to the parameter set (param1_i, param2_j). How do I return the result and keep the parameter-value correspondence?
Calling the function for each and every of param1_i, para2_j and returning one value would take ages (far more efficient if you do it in one go)
Break the function into (many) smaller functions and make usage difficult? (the point is to get the values for a range of parameters, 1 value is completely useless)
The best I can come up with is make a new numpy dtype, for example for a 2d array:
tagged2d = np.dtype( [('vals', float, 1), ('params', float, (2,))] )
so that a['vals'][i,j] contains the values and a['params'][i,j] the corresponding parameters.
Any thoughts? Maybe I should just return 2 arrays, one with values, other with parameter tuples?
I recommend your last suggestion... just return two arrays {'values': a, 'params':params}.
There are a few reasons for this.
Primarily, your other solution (using dtype and recarrays) tangles too many things together. For example, what about quantities derived from a that correspond to the same parameters... do you make a new recarray and a new copy of the parameters for that? Even something as simple as 2*a becoming the salient quantity will require that you make difficult decisions.
Recarrays have limitations and this is so easily solved in other ways that it's not worth accepting those limitations.
If you want an easier interrelation between the returned terms, you could put the items in a class. For example, you could have a method that takes a param pair and returns the corresponding result. This way, you wouldn't be limited by the recarray, and you could still construct whatever convenience relationship between the two that you like, and easily make backward-compatible change to behavior, etc.