numpy structured array inconsistency - python

I'm writing a library that uses NumPy arrays and I have a scalar operation I would like to perform on any dtype. This works fine for most structured arrays, however I run into a problem when creating structured arrays with multiple dimensions for structured elements. As an example,
x = np.zeros(10, np.dtype('3float32,int8'))
print(x.dtype)
print(x.shape)
shows
[('f0', '<f4', (3,)), ('f1', 'i1')]
(10,)
but
x = np.zeros(10, np.dtype('3float32'))
print(x.dtype)
print(x.shape)
yields
float32
(10, 3)
that is, creating a structured array with a single multidimensional field appears to instead expand the array shape. This means that the number of dimensions for the last example is 2, not 1 as I was expecting. Is there anything I'm missing here, or a known workaround?

Use the same dtype notation as displayed in the first working example:
In [92]: x = np.zeros(3, np.dtype([('f0','<f4',(3,))]))
In [93]: x
Out[93]:
array([([0., 0., 0.],), ([0., 0., 0.],), ([0., 0., 0.],)],
dtype=[('f0', '<f4', (3,))])
I don't normally use the string shorthand,
In [99]: np.dtype('3float32')
Out[99]: dtype(('<f4', (3,))) # no field name assigned
In [100]: np.zeros(3,_)
Out[100]:
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]], dtype=float32)
A couple of comma separated strings creates named fields:
In [102]: np.dtype('3float32,i4')
Out[102]: dtype([('f0', '<f4', (3,)), ('f1', '<i4')])

Related

Indexing with ndarray in the same way as using tuples

I'd like to index my 2d-array using a 1d-array of size two in the same way a tuple or basic indexing would be used. I have the indices as np.ndarrays for convenience when it comes to manipulations, but currently I'm converting them back and forth to tuples.
a = np.zeros((5, 5))
ix = np.array([3, 2])
>>> a[3, 2]
0.0
>>> a[(3, 2)]
0.0
>>> a[ix]
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
I've tried it with reshaping the array a bunch of ways, e.g. shapes (2, 1) and (1, 2), but no luck. Also couldn't find an entry from the documentation.
Is there a way?
Pass ix as a tuple for indexing, not an array/list, since the latter will specify a selection of rows, rather than a single cell.
So either a[tuple(ix)] or a[(*ix,)] will work.

Python Median Filter for 1D numpy array

I have a numpy.array with a dimension dim_array. I'm looking forward to obtain a median filter like scipy.signal.medfilt(data, window_len).
This in fact doesn't work with numpy.array may be because the dimension is (dim_array, 1) and not (dim_array, ).
How to obtain such filter?
Next, another question, how can I obtain other filter, i.e., min, max, mean?
Based on this post, we could create sliding windows to get a 2D array of such windows being set as rows in it. These windows would merely be views into the data array, so no memory consumption and thus would be pretty efficient. Then, we would simply use those ufuncs along each row axis=1.
Thus, for example sliding-median` could be computed like so -
np.median(strided_app(data, window_len,1),axis=1)
For the other ufuncs, just use the respective ufunc names there : np.min, np.max & np.mean. Please note this is meant to give a generic solution to use ufunc supported functionality.
For the best performance, one must still look into specific functions that are built for those purposes. For the four requested functions, we have the builtins, like so -
Median : scipy.signal.medfilt.
Max : scipy.ndimage.filters.maximum_filter1d.
Min : scipy.ndimage.filters.minimum_filter1d.
Mean : scipy.ndimage.filters.uniform_filter1d
The fact that applying of a median filter with the window size 1 will not change the array gives us a freedom to apply the median filter row-wise or column-wise.
For example, this code
from scipy.ndimage import median_filter
import numpy as np
arr = np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
median_filter(arr, size=3, cval=0, mode='constant')
#with cval=0, mode='constant' we set that input array is extended with zeros
#when window overlaps edges, just for visibility and ease of calculation
outputs an expected filtered with window (3, 3) array
array([[0., 2., 0.],
[2., 5., 3.],
[0., 5., 0.]])
because median_filter automatically extends the size to all dimensions, so the same effect we can get with:
median_filter(arr, size=(3, 3), cval=0, mode='constant')
Now, we can also apply median_filter row-wise with setting 1 to the first element of size
median_filter(arr, size=(1, 3), cval=0, mode='constant')
Output:
array([[1., 2., 2.],
[4., 5., 5.],
[7., 8., 8.]])
And column-wise with the same logic
median_filter(arr, size=(3, 1), cval=0, mode='constant')
Output:
array([[1., 2., 3.],
[4., 5., 6.],
[4., 5., 6.]])

numpy recarray from CSV dtype has many columns but shape says just one row, why is that?

My CSV has a mix of strings and numeric columns. nump.recfromcsv accurately inferred them (woo-hoo) giving a dtype of
dtype=[('null', 'S7'), ('00', '<f8'), ('nsubj', 'S20'), ('g', 'S1'), ...
So a mix of strings and numbers as you can see. But numpy.shape(csv) gives me
(133433,)
Which confuses me, since dtype implied it was column aware. Furthermore it accesses intuitively:
csv[1]
> ('def', 0.0, 'prep_to', 'g', 'query_w', 'indef', 0.0, ...
I also get the error
cannot perform reduce with flexible type
on operations like .all(), even when using with a numeric column. I'm not sure whether I'm really working with a table-like entity (two dimensions) or just one list of something. Why is the dtype inconsistent with the shape?
A recarray is an array of records. Each record can have multiple fields. A record is sort of like a struct in C.
If the shape of the recarray is (133433,) then the recarray is a 1-dimensional
array of records.
The fields of the recarray may be accessed by name-based
indexing. For example, csv['nsub'] and is essentially equivalent to
np.array([record['nsub'] for record in csv])
This special name-based indexing supports the illusion that a 1-dimensional recarray is a 2-dimensional array -- csv[intval] selects rows, csv[fieldname] selects "columns". However, under the hood and strictly
speaking if the shape is (133433,) then it is 1-dimensional.
Note that not all recarrays are 1-dimensional.
It is possible to have a higher-dimensional recarray,
In [142]: arr = np.zeros((3,2), dtype=[('foo', 'int'), ('bar', 'float')])
In [143]: arr
Out[143]:
array([[(0, 0.0), (0, 0.0)],
[(0, 0.0), (0, 0.0)],
[(0, 0.0), (0, 0.0)]],
dtype=[('foo', '<i8'), ('bar', '<f8')])
In [144]: arr.shape
Out[144]: (3, 2)
This is a 2-dimensional array, whose elements are records.
Here are the bar field values in the arr[:, 0] slice:
In [148]: arr[:, 0]['bar']
Out[148]: array([ 0., 0., 0.])
Here are all the bar field values in the 2D array:
In [151]: arr['bar']
Out[151]:
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
In [160]: arr['bar'].all()
Out[160]: False
Note that an alternative to using recarrays is Pandas Dataframes.
There are a lot more methods available for manipulating Dataframes than recarrays. You might find it more convenient.

fill off diagonal of numpy array fails

I'm trying to the fill the offset diagonals of a matrix:
loss_matrix = np.zeros((125,125))
np.diagonal(loss_matrix, 3).fill(4)
ValueError: assignment destination is read-only
Two questions:
1) Without iterating over indexes, how can I set the offset diagonals of a numpy array?
2) Why is the result of np.diagonal read only? The documentation for numpy.diagonal reads: "In NumPy 1.10, it will return a read/write view and writing to the returned array will alter your original array."
np.__version__
'1.10.1'
Judging by the discussion on the NumPy issue tracker, it looks like the feature is stuck in limbo and they never got around to fixing the documentation to say it was delayed.
If you need writability, you can force it. This will only work on NumPy 1.9 and up, since np.diagonal makes a copy on lower versions:
diag = np.diagonal(loss_matrix, 3)
# It's not writable. MAKE it writable.
diag.setflags(write=True)
diag.fill(4)
In an older version, diagflat constructs an array from a diagonal.
In [180]: M=np.diagflat(np.ones(125-3)*4,3)
In [181]: M.shape
Out[181]: (125, 125)
In [182]: M.diagonal(3)
Out[182]:
array([ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.,... 4.])
In [183]: np.__version__
Out[183]: '1.8.2'
Effectively it does this (working from its Python code)
res = np.zeros((125, 125))
i = np.arange(122)
fi = i+3+i*125
res.flat[fi] = 4
That is, it finds the flatten array equivalent indices of the diagonal.
I can also get fi with:
In [205]: i=np.arange(0,122)
In [206]: np.ravel_multi_index((i,i+3),(125,125))

How do I add a column to a python (matix) multi-dimensional array? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What's the simplest way to extend a numpy array in 2 dimensions?
I've been frustrated as a Matlab user switching over to python because I don't know all the tricks and get stuck hacking together code until it works. Below is an example where I have a matrix that I want to add a dummy column to. Surely, there is a simpler way then the zip vstack zip method below. It works, but it is totally a noob attempt. Please enlighten me. Thank you in advance for taking the time for this tutorial.
# BEGIN CODE
from pylab import *
# Find that unlike most things in python i must build a dummy matrix to
# add stuff in a for loop.
H = ones((4,10-1))
print "shape(H):"
print shape(H)
print H
### enter for loop to populate dummy matrix with interesting data...
# stuff happens in the for loop, which is awesome and off topic.
### exit for loop
# more awesome stuff happens...
# Now I need a new column on H
H = zip(*vstack((zip(*H),ones(4)))) # THIS SEEMS LIKE THE DUMB WAY TO DO THIS...
print "shape(H):"
print shape(H)
print H
# in conclusion. I found a hack job solution to adding a column
# to a numpy matrix, but I'm not happy with it.
# Could someone educate me on the better way to do this?
# END CODE
Use np.column_stack:
In [12]: import numpy as np
In [13]: H = np.ones((4,10-1))
In [14]: x = np.ones(4)
In [15]: np.column_stack((H,x))
Out[15]:
array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
In [16]: np.column_stack((H,x)).shape
Out[16]: (4, 10)
There are several functions that let you concatenate arrays in different dimensions:
np.vstack along axis=0
np.hstack along axis=1
np.dstack along axis=2
In your case, the np.hstack looks what you want. np.column_stack stacks a set 1D arrays as a 2D array, but you have already a 2D array to start with.
Of course, nothing prevents you to do it the hard way:
>>> new = np.empty((a.shape[0], a.shape[1]+1), dtype=a.dtype)
>>> new.T[:a.shape[1]] = a.T
Here, we created an empty array with an extra column, then used some tricks to set the first columns to a (using the transpose operator T, so that new.T has an extra row compared to a.T...)

Categories